Classification at scale
High-dim data (20K+ features), weakly-supervised and hierarchical learning. Partial labels, novel-class discovery, single-cell RNA-seq.
I engineer ML systems that survive the distance from research to production. PhD in applied mathematics (single-cell classification); now building LLM evaluation frameworks, agent training loops, and high-dimensional classification pipelines.
Doctorate
2024
Peer-reviewed
4 papers
PLOS Comp Bio
2×
Focus
LLM eval & agents
Three years teaching models to classify cells they'd never seen. The same muscle — rigorous evaluation, principled uncertainty, systems that survive messy real data — is what makes LLM agents useful in production.
High-dim data (20K+ features), weakly-supervised and hierarchical learning. Partial labels, novel-class discovery, single-cell RNA-seq.
RAG, prompt engineering, RL loops for agent training. Instrumented evaluation, reward-hacking prevention, reliability-focused design.
Docker, CI/CD, experiment tracking, Streamlit dashboards, model monitoring. Research systems built to be reproducible and deployed — internal tools serving researcher workflows.
An LLM agent trained via a reinforcement learning loop to autonomously fit Kolmogorov-Arnold Networks for symbolic regression. Structured feedback on generalization, OOD extrapolation, and expression parsimony — with no gradient flowing through the LLM.
Best score (GPT-OSS 120B)
0.694
Threshold 0.65 reached at
round 5 / 20
→ Evaluation loop architecture for autonomous agent training. A building block for research systems where LLMs train specialized models without human supervision.
View on GitHub →Systematic framework for diagnosing LLM failures and testing optimization strategies in production. Case study on medical entity extraction; modular and extensible to other use cases.
→ Reduced inference cost by 10× while maintaining accuracy (91% vs 92%) on medical entity extraction.
Pick the right model for the right complexity and cost.
Deep learning models for biological data with 20K+ features. Custom PyTorch architectures with automated feature selection and dimensionality reduction, achieving state-of-the-art performance on complex scRNA-seq classification.
→ State-of-the-art performance on real scRNA-seq datasets. Benchmarked against existing methods in peer-reviewed work (see Publications).
End-to-end ML solution for energy demand forecasting with a Streamlit dashboard for real-time monitoring and decision-making.
→ Full pipeline from data ingestion to deployed dashboard.
View project →Machine learning models for classifying single-cell RNA sequencing data under partial label learning. New methods benchmarked against adapted existing approaches on real and synthetic datasets.
Extreme learning scenario where models must classify without label knowledge. A hierarchical hypothesis on labels completes the learning schema for predictions on unlabelled data.
Contributed to an interactive Plotly/Dash visualization tool for random-walk simulations on large-scale networks.
Extension of hierarchical classification to the weakly-supervised problem. Three algorithms benchmarked on C. elegans transcriptomic profiles.
PhD in Mathematics and Computer Science
Master 2 Mathematics and Applications (CEPS)