Mingyuan Ma
I am a Software Engineer at NVIDIA, working on LLM Inference Workload Performance in the Compute Architecture Group. I conduct end-to-end inference performance benchmarking and analysis, and build automation infrastructures for HPC benchmarking.
I received my M.S. in Data Science from Harvard University, with cross-registration in EECS at MIT. Before that, I completed my B.A. in Computer Science and B.A. in Statistics (Double Majors with High Distinction Honors) at UC Berkeley.
My research interests include LLM Inference Systems, Efficient Deep Learning, and Continual Learning. I have collaborated with SGLang / Sky Computing Lab at UC Berkeley on distributed GPU-sharing inference systems, with Microsoft Research Asia on reasoning frameworks for Small Language Models, and with HPC-AI Lab at NUS on continual learning of vision-language models. I also worked at Moonshot AI (Kimi) on efficient LLM architectures.
News
| Jul 1, 2025 |
Joined NVIDIA as Software Engineer in LLM Inference Workload Performance, Compute Architecture Group |
|---|---|
| May 20, 2025 |
I graduate from Harvard University with M.S. in Data Science |
| Jan 20, 2025 |
Our paper “Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers” is accepted by ICLR 2025 |
| Jan 15, 2025 |
Our paper “Octopus: On-device language model for function calling of software APIs” is accepted by NAACL 2025 Industry Track (Oral) |
| Oct 1, 2024 |
Started collaborating with SGLang / Sky Computing Lab at UC Berkeley on distributed GPU-sharing inference systems |
| Jul 13, 2023 |
Our paper Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models is accepted by by ICCV 2023. |
| May 13, 2023 |
I graduate from UC Berkeley with Magna cum Laude, double majoring Statistics and Computer Science |
| Mar 20, 2023 |
I will start my Master’s in Data Science degree at Harvard SEAS |