Mingyuan Ma

ucb23.jpg

I am a Software Engineer at NVIDIA, working on LLM Inference Workload Performance in the Compute Architecture Group. I conduct end-to-end inference performance benchmarking and analysis, and build automation infrastructures for HPC benchmarking.

I received my M.S. in Data Science from Harvard University, with cross-registration in EECS at MIT. Before that, I completed my B.A. in Computer Science and B.A. in Statistics (Double Majors with High Distinction Honors) at UC Berkeley.

My research interests include LLM Inference Systems, Efficient Deep Learning, and Continual Learning. I have collaborated with SGLang / Sky Computing Lab at UC Berkeley on distributed GPU-sharing inference systems, with Microsoft Research Asia on reasoning frameworks for Small Language Models, and with HPC-AI Lab at NUS on continual learning of vision-language models. I also worked at Moonshot AI (Kimi) on efficient LLM architectures.



News

Jul 1, 2025 Joined NVIDIA as Software Engineer in LLM Inference Workload Performance, Compute Architecture Group :computer:
May 20, 2025 I graduate from Harvard University with M.S. in Data Science :mortar_board:
Jan 20, 2025 Our paper “Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers” is accepted by ICLR 2025 :star2:
Jan 15, 2025 Our paper “Octopus: On-device language model for function calling of software APIs” is accepted by NAACL 2025 Industry Track (Oral) :sparkles:
Oct 1, 2024 Started collaborating with SGLang / Sky Computing Lab at UC Berkeley on distributed GPU-sharing inference systems :rocket:
Jul 13, 2023 Our paper Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models is accepted by by ICCV 2023. :star2:
May 13, 2023 I graduate from UC Berkeley with Magna cum Laude, double majoring Statistics and Computer Science :star2:
Mar 20, 2023 I will start my Master’s in Data Science degree at Harvard SEAS :sparkles:


Publications

  1. Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models
    Zangwei Zheng, Mingyuan Ma, Kai Wang, and 3 more authors
    ICCV 2023, 2023
  2. Octopus: On-device language model for function calling of software APIs
    Wei Chen, Zhiyuan Li, and Mingyuan Ma
    NAACL 2025 Industry Track (Oral), 2024
  3. Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
    Zhenting Qi, Mingyuan Ma, Jiahang Xu, and 3 more authors
    ICLR 2025, 2024
  4. ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
    Huandong Chang, Zicheng Ma, Mingyuan Ma, and 4 more authors
    arXiv preprint, 2025
  5. Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
    Shan Yu, Jiarong Xing, Yifan Qiao, and 4 more authors
    Under Review at OSDI 2026, 2025