Ruofeng Yang 杨若峰

Ph.D. Candidate · Shanghai Jiao Tong University, John Hopcroft Center for Computer Science

Bio

I am a Ph.D. candidate at the John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, advised by Prof. Shuai Li (李帅). I am currently a visiting research intern at the School of Computing, NUS (Jan – Jun 2026), supervised by Prof. Xiaokui Xiao (IEEE Fellow), funded by 国家留学基金委人工智能卓越专项.

My research focuses on the theory of diffusion models and reinforcement learning — from sample complexity and consistency models to RL post-training for 3D and video generation (Tencent Rhino-Bird, Meituan Longcat-Video), and increasingly on long-horizon agents and multi-agent auto-research. I created ARIS — Auto Research in Sleep, an open-source long-horizon multi-agent auto-research platform (10K+ GitHub stars, HuggingFace Daily Papers #1, VALSE 2026 talk).

He anticipates graduating in 2027 for industrial research positions and is also open to internship opportunities at any time. If you're interested, please feel free to reach out via email or WeChat (yrf13618645542).

Research Experiences

Research Intern at Meituan · Longcat-Video Team — 美团北斗人才计划 · 2025-06 – 2025-12
Research Intern at Tencent IEG — 犀牛鸟精英人才计划 (Rhino-Bird Elite Program) · 2024-03 – 2024-09

Research Interests

Education

Ph.D. in Computer Science · Shanghai Jiao Tong University · advised by Prof. Shuai Li (李帅)2022-09 – 2027-06 (expected)
B.E., Naval Architectural, Ocean & Civil Engineering · Shanghai Jiao Tong University2018-09 – 2022-06

News

2026-01Started visiting research internship at NUS School of Computing (国家留学基金委人工智能卓越专项, supervised by Prof. Xiaokui Xiao).
2026-04ARIS reached 10K+ GitHub stars; technical report featured as HuggingFace Daily Papers #1 (arXiv:2605.03042). ARIS star history
2026-03ARIS won AI Digital Crew Project of the Day.
2026Invited talk at VALSE 2026 on ARIS (持续学习与持续智能体分论坛).
2026Two papers at ICLR 2026 (MoE structure for diffusion; few-shot pretraining warm-up).
2025"Improved Discretization Complexity Analysis of Consistency Models" accepted at ICML 2025.
2025Awarded the National Scholarship for Ph.D. Students from the Ministry of Education, China.
2025Joined Meituan Longcat-Video team (北斗人才计划); co-authored DFS-GRPO (SoTA T2I post-training on UniGenBench).

Selected Publications

As we know, diffusion models can be roughly divided into pretraining, supervised fine-tuning, RL post-training, and sampling algorithm design. My works focus on these four areas, plus the long-horizon multi-agent auto-research direction.

Long-Horizon Multi-Agent Auto-Research

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
Ruofeng Yang, Yongcan Li, Shuai Li
arXiv preprint arXiv:2605.03042, 2026

Diffusion Model MoE Structure and Pretraining

Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts
Ruofeng Yang*, Yongcan Li*, Bo Jiang, Chen Cheng, Shuai Li (* equal contribution)
ICLR, 2026

Diffusion Model Supervised Fine-tuning

Few-Shot Diffusion Models Escape the Curse of Dimensionality
Ruofeng Yang, Bo Jiang, Cheng Chen, Ruinan Jin, Baoxiang Wang, Shuai Li
NeurIPS, 2024
Evaluating the Role of Great Pre-trained Diffusion Models in Few-shot Phase: Warm-up and Acceleration
Ruofeng Yang*, Yongcan Li*, Bo Jiang, Chen Cheng, Shuai Li (* equal contribution)
ICLR 2026 DeLTa Workshop, 2026

Diffusion Model Post-Training

DFS-GRPO: Reward Guided Tree Search Leads to Provable Improvement in Diffusion Models
Ruofeng Yang, et al.
Preprint, in submission, 2025
Meituan Longcat-Video team. As the first work, we focused on how to efficiently train diffusion models with GRPO algorithm and proposed DFS-GRPO, which achieves SoTA performance (using FLUX.1-dev as base model, proposed in 2024/05) on UniGenBench compared with all T5-based T2I models — and ranks 5th among all open-source T2I models. We then focus on improving physical properties in video generation via post-training algorithms, plus reward modeling with V-JEPA2 / Qwen 3.5 and training-free methods.
Contrastive Guidance and Feedback: A Suitable Way to Improving 3D Consistency of Multi-view Diffusion Model
Ruofeng Yang, Yaqing Zhang, Le Wan, Shuai Li
Preprint, in submission, 2024
Tencent IEG · Rhino-Bird Research Elite Program. Online DPO-style algorithm with group-relative reward design. We focus on how to improve the 3D consistency of multi-view diffusion models. We first model the 3D generation process from a theoretical perspective and prove that contrastive guidance is suitable for 3D generation. Then we train a 3D consistency feedback using the contrastive method. Finally, we achieve strong performance by fine-tuning the pre-trained model with the contrastive 3D consistency feedback and the direct preference optimization (DPO) method.

Diffusion Model Sampling and Condition Generation

Improved Discretization Complexity Analysis of Consistency Models: Variance Exploding Forward Process and Decay Discretization Scheme
Ruofeng Yang, Bo Jiang, Cheng Chen, Shuai Li
ICML, 2025
The Polynomial Iteration Complexity for Variance Exploding Diffusion Models: Analyzing SDE and ODE Samplers
Ruofeng Yang, Bo Jiang, Shuai Li
AISTATS, 2025
Leveraging Drift to Improve Sample Complexity of Variance Exploding Diffusion Models
Ruofeng Yang, Zhijie Wang, Bo Jiang, Shuai Li
NeurIPS, 2024
Elucidating Rectified Flow with Deterministic Sampler: Polynomial Discretization Complexity for Multi and One-step Models
Ruofeng Yang, Zhaoyu Zhu, Bo Jiang, Chen Cheng, Shuai Li
Preprint, arXiv:2508.08735, 2025
Elucidating Guidance in Variance Exploding Diffusion Models: Fast Convergence and Better Diversity
Ruofeng Yang, Qiuyi Yu, Bo Jiang, Chen Cheng, Shuai Li
ICLR 2026 2nd DeLTa Workshop, 2025

Representation and Reinforcement Learning Theory

Understanding Representation Learnability of Nonlinear Self-Supervised Learning
Ruofeng Yang, Xiangyuan Li, Bo Jiang, Shuai Li
AAAI, 2023
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li
NeurIPS, 2023
Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Shuai Li
ICLR, 2023
Full list on Google Scholar.

Rewards

National Scholarship for Ph.D. Students — Ministry of Education, China, 2025
Best Paper, 2nd Prize — Few-Shot Diffusion Models — TongAI 2025, 2025
Excellent Doctoral Academic Forum, 3rd Prize — John Hopcroft Center, SJTU, 2024
Outstanding Graduate — Shanghai Jiao Tong University, 2022
Outstanding Winner (top 0.1%) — Mathematical Contest in Modeling, 2020
Tung Scholarship — Tung Foundation (Hong Kong), 2020
Yang You Scholarship — Shanghai Jiao Tong University, 2019, 2021

Blogs & Tutorials & Talks

Diffusion and Representation Learning and Manifold Learning (Theory & Application) [html] [pdf] [slides]
A Simple Introduction to Diffusion Model (Application) [pdf]
Why Rectified Flow is Better? Elucidating VP, VE, and RF-based Diffusion Models (Theory) [slides]
ARIS: 跨模型持久化对抗式多智能体自主科研系统 — VALSE 2026 · 持续学习与持续智能体分论坛, 2026 [slides]
Why Rectified Flow is Better? Elucidating VP, VE, and RF-based Diffusion Models — CSML 2025, 2025
Few-Shot Diffusion Models Escape the Curse of Dimensionality — TongAI 2025 (Best Paper, 2nd Prize), 2025

Professional Services

Conference Reviewer for:

International Conference on Machine Learning (ICML) 2025, 2026
International Conference on Learning Representations (ICLR) 2025, 2026
International Conference on Artificial Intelligence and Statistics (AISTATS) 2025
Neural Information Processing Systems (NeurIPS) 2024, 2025
Autonomous Agents and Multiagent Systems (AAMAS) 2022

Teaching

AI3601 Reinforcement Learning (Undergraduate) · Teaching Assistant · Spring 2023 · 2024 · 2025
AI2615 Algorithm Design and Analysis · Teaching Assistant · Spring 2022
CS445 Combinatorics (Undergraduate) · Teaching Assistant · Fall 2020 · 2021