Xiaomi SU7 Reinforcement Learning World Model Explained
Q: What are “Reinforcement Learning” and “World Model”?
A: Reinforcement Learning can be simply understood as a model training method. Previously, end-to-end training used “Imitation Learning”. Compared to “Imitation Learning”, models can use Reinforcement Learning to repeatedly explore in virtual environments constructed by the World Model – earning points for correct actions and losing points for wrong ones. Through this approach, the model can autonomously explore and master better driving strategies, making assisted driving behavior better aligned with real road condition requirements.
Therefore, the key to successful Reinforcement Learning is having a high-quality World Model. Xiaomi’s World Model can not only reconstruct the real world with high fidelity but also has generative expansion capabilities, enabling it to generate specific scenarios in digital space and support various training needs in environments that affect assisted driving, such as sunny days, rainy days, snowy days, and foggy days.
It is worth mentioning that Xiaomi’s World Model has received full recognition from global academic authorities:
Xiaomi’s ViSE algorithm won the 2025 ICCV RealADSim Challenge championship and innovation awards;
Xiaomi’s generative model related paper has been accepted by NeurIPS;
*ICCV is one of the top three global computer vision academic conferences, and NeurIPS is one of the top three global artificial intelligence/machine learning academic conferences, both highly prestigious.