Insights into Top Paper Nominee, “Planning-oriented Autonomous Driving”

Insights into Top Paper Nominee, "Planning-oriented Autonomous Driving"

A Q&A with the Authors

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li

Paper Presentation: Tuesday, 20 June, 3:50 p.m. PDT, East Exhibit Halls A-B

The CVPR 2023 paper, “Planning-oriented Autonomous Driving,” introduces UniAD (Unified Autonomous Driving algorithm framework), which distinguishes itself from all previous approaches in in terms of its functionality, capability, and openness. Find out more in the following Q&A with the research team.

CVPR: Will you please share a little more about your work and results? How is it different than the standard approaches to date?

We present UniAD (Unified Autonomous Driving algorithm framework), a task-oriented end-to-end framework for autonomous driving. UniAD incorporates a variety of structured autonomous driving tasks covering a wide span from perception, prediction to planning, in pursuit of safe planning. We aim at answering the following questions: (1) what preceding modules are necessary; (2) how to organize the tasks to build up a coherent system; (3) how preceding modules contribute to the ultimate module as in planning.

To promote synergistic interaction among these tasks, we devise a unified query design which serves as an interface to synchronize all tasks during training and transmit valuable safety-critical knowledge to the planner. Notably, on the challenging nuScenes dataset, UniAD surpasses previous end-to-end methods by a large margin, achieving state-of-the-art results in all investigated tasks. It even outperforms LiDAR-based frameworks for the planning task using vision-based input only. We also showcase that the planning-oriented design could potentially recover upstream errors in the planner, such as generating reasonable planning outcomes even when a critical vehicle is undetected in the upstream module.

UniAD distinguishes it from all previous approaches in terms of its functionality, capability, and openness.

Functionality: In contrast to the independently trained standalone modules or parallel multi-task learning designs, UniAD performs five crucial driving tasks jointly to assist planning, including detection, tracking, mapping, motion forecasting and occupancy prediction. Compared to previous end-to-end research, it offers more representative intermediates of different tasks and is more interpretable.
Capability: Benefitting from our unified query design, UniAD optimizes all task nodes jointly and demonstrates impressive capability on coordinating tasks, which contributes to safe planning.
Openness: UniAD is the first open-source full-stack autonomous driving algorithm worldwide, aiming to benefit future research and development of autonomous driving in both academia and industry. For more details, see https://opendrivelab.com.

CVPR: How did your model outperform other options? What was the key factor in these results?

The impressive performance of UniAD can be attributed to three major factors: a unified query design, end-to-end joint optimization, and exploration of complex relations in the scene.

Unified query design: Multiple groups of queries are optimized to abstract task-specific agent- or scene-level knowledge in each task node, which efficiently transmit knowledge to downstream nodes and further assist planning.
End-to-end joint optimization: All task nodes are connected by unified queries and optimized together to improve task coordination and feature alignment across nodes.
Complex interactions in the driving scene: All task modules are designed using transformers, with a variety of attention mechanisms to model the interactions between dynamic agents and static map elements in the scene.

CVPR: So, what’s next? What do you see as the future of your research?

We envision multiple aspects for the future work. Given the strong motivation for applying autonomous driving technology, a critical next step is to deploy UniAD in the real world. This will require engineering efforts such as finetuning, distillation, and quantization to make it deployable. While UniAD has been primarily benchmarked on the nuScenes public dataset, deployment in real-world scenarios is likely to expose new challenges and opportunities for further enhancement.

In terms of feasible enhancements on UniAD, the design space of each task module and learning scheme are still worth exploration. For instance, the implementation of MapFormer could be upgraded to more advanced topology reasoning algorithms to provide richer geometry instructions for autonomous vehicles. We are glad to see that there has already been some research working on this.

We have observed that the perception performance of UniAD is affected by the long-tail data distribution of the nuScenes dataset. One promising direction is to scale up the training data to cover more scenarios and diverse driving situations.

We view UniAD as the first step towards a large foundation model for autonomous driving. A potential direction is to jointly scale up the data and model capacity to build such a foundation model with visual and language understanding and decision intelligence. This would require significant research efforts but has the potential to revolutionize the field of autonomous driving.

CVPR: What more would you like to add?

We want to emphasize the importance of our planning-oriented philosophy for building reliable autonomous driving systems. This philosophy focuses on effectively coordinating multiple tasks and transmitting upstream knowledge to the final planner to make safe decisions.

Given the success of large foundation models, we believe that high-level decision intelligence in dynamic environments could emerge from large-scale training with well-organized tasks and data. The modularized end-to-end training mechanism of UniAD has the potential to be developed into a pre-training framework to serve all the autonomous driving applications.

Annually, CVPR recognizes top research in the field through its prestigious “Best Paper Awards.” This year, from more than 9,000 paper submissions, the CVPR 2023 Paper Awards Committee selected 12 candidates for the coveted honor of Best Paper. Join us for the Award Session on Wednesday, 21 June at 8:30 a.m. to find out which nominees take home the distinction of “Best Paper” at CVPR 2023.