Insights into Top Paper Nominee, “Data-driven Feature Tracking for Event Cameras”

Insights into "Data-driven Feature Tracking for Event Cameras”

A Q&A with the Authors

Nico Messikommer, Carter Fang, Mathias Gehrig, Davide Scaramuzza

Paper Presentation: Thursday, 22 June, 3:40 p.m. PDT, East Exhibit Halls A-B

For the authors of CVPR 2023 paper, “Data-driven Feature Tracking for Event Cameras,” the key to improving feature-tracking using event cameras lies in a data-driven nature, novel frame attention module, and a unique self-supervision scheme designed to improve generalization. The following Q&A interview explores just how this technique works.

CVPR: Will you please share a little more about your work and results? How is it different than the standard approaches to date?
Our work primarily focuses on improving feature tracking using event cameras, a type of bio-inspired vision sensor that provides information asynchronously when the brightness change at an individual pixel crosses a certain threshold. Event cameras are particularly advantageous due to their high temporal resolution, increased resilience to motion blur, low power consumption, and sparse output. However, feature tracking techniques for event cameras so far have relied on classical model assumptions, leading to poorer performance under noisy conditions or in diverse scenarios.

Our approach differs from existing methods as it introduces the first data-driven feature tracker for event cameras. Instead of depending on handcrafted techniques, which often require extensive parameter tuning and fall short under varying conditions, we leverage machine learning to build a more flexible and generalizable tracker.

We developed a neural network that localizes a template patch from a grayscale image in subsequent event patches, thereby allowing for high-quality feature tracking at an arbitrary frequency. Our network architecture includes a correlation volume for assignment and employs recurrent layers for maintaining long-term consistency. We also introduced a unique frame attention module that shares information across different feature tracks in an image, further enhancing the robustness of the tracking process.

CVPR: How did your model outperform other options? What was the key factor in these results?
The key to our model's superior performance lies in its data-driven nature, the novel frame attention module, and a unique self-supervision scheme designed to improve generalization.

The data-driven nature of our approach allows the tracker to learn from data and hence adapt to various scenarios without requiring extensive parameter tuning, a common limitation in standard methods. This approach enables our model to generalize across diverse environments and conditions, significantly enhancing performance.

The frame attention module helps share information across different feature tracks in one image. This shared information aids in accurate feature tracking, enhancing the overall robustness and quality of the tracking process.

Furthermore, we use a self-supervision scheme based on 3D point triangulation using camera poses. This self-supervision strategy allows us to fine-tune our tracker on real data after initially training it on a synthetic optical flow dataset.

Combining these elements resulted in our data-driven tracker outperforming the existing approaches in relative feature age by up to 120%, with the lowest latency. The performance gap further increased to 130% when we adapted our tracker to real data using our self-supervision strategy.

Another advantage over prior event-based trackers is that our approach can leverage parallel computation on GPUs by avoiding asynchronous computation. As a result, even without optimizing the code for deployment, our method achieved faster inference than existing methods, further highlighting the effectiveness of our approach.

CVPR: So, what’s next? What do you see as the future of your research?
There are numerous potential extensions to our work. In this paper, we predetermined which features we intended to track. An intriguing next step would be to develop an algorithm that can autonomously detect and decide which features should be tracked based on the specific needs of downstream applications. For instance, in the context of visual odometry, we could integrate our approach into an end-to-end pipeline. This could facilitate the optimization of feature detection and tracking directly, further enhancing the system's overall effectiveness and efficiency.

In addition, this research could be invaluable for robotics applications, like fast navigation with drones. Implementing our data-driven feature tracker could help improve the real-time navigation and obstacle-avoidance capabilities of these autonomous systems.

Finally, another intriguing direction would be exploring ways to reduce bandwidth consumption. We can achieve this by triggering frames only when absolutely necessary, for example, when the system loses too many tracks. This approach could balance between ensuring high-quality feature tracking and managing resource usage, a critical consideration for mobile and power-sensitive applications.

Annually, CVPR recognizes top research in the field through its prestigious “Best Paper Awards.” This year, from more than 9,000 paper submissions, the CVPR 2023 Paper Awards Committee selected 12 candidates for the coveted honor of Best Paper. Join us for the Award Session on Wednesday, 21 June at 8:30 a.m. to find out which nominees take home the distinction of “Best Paper” at CVPR 2023.