Paper
in
Workshop: 5th International Workshop on Event-based Vision
BRAT: Bidirectional Relative Positional Attention Transformer for Event-based Eye tracking
Yuliang Wu · Han Han · Jinze Chen · Wei Zhai · Yang Cao · Zheng-Jun Zha
Event-based eye tracking, with its neuromorphic triggering mechanism, offers high temporal resolution and low power consumption, making it a promising technology for future application. However, this triggering mechanism is a double-edged sword. While providing exceptional temporal precision for eye tracking, it also introduces challenges such as the loss of static information, increased complexity of dynamic signals, and irregular spatio-temporal distribution of events. To tackle these challenges, this paper presents a Bidirectional Relative Positional Attention Transformer (BRAT) architecture, designed to fully exploit the spatio-temporal sequence information within event streams, enabling stable and precise eye tracking. The proposed network is composed of a spatial encoder and a temporal decoder. The former utilizes a CNN to extract geometric structural features from event representations, while the latter combines a Bi-GRU block and the BRAT block to analyze temporal motion patterns and accurately localize pupil positions. Furthermore, we propose a multi-time-step training strategy, which improves the model's stability and accuracy by incorporating event representations across multiple time spans as input. Tests on the ThreeET-plus benchmark demonstrate BRAT's high tracking accuracy and stability for complex eye movement patterns, and achieving state-of-the-art performance.