Tutorial Wed, Jun 3, 2026 • AM 702

Edge AI in Action: Mastering On-Device Inference

Fabricio Batista Narcizo · Elizabete Munzlinger · Sai Narsi Reddy Donthi Reddy · Shan Ahmed Shaffi

Project Page

Abstract

Edge AI enables real-time, low-latency inference directly on devices, but achieving high performance and efficiency requires specialized optimization and deployment techniques tailored to heterogeneous hardware. This tutorial provides a hands-on guide to on-device inference, focusing on end-to-end workflows for optimizing and deploying deep learning models on leading edge platforms such as Qualcomm Snapdragon and NVIDIA Jetson. It covers key techniques including model compression, quantization, and hardware-aware optimization, along with practical tools such as SNPE and TensorRT. Through comparative analysis and real-world case studies, the tutorial highlights best practices for achieving efficient, low-latency performance in applications ranging from computer vision to multimodal AI.