Edge AI in Action: Mastering On-Device Inference
Abstract
Edge AI enables real-time, low-latency inference directly on devices, but achieving high performance and efficiency requires specialized optimization and deployment techniques tailored to heterogeneous hardware. This tutorial provides a hands-on guide to on-device inference, focusing on end-to-end workflows for optimizing and deploying deep learning models on leading edge platforms such as Qualcomm Snapdragon and NVIDIA Jetson. It covers key techniques including model compression, quantization, and hardware-aware optimization, along with practical tools such as SNPE and TensorRT. Through comparative analysis and real-world case studies, the tutorial highlights best practices for achieving efficient, low-latency performance in applications ranging from computer vision to multimodal AI.