Skip to yearly menu bar Skip to main content


Poster

Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation

Shahad Albastaki · Anabia Sohail · IYYAKUTTI IYAPPAN GANAPATHI · Basit Alawode · Asim Khan · Sajid Javed · Naoufel Werghi · Mohammed Bennamoun · Arif Mahmood

[ ] [ Paper PDF ]
[ Poster
Sun 15 Jun 8:30 a.m. PDT — 10:30 a.m. PDT

Abstract:

In Computational Pathology (CPath), the introduction of Vision-Language Models (VLMs) has opened new avenues for research, focusing primarily on aligning image-text pairs at a single magnification level. However, this approach might not be sufficient for tasks like cancer subtype classification, tissue phenotyping, and survival analysis due to the limited level of detail that a single-resolution image can provide. Addressing this, we propose a novel multi-resolution paradigm leveraging Whole Slide Images (WSIs) to extract histology patches at multiple resolutions and generate corresponding textual descriptions through advanced CPath VLM. This method aims to capture a broader range of information, supported by novel loss functions, enriches feature representation, improves discriminative ability, and enhances generalization across different resolutions. Pre-trained on a comprehensive TCGA dataset with 34 million image-language pairs at various resolutions, our fine-tuned model outperforms State-Of-The-Art (SOTA) counterparts across multiple datasets and tasks, demonstrating its effectiveness in CPath. The code is available on GitHub at xxx.

Chat is not available.