Looking Beyond the Window: Global-Local Aligned CLIP for Training-free Open-Vocabulary Semantic Segmentation

Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, 2026

We propose a training-free framework that aligns CLIP’s global context with local window features to improve open-vocabulary semantic segmentation, enabling dense predictions on unseen concepts without additional supervision.

Recommended citation: ByeongCheol Lee, Hyun Seok Seong, Sangeek Hyun, Gilhan Park, WonJun Moon, and Jae-Pil Heo. "Looking Beyond the Window: Global-Local Aligned CLIP for Training-free Open-Vocabulary Semantic Segmentation." In CVPR 2026.
Download Paper

[GitHub] [arXiv]