Looking Beyond the Window: Global-Local Aligned CLIP for Training-free Open-Vocabulary Semantic Segmentation

Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, 2026

We propose a training-free framework that aligns CLIP’s global context with local window features to improve open-vocabulary semantic segmentation, enabling dense predictions on unseen concepts without additional supervision.

Recommended citation: ByeongCheol Lee, Hyun Seok Seong, Sangeek Hyun, Gilhan Park, WonJun Moon, and Jae-Pil Heo. "Looking Beyond the Window: Global-Local Aligned CLIP for Training-free Open-Vocabulary Semantic Segmentation." In CVPR 2026.
Download Paper

[GitHub] [arXiv]

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Byeong Cheol Lee

Share on