TRUST-VL: A Unified and Explainable Vision–Language Model for General Multimodal Misinformation Detection

Zehong Yan, Peng Qi, Wynne Hsu and Mong Li Lee
National University of Singapore

arXiv Homepage Homepage Hugging Face Hugging Face

EMNLP, 2025 / PDF / Project Page / Code / Data

We present a unified and explainable vision–language model for various multimodal misinformation detection tasks across textual, visual, and cross-modal distortions, enhanced by the Question-Aware Visual Amplifier (QAVA) and supported by the large-scale TRUST-Instruct dataset of 198K reasoning samples.

Read More

Mitigating GenAI-powered Evidence Pollution for Out-of-Context Multimodal Misinformation Detection

Zehong Yan1, Peng Qi1, Wynne Hsu1, Mong Li Lee1
1NUS Centre for Trusted Internet & Community, National University of Singapore

arXiv, 2025 / PDF / Project Page / Code / Data

We investigate how polluted evidence affects the performance of existing OOC detectors, revealing a performance degradation of more than 9 percentage points. We further propose two strategies, cross-modal evidence reranking and cross-modal claim-evidence reasoning, to address the challenges posed by polluted evidence.

Read More

Modeling Complex Interactions in Long Documents for Aspect-Based Sentiment Analysis

Zehong Yan1, Wynne Hsu1, Mong Li Lee1, David Roy Bartram-Shaw2
1NUS Centre for Trusted Internet & Community, National University of Singapore
2Edelman Data & Intelligence

WASSA Workshop, ACL, 2024 / PDF / Project Page / Code / Data

We introduce DART, a hierarchical transformer-based framework for aspect-based sentiment analysis in long documents. DART handles the complexities of longer text through its global context interaction and two-level aspect-specific aggregation blocks. For empirical validation, we curate two datasets for aspect-based sentiment analysis in long documents: SocialNews and TrustData.

Read More

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

Peng Qi, Zehong Yan, Wynne Hsu, Mong Li Lee
National University of Singapore

CVPR, 2024 / PDF / Project Page / Code / Video

Focusing on the innovative research perspective of explainable out-of-context misinformation detection, this paper proposes a new multimodal large language model, SNIFFER, designed to offer both accurate detection and persuasive explanations simultaneously. Enhanced by two-stage instruction tuning and retrieval-enhancement techniques, SNIFFER effectively models both internal image-text inconsistency and external claim-evidence relationships.

Read More