Publications

Preprints

Cross-Domain Contrastive Training of Embedding Models for Insight-Guided Agentic Reasoning

Tsz Ting Chung, Mo Yu*, Jie Zhou*, Dit-Yan YeungPreprint.

(Agentic Retrieval) To address the problem of poor retrieval from queries to procedural guidance, we introduce a new embedding model trained from preference pairs that generalizes across reasoning and agentic domains.

Project · Paper

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Mo Yu*, Tsz Ting Chung*, Chulun Zhou*, Tong Li*, Rui Lu*, Jiangnan Li*, Liyan Xu*, Haoshu Lu, Ning Zhang, Jing Li, Jie ZhouIn Submission.

(Measure of AGI) Introduce a long-context benchmark requiring global comprehension and deep reasoning. Experiments show ICL, RAG, SFT, and DeepResearch systems fall behind humans and even more in reasoning.

Project · Paper · Hugging Face

Publications

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

Tsz Ting Chung, Lemao Liu, Mo Yu, Dit-Yan YeungICML 2026.

(Many-shot CoT-ICL) It scales unreliably, similarity retrieval fails, and order effects grow with more demos. We reframe CoT-ICL as in-context test-time learning with 2 principles and propose CDS to improve reasoning.

Project · Paper

DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models

Tsz Ting Chung, Lemao Liu, Mo Yu, Dit-Yan YeungEMNLP 2025.

(Logic Evaluation) Introduce a new benchmark to assess LLMs’ logical reasoning while minimizing external influences, address data distribution bias, and propose a metric to reduce evaluation bias and uncertainty.

Project · Paper · Hugging Face

Unified Triplet-Level Granularity Hallucination Evaluation for Vision Language Models

Junjie Wu*, Tsz Ting Chung*, Kai Chen*, Dit-Yan YeungTMLR 2025.

(LVLM Hallucination) Introduce a new framework to evaluate LVLMs’ hallucination on the triplet level, with a benchmark dataset for evaluation and a mitigation method proposed based on the paper’s findings.

Project · Paper · Code

The Stochastic Parrot on LLMs Shoulder: A Summative Assessment of Physical Concept Understanding

Mo Yu*, Lemao Liu*, Junjie Wu*, Tsz Ting Chung*, Shunchi Zhang*, Jiangnan Li, Dit-Yan Yeung, Jie ZhouNAACL 2025 (Oral).

(Measure of AGI) Investigate the stochastic parrot phenomenon and propose a task that alleviates the memorization issue via the usage of grid-format inputs that abstractly describe physical phenomena.

Project · Paper · Code

Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability

Tsz Ting Chung, Leyang Cui, Lemao Liu, Xinting Huang, Shuming Shi, Dit-Yan YeungEMNLP 2024.

(Token Compression) With simple tuning and small additional parameters, LLMs can achieve a better or similar level of performance in natural language understanding tasks with compressed demonstrations.

Project · Paper · Code