About Me

I'm an AI researcher focused on foundational problems — efficiency, sparsity, reasoning, transfer learning, and continual learning — where progress ripples across the entire field rather than solving one narrow task. I draw on insights from biological systems to discover simple, generalizable ideas. I care about both the science and its real-world impact, and enjoy building research agendas from the ground up while collaborating across disciplines to tackle problems that matter at scale.

Education

  • PhD in Computer Science, University of Colorado Boulder — advised by Katharina von der Wense and Larry Hunter
  • BS in Computational and Applied Mathematics (Epidemiology focus), minor in Statistics, pre-med track

Outside Research

Muay Thai  ·  Hiking  ·  Baking  ·  Snowboarding  ·  Scuba diving

Sagi Shaier

Research

EXCITATION paper figure
arXiv 2026

Excitation: Momentum For Experts

We propose EXCITATION, an optimizer-, domain-, and model-agnostic framework that dynamically modulates updates in sparse architectures using batch-level expert utilization. By amplifying highly-utilized experts and selectively suppressing low-utilization ones, EXCITATION sharpens routing specialization, rescues deep MoEs from structural confusion, and improves convergence speed and final performance across language and vision tasks.

Read paper
MALAMUTE paper figure
ACL Findings 2025

MALAMUTE: A Multilingual, Template-Free, Granular Educational Probing Dataset

We introduce MALAMUTE, the first education-based cloze-style dataset for evaluating course-related knowledge in language models. Built from expert-written, peer-reviewed probes across 71 university-level textbooks in English, Spanish, and Polish, MALAMUTE spans 33,361 curriculum concepts and 116,887 prompts, revealing significant subject-level knowledge gaps in masked and causal LMs.

Read paper
COMET paper figure
ICLR 2025

More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-inspired Fixed Routing

We propose COMET, a general deep learning method that induces a modular, sparse architecture with an exponential number of overlapping experts. COMET replaces the trainable gating function used in Sparse Mixture of Experts with a fixed, biologically inspired random projection, causing similar inputs to share more parameters — facilitating positive knowledge transfer, faster learning, and improved generalization across image classification, language modeling, and regression tasks.

Read paper
Asking Again paper figure
arXiv 2024

Asking Again and Again: Exploring LLM Robustness to Repeated Questions

We study whether repeating questions inside prompts improves LLM reading comprehension. Across five recent models and three datasets, varying question repetition from one to five times can increase accuracy by up to 6%, but the effect is not statistically significant across models, settings, and datasets, suggesting repetition alone does not reliably improve output quality.

Read paper
Lost in the Middle paper figure
arXiv 2024

Lost in the Middle, and In-Between: Enhancing Language Models' Ability to Reason Over Long Contexts in Multi-Hop QA

We examine the lost-in-the-middle problem in multi-hop question answering, where multiple necessary pieces of information are spread across long inputs. Performance degrades with both distance from context edges and distance between supporting facts, while document compression and chain-of-thought prompting offer possible mitigation paths.

Read paper
Adaptive QA paper figure
EMNLP 2024

Adaptive Question Answering: Enhancing Language Model Proficiency for Addressing Knowledge Conflicts with Source Citations

We bridge a critical gap in QA research by proposing the novel task of question answering with source citation in ambiguous settings — where multiple valid answers exist. We create five novel datasets, the first ambiguous multi-hop QA dataset with real-world contexts, two new evaluation metrics, and strong baselines across five LLMs.

Read paper
Input emphasis paper figure
ACL Findings 2024

It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension

Across 9 LLMs and 3 datasets, we show that presenting context before the question improves accuracy by up to 31%. Emphasizing the context (by concatenating a few tokens) further yields gains of up to 36%, allowing smaller models to outperform significantly larger counterparts — a surprisingly simple yet powerful finding.

Read paper
Template probing paper figure
EACL 2024

Comparing Template-based and Template-free Language Model Probing

We evaluate 16 LMs on 10 probing datasets across template-based and template-free approaches, finding that the two methods often rank models differently, scores drop by up to 42 Acc@1 between approaches, and that perplexity correlates with accuracy in opposite directions depending on the approach used.

Read paper
Desiderata paper figure
EACL 2024

Desiderata For The Context Use Of Question Answering Systems

We outline a comprehensive set of desiderata for QA systems and evaluate 11 systems on 4 datasets simultaneously. Key findings: systems less susceptible to noise are not necessarily more consistent; combining conflicting knowledge and noise can reduce performance by up to 96%.

Read paper
Stochastic Parrots paper figure
AACL 2023

Who Are All The Stochastic Parrots Imitating? They Should Tell Us!

We argue that language models in their current state will never be fully trustworthy in critical settings, and propose a novel strategy: building LMs that can cite their sources, pointing users to the training data backing their outputs. We outline the NLP sub-tasks required and call for a discussion on LM transparency.

Read paper
Personalized Medicine paper figure
AACL 2023

Emerging Challenges in Personalized Medicine: Assessing Demographic Effects on Biomedical QA Systems

We show that irrelevant patient demographic information changes up to 23% of answers in text-based QA systems and up to 15% in knowledge-graph-grounded systems — including accuracy-affecting changes — raising significant fairness concerns for biomedical AI.

Read paper
Knowledge-enhanced dialogue survey figure
arXiv 2022

Mind the Knowledge Gap: A Survey of Knowledge-enhanced Dialogue Systems

The first survey of knowledge-enhanced dialogue systems. We define three categories — internal, external, and hybrid — and survey motivation, datasets, knowledge search, encoding, and incorporation methods, drawing on theories from linguistics and cognitive science to propose future improvements.

Read paper
Disease Informed Neural Networks figure
Letters in Biomathematics 2022

Data-driven approaches for predicting spread of infectious diseases through DINNs: Disease Informed Neural Networks

We introduce DINNs — neural networks capable of learning how diseases spread, forecasting their progression, and identifying unique epidemiological parameters such as death rates. Applied to 11 highly infectious diseases (COVID, HIV, Ebola, Zika, and more) modeled by ODEs ranging from 3D to 9D.

Read paper

Contact

Interested in collaborating or just want to say hi? Feel free to reach out.

sagishaier@gmail.com