Research

My current research focus is on memory systems for continually learning LLMs. I am also interested in probabilistic (Bayesian) inference and sampling methods such as Markov chain Monte Carlo and Sequential Monte Carlo, recurrent alternatives to transformers such as State Space Models and linear attention variants, and LLM adaptation, test-time training, and reasoning.

Before joining Mila, I was fortunate to have worked on a variety of research topics in my final year at the University of Toronto. I worked on probabilistic inference for language model alignment with Prof. Roger Grosse, causal inference using normalizing flows with Prof. Rahul Krishnan, and optimal scaling for Markov chain Monte Carlo with Prof. Jeffrey Rosenthal. I also spent a summer at EPFL in Switzerland, where I worked on state space models for graphs with Prof. Volkan Cevher.

For more on how these interests came together, see How my interests developed.

What I am actively thinking about

  • How to coordinate different memory systems, and how we can evaluate them
  • How probabilistic inference can be used for exploration in LLM reasoning beyond the base model
  • Meta-learning and how to model meta-learning optimization problems

Publications

Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference

Stephen Zhao, ‎⁨Aidan Li⁩, Rob Brekelmans, Roger GrosseThe Thirty-Ninth Annual Conference on Neural Information Processing Systems 2025
Abstract

Reinforcement learning (RL) has become a predominant technique to align language models (LMs) with human preferences or promote outputs which are deemed to be desirable by a given reward function. Standard RL approaches optimize average reward, while methods explicitly focused on reducing the probability of undesired outputs typically come at a cost to average-case performance. To improve this tradeoff, we introduce RePULSe, a new training method that augments the standard RL loss with an additional loss that uses learned proposals to guide sampling low-reward outputs, and then reduces those outputs’ probability. We run experiments demonstrating that RePULSe produces a better tradeoff of expected reward versus the probability of undesired outputs and is more adversarially robust, compared to standard RL alignment approaches and alternatives.

Exploring the Generalizability of the Optimal 0.234 Acceptance Rate in Random-Walk Metropolis and Parallel Tempering Algorithms

‎⁨Aidan Li⁩, Liyan Wang, Tianye Dou, Jeffrey S. RosenthalCommunications in Statistics - Simulation and Computation 2025
Abstract

For random-walk Metropolis (RWM) and parallel tempering (PT) algorithms, an asymptotic acceptance rate of around 0.234 is known to be optimal in certain high-dimensional limits. However, its practical relevance is uncertain due to restrictive derivation conditions. We synthesise previous theoretical advances in extending the 0.234 acceptance rate to more general settings, and demonstrate its applicability with a comprehensive empirical simulation study on examples examining how acceptance rates affect Expected Squared Jumping Distance (ESJD). Our experiments show the optimality of the 0.234 acceptance rate for RWM is surprisingly robust even in lower dimensions across various non-spherically symmetric proposal distributions, multimodal target distributions that may not have an i.i.d. product density, and curved Rosenbrock target distributions with nonlinear correlation structure. Parallel tempering experiments also show that the idealized 0.234 spacing of inverse temperatures may be approximately optimal for low dimensions and non i.i.d. product target densities, and that constructing an inverse temperature ladder with spacings given by a swap acceptance of 0.234 is a viable strategy.

Experimenting with Experimentation: Rethinking the Role of Experimentation in Educational Design

Mohi Reza, Akmar Chowdhury, ‎⁨Aidan Li⁩, Mahathi Gandhamaneni, Joseph Jay Williams3rd Annual Workshop on A/B Testing and Platform-Enabled Learning Research at The ACM Conference on Learning @ Scale 2022
Abstract

What if we take a broader view of what it means to run an education experiment? In this paper, we explore opportunities that arise when we think beyond the commonly-held notion that the purpose of an experiment is to either accept or reject a pre-defined hypothesis and instead, reconsider experimentation as a means to explore the complex design space of creating and improving instructional content. This is an approach we call experiment-inspired design. Then, to operationalize these ideas in a real-world experimentation venue, we investigate the implications of running a sequence of interventions teaching first-year students meta-skills: transferable skills applicable to multiple areas of their lives, such as planning, and managing stress. Finally, using two examples as case studies for meta-skills interventions (stress-reappraisal and mental contrasting with implementation intentions), we reflect on our experiences with experiment-inspired design and share six preliminary lessons on how to use experimentation for design.