Research
I am currently exploring the field of continual learning and searching for interesting problems to work on. My interest in continual learning stems from two main sources:
- Since model decay in the real world is inevitable as data distributions shift, successful continual learning schemes would allow models to be used for longer, reducing the need to pretrain new models from scratch and saving costs.
- From a cognitive science perspective, continual learning is a natural way to model human learning.
I was fortunate to have worked on a variety of research topics in my Bachelor's degree at the University of Toronto. I worked on probabilistic inference for language model alignment with Prof. Roger Grosse, causal inference using normalizing flows with Prof. Rahul Krishnan, and optimal scaling for Markov chain Monte Carlo with Prof. Jeffrey Rosenthal. I also spent a summer at EPFL in Switzerland, where I worked on state space models for graphs with Prof. Volkan Cevher.
Continual Learning
Continual learning is a field of machine learning that focuses on learning from a stream of data, where the data is not i.i.d. The goal is to learn a model that can adapt to new tasks while maintaining performance on previous tasks. This is a challenging problem because a model tends to either forget old tasks or fail to learn new tasks; this is called the stability-plasticity dilemma.
Research Projects
Current
- Koopman theory / dynamical systemsStudying koopman theory for dynamical systems and how machine learning can be used to model a dynamical system.
Past
- Language model alignmentWIP. Using probabilistic inference to reduce the probability of harmful outputs from language models.
- Causal inference with normalizing flowsDifferentiable likelihood-based causal order discovery framework with normalizing flows and Plackett-Luce models.
- State space models for temporal graph dataA novel state space model architecture for processing temporal graph data such as EEG seizure data.
- MCMC Optimal ScalingAnalysis of acceptance rates and scaling properties for Metropolis algorithms in high-dimensional spaces.