I’m a Member of Technical Staff at Anthropic working on aligning language models with human preferences. Previously, I was a PhD student at the University of Sussex with Chris Buckley and Anil Seth focusing on RL from human feedback (RLHF) and spent time as a visiting researcher at NYU working with Ethan Perez, Sam Bowman and Kyunghyun Cho. I also interned at Naver Labs Europe and FAR AI. Before that, I studied cognitive science, philosophy and physics at the University of Warsaw and worked on compositional generalisation and emergent communication with Joanna Rączaszek-Leonardi and Piotr Miłoś, and on Bayesian accounts of self-organisation with Marcin Miłkowski.
Highlighted papers
-
RL with KL penalties is better viewed as Bayesian inference
Findings of EMNLP 2022
-
Energy-based models for code generation under compilability constraints
NLP4Programming workshop, ACL 2021
-
Measuring non-trivial compositionality in emergent communication
Emergent communication workshop, NeurIPS 2020
-
Developmentally motivated emergence of compositional communication via template transfer
Emergent communication workshop, NeurIPS 2019