I am a Ph.D. student in Computer Science at the University of Mannheim. I am interested in understanding the internal mechanisms and representations of neural networks, and using these insights to control and improve them. I also work with the Interpretable Neural Networks lab of Prof. David Bau and post about research on LessWrong.
arXiv 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
NeurIPS 2024
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
ACL 2024
A Mechanistic Analysis of a Transformer Trained on Symbolic Multi-Step Reasoning Task
The complete list of publications can be found on Google Scholar or Semantic Scholar.