Hi, I’m Tim.
I am currently exploring how machines learn. I am also interested in solving problems arising from creating and adopting machine intelligence.
Currently, I am working on understanding how transformer language models represent the various entities involved in generating natural language at the Center for Human-Compatible AI.
Previously, I've worked on stress-testing CoT monitoring at LASR Labs and interpretability, red-teaming, and steering of language models with the Algorithmic Alignment Group.
Previous Work
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
Benjamin Arnav*, Pablo Bernabeu-Pérez*, Nathan Helm-Burger*, Timothy Kostolansky*, Hannes Whittingham*, Mary Phuong
Pre-print
The Effect of Activation Functions On Superposition in Toy Models
Timothy Kostolansky*, Vedang Lad*
Blog Post
Iterative Interactive Inverse Constitutional AI (I^3CAI)
Timothy Kostolansky*, Julian Manyika*
Class Project