Hi, I’m Tim.

I am currently exploring how machines learn. I am also interested in solving problems arising from creating and adopting machine intelligence.

Currently, I am working on understanding how transformer language models represent the various entities involved in generating natural language at the Center for Human-Compatible AI.

Previously, I've worked on stress-testing CoT monitoring at LASR Labs and interpretability, red-teaming, and steering of language models with the Algorithmic Alignment Group.

Timothy H. Kostolansky

Previous Work

CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring

Benjamin Arnav*, Pablo Bernabeu-Pérez*, Nathan Helm-Burger*, Timothy Kostolansky*, Hannes Whittingham*, Mary Phuong

Pre-print

Inverse Constitutional AI

Timothy Kostolansky

Master's Thesis

The Effect of Activation Functions On Superposition in Toy Models

Timothy Kostolansky*, Vedang Lad*

Blog Post

Iterative Interactive Inverse Constitutional AI (I^3CAI)

Timothy Kostolansky*, Julian Manyika*

Class Project

RL-Augmented Action Spaces in MsPacman

Timothy Kostolansky*, Julian Yocum*

Class Project