Prior Work

CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring

Benjamin Arnav*, Pablo Bernabeu-Pérez*, Nathan Helm-Burger*, Timothy H. Kostolansky*, Hannes Whittingham*, Mary Phuong

arXiv, poster, In Proc. NeurIPS 2025, June 2025

Inverse Constitutional AI

Timothy H. Kostolansky

pdf, Master's Thesis, May 2024

Iterative Interactive Inverse Constitutional AI (I^3CAI)

Timothy H. Kostolansky*, Julian Manyika*

pdf, Class Project, May 2024

RL-Augmented Action Spaces in MsPacman

Timothy H. Kostolansky*, Julian Yocum*

pdf, Class Project, May 2024

The Effect of Activation Functions On Superposition in Toy Models

Timothy H. Kostolansky*, Vedang Lad*

blog post, Blog Post, December 2023