Prior Work
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
Benjamin Arnav*, Pablo Bernabeu-Pérez*, Nathan Helm-Burger*, Timothy H. Kostolansky*, Hannes Whittingham*, Mary Phuong
Iterative Interactive Inverse Constitutional AI (I^3CAI)
Timothy H. Kostolansky*, Julian Manyika*
pdf, Class Project, May 2024
RL-Augmented Action Spaces in MsPacman
Timothy H. Kostolansky*, Julian Yocum*
pdf, Class Project, May 2024
The Effect of Activation Functions On Superposition in Toy Models
Timothy H. Kostolansky*, Vedang Lad*
blog post, Blog Post, December 2023