Autoencoders are letting us peer into the black box of artificial intelligence. They could help us create AI that is better understood, and more easily controlled. AI has led to breakthroughs in drug ...
Anthropic says it has developed a new tool designed to better understand how its Claude AI model processes information and generates responses.
Bhalla, Usha, Alex Oesterling, Claudio Mayrink Verdun, Himabindu Lakkaraju, and Flavio Calmon. "Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability." ...
Jiaxun Li, Aaron, Suraj Srinivas, Usha Bhalla, and Himabindu Lakkaraju. "Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders." Proceedings of the Conference of the ...
Anthropic says it may have found a way to understand what its AI model Claude is "thinking" internally. The company's new system translates hidden AI activation patterns into readable text, which ...