Testing Machine Intelligence: Mechanistic Interpretability and Commonsense Reasoning

Abstract

This study investigates commonsense reasoning as a key feature of Artificial General Intelligence (AGI) and critiques traditional testing methodologies like the Turing Test and the Winograd Schema Challenge (WSC). It proposes mechanistic interpretability as an internal evaluation approach to assess AI’s reasoning capabilities. Mechanistic interpretability provides insights into the internal workings of models, enabling a more precise assessment of their ability to reason, adapt, and generalize knowledge.

Why Commonsense Reasoning?

Commonsense reasoning enables systems to make inferences and decisions based on everyday knowledge, encompassing:

Contextual understanding (time, space, causality, relationships).
Deductive and abductive reasoning for ambiguous or incomplete information.
A knowledge base of physical, social, and cultural facts.

Relationship to AGI

AGI refers to systems capable of flexible problem-solving and adaptability across diverse tasks without specialized programming. While commonsense reasoning is not sufficient to define AGI, it is a necessary component for reasoning about the world, making intuitive judgments, and adapting to complex, real-world scenarios.

Failures of Prior Testing Methodologies

Limitations of the Winograd Schema Challenge

The WSC aimed to test commonsense reasoning using tasks requiring pronoun disambiguation. However:

Transformer-based models succeeded by exploiting statistical correlations rather than genuine reasoning.
Behavioral tests fail to evaluate the internal mechanisms of intelligence, similar to criticisms of the Turing Test and the Chinese Room Argument.

Key Observations

Testing methodologies must assess how models reason, not just their outcomes.
Models often pass behavioral tests without achieving genuine commonsense reasoning.

Mechanistic Interpretability: A New Paradigm for Testing

Mechanistic interpretability focuses on understanding the internal processes of models, including their architectures, parameters, and reasoning mechanisms. This approach contrasts with prior testing methods like prompting and probing, which primarily assess outputs rather than internal dynamics.

Why Mechanistic Interpretability?

Explanation: It reveals the architecture components and reasoning steps involved in generating answers.
Adaptation: It evaluates how models assimilate and accommodate new knowledge, akin to human learning.

Cognitive Foundations

Inspired by Mercier’s Theory of Reasoning, which emphasizes justification and explanation in human cognition.
Aligns with Piaget’s Cognitive Modeling:
- Assimilation: Integrating new observations into an existing world model.
- Accommodation: Reshaping the world model to include previously unaccounted-for experiences.

Riccardo Campanella

Testing Machine Intelligence: Mechanistic Interpretability and Commonsense Reasoning

Abstract

Why Commonsense Reasoning?

Relationship to AGI

Failures of Prior Testing Methodologies

Limitations of the Winograd Schema Challenge

Key Observations

Mechanistic Interpretability: A New Paradigm for Testing

Why Mechanistic Interpretability?

Cognitive Foundations

Mechanistic Reasoning in Practice

Knowledge-Critical Subnetworks

RECKONING System

Implications

Recommendations for Future Testing

Conclusion

Full Report

Share on