Testing Machine Intelligence: Mechanistic Interpretability and Commonsense Reasoning
Testing Machine Intelligence: An Internal Interpretation of Commonsense Reasoning
Abstract
This study investigates commonsense reasoning as a key feature of Artificial General Intelligence (AGI) and critiques traditional testing methodologies like the Turing Test and the Winograd Schema Challenge (WSC). It proposes mechanistic interpretability as an internal evaluation approach to assess AI’s reasoning capabilities. Mechanistic interpretability provides insights into the internal workings of models, enabling a more precise assessment of their ability to reason, adapt, and generalize knowledge.
Why Commonsense Reasoning?
Commonsense reasoning enables systems to make inferences and decisions based on everyday knowledge, encompassing:
- Contextual understanding (time, space, causality, relationships).
- Deductive and abductive reasoning for ambiguous or incomplete information.
- A knowledge base of physical, social, and cultural facts.
Relationship to AGI
AGI refers to systems capable of flexible problem-solving and adaptability across diverse tasks without specialized programming. While commonsense reasoning is not sufficient to define AGI, it is a necessary component for reasoning about the world, making intuitive judgments, and adapting to complex, real-world scenarios.
Failures of Prior Testing Methodologies
Limitations of the Winograd Schema Challenge
The WSC aimed to test commonsense reasoning using tasks requiring pronoun disambiguation. However:
- Transformer-based models succeeded by exploiting statistical correlations rather than genuine reasoning.
- Behavioral tests fail to evaluate the internal mechanisms of intelligence, similar to criticisms of the Turing Test and the Chinese Room Argument.
Key Observations
- Testing methodologies must assess how models reason, not just their outcomes.
- Models often pass behavioral tests without achieving genuine commonsense reasoning.
Mechanistic Interpretability: A New Paradigm for Testing
Mechanistic interpretability focuses on understanding the internal processes of models, including their architectures, parameters, and reasoning mechanisms. This approach contrasts with prior testing methods like prompting and probing, which primarily assess outputs rather than internal dynamics.
Why Mechanistic Interpretability?
- Explanation: It reveals the architecture components and reasoning steps involved in generating answers.
- Adaptation: It evaluates how models assimilate and accommodate new knowledge, akin to human learning.
Cognitive Foundations
- Inspired by Mercier’s Theory of Reasoning, which emphasizes justification and explanation in human cognition.
- Aligns with Piaget’s Cognitive Modeling:
- Assimilation: Integrating new observations into an existing world model.
- Accommodation: Reshaping the world model to include previously unaccounted-for experiences.
Mechanistic Reasoning in Practice
Knowledge-Critical Subnetworks
Mechanistic interpretability identifies knowledge-critical subnetworks, where specific parameters encode target knowledge:
- Removing the relevant subnetwork alters predictions for related prompts.
- Control mechanisms prevent irrelevant knowledge from being updated.
RECKONING System
The RECKONING framework leverages meta-learning to dynamically adapt model weights:
- Encodes knowledge from a few examples of new tasks (few-shot learning).
- Dynamically integrates this knowledge into its internal structure.
- Provides explanations by tracing which facts contributed to the model’s reasoning process.
Implications
Mechanistic reasoning showcases a model’s ability to:
- Adapt to new tasks without specialized training.
- Encode and retrieve knowledge dynamically for more effective reasoning.
Recommendations for Future Testing
- No Human in the Loop: Tests must eliminate reliance on human evaluation to avoid biases inherent in Turing-style tests.
- Explainability: Models must justify their answers by explaining their reasoning and the knowledge used.
- Dynamic Knowledge Encoding: Evaluate models’ assimilation and accommodation of new knowledge using mechanistic interpretability.
- Distractor Inclusion: Test reasoning robustness by incorporating irrelevant or misleading information.
- Meta-Learning Evaluation: Compare models’ quick adaptation capabilities to assess their reasoning versatility.
Conclusion
Mechanistic interpretability provides a powerful lens for evaluating AI systems’ commonsense reasoning capabilities. By moving beyond behavioral tests and focusing on internal processes, this approach aligns testing methodologies with the broader goal of achieving explainable, adaptable, and genuinely intelligent systems. The integration of dynamic knowledge encoding and meta-learning frameworks like RECKONING offers promising directions for advancing AI toward AGI.