(NLP) Probing BERT: Understanding Language Model Behavior

Overview

This project investigates the interpretability of BERT’s outputs in masked language modeling through the design and implementation of probing techniques. The objective is to evaluate how BERT’s representations align with human-like language knowledge and explore the features encoded within its contextual embeddings.


Key Components

  1. Probing Design:
    • Created probes to interpret BERT’s masked language modeling predictions.
    • Investigated the extent to which BERT encodes linguistic, syntactic, and semantic features.
  2. Techniques Used:
    • Analyzed BERT’s attention mechanisms to decode feature interactions.
    • Conducted feature probes to isolate specific language structures encoded in embeddings.
  3. Evaluation:
    • Benchmarked outputs against predefined linguistic features.
    • Compared BERT’s ability to model human-like sentence completion and structure interpretation.

Methodology

  1. Masked Language Modeling:
    • Selected a corpus of sentences where key tokens were masked.
    • Predicted the missing tokens using BERT and analyzed the predictions for accuracy and contextual relevance.
  2. Sequence Probing:
    • Designed probes to analyze outputs layer-by-layer, focusing on:
      • Grammatical structure recovery.
      • Semantic coherence.
      • Lexical substitutions.
  3. Analysis of Feature Encodings:
    • Applied individual conditional techniques to understand token-level interactions.
    • Evaluated embedding separability for distinct linguistic categories (e.g., nouns vs. verbs).
  4. Comparison with Human Performance:
    • Benchmarked BERT’s predictions against human-labeled datasets.
    • Evaluated interpretability through visualization techniques, such as attention maps and feature importance plots.

Results

  1. Masked Predictions:
    • BERT consistently predicted tokens with high accuracy, achieving performance comparable to human benchmarks in most scenarios.
    • Attention layers exhibited a strong capacity for capturing dependencies across long sentence spans.
  2. Feature Probing:
    • Probing revealed that BERT encodes rich representations of syntax and semantics, often outperforming traditional language models in capturing these features.
    • Significant alignment was observed between feature embeddings and human-annotated linguistic labels.
  3. Model Interpretability:
    • Attention maps and feature probing techniques provided a clearer understanding of how BERT distributes importance across inputs.
    • The analysis uncovered that specific attention heads specialize in detecting particular linguistic structures, such as noun-verb dependencies.

Libraries Used

  • Transformers: Hugging Face library for BERT and related models.
  • Matplotlib: For visualizing attention distributions and feature importance.
  • Scikit-learn: For feature analysis and separability evaluations.

Resources