Are LLM Belief Updates Consistent with Bayes’ Theorem?
This paper evaluates whether large language models (LLMs) update their beliefs over propositions in a Bayesian-consistent manner as they scale in size and capability.
This paper evaluates whether large language models (LLMs) update their beliefs over propositions in a Bayesian-consistent manner as they scale in size and capability.
Built an AI agent in 24 hours to analyze user banking data, predict financial trends, and create meme-based insights—Top 3 project at the bunq Hackathon.
Exploring the ongoing debate in AI between symbolic, subsymbolic, and hybrid approaches, with a focus on deep learning limitations and the promise of neuro-compositional systems.
Exploring commonsense reasoning as a cornerstone of machine intelligence and evaluating mechanistic interpretability as a method for assessing AI capabilities.
Analyzing the ERA agent’s architecture, functionalities, and its applications in determining the veracity of news headlines
Implementing Epsilon-Greedy, Semi-Grad SARSA, and Q-Learning for solving multi-armed bandit and episodic MDP tasks
Exploring the interpretability of BERT’s outputs through probing techniques for linguistic and semantic analysis
Exploring structural causal models and the PC algorithm for causal discovery
Evaluating sequence labeling and contextual embeddings for linguistic tasks
Analyzing probabilistic parsing techniques for syntactic structure recognition
A study on the effectiveness of n-gram models and perplexity evaluation for language prediction tasks
Analyzing factors contributing to diabetes patient readmissions using various explainable AI techniques
Exploring various models for job interview invitation predictions using explainability techniques
Exploring VisionTransformers and RNNs for brain state classification
A study on classifying deceptive reviews using Naive Bayes, Logistic Regression, Decision Trees, and Random Forests
A comparative analysis between logistic regression and SVM models for digit recognition tasks
Offline color modeling and real-time voxel labeling for multi-person tracking
Implementation of small scale variant of the paper GloVe
Implementation of tutorials from a neuroscience masterclass for the CCSS at the Utrecht University
Implementation of a Two-stream network for classifying actions
Implementation and optimization of LeNet variants for FashionMNIST Classification