Action Recognition in Still Images and Videos

Project Overview

This project focuses on performing action classification in still images and videos using Convolutional Neural Networks (CNNs). We implement a Two-stream network architecture, combining spatial and temporal information for improved action recognition.

Key Components

Datasets:
- Still images: Stanford40
- Videos: HMDB51
- Custom two-stream dataset
Models:
- Frames CNN (Spatial model)
- Optical Flow CNN (Temporal model)
- Two-stream CNN (Combined model)
Techniques:
- Transfer learning
- Fine-tuning
- Optical flow extraction
- CNN output fusion

Methodology

Train and evaluate the Spatial model on Stanford40
Fine-tune the Spatial model on HMDB51
Train and evaluate the Temporal model on HMDB51
Combine pre-trained weights in the Two-stream model
Evaluate the Two-stream model on the custom dataset

Two-stream Hypothesis

The project leverages the Two-stream hypothesis of the human visual cortex:

Ventral stream: Object recognition
Dorsal stream: Action recognition

By combining a DeepCNN for object recognition from still images with motion data extracted from optical flow, we aim to achieve better performance in action recognition tasks.

Resources

Code Repository: GitHub - Action Recognition
Course: Computer Vision at Utrecht University

Share on

Twitter Facebook LinkedIn

Riccardo Campanella