Action Recognition in Still Images and Videos
Project Overview
This project focuses on performing action classification in still images and videos using Convolutional Neural Networks (CNNs). We implement a Two-stream network architecture, combining spatial and temporal information for improved action recognition.
Key Components
- Datasets:
- Still images: Stanford40
- Videos: HMDB51
- Custom two-stream dataset
- Models:
- Frames CNN (Spatial model)
- Optical Flow CNN (Temporal model)
- Two-stream CNN (Combined model)
- Techniques:
- Transfer learning
- Fine-tuning
- Optical flow extraction
- CNN output fusion
Methodology
- Train and evaluate the Spatial model on Stanford40
- Fine-tune the Spatial model on HMDB51
- Train and evaluate the Temporal model on HMDB51
- Combine pre-trained weights in the Two-stream model
- Evaluate the Two-stream model on the custom dataset
Two-stream Hypothesis
The project leverages the Two-stream hypothesis of the human visual cortex:
- Ventral stream: Object recognition
- Dorsal stream: Action recognition
By combining a DeepCNN for object recognition from still images with motion data extracted from optical flow, we aim to achieve better performance in action recognition tasks.
Resources
- Code Repository: GitHub - Action Recognition
- Course: Computer Vision at Utrecht University