Course Project
Guidelines and suggestions for course projects
AIAA 4220 - Introduction to Embodied AI Project 2: Egocentric Object Detection Challenge
Overview
AIAA 4220 - Introduction to Embodied AI Project 2: Egocentric Object Detection Challenge
Welcome to the second project of AIAA 4220 Introduction to Embodied AI course! This competition challenges you to develop robust object detection algorithms for first-person perspective images.
🏆 Kaggle Competition: https://www.kaggle.com/t/beb9f7424edf46799536b93620e26f2f
đź“– How to Join Kaggle Competition: https://www.kaggle.com/discussions/general/400680
Competition Highlights:
- Dataset: 13,000 egocentric images from EgoObjects dataset
- Task: Detect 10 common object categories in first-person view
- Framework Freedom: Use any detection method (YOLO, Faster R-CNN, DETR, etc.)
- Team Size: 1-3 students per team
- Baseline Performance: mAP@0.5:0.95 of 0.759-0.788 provided (MMDetection)
- Prize: Top 3 teams present methods on Dec 3 (4%/2%/1% extra credit)
This is a learning-focused competition where methodology explanation and code quality matter as much as leaderboard ranking. Perfect for exploring modern object detection techniques while working with challenging real-world data.
Description
The Challenge
Egocentric (first-person) vision presents unique challenges compared to traditional third-person object detection. Objects appear at unusual angles, suffer from motion blur, and exhibit extreme scale variations. Your task is to build a detector that can reliably identify everyday objects in this challenging domain.
Dataset Details
- Source: Curated from EgoObjects dataset
- Total Images: 13,000 first-person perspective images
- Split: 10,000 train / 1,000 validation / 2,000 test
- Annotation Format: COCO-style JSON with 2D bounding boxes
Object Categories (10 classes):
- Box
- Book
- Bottle
- Chair
- Mug
- Door
- Shelf
- Plate
- Sofa
Technical Requirements
Complete Submission Package:
- Source Code: All training and inference scripts
- Technical Report: Method explanation, architecture choices, hyperparameters
- Resource Documentation: GPU usage, training time, model parameters
- Predictions: Test set results in specified JSON format
Prediction Format:
[
{
"image_id": 12345,
"category_id": 3,
"bbox": [x_min, y_min, width, height],
"score": 0.85
}
]
Baseline Method Provided
We provide a complete MMDetection pipeline to get you started:
MMDetection Framework
- Professional-grade detection toolkit
- Includes Faster R-CNN and Deformable DETR implementations
- Expected mAP@0.5:0.95: 0.759-0.788
- Complete training/validation/testing pipeline with visualization tools
Alternative Detection Methods (Explore on Your Own!)
For additional challenge and learning, consider exploring these state-of-the-art detection algorithms:
- YOLOv10-N/S (NMS-free, high throughput), RT-DETRv2-Lite (lightweight Transformer), RTMDet-tiny/nano (OpenMMLab family), NanoDet-Plus (mobile-friendly), PP-PicoDet/PP-YOLOE (mobile-optimized), LW-DETR (lightweight DETR alternative)
Learning Objectives
- Understand modern object detection architectures
- Experience training on challenging real-world data
- Learn to balance accuracy vs. computational efficiency
- Practice professional ML workflow (train/val/test pipeline)
This competition emphasizes engineering excellence over pure performance optimization. Clean, well-documented code that demonstrates understanding of the underlying principles will be highly valued.
Getting Started Guide
Environment Setup
Before starting, install Conda and PyTorch. If you haven’t already, download and install Miniconda:
👉 Miniconda Installation Guide
Then create and activate a clean environment:
# Create a conda environment
conda create -n ego2d python=3.9
conda activate ego2d
Install PyTorch (ensure it matches your CUDA version):
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
MMDetection Setup
Install MMDetection and dependencies:
pip install openmim
mim install mmengine==0.10.7 mmcv==2.1.0 mmdet==3.3.0
We provide two baseline implementations:
- Faster R-CNN (anchor-based)
- Deformable DETR (end-to-end)
Training
Run training using the distributed training script (e.g., using 4 GPUs):
# Change to mm directory
cd mm
# Train Deformable DETR
bash tools/dist_train.sh config/deformable-detr_r50.py 4
# Or train Faster R-CNN
bash tools/dist_train.sh config/faster-rcnn_r50_fpn_giou_1x.py 4
Testing and Submission
Step 1: Generate Predictions
Run inference on the test set to generate predictions:
# For Faster R-CNN
bash tools/dist_test.sh config/faster-rcnn_r50_fpn_giou_20e.py work_dirs/faster-rcnn_r50_fpn_giou_20e/epoch_20.pth 4
# For Deformable DETR
bash tools/dist_test.sh config/deformable-detr_r50.py work_dirs/deformable-detr_r50/epoch_50.pth 4
This will produce a prediction file (e.g., test.bbox.json) in the corresponding work_dirs folder.
Step 2: Convert to Kaggle Format
Use the provided conversion script to convert MMDetection output to Kaggle submission format:
# Return to project root
cd ..
# Convert predictions to submission.csv
python convert_to_submission.py --pred mm/work_dirs/deformable-detr_r50/test.bbox.json --output submission.csv
Expected output:
Loaded 200000 predictions from mm/work_dirs/deformable-detr_r50/test.bbox.json
Found 2000 unique images
âś“ Saved submission to submission.csv
Step 3: Upload to Kaggle
- Go to the Kaggle competition page
- Click “Submit Predictions”
- Upload
submission.csv - Important: Add your student ID in the submission description
- Click “Make Submission”
Format Verification
Before generating test results for final submission, verify the output format using the validation set:
Steps:
-
Modify
test_dataloaderandtest_evaluatorinmm/config/data.pyto use the validation set -
Run inference on the validation set to generate predictions:
bash tools/dist_test.sh config/faster-rcnn_r50_fpn_giou_20e.py work_dirs/faster-rcnn_r50_fpn_giou_20e/epoch_20.pth 4
- Evaluate the predicted results using the provided evaluation script:
python eval.py --gt data/val.json --pred mm/work_dirs/faster-rcnn_r50_fpn_giou_20e/test.bbox.json
Visualization
To inspect the dataset:
cd tools
python browse_dataset.py ../mm/config/deformable-detr_r50.py --output-dir vis
To visualize prediction results:
bash tools/dist_test.sh config/faster-rcnn_r50_fpn_giou_20e.py work_dirs/faster-rcnn_r50_fpn_giou_20e/epoch_20.pth 4 --show --show-dir vis
Evaluation
Primary Metric
Mean Average Precision (mAP) at IoU 0.50:0.95
This is the standard COCO evaluation metric that averages precision across IoU thresholds from 0.5 to 0.95 (step 0.05). It provides a comprehensive measure of both localization accuracy and detection confidence.
Detailed Metrics Reported
- mAP@0.5:0.95: Primary ranking metric (IoU thresholds 0.5-0.95)
- mAP@0.5: Detection accuracy at relaxed IoU threshold
- mAP@0.75: Detection accuracy at strict IoU threshold
- mAP by object size: Small/Medium/Large object performance
- Average Recall (AR): Maximum recall at 1, 10, 100 detections
Submission Requirements
File Format: CSV file named submission.csv with the following structure:
| Column | Description |
|---|---|
id |
Image ID (2000 rows, one per test image) |
predictions |
JSON array of detections for each image |
Each detection in the predictions column must include:
{
"image_id": <int>,
"category_id": <int>, // 0-9 for the 10 object classes
"bbox": [x, y, w, h], // COCO format: [x_min, y_min, width, height]
"score": <float> // Confidence score between 0 and 1
}
Important: Use the provided convert_to_submission.py script to ensure correct format. Do not manually edit the CSV file.
Kaggle Submission: When submitting to Kaggle, include your student ID in the submission description to ensure proper grading.
Academic Integrity
- Collaboration: Teams of 1-3 students allowed
- External Data: Only provided dataset permitted
- Pre-trained Models: Allowed and encouraged (ImageNet, COCO weights)
- Framework Freedom: Any detection framework/architecture permitted
Timeline
- Competition Start: October 11, 2024
- Submission Deadline: December 3, 2024, 11:59 PM
- Results Presentation: Top 3 teams present on December 3
These serve as reference implementations to help you get started. Your final ranking will be determined by your performance relative to other teams on the Kaggle leaderboard.
Submission Checklist
Required Deliverables
âś… Source Code: Complete training and inference scripts with clear documentation
âś… Technical Report: Detailed explanation of your method including:
- Detection framework/model used
- Any modifications or improvements made
- Training configurations and hyperparameters
- Resources used (GPU type, number of GPUs, training time)
- Model size (number of parameters)
âś… Prediction Results: Test set predictions converted to submission.csv using the provided conversion script
âś… Kaggle Submission: Submit submission.csv to Kaggle with student ID included in description
Final Submission Format
Your submission should include:
- Kaggle submission: Upload
submission.csvwith student ID in description - All source code files
- Technical documentation (PDF or Markdown)
- Instructions to reproduce your results
Quick Reference: Complete Workflow
# 1. Train your model
cd mm
bash tools/dist_train.sh config/your_config.py 4
# 2. Generate predictions on test set
bash tools/dist_test.sh config/your_config.py work_dirs/your_model/epoch_X.pth 4
# 3. Convert to Kaggle format
cd ..
python convert_to_submission.py --pred mm/work_dirs/your_model/test.bbox.json --output submission.csv
# 4. Upload submission.csv to Kaggle (include student ID in description)
Good luck, and may your detection boxes be precise! 🎯
- AIAA 4220 - Introduction to Embodied AI Project 2: Egocentric Object Detection Challenge