AI Video Face Search
Computer vision system detecting specified people or objects for B-roll classification
Technologies Used
Computer Vision
Machine Learning
Facenet
Streamlit
Python
Prototyping
Executive Summary
This application automates the labor-intensive process of video indexing. Originally designed for media production teams, it uses advanced computer vision to scan hours of footage and instantly locate specific individuals, transforming a manual search task into an automated, high-speed workflow.
Business Context
Content creators and media archives face a common challenge: retrieving specific footage from massive video libraries is slow and expensive. Manual review of B-roll or archival footage creates a bottleneck. This solution addresses this by enabling 'semantic' search within video content—finding people not by file name, but by their visual identity.
Live Streamlit Demo
Try the application yourself. Upload a video and a reference photo (or use the demo data) to see the face search in action.
Source Code
View the repository for technical details on the computer vision pipeline, embedding generation, and performance optimization.
Computer Vision Pipeline
The system follows a modular pipeline: Frame Extraction -> Face Detection (MTCNN) -> Embedding Generation (Inception Resnet) -> Vector Similarity Search. This modularity allows for independent optimization of each stage, balancing speed and accuracy.
Technical Approach
The system utilizes a One-Shot Learning approach using Deep Learning (Inception Resnet V1). Unlike traditional models requiring thousands of training images, this architecture 'enrolls' a new identity from just a few reference photos. It combines MTCNN for robust face detection with Facenet-PyTorch for generating 512-dimensional embeddings, allowing for precise identity matching across varying lighting and angles.
Key Outcomes
Achieved real-time processing speeds for standard definition video. The system successfully handles partial occlusions and profile views. Provided a user-friendly Streamlit interface that abstracts complex ML operations, making the technology accessible to non-technical video editors.
Business Value
Drastically reduces post-production search time, allowing teams to repurpose existing content more effectively. The 'enrollment' feature offers flexibility to track new subjects without engineering intervention, ensuring the tool scales with the organization's needs.