AI Video Face Search

Computer vision system detecting specified people or objects for B-roll classification

Technologies Used

Computer Vision

Machine Learning

Facenet

Streamlit

Python

Prototyping

Executive Summary

This application automates the labor-intensive process of video indexing. Originally designed for media production teams, it uses advanced computer vision to scan hours of footage and instantly locate specific individuals, transforming a manual search task into an automated, high-speed workflow.

Business Context

Content creators and media archives face a common challenge: retrieving specific footage from massive video libraries is slow and expensive. Manual review of B-roll or archival footage creates a bottleneck. This solution addresses this by enabling 'semantic' search within video content—finding people not by file name, but by their visual identity.

Live Streamlit Demo

Try the application yourself. Upload a video and a reference photo (or use the demo data) to see the face search in action.

🚀 Live Demo

Source Code

View the repository for technical details on the computer vision pipeline, embedding generation, and performance optimization.

💻 View Source Code

Computer Vision Pipeline

The system follows a modular pipeline: Frame Extraction -> Face Detection (MTCNN) -> Embedding Generation (Inception Resnet) -> Vector Similarity Search. This modularity allows for independent optimization of each stage, balancing speed and accuracy.

Technical Approach

The system utilizes a One-Shot Learning approach using Deep Learning (Inception Resnet V1). Unlike traditional models requiring thousands of training images, this architecture 'enrolls' a new identity from just a few reference photos. It combines MTCNN for robust face detection with Facenet-PyTorch for generating 512-dimensional embeddings, allowing for precise identity matching across varying lighting and angles.

Key Outcomes

Achieved real-time processing speeds for standard definition video. The system successfully handles partial occlusions and profile views. Provided a user-friendly Streamlit interface that abstracts complex ML operations, making the technology accessible to non-technical video editors.

Business Value

Drastically reduces post-production search time, allowing teams to repurpose existing content more effectively. The 'enrollment' feature offers flexibility to track new subjects without engineering intervention, ensuring the tool scales with the organization's needs.