Profile picture
Brian Jin
Applied AI Engineer
Home
Portfolio
Multi-Agent Technical Ticket Assistant
Customer Churn Prediction
AI Video Face Search
About
Contact
EN | DE
Back

AI Video Face Search

Computer vision system detecting specified people or objects for B-roll classification

Technologies Used
Computer Vision
Machine Learning
Facenet
Streamlit
Python
Prototyping
Executive Summary

This application automates the labor-intensive process of video indexing. Originally designed for media production teams, it uses advanced computer vision to scan hours of footage and instantly locate specific individuals, transforming a manual search task into an automated, high-speed workflow.

Business Context

Content creators and media archives face a common challenge: retrieving specific footage from massive video libraries is slow and expensive. Manual review of B-roll or archival footage creates a bottleneck. This solution addresses this by enabling 'semantic' search within video content—finding people not by file name, but by their visual identity.

Live Streamlit Demo

Try the application yourself. Upload a video and a reference photo (or use the demo data) to see the face search in action.

🚀 Live Demo
Source Code

View the repository for technical details on the computer vision pipeline, embedding generation, and performance optimization.

💻 View Source Code
Computer Vision Pipeline

The system follows a modular pipeline: Frame Extraction -> Face Detection (MTCNN) -> Embedding Generation (Inception Resnet) -> Vector Similarity Search. This modularity allows for independent optimization of each stage, balancing speed and accuracy.

Workflow Architecture
Technical Approach

The system utilizes a One-Shot Learning approach using Deep Learning (Inception Resnet V1). Unlike traditional models requiring thousands of training images, this architecture 'enrolls' a new identity from just a few reference photos. It combines MTCNN for robust face detection with Facenet-PyTorch for generating 512-dimensional embeddings, allowing for precise identity matching across varying lighting and angles.

Key Outcomes

Achieved real-time processing speeds for standard definition video. The system successfully handles partial occlusions and profile views. Provided a user-friendly Streamlit interface that abstracts complex ML operations, making the technology accessible to non-technical video editors.

Business Value

Drastically reduces post-production search time, allowing teams to repurpose existing content more effectively. The 'enrollment' feature offers flexibility to track new subjects without engineering intervention, ensuring the tool scales with the organization's needs.

Legal:
Imprint Privacy Policy Terms of Service
© 2025 Brian Zelun Jin. All rights reserved.