Building a Face Detector with Raspberry Pi, Kinesis Video Streams, and Rekognition Video
Introduction
In this guide, we will implement a face detection system using a USB camera connected to a Raspberry Pi. The project leverages Amazon Kinesis Video Streams and Amazon Rekognition Video for processing and detecting faces in real-time.
Prerequisites
Hardware Requirements
- Raspberry Pi 4B with 4GB RAM
- Running Ubuntu 23.10 (installed via Raspberry Pi Imager)
- USB Camera
Software Requirements
- GStreamer: Used to process and stream video.
- Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin, and JNI: Install from AWS GitHub Repository.
- AWS SAM CLI: Install from official documentation.
- Python 3.11
Setting Up the Project
You can pull an example code used in this post from my GitHub repository.
Directory Structure
/
|-- src/
| |-- app.py
| `-- requirements.txt
|-- samconfig.toml
`-- template.yaml
AWS SAM Template
Here’s the AWS CloudFormation template to provision required resources like Lambda, Rekognition Stream Processor, and Kinesis Video Streams.
AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31
Description: face-detector-using-kinesis-video-streams
Resources:
Function:
Type: AWS::Serverless::Function
Properties:
FunctionName: face-detector-function
CodeUri: src/
Handler: app.lambda_handler
Runtime: python3.11
Architectures:
- arm64
Timeout: 3
MemorySize: 128
Role: !GetAtt FunctionIAMRole.Arn
Events:
KinesisEvent:
Type: Kinesis
Properties:
Stream: !GetAtt KinesisStream.Arn
MaximumBatchingWindowInSeconds: 10
MaximumRetryAttempts: 3
StartingPosition: LATEST
FunctionIAMRole:
Type: AWS::IAM::Role
Properties:
RoleName: face-detector-function-role
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
- arn:aws:iam::aws:policy/service-role/AWSLambdaKinesisExecutionRole
Policies:
- PolicyName: policy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- kinesisvideo:GetHLSStreamingSessionURL
- kinesisvideo:GetDataEndpoint
Resource: !GetAtt KinesisVideoStream.Arn
KinesisVideoStream:
Type: AWS::KinesisVideo::Stream
Properties:
Name: face-detector-kinesis-video-stream
DataRetentionInHours: 24
RekognitionCollection:
Type: AWS::Rekognition::Collection
Properties:
CollectionId: FaceCollection
RekognitionStreamProcessor:
Type: AWS::Rekognition::StreamProcessor
Properties:
Name: face-detector-rekognition-stream-processor
KinesisVideoStream:
Arn: !GetAtt KinesisVideoStream.Arn
KinesisDataStream:
Arn: !GetAtt KinesisStream.Arn
RoleArn: !GetAtt RekognitionStreamProcessorIAMRole.Arn
FaceSearchSettings:
CollectionId: !Ref RekognitionCollection
FaceMatchThreshold: 80
DataSharingPreference:
OptIn: false
KinesisStream:
Type: AWS::Kinesis::Stream
Properties:
Name: face-detector-kinesis-stream
StreamModeDetails:
StreamMode: ON_DEMAND
RekognitionStreamProcessorIAMRole:
Type: AWS::IAM::Role
Properties:
RoleName: face-detector-rekognition-stream-processor-role
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service: rekognition.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonRekognitionServiceRole
Policies:
- PolicyName: policy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- kinesis:PutRecord
- kinesis:PutRecords
Resource:
- !GetAtt KinesisStream.Arn
Python Script
requirements.txt
Leave it empty.
app.py
The Rekognition Video stream processor streams detected face data to the Kinesis Data Stream, which is encoded as a Base64 string (line 18). For detailed information about the data structure, refer to the official documentation.
The Lambda function generates an HLS URL using the KinesisVideoArchivedMedia#get_hls_streaming_session_url
API (line 54-66).
import base64
import json
import logging
from datetime import datetime, timedelta, timezone
from functools import cache
import boto3
JST = timezone(timedelta(hours=9))
kvs_client = boto3.client('kinesisvideo')
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
def lambda_handler(event: dict, context: dict) -> dict:
for record in event['Records']:
base64_data = record['kinesis']['data']
stream_processor_event = json.loads(base64.b64decode(base64_data).decode())
# Refer to https://docs.aws.amazon.com/rekognition/latest/dg/streaming-video-kinesis-output.html for details on the structure.
if not stream_processor_event['FaceSearchResponse']:
continue
logger.info(stream_processor_event)
url = get_hls_streaming_session_url(stream_processor_event)
logger.info(url)
return {
'statusCode': 200,
}
@cache
def get_kvs_am_client(api_name: str, stream_arn: str):
# Retrieves the data endpoint for the stream.
# See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesisvideo/client/get_data_endpoint.html
endpoint = kvs_client.get_data_endpoint(
APIName=api_name.upper(),
StreamARN=stream_arn
)['DataEndpoint']
return boto3.client('kinesis-video-archived-media', endpoint_url=endpoint)
def get_hls_streaming_session_url(stream_processor_event: dict) -> str:
# Generates an HLS streaming URL for the video stream.
# See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesis-video-archived-media/client/get_hls_streaming_session_url.html
kinesis_video = stream_processor_event['InputInformation']['KinesisVideo']
stream_arn = kinesis_video['StreamArn']
kvs_am_client = get_kvs_am_client('get_hls_streaming_session_url', stream_arn)
start_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST)
end_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST) + timedelta(minutes=1)
return kvs_am_client.get_hls_streaming_session_url(
StreamARN=stream_arn,
PlaybackMode='ON_DEMAND',
HLSFragmentSelector={
'FragmentSelectorType': 'SERVER_TIMESTAMP',
'TimestampRange': {
'StartTimestamp': start_timestamp,
'EndTimestamp': end_timestamp,
},
},
ContainerFormat='FRAGMENTED_MP4',
Expires=300,
)['HLSStreamingSessionURL']
Build and Deploy
Build and deploy the project using the following commands:
sam build
sam deploy
Indexing Faces
To detect faces using the USB camera, index faces into an Amazon Rekognition face collection. The Index Faces API is used for this purpose.
Command for Indexing Faces
Before running the following command, replace the placeholders <YOUR_BUCKET>
, <YOUR_OBJECT>
, and <PERSON_ID>
with the actual values relevant to your use case.
aws rekognition index-faces \
--image '{"S3Object": {"Bucket": "<YOUR_BUCKET>", "Name": "<YOUR_OBJECT>"}}' \
--collection-id FaceCollection \
--external-image-id <PERSON_ID>
Important Notes
- Amazon Rekognition does not store actual images in the face collection. Instead, it extracts and saves facial features as metadata.
- This ensures that only essential facial feature data is stored securely.
For more details, refer to the AWS documentation on indexing faces.
For each face detected, Amazon Rekognition extracts facial features and stores the feature information in a database. In addition, the command stores metadata for each face that’s detected in the specified face collection. Amazon Rekognition doesn’t store the actual image bytes.
Setting Up the Video Producer
This guide uses a Raspberry Pi 4B with 4GB RAM running Ubuntu 23.10 as the video producer. A USB camera is connected to the Raspberry Pi to stream video to Amazon Kinesis Video Streams.
Building the AWS GStreamer Plugin
AWS provides a Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI. This SDK facilitates video streaming from the Raspberry Pi to Kinesis Video Streams.
Follow the steps in the Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI to build the plugin on your Raspberry Pi.
Building the GStreamer Plugin
To enable video streaming from your Raspberry Pi to Amazon Kinesis Video Streams, build the GStreamer plugin provided by AWS. Follow the steps below to complete the build process.
Build Steps
Run the following commands. Depending on your system’s specifications, the build may take 20 minutes or more.
sudo apt update
sudo apt upgrade
sudo apt install \
make \
cmake \
build-essential \
m4 \
autoconf \
default-jdk
sudo apt install \
libssl-dev \
libcurl4-openssl-dev \
liblog4cplus-dev \
libgstreamer1.0-dev \
libgstreamer-plugins-base1.0-dev \
gstreamer1.0-plugins-base-apps \
gstreamer1.0-plugins-bad \
gstreamer1.0-plugins-good \
gstreamer1.0-plugins-ugly \
gstreamer1.0-tools
git clone https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-cpp.git
mkdir -p amazon-kinesis-video-streams-producer-sdk-cpp/build
cd amazon-kinesis-video-streams-producer-sdk-cpp/build
sudo cmake .. -DBUILD_GSTREAMER_PLUGIN=ON -DBUILD_JNI=TRUE
sudo make
Verify the Build
Once the build completes, verify the result with the following commands:
cd ~/amazon-kinesis-video-streams-producer-sdk-cpp
export GST_PLUGIN_PATH=`pwd`/build
export LD_LIBRARY_PATH=`pwd`/open-source/local/lib
gst-inspect-1.0 kvssink
The output should display details similar to this:
Factory Details:
Rank primary + 10 (266)
Long-name KVS Sink
Klass Sink/Video/Network
Description GStreamer AWS KVS plugin
Author AWS KVS <kinesis-video-support@amazon.com>
...
Persistent Environment Variables
To avoid resetting environment variables every time, add the following exports to your ~/.profile
:
echo "" >> ~/.profile
echo "# GStreamer" >> ~/.profile
echo "export GST_PLUGIN_PATH=$GST_PLUGIN_PATH" >> ~/.profile
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> ~/.profile
Running GStreamer
After building the plugin, connect your USB camera to the Raspberry Pi and run the following command to stream video data to Amazon Kinesis Video Streams.
gst-launch-1.0 -v v4l2src device=/dev/video0 \
! videoconvert \
! video/x-raw,format=I420,width=320,height=240,framerate=5/1 \
! x264enc bframes=0 key-int-max=45 bitrate=500 tune=zerolatency \
! video/x-h264,stream-format=avc,alignment=au \
! kvssink stream-name=<KINESIS_VIDEO_STREAM_NAME> storage-size=128 access-key="<YOUR_ACCESS_KEY>" secret-key="<YOUR_SECRET_KEY>" aws-region="<YOUR_AWS_REGION>"
Verifying Video Stream
You can verify the live stream by navigating to the Kinesis Video Streams management console. The video should display in real-time.
Testing
Starting the Rekognition Video Stream Processor
Start the Rekognition Video stream processor. This service subscribes to the Kinesis Video Stream, detects faces using the face collection, and streams the results to the Kinesis Data Stream.
Run the following command to start the stream processor:
aws rekognition start-stream-processor --name face-detector-rekognition-stream-processor
Verify the status of the stream processor to ensure it is running:
aws rekognition describe-stream-processor --name face-detector-rekognition-stream-processor | grep "Status"
The expected output should show "Status": "RUNNING"
.
Capturing Faces
Once the USB camera captures video, the following process is initiated:
- Video Data Streaming: The video data is streamed to the Kinesis Video Stream.
- Face Detection: The Rekognition Video stream processor analyzes the video stream and detects faces based on the face collection.
- Result Streaming: Detected face data is streamed to the Kinesis Data Stream.
- HLS URL Generation: A Lambda function generates an HLS URL for playback.
To check the results, view the Lambda function logs with the following command:
sam logs -n Function --stack-name face-detector-using-kinesis-video-streams --tail
Sample Log Data
The log records include detailed information about the stream processor events, such as the following example:
{
"InputInformation": {
"KinesisVideo": {
"StreamArn": "arn:aws:kinesisvideo:<AWS_REGION>:<AWS_ACCOUNT_ID>:stream/face-detector-kinesis-video-stream/xxxxxxxxxxxxx",
"FragmentNumber": "91343852333181501717324262640137742175000164731",
"ServerTimestamp": 1702208586.022,
"ProducerTimestamp": 1702208585.699,
"FrameOffsetInSeconds": 0.0,
}
},
"StreamProcessorInformation": {"Status": "RUNNING"},
"FaceSearchResponse": [
{
"DetectedFace": {
"BoundingBox": {
"Height": 0.4744676,
"Width": 0.29107505,
"Left": 0.33036956,
"Top": 0.19599175,
},
"Confidence": 99.99677,
"Landmarks": [
{"X": 0.41322955, "Y": 0.33761832, "Type": "eyeLeft"},
{"X": 0.54405355, "Y": 0.34024307, "Type": "eyeRight"},
{"X": 0.424819, "Y": 0.5417343, "Type": "mouthLeft"},
{"X": 0.5342691, "Y": 0.54362005, "Type": "mouthRight"},
{"X": 0.48934412, "Y": 0.43806323, "Type": "nose"},
],
"Pose": {"Pitch": 5.547308, "Roll": 0.85795176, "Yaw": 4.76913},
"Quality": {"Brightness": 57.938313, "Sharpness": 46.0298},
},
"MatchedFaces": [
{
"Similarity": 99.986176,
"Face": {
"BoundingBox": {
"Height": 0.417963,
"Width": 0.406223,
"Left": 0.28826,
"Top": 0.242463,
},
"FaceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"Confidence": 99.996605,
"ImageId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"ExternalImageId": "iwasa",
},
}
],
}
],
}
HLS URL for Video Playback
The logs also include the generated HLS URL for on-demand video playback, such as:
https://x-xxxxxxxx.kinesisvideo.<AWS_REGION>.amazonaws.com/hls/v1/getHLSMasterPlaylist.m3u8?SessionToken=xxxxxxxxxx
Playing the Video
- Open the HLS URL using a supported browser like Safari or Edge.
- Chrome does not natively support HLS playback. You can use a third-party extension, such as Native HLS Playback.
Cleaning Up
To avoid incurring unnecessary costs, ensure you clean up all the AWS resources provisioned during this guide. Follow these steps:
1. Stop the Rekognition Stream Processor
Use the following command to stop the Rekognition stream processor:
aws rekognition stop-stream-processor --name face-detector-rekognition-stream-processor
2. Delete the SAM Stack
Run the command below to delete all resources provisioned by the AWS Serverless Application Model (SAM):
sam delete
These commands will remove all associated resources, including the Kinesis Video Stream, Kinesis Data Stream, Lambda function, and the Rekognition stream processor.
Conclusion
This guide walked you through setting up a real-time face detection system using a Raspberry Pi, USB camera, Amazon Kinesis Video Streams, and Amazon Rekognition Video. By integrating these technologies, you can process and analyze video streams efficiently, leveraging the power of AWS services to handle scalability and performance demands.
Happy Coding! 🚀