Building a Face Detector with Raspberry Pi, Kinesis Video Streams, and Rekognition Video

Takahiro Iwasa

Dec 9, 2023

9 min read

Kinesis Video Streams Rekognition Video

Introduction

In this guide, we will implement a face detection system using a USB camera connected to a Raspberry Pi. The project leverages Amazon Kinesis Video Streams and Amazon Rekognition Video for processing and detecting faces in real-time.

AWS Architecture Diagram

Prerequisites

Hardware Requirements

Raspberry Pi 4B with 4GB RAM
- Running Ubuntu 23.10 (installed via Raspberry Pi Imager)
USB Camera

Software Requirements

GStreamer: Used to process and stream video.
- Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin, and JNI: Install from AWS GitHub Repository.
AWS SAM CLI: Install from official documentation.
Python 3.11

Setting Up the Project

You can pull an example code used in this post from my GitHub repository.

Directory Structure

/
|-- src/
|   |-- app.py
|   `-- requirements.txt
|-- samconfig.toml
`-- template.yaml

AWS SAM Template

Here’s the AWS CloudFormation template to provision required resources like Lambda, Rekognition Stream Processor, and Kinesis Video Streams.

AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31
Description: face-detector-using-kinesis-video-streams

Resources:
  Function:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: face-detector-function
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.11
      Architectures:
        - arm64
      Timeout: 3
      MemorySize: 128
      Role: !GetAtt FunctionIAMRole.Arn
      Events:
        KinesisEvent:
          Type: Kinesis
          Properties:
            Stream: !GetAtt KinesisStream.Arn
            MaximumBatchingWindowInSeconds: 10
            MaximumRetryAttempts: 3
            StartingPosition: LATEST

  FunctionIAMRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: face-detector-function-role
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
        - arn:aws:iam::aws:policy/service-role/AWSLambdaKinesisExecutionRole
      Policies:
        - PolicyName: policy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - kinesisvideo:GetHLSStreamingSessionURL
                  - kinesisvideo:GetDataEndpoint
                Resource: !GetAtt KinesisVideoStream.Arn

  KinesisVideoStream:
    Type: AWS::KinesisVideo::Stream
    Properties:
      Name: face-detector-kinesis-video-stream
      DataRetentionInHours: 24

  RekognitionCollection:
    Type: AWS::Rekognition::Collection
    Properties:
      CollectionId: FaceCollection

  RekognitionStreamProcessor:
    Type: AWS::Rekognition::StreamProcessor
    Properties:
      Name: face-detector-rekognition-stream-processor
      KinesisVideoStream:
        Arn: !GetAtt KinesisVideoStream.Arn
      KinesisDataStream:
        Arn: !GetAtt KinesisStream.Arn
      RoleArn: !GetAtt RekognitionStreamProcessorIAMRole.Arn
      FaceSearchSettings:
        CollectionId: !Ref RekognitionCollection
        FaceMatchThreshold: 80
      DataSharingPreference:
        OptIn: false

  KinesisStream:
    Type: AWS::Kinesis::Stream
    Properties:
      Name: face-detector-kinesis-stream
      StreamModeDetails:
        StreamMode: ON_DEMAND

  RekognitionStreamProcessorIAMRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: face-detector-rekognition-stream-processor-role
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: rekognition.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonRekognitionServiceRole
      Policies:
        - PolicyName: policy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - kinesis:PutRecord
                  - kinesis:PutRecords
                Resource:
                  - !GetAtt KinesisStream.Arn

Python Script

requirements.txt

Leave it empty.

app.py

The Rekognition Video stream processor streams detected face data to the Kinesis Data Stream, which is encoded as a Base64 string (line 18). For detailed information about the data structure, refer to the official documentation.

The Lambda function generates an HLS URL using the KinesisVideoArchivedMedia#get_hls_streaming_session_url API (line 54-66).

import base64
import json
import logging
from datetime import datetime, timedelta, timezone
from functools import cache

import boto3

JST = timezone(timedelta(hours=9))
kvs_client = boto3.client('kinesisvideo')
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)


def lambda_handler(event: dict, context: dict) -> dict:
    for record in event['Records']:
        base64_data = record['kinesis']['data']
        stream_processor_event = json.loads(base64.b64decode(base64_data).decode())
        # Refer to https://docs.aws.amazon.com/rekognition/latest/dg/streaming-video-kinesis-output.html for details on the structure.

        if not stream_processor_event['FaceSearchResponse']:
            continue

        logger.info(stream_processor_event)
        url = get_hls_streaming_session_url(stream_processor_event)
        logger.info(url)

    return {
        'statusCode': 200,
    }


@cache
def get_kvs_am_client(api_name: str, stream_arn: str):
    # Retrieves the data endpoint for the stream.
    # See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesisvideo/client/get_data_endpoint.html
    endpoint = kvs_client.get_data_endpoint(
        APIName=api_name.upper(),
        StreamARN=stream_arn
    )['DataEndpoint']
    return boto3.client('kinesis-video-archived-media', endpoint_url=endpoint)


def get_hls_streaming_session_url(stream_processor_event: dict) -> str:
    # Generates an HLS streaming URL for the video stream.
    # See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesis-video-archived-media/client/get_hls_streaming_session_url.html

    kinesis_video = stream_processor_event['InputInformation']['KinesisVideo']
    stream_arn = kinesis_video['StreamArn']
    kvs_am_client = get_kvs_am_client('get_hls_streaming_session_url', stream_arn)
    start_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST)
    end_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST) + timedelta(minutes=1)

    return kvs_am_client.get_hls_streaming_session_url(
        StreamARN=stream_arn,
        PlaybackMode='ON_DEMAND',
        HLSFragmentSelector={
            'FragmentSelectorType': 'SERVER_TIMESTAMP',
            'TimestampRange': {
                'StartTimestamp': start_timestamp,
                'EndTimestamp': end_timestamp,
            },
        },
        ContainerFormat='FRAGMENTED_MP4',
        Expires=300,
    )['HLSStreamingSessionURL']

Build and Deploy

Build and deploy the project using the following commands:

sam build
sam deploy

Indexing Faces

To detect faces using the USB camera, index faces into an Amazon Rekognition face collection. The Index Faces API is used for this purpose.

Command for Indexing Faces

Before running the following command, replace the placeholders <YOUR_BUCKET>, <YOUR_OBJECT>, and <PERSON_ID> with the actual values relevant to your use case.

aws rekognition index-faces \
  --image '{"S3Object": {"Bucket": "<YOUR_BUCKET>", "Name": "<YOUR_OBJECT>"}}' \
  --collection-id FaceCollection \
  --external-image-id <PERSON_ID>

Important Notes

Amazon Rekognition does not store actual images in the face collection. Instead, it extracts and saves facial features as metadata.
This ensures that only essential facial feature data is stored securely.

For more details, refer to the AWS documentation on indexing faces.

For each face detected, Amazon Rekognition extracts facial features and stores the feature information in a database. In addition, the command stores metadata for each face that’s detected in the specified face collection. Amazon Rekognition doesn’t store the actual image bytes.

Setting Up the Video Producer

This guide uses a Raspberry Pi 4B with 4GB RAM running Ubuntu 23.10 as the video producer. A USB camera is connected to the Raspberry Pi to stream video to Amazon Kinesis Video Streams.

Raspberry Pi and USB Camera Setup

Building the AWS GStreamer Plugin

AWS provides a Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI. This SDK facilitates video streaming from the Raspberry Pi to Kinesis Video Streams.

While AWS offers Docker images for the GStreamer plugin, these images may not work on Raspberry Pi due to architecture limitations.

Follow the steps in the Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI to build the plugin on your Raspberry Pi.

Building the GStreamer Plugin

To enable video streaming from your Raspberry Pi to Amazon Kinesis Video Streams, build the GStreamer plugin provided by AWS. Follow the steps below to complete the build process.

Build Steps

Run the following commands. Depending on your system’s specifications, the build may take 20 minutes or more.

sudo apt update
sudo apt upgrade
sudo apt install \
  make \
  cmake \
  build-essential \
  m4 \
  autoconf \
  default-jdk
sudo apt install \
  libssl-dev \
  libcurl4-openssl-dev \
  liblog4cplus-dev \
  libgstreamer1.0-dev \
  libgstreamer-plugins-base1.0-dev \
  gstreamer1.0-plugins-base-apps \
  gstreamer1.0-plugins-bad \
  gstreamer1.0-plugins-good \
  gstreamer1.0-plugins-ugly \
  gstreamer1.0-tools

git clone https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-cpp.git
mkdir -p amazon-kinesis-video-streams-producer-sdk-cpp/build
cd amazon-kinesis-video-streams-producer-sdk-cpp/build

sudo cmake .. -DBUILD_GSTREAMER_PLUGIN=ON -DBUILD_JNI=TRUE
sudo make

Verify the Build

Once the build completes, verify the result with the following commands:

cd ~/amazon-kinesis-video-streams-producer-sdk-cpp
export GST_PLUGIN_PATH=`pwd`/build
export LD_LIBRARY_PATH=`pwd`/open-source/local/lib
gst-inspect-1.0 kvssink

The output should display details similar to this:

Factory Details:
  Rank                     primary + 10 (266)
  Long-name                KVS Sink
  Klass                    Sink/Video/Network
  Description              GStreamer AWS KVS plugin
  Author                   AWS KVS <kinesis-video-support@amazon.com>
...

Persistent Environment Variables

To avoid resetting environment variables every time, add the following exports to your ~/.profile:

echo "" >> ~/.profile
echo "# GStreamer" >> ~/.profile
echo "export GST_PLUGIN_PATH=$GST_PLUGIN_PATH" >> ~/.profile
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> ~/.profile

Running GStreamer

After building the plugin, connect your USB camera to the Raspberry Pi and run the following command to stream video data to Amazon Kinesis Video Streams.

Enhancing video quality (e.g., increasing resolution or frame rate) may result in higher AWS costs.

gst-launch-1.0 -v v4l2src device=/dev/video0 \
  ! videoconvert \
  ! video/x-raw,format=I420,width=320,height=240,framerate=5/1 \
  ! x264enc bframes=0 key-int-max=45 bitrate=500 tune=zerolatency \
  ! video/x-h264,stream-format=avc,alignment=au \
  ! kvssink stream-name=<KINESIS_VIDEO_STREAM_NAME> storage-size=128 access-key="<YOUR_ACCESS_KEY>" secret-key="<YOUR_SECRET_KEY>" aws-region="<YOUR_AWS_REGION>"

Verifying Video Stream

You can verify the live stream by navigating to the Kinesis Video Streams management console. The video should display in real-time.

Kinesis Video Streams Management Console

Testing

Starting the Rekognition Video Stream Processor

Start the Rekognition Video stream processor. This service subscribes to the Kinesis Video Stream, detects faces using the face collection, and streams the results to the Kinesis Data Stream.

Run the following command to start the stream processor:

aws rekognition start-stream-processor --name face-detector-rekognition-stream-processor

Verify the status of the stream processor to ensure it is running:

aws rekognition describe-stream-processor --name face-detector-rekognition-stream-processor | grep "Status"

The expected output should show "Status": "RUNNING".

Capturing Faces

Once the USB camera captures video, the following process is initiated:

Video Data Streaming: The video data is streamed to the Kinesis Video Stream.
Face Detection: The Rekognition Video stream processor analyzes the video stream and detects faces based on the face collection.
Result Streaming: Detected face data is streamed to the Kinesis Data Stream.
HLS URL Generation: A Lambda function generates an HLS URL for playback.

To check the results, view the Lambda function logs with the following command:

sam logs -n Function --stack-name face-detector-using-kinesis-video-streams --tail

Sample Log Data

The log records include detailed information about the stream processor events, such as the following example:

{
    "InputInformation": {
        "KinesisVideo": {
            "StreamArn": "arn:aws:kinesisvideo:<AWS_REGION>:<AWS_ACCOUNT_ID>:stream/face-detector-kinesis-video-stream/xxxxxxxxxxxxx",
            "FragmentNumber": "91343852333181501717324262640137742175000164731",
            "ServerTimestamp": 1702208586.022,
            "ProducerTimestamp": 1702208585.699,
            "FrameOffsetInSeconds": 0.0,
        }
    },
    "StreamProcessorInformation": {"Status": "RUNNING"},
    "FaceSearchResponse": [
        {
            "DetectedFace": {
                "BoundingBox": {
                    "Height": 0.4744676,
                    "Width": 0.29107505,
                    "Left": 0.33036956,
                    "Top": 0.19599175,
                },
                "Confidence": 99.99677,
                "Landmarks": [
                    {"X": 0.41322955, "Y": 0.33761832, "Type": "eyeLeft"},
                    {"X": 0.54405355, "Y": 0.34024307, "Type": "eyeRight"},
                    {"X": 0.424819, "Y": 0.5417343, "Type": "mouthLeft"},
                    {"X": 0.5342691, "Y": 0.54362005, "Type": "mouthRight"},
                    {"X": 0.48934412, "Y": 0.43806323, "Type": "nose"},
                ],
                "Pose": {"Pitch": 5.547308, "Roll": 0.85795176, "Yaw": 4.76913},
                "Quality": {"Brightness": 57.938313, "Sharpness": 46.0298},
            },
            "MatchedFaces": [
                {
                    "Similarity": 99.986176,
                    "Face": {
                        "BoundingBox": {
                            "Height": 0.417963,
                            "Width": 0.406223,
                            "Left": 0.28826,
                            "Top": 0.242463,
                        },
                        "FaceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
                        "Confidence": 99.996605,
                        "ImageId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
                        "ExternalImageId": "iwasa",
                    },
                }
            ],
        }
    ],
}

HLS URL for Video Playback

The logs also include the generated HLS URL for on-demand video playback, such as:

https://x-xxxxxxxx.kinesisvideo.<AWS_REGION>.amazonaws.com/hls/v1/getHLSMasterPlaylist.m3u8?SessionToken=xxxxxxxxxx

Playing the Video

Open the HLS URL using a supported browser like Safari or Edge.
Chrome does not natively support HLS playback. You can use a third-party extension, such as Native HLS Playback.

HLS Playback Example

Cleaning Up

To avoid incurring unnecessary costs, ensure you clean up all the AWS resources provisioned during this guide. Follow these steps:

1. Stop the Rekognition Stream Processor

Use the following command to stop the Rekognition stream processor:

aws rekognition stop-stream-processor --name face-detector-rekognition-stream-processor

2. Delete the SAM Stack

Run the command below to delete all resources provisioned by the AWS Serverless Application Model (SAM):

sam delete

These commands will remove all associated resources, including the Kinesis Video Stream, Kinesis Data Stream, Lambda function, and the Rekognition stream processor.

Conclusion

This guide walked you through setting up a real-time face detection system using a Raspberry Pi, USB camera, Amazon Kinesis Video Streams, and Amazon Rekognition Video. By integrating these technologies, you can process and analyze video streams efficiently, leveraging the power of AWS services to handle scalability and performance demands.

Happy Coding! 🚀