Raspberry Pi/Kinesis Video Streams/Rekognition Video を使った顔検出システムの構築

岩佐孝浩

2023年12月9日

12 min read

Kinesis Video Streams Rekognition Video

はじめに

この投稿では、Raspberry Pi に接続した USB カメラを使用して 顔検出システムを実装します。本プロジェクトでは、Amazon Kinesis Video Streams および Amazon Rekognition Video を活用し、リアルタイムでの顔検出を実現します。

AWS アーキテクチャ図

必要条件

ハードウェア要件

Raspberry Pi 4B （4GB RAM）
- Ubuntu 23.10 を実行中（Raspberry Pi Imager を使用してインストール）
USB カメラ

ソフトウェア要件

GStreamer: 動画を処理してストリーミングするために使用。
- Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin, and JNI: AWS GitHub リポジトリからインストール。
AWS SAM CLI: 公式ドキュメントからインストール。
Python 3.11

プロジェクトのセットアップ

この投稿のサンプルコードは GitHub リポジトリからクローンできます。

ディレクトリ構造

/
|-- src/
|   |-- app.py
|   `-- requirements.txt
|-- samconfig.toml
`-- template.yaml

AWS SAM テンプレート

以下は、Lambda、Rekognition Stream Processor、Kinesis Video Streams など必要なリソースをプロビジョニングするための AWS CloudFormation テンプレート です。

AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31
Description: face-detector-using-kinesis-video-streams

Resources:
  Function:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: face-detector-function
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.11
      Architectures:
        - arm64
      Timeout: 3
      MemorySize: 128
      Role: !GetAtt FunctionIAMRole.Arn
      Events:
        KinesisEvent:
          Type: Kinesis
          Properties:
            Stream: !GetAtt KinesisStream.Arn
            MaximumBatchingWindowInSeconds: 10
            MaximumRetryAttempts: 3
            StartingPosition: LATEST

  FunctionIAMRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: face-detector-function-role
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
        - arn:aws:iam::aws:policy/service-role/AWSLambdaKinesisExecutionRole
      Policies:
        - PolicyName: policy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - kinesisvideo:GetHLSStreamingSessionURL
                  - kinesisvideo:GetDataEndpoint
                Resource: !GetAtt KinesisVideoStream.Arn

  KinesisVideoStream:
    Type: AWS::KinesisVideo::Stream
    Properties:
      Name: face-detector-kinesis-video-stream
      DataRetentionInHours: 24

  RekognitionCollection:
    Type: AWS::Rekognition::Collection
    Properties:
      CollectionId: FaceCollection

  RekognitionStreamProcessor:
    Type: AWS::Rekognition::StreamProcessor
    Properties:
      Name: face-detector-rekognition-stream-processor
      KinesisVideoStream:
        Arn: !GetAtt KinesisVideoStream.Arn
      KinesisDataStream:
        Arn: !GetAtt KinesisStream.Arn
      RoleArn: !GetAtt RekognitionStreamProcessorIAMRole.Arn
      FaceSearchSettings:
        CollectionId: !Ref RekognitionCollection
        FaceMatchThreshold: 80
      DataSharingPreference:
        OptIn: false

  KinesisStream:
    Type: AWS::Kinesis::Stream
    Properties:
      Name: face-detector-kinesis-stream
      StreamModeDetails:
        StreamMode: ON_DEMAND

  RekognitionStreamProcessorIAMRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: face-detector-rekognition-stream-processor-role
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: rekognition.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonRekognitionServiceRole
      Policies:
        - PolicyName: policy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - kinesis:PutRecord
                  - kinesis:PutRecords
                Resource:
                  - !GetAtt KinesisStream.Arn

Python スクリプト

requirements.txt

空のままにしておきます。

app.py

Rekognition Video Stream Processor は検出した顔データを Kinesis Data Stream に Base64 文字列としてエンコードしてストリームします（18 行目）。データ構造の詳細は、公式ドキュメントをご参照ください。

Lambda 関数は KinesisVideoArchivedMedia#get_hls_streaming_session_url API （54 行目〜 66 行目）を使用して HLS URL を生成します。

import base64
import json
import logging
from datetime import datetime, timedelta, timezone
from functools import cache

import boto3

JST = timezone(timedelta(hours=9))
kvs_client = boto3.client('kinesisvideo')
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)


def lambda_handler(event: dict, context: dict) -> dict:
    for record in event['Records']:
        base64_data = record['kinesis']['data']
        stream_processor_event = json.loads(base64.b64decode(base64_data).decode())
        # Refer to https://docs.aws.amazon.com/rekognition/latest/dg/streaming-video-kinesis-output.html for details on the structure.

        if not stream_processor_event['FaceSearchResponse']:
            continue

        logger.info(stream_processor_event)
        url = get_hls_streaming_session_url(stream_processor_event)
        logger.info(url)

    return {
        'statusCode': 200,
    }


@cache
def get_kvs_am_client(api_name: str, stream_arn: str):
    # Retrieves the data endpoint for the stream.
    # See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesisvideo/client/get_data_endpoint.html
    endpoint = kvs_client.get_data_endpoint(
        APIName=api_name.upper(),
        StreamARN=stream_arn
    )['DataEndpoint']
    return boto3.client('kinesis-video-archived-media', endpoint_url=endpoint)


def get_hls_streaming_session_url(stream_processor_event: dict) -> str:
    # Generates an HLS streaming URL for the video stream.
    # See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesis-video-archived-media/client/get_hls_streaming_session_url.html

    kinesis_video = stream_processor_event['InputInformation']['KinesisVideo']
    stream_arn = kinesis_video['StreamArn']
    kvs_am_client = get_kvs_am_client('get_hls_streaming_session_url', stream_arn)
    start_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST)
    end_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST) + timedelta(minutes=1)

    return kvs_am_client.get_hls_streaming_session_url(
        StreamARN=stream_arn,
        PlaybackMode='ON_DEMAND',
        HLSFragmentSelector={
            'FragmentSelectorType': 'SERVER_TIMESTAMP',
            'TimestampRange': {
                'StartTimestamp': start_timestamp,
                'EndTimestamp': end_timestamp,
            },
        },
        ContainerFormat='FRAGMENTED_MP4',
        Expires=300,
    )['HLSStreamingSessionURL']

ビルドとデプロイ

以下のコマンドを使用してプロジェクトをビルドおよびデプロイします：

sam build
sam deploy

顔のインデックス作成

USB カメラを使用して顔を検出するには、Amazon Rekognition の顔コレクションに 顔をインデックス化 する必要があります。この目的のために Index Faces API を使用します。

顔をインデックス化するコマンド

以下のコマンドを実行する前に、<YOUR_BUCKET>、<YOUR_OBJECT>、<PERSON_ID> を該当する値に置き換えてください。

aws rekognition index-faces \
  --image '{"S3Object": {"Bucket": "<YOUR_BUCKET>", "Name": "<YOUR_OBJECT>"}}' \
  --collection-id FaceCollection \
  --external-image-id <PERSON_ID>

注意点

Amazon Rekognition は実際の画像を顔コレクションに保存しません 。代わりに、顔の特徴をメタデータとして抽出して保存します。
これにより、必要最低限の顔特徴データが安全に保存されます。

詳細については、顔をコレクションに追加する手順をご参照ください。

For each face detected, Amazon Rekognition extracts facial features and stores the feature information in a database. In addition, the command stores metadata for each face that’s detected in the specified face collection. Amazon Rekognition doesn’t store the actual image bytes.

動画プロデューサーのセットアップ

この投稿では、Raspberry Pi 4B （4GB RAM） を使用し、Ubuntu 23.10 を実行して動画を生成します。 Raspberry Pi に接続された USB カメラが Amazon Kinesis Video Streams に動画をストリーミングします。

Raspberry Pi と USB カメラのセットアップ

AWS GStreamer プラグインのビルド

AWS は Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI を提供しています。この SDK により、Raspberry Pi から Kinesis Video Streams への動画ストリーミングが可能になります。

AWS は Docker イメージを GStreamer プラグイン用に提供していますが、Raspberry Pi のアーキテクチャ制約により、これらのイメージは動作しない可能性があります。

プラグインのビルドには、Raspberry Pi 上でのビルド手順に従ってください。

GStreamer プラグインのビルド

Raspberry Pi から Amazon Kinesis Video Streams への動画ストリーミングを実現するため、AWS が提供する GStreamer プラグインをビルドします。以下の手順でビルドを行ってください。

ビルド手順

以下のコマンドを実行してください。システムの仕様によっては、ビルドに 20 分以上かかる場合があります。

sudo apt update
sudo apt upgrade
sudo apt install \
  make \
  cmake \
  build-essential \
  m4 \
  autoconf \
  default-jdk
sudo apt install \
  libssl-dev \
  libcurl4-openssl-dev \
  liblog4cplus-dev \
  libgstreamer1.0-dev \
  libgstreamer-plugins-base1.0-dev \
  gstreamer1.0-plugins-base-apps \
  gstreamer1.0-plugins-bad \
  gstreamer1.0-plugins-good \
  gstreamer1.0-plugins-ugly \
  gstreamer1.0-tools

git clone https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-cpp.git
mkdir -p amazon-kinesis-video-streams-producer-sdk-cpp/build
cd amazon-kinesis-video-streams-producer-sdk-cpp/build

sudo cmake .. -DBUILD_GSTREAMER_PLUGIN=ON -DBUILD_JNI=TRUE
sudo make

ビルドの確認

ビルドが完了したら、以下のコマンドで結果を確認してください：

cd ~/amazon-kinesis-video-streams-producer-sdk-cpp
export GST_PLUGIN_PATH=`pwd`/build
export LD_LIBRARY_PATH=`pwd`/open-source/local/lib
gst-inspect-1.0 kvssink

出力結果は以下のような詳細を表示するはずです：

Factory Details:
  Rank                     primary + 10 (266)
  Long-name                KVS Sink
  Klass                    Sink/Video/Network
  Description              GStreamer AWS KVS plugin
  Author                   AWS KVS <kinesis-video-support@amazon.com>
...

永続的な環境変数

毎回環境変数をリセットしないよう、以下のエクスポートを ~/.profile に追加してください：

echo "" >> ~/.profile
echo "# GStreamer" >> ~/.profile
echo "export GST_PLUGIN_PATH=$GST_PLUGIN_PATH" >> ~/.profile
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> ~/.profile

GStreamer の実行

プラグインをビルドした後、USB カメラを Raspberry Pi に接続し、以下のコマンドを実行して動画データを Amazon Kinesis Video Streams にストリーミングします。

動画の品質を向上させる（例：解像度やフレームレートを上げる）と AWS のコストが増加する可能性があります。

gst-launch-1.0 -v v4l2src device=/dev/video0 \
  ! videoconvert \
  ! video/x-raw,format=I420,width=320,height=240,framerate=5/1 \
  ! x264enc bframes=0 key-int-max=45 bitrate=500 tune=zerolatency \
  ! video/x-h264,stream-format=avc,alignment=au \
  ! kvssink stream-name=<KINESIS_VIDEO_STREAM_NAME> storage-size=128 access-key="<YOUR_ACCESS_KEY>" secret-key="<YOUR_SECRET_KEY>" aws-region="<YOUR_AWS_REGION>"

動画ストリームの確認

Kinesis Video Streams 管理コンソール に移動してライブストリームを確認できます。動画はリアルタイムで表示されるはずです。

Kinesis Video Streams 管理コンソール

テスト

Rekognition Video Stream Processor の開始

Rekognition Video stream processor を開始します。このサービスは Kinesis Video Stream を購読し、顔コレクションを使用して顔を検出し、結果を Kinesis Data Stream にストリーミングします。

以下のコマンドを実行してストリームプロセッサを開始してください：

aws rekognition start-stream-processor --name face-detector-rekognition-stream-processor

ストリームプロセッサのステータスを確認し、動作中であることを確認します：

aws rekognition describe-stream-processor --name face-detector-rekognition-stream-processor | grep "Status"

期待される出力は "Status": "RUNNING" を示すはずです。

顔のキャプチャ

USB カメラが動画をキャプチャすると、以下のプロセスが開始されます：

動画データのストリーミング: 動画データが Kinesis Video Stream にストリーミングされます。
顔の検出: Rekognition Video stream processor が動画ストリームを分析し、顔コレクションに基づいて顔を検出します。
結果のストリーミング: 検出された顔データが Kinesis Data Stream にストリーミングされます。
HLS URL の生成: Lambda 関数が再生用の HLS URL を生成します。

結果を確認するには、以下のコマンドを使用して Lambda 関数のログを確認してください：

sam logs -n Function --stack-name face-detector-using-kinesis-video-streams --tail

サンプルログデータ

ログレコードには、ストリームプロセッサイベントに関する詳細情報が含まれています。以下はその例です：

{
    "InputInformation": {
        "KinesisVideo": {
            "StreamArn": "arn:aws:kinesisvideo:<AWS_REGION>:<AWS_ACCOUNT_ID>:stream/face-detector-kinesis-video-stream/xxxxxxxxxxxxx",
            "FragmentNumber": "91343852333181501717324262640137742175000164731",
            "ServerTimestamp": 1702208586.022,
            "ProducerTimestamp": 1702208585.699,
            "FrameOffsetInSeconds": 0.0,
        }
    },
    "StreamProcessorInformation": {"Status": "RUNNING"},
    "FaceSearchResponse": [
        {
            "DetectedFace": {
                "BoundingBox": {
                    "Height": 0.4744676,
                    "Width": 0.29107505,
                    "Left": 0.33036956,
                    "Top": 0.19599175,
                },
                "Confidence": 99.99677,
                "Landmarks": [
                    {"X": 0.41322955, "Y": 0.33761832, "Type": "eyeLeft"},
                    {"X": 0.54405355, "Y": 0.34024307, "Type": "eyeRight"},
                    {"X": 0.424819, "Y": 0.5417343, "Type": "mouthLeft"},
                    {"X": 0.5342691, "Y": 0.54362005, "Type": "mouthRight"},
                    {"X": 0.48934412, "Y": 0.43806323, "Type": "nose"},
                ],
                "Pose": {"Pitch": 5.547308, "Roll": 0.85795176, "Yaw": 4.76913},
                "Quality": {"Brightness": 57.938313, "Sharpness": 46.0298},
            },
            "MatchedFaces": [
                {
                    "Similarity": 99.986176,
                    "Face": {
                        "BoundingBox": {
                            "Height": 0.417963,
                            "Width": 0.406223,
                            "Left": 0.28826,
                            "Top": 0.242463,
                        },
                        "FaceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
                        "Confidence": 99.996605,
                        "ImageId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
                        "ExternalImageId": "iwasa",
                    },
                }
            ],
        }
    ],
}

動画再生用 HLS URL

ログには、オンデマンド動画再生用の HLS URL も含まれています。例：

https://x-xxxxxxxx.kinesisvideo.<AWS_REGION>.amazonaws.com/hls/v1/getHLSMasterPlaylist.m3u8?SessionToken=xxxxxxxxxx

動画の再生

Safari または Edge などの対応ブラウザで HLS URL を開いてください。
Chrome は HLS 再生をネイティブでサポートしていません。 Native HLS Playback のようなサードパーティ拡張機能を使用してください。

HLS 再生例

クリーンアップ

不要なコストを回避するため、この投稿でプロビジョニングしたすべての AWS リソースをクリーンアップしてください。以下の手順に従います：

1. Rekognition Stream Processor の停止

以下のコマンドを使用して Rekognition Stream Processor を停止します：

aws rekognition stop-stream-processor --name face-detector-rekognition-stream-processor

2. SAM スタックの削除

以下のコマンドを実行して、AWS Serverless Application Model （ SAM ）でプロビジョニングされたすべてのリソースを削除します：

sam delete

これらのコマンドは、Kinesis Video Stream、Kinesis Data Stream、Lambda 関数、および Rekognition Stream Processor を含むすべての関連リソースを削除します。

まとめ

この投稿では、Raspberry Pi、USB カメラ、Amazon Kinesis Video Streams、および Amazon Rekognition Video を使用した、リアルタイム顔検出システムのセットアップ方法を解説しました。これらのテクノロジーを組み合わせることで、AWS サービスのスケーラビリティとパフォーマンスを活用し、効率的な動画ストリームの処理と分析が可能になります。

Happy Coding! 🚀