Where's Waldo versus AWS

by | 7 Nov 2019 | DevOps Expertise, Infrastructure & Operations

Face detection can be a complicated subject. But the AWS Rekognition API allows you to do quite advanced things very quickly. In this post I'm going to show you how to win at "where's Waldo" every time.


I'm going to present you a use of the API in python using the boto3 library which allows you to automate almost anything you want on AWS in python. I will use python3.6 (or +) in this tutorial.

So we're going to create a virtualenv to isolate ourselves from your machine and not pollute it too much with new python packages:

python -m venv mon-venv

Activate it:

source mon-venv/bin/activate

And we install boto3 :

pip install boto3

Here we go!


To communicate with AWS via boto3, there are a few preliminary steps. First, you need to create an IAM in AWS and retrieve the access keys and the secret key. Once done, there are several ways to use them. Either we declare them in ~/.aws/credentials and boto3 will automatically read them for use. Either we define them in the environment using export like this:

export AWS_ACCESS_KEY="ma-super-clé"
export AWS_SECRET_KEY="ma-clé-que-personne-connait"

Let's go for the detection!

We'll start by instantiating the boto3 client for rekognition and then we'll open our streaming image to send it to the API:

import boto3
import os
client = boto3.client('rekognition',
image = open('/path/to/image.jpg').read()
client.detect_faces(Image={'Bytes': imageBytes},

If we do the test with this image:

The result is as follows: (I've truncated it a little for readability.)

{'Confidence': 99.99996185302734,
 'FaceDetails': [{'AgeRange': {'High': 34, 'Low': 22},
 'BoundingBox': {'Height':…,'Left':…,'Top':…,'Width':…},
 'Emotions': [
  {'Confidence': 0.184126317501068, 'Type': 'CONFUSED'},
  {'Confidence': 0.522737801074981, 'Type': 'SAD'},
  {'Confidence': 0.094188556075096, 'Type': 'SURPRISED'},
  {'Confidence': 0.011522555723786, 'Type': 'FEAR'},
  {'Confidence': 98.85511779785156, 'Type': 'CALM'},
  {'Confidence': 0.038878932595252, 'Type': 'DISGUSTED'},
  {'Confidence': 0.062760457396507, 'Type': 'HAPPY'},
  {'Confidence': 0.230674102902412, 'Type': 'ANGRY'}],
 'Beard': {'Confidence': 89.78614044189453, 'Value': True},              
 'Eyeglasses': {'Confidence': 97.04179382324219, 'Value': False},
 'EyesOpen': {'Confidence': 99.22305297851562, 'Value': True},
 'Gender': {'Confidence': 98.88516998291016, 'Value': 'Male'},
 'Smile': {'Confidence': 99.73528289794922, 'Value': False},
 'Sunglasses': {'Confidence': 98.868484497070, 'Value': False},
 'MouthOpen': {'Confidence': 99.42462158203125, 'Value': False},
 'Mustache': {'Confidence': 66.82974243164062, 'Value': False},
 'Landmarks': [
   {'Type': 'eyeLeft',
     'X': 0.38918817043304443,
     'Y': 0.2930305302143097},
   {'Type': 'eyeRight', …},
   {'Type': 'mouthLeft', …},
   {'Type': 'mouthRight', …},
   {'Type': 'nose', …},
   {'Type': 'leftEyeBrowLeft', …},
   {'Type': 'leftEyeBrowRight', …},
   {'Type': 'leftEyeBrowUp', …},
   {'Type': 'rightEyeBrowLeft', …},
   {'Type': 'rightEyeBrowRight', …},
   {'Type': 'rightEyeBrowUp', …},
   {'Type': 'leftEyeLeft', …},
   {'Type': 'leftEyeRight', …},
   {'Type': 'leftEyeUp', …},
   {'Type': 'leftEyeDown', …},
   {'Type': 'rightEyeLeft', …},
   {'Type': 'rightEyeRight', …},
   {'Type': 'rightEyeUp', …},
   {'Type': 'rightEyeDown', …},
   {'Type': 'noseLeft', …},
   {'Type': 'noseRight', …},
   {'Type': 'mouthUp', …},
   {'Type': 'mouthDown', …},
   {'Type': 'leftPupil', …},
   {'Type': 'rightPupil', …},
   {'Type': 'upperJawlineLeft', …},
   {'Type': 'midJawlineLeft', …},
   {'Type': 'chinBottom', …},
   {'Type': 'midJawlineRight', …},
   {'Type': 'upperJawlineRight', …}],
  'Pose': {'Pitch': …, 'Roll': …, 'Yaw': …},
  'Quality': {'Brightness': …, 'Sharpness': …}}]

You can see that it really detects a lot of things! Emotions, the position of facial elements, some distinctive features (beard, glasses, ...), the rotations of the face, and every detail is always accompanied by a confidence index. It is also able to process a photo with several faces and give the characteristics of all the faces recognized in the photo.

And now it's up to the two of us, Waldo!

We're going to get a mug from a happy panorama of weird people:

A mug (yes, the same as above. What's the big deal? I wasn't going to look for a new image either!).
Lots of weird people

So we're just going to send this whole thing through the mill and see what happens.

face = open('/path/to/image-face.jpg').read()
team = open('/path/to/image-team.jpg').read()
client.compare(face, team, 90) # Le 90 magique c'est un seuil de détection basé sur l'indice de confiance de rekognition

Phew! After all these efforts we could take a break but no, we're brave and we'll analyze the result to see how it goes:

{'FaceMatches': [
      {'Height': 0.06476461887359619,
       'Left': 0.5835976004600525,
       'Top': 0.2256196290254593,
       'Width': 0.023400133475661278},
     'Confidence': 99.99836730957031,
     'Landmarks': [
       {'Type': 'eyeLeft',
        'X': 0.5919212698936462,
        'Y': 0.2561943531036377},
       {'Type': 'eyeRight', …},
       {'Type': 'mouthLeft', …},
       {'Type': 'mouthRight', …},
       {'Type': 'nose', …}]},
 'UnmatchedFaces': [{}, {}, …]}

He found me! It places my paper clip in the photo, it gives some characteristics (but less than the detect_faces), always with the trust index and as a bonus it gives us all the found faces that don't match. Depending on the threshold, the quality of the photo, the effort of photoshop to put my face everywhere, it could also find several faces that "match".

The ultimate test

We'll see what AWS can do versus Waldo:

We already start with the simple detection to see if Waldo does not have something extra that allows him to camouflage himself and the result :

{'FaceDetails': [], …}

Too strong that Waldo...



Architect Developer @theTribe

Why don't we talk?