How would you feel if you knew your device could tell how you felt? While other companies are busy trying to give machines intellectual intelligence, Boston-based startup Affectiva is trying to give them emotional intelligence. Imagine customer service robots that could gauge their responses based on your emotions, games that can adjust difficulty and scenarios based on your emotional response, a healthcare app on your smartphone that could use your facial expressions to read your pain levels, or even a car that knows how its passengers are feeling.
“[We're] motivated by one simple question: What if technology could identify emotions the same way humans can?” Jay Turcot, Affectiva's Director of Applied AI, told an audience during a presentation at the 2017 GPU Technology Conference (GTC). “We believe interacting with technology being a cold experience is a side effect of machines not having empathy.”
Affectiva, which spun out of the MIT Media Lab in 2009, has released an Emotional AI software development kit (SDK) focused on letting developers create machines that can understand human emotion. It does this by analyzing facial expressions. “It turns out the face is a great window into our emotional and cognitive state,” Turcot said. “The face shows a diversity of emotions as well as estimates of intensity.”
|Affectiva's Abdelrahman Mahmoud talks to an audience at GTC 2017 about autonomous vehicle applications for Emotion AI. (Image source: Design News).|
To teach its AI how to recognize emotions Affectiva employed several layers of deep learning – specifically a combination of a convolutional neural network ( CNN) and a support vector machine ( SWM). The CNN mimics animal visual processing and SWM is used for making classifications.
The first thing to train the AI on was object detection – recognizing faces in photos and videos. Once the AI was able to recognize faces, it then moved to facial action and attribute classification, meaning, “Can we codify each specific facial expression?” according to Turcot. Smiles, for example, are not broad and universal. Some people have big smiles, for others a smile is more subtle. The challenge is in getting the AI to recognizing all types of smiles, not just one.
The final layer Turcot called Facial Expression Interpretation. “Can we look at what we're seeing and map that into an estimate of [someone's] emotional state?”
Turcot told the audience what Affectiva encountered was, in essence, a multi-attribute classification problem. “It turns out in the real world expressions are really subtle,” he said. Expressions are combinations of features (eyes, brow, lips, ect.), and age, race, and gender can all play a role in how an expression takes shape. And this doesn't even begin to account for environmental factors like lighting and someone's orientation to the camera.
“The real challenge is how can we train a neural net to perform this task fast enough to run on a device?” Turcot said.
He said the Affectiva team knew the solution couldn't be cloud-based because it processing would take too long and users would want the functionality available offline as well. To train the system