How would you feel if you knew your device could tell how you felt? While other companies are busy trying to give machines intellectual intelligence, Boston-based startup Affectiva is trying to give them emotional intelligence. Imagine customer service robots that could gauge their responses based on your emotions, games that can adjust difficulty and scenarios based on your emotional response, a healthcare app on your smartphone that could use your facial expressions to read your pain levels, or even a car that knows how its passengers are feeling.
“[We're] motivated by one simple question: What if technology could identify emotions the same way humans can?” Jay Turcot, Affectiva's Director of Applied AI, told an audience during a presentation at the 2017 GPU Technology Conference (GTC). “We believe interacting with technology being a cold experience is a side effect of machines not having empathy.”
Affectiva, which spun out of the MIT Media Lab in 2009, has released an Emotional AI software development kit (SDK) focused on letting developers create machines that can understand human emotion. It does this by analyzing facial expressions. “It turns out the face is a great window into our emotional and cognitive state,” Turcot said. “The face shows a diversity of emotions as well as estimates of intensity.”
|Affectiva's Abdelrahman Mahmoud talks to an audience at GTC 2017 about autonomous vehicle applications for Emotion AI. (Image source: Design News).|
To teach its AI how to recognize emotions Affectiva employed several layers of deep learning – specifically a combination of a convolutional neural network (CNN) and a support vector machine (SWM). The CNN mimics animal visual processing and SWM is used for making classifications.
The first thing to train the AI on was object detection – recognizing faces in photos and videos. Once the AI was able to recognize faces, it then moved to facial action and attribute classification, meaning, “Can we codify each specific facial expression?” according to Turcot. Smiles, for example, are not broad and universal. Some people have big smiles, for others a smile is more subtle. The challenge is in getting the AI to recognizing all types of smiles, not just one.
The final layer Turcot called Facial Expression Interpretation. “Can we look at what we're seeing and map that into an estimate of [someone's] emotional state?”
Turcot told the audience what Affectiva encountered was, in essence, a multi-attribute classification problem. “It turns out in the real world expressions are really subtle,” he said. Expressions are combinations of features (eyes, brow, lips, ect.), and age, race, and gender can all play a role in how an expression takes shape. And this doesn't even begin to account for environmental factors like lighting and someone's orientation to the camera.
“The real challenge is how can we train a neural net to perform this task fast enough to run on a device?” Turcot said.
He said the Affectiva team knew the solution couldn't be cloud-based because it processing would take too long and users would want the functionality available offline as well. To train the system they used machine learning and training data sourced from patterns around the world containing spontaneous real world reactions captured in the wild so as to allow for plenty of variance in lighting, positioning, and other factors.
In a separate talk at GTC Abdelrahman Mahmoud, Product Manager at Affectiva, talked about applying Emotion AI to autonomous driving. “We believe that Emotion AI fits in in trying to build a holistic view of the in-car occupants, understanding of their emotional, mental, and physiological state, and being able to intervene when appropriate.” Mahmoud said this functionality could be as simple as using head movement for hands-free gesture control of the infotainment system, to monitoring children and passengers, all the way to complex tasks like understanding emotional content and having the car adjust its driving style based on its passenger's emotional response.
Imagine if a car could tell when its driver is inattentive (texting, eating, putting on makeup, ect.), experiencing cognitive load (boredom, frustration, or confusion), or even drowsy or intoxicated and was able to respond accordingly. Cars have sensors to monitor their mechanical systems, but generally overlook the occupants' mental states and emotions, Mahmoud told the audience.
Affectiva's software has already been successfully used in an academic setting as well. In April a team of researchers affiliated with Affectiva published a study in the open-access online science journal PLOS One, “A Large-scale Analysis of Sex Differences in Facial Expressions” that used Affectiva's software to examine gender differences in expressing emotions across five countries, including the United States, the UK, France, England, and Germany. The study is the first large-scale, naturalistic analysis of gender differences in facial expressions conducted using Affectiva’s software. In the study 1,862 participants were asked to watch a number of emotion-eliciting online videos while at home or in some other natural environment. For those curious, among other findings, the study shows that women overall elict significantly more positive emotions than men.
Mahmoud said Affectiva has spent over five years on studies like these, helping their company to understand human emotion to better train its AI. A 2012 study found that combining facial data with voice data created a much more accurate recognition of drowsiness in drivers. And a 2016 study found similar results by combining audio and video to recognize frustration in drivers. “Frustration for example isn't just negative, Mahmoud said. “It can be expressed sarcastically or even positively.” Hence why the additional voice component helps.
Turcot shared some tips for developers who may be hoping to leverage Affectiva's Emotional SDK for their own purposes. First, he said you should use all the data available to you when training the AI. He said Affectiva used 40,000 videos in its first training set, which when annotated frame by frame, resulted in millions of images for the system to analyze and learn from.
He also emphasized finding ways to shrink your model. “You want a model with a small enough memory footprint to run on a small device,” he said, suggesting that reducing model complexity in particular can be a big help with this -- focusing on models that are trained and can condense the knowledge and store it in an efficient architecture. Finding and eliminating redundancy in your layers can be a big help in this regard, particularly for the machine vision aspects of facial expression recognition. “Smaller filters are faster, but can be highly correlated. Looking for that redundancy could result in faster architectures,” Turcot suggested.
“Also, match your architecture to the problem,” he said. “Small architectures can still work very well and are sufficiently small for on-device processing. ... Simplifying the problem means you don't have to build an oversized [neural network] to make it robust.” He cautioned against simply copying architectures from other developers because if your problem doesn't matching their problem or constraints you won't have an optimum performance.
Turcot said Affectiva has plans to incorporate more accurate facial emotion detection into its SDK as it develops and is also doing research into incorporating tone and voice into its emotional assessment. Further down the line the system may also be able to use other modes of non-verbal communication like hand gestures and body language to more accurately assess emotion as well.
“The biggest take home is these deep learning methods still outperform traditional methods, even with model constraints, Turcot told the audience. “ A lot of people think you have to go as deep as possible with neural networks, [but] you can still train small networks that do very, very well.”
A version of Affectiva's Emotion AI, AffdexMe, is currently available on Google Play and via the Apple App Store.
Chris Wiltz is the Managing Editor of Design News