For humans, emotional intelligence is the key to both personal and professional success and the first step towards achieving it is awareness of one’s own and other people’s feelings. In order to achieve optimal effectiveness of Artificial Intelligence (AI) systems, they too will need to gain (an artificial equivalent of) emotional intelligence. Understanding what humans are feeling remains one of the long-standing, fundamental problems in AI and one that needs to be solved to achieve general AI, as well as insure both acceptance of AI systems by humans and mitigate potential negative effects of AI to our well-being.
Despite the fact that recent developments in the domain of deep learning have helped us advance the state of the art in the domain of affective computing significantly and that the technology already supports a $20 billion market, there is mounting evidence that automatic emotion recognition, as it is done today, is flawed, to the extent that has lead a part of the scientific community to call for it to be regulated and banned.
Existing methods for emotion detection are by and large based on unimodal approaches, where only a single source of stimuli is used to try and map what humans feel to one of the predefined (and often mutually exclusive) categories, such as 'happy', 'sad', 'angry', 'scared', 'surprised' or 'disgusted'.
They usually rely on one of the following:
Facial expressions thought to be representative of different emotional states (e.g. a frown or a smile) are detected in images and used for assignment of one of the emotional labels.
Voice analysis is performed to identify patterns in speech that could be indicative of different emotions (e.g. stuttering, increased stress or higher speed of speech).
Natural Language Understanding techniques leverage written content for emotion recognition, taking into account context in addition to meaning of words and phrases.
These approaches, used separately, leave much to be desired at best or completely miss the target at worst - often providing false predictions on what humas feel which can lead to biased decisions. Just because someone is frowning does not mean that they are angry; if they laugh it does not necessarily imply they find something funny; if they say that something is 'great' they might not really mean it. A forced smile, a fake laugh or sarcasm are quite difficult to reliably identify when using unimodal approaches, as a lot of information available exclusively via other channels is missing.
This makes it very challenging to determine a proper context for emotional state evaluation.
Our approach is a multi-modal and a multi-view one. We are able to combine the visual, language and acoustic view of the data acquired to understand emotions in complex, multi-party human interactions. Our technology leverages state of the art computer vision, audio and natural language processing methods to capture complex emotional states by relying on securely extracted privacy-preserving features.
Stay updated on the latest developments
If you are reading this, I’m assuming that you’ve read the “part 1” of the series and already have the context. If not, suffices to say that the article below was written a while ago, during my tenure as the Senior Research Scientist of the Tandemlaunch startup foundry...
As an engineer, I love reuse. So, when the time came to write the first article for our blog, I went sifting through the stuff I had on my hard drive. Some of it actually did not look too bad, if you are looking for ways to bootstrap your blog-writing process and...
Cinteraction team consists of experts in Artificial Intelligence, Machine Learning, Software Engineering and Business Intelligence. Founding members of the team are published University professors and successful serial entrepreneurs.