Vinereactor

Research

We are researching novel ways to use computer vision, crowdsourcing, and deep learning algorithms to teach machines to interact with humans, improve affect recognition, and perform general-purpose video emotion classification.

You Watch

We show you videos. These videos are randomly scraped from vine.co and we use Amazon Mechanical Turk to show these videos to 30 unique viewers per video.

We Record

We record your reactions. For each 7-second vine video, we are collecting approximately 0.5 GB of viewer reaction video.

Machines Learn

We train machines to recognize emotion. Using deep learning, we train computers to recognize face muscle activations and transfer the emotion expressed by the viewer to the video.

Database and Data Collection

Although machines are more pervasive in our everyday lives, we are still forced to interact with them through limited communication channels. Our overarching goal is to support new and complex interactions by teaching the computer to interpret the expressions of the user. Towards this goal, we present Vinereactor, a new labeled database for face analysis and affect recognition. Our dataset is one of the first to explore human expression recognition in response to a stimulus video, enabling a new facet of affect analysis research. Furthermore, our dataset is the largest of its kind, nearly a magnitude larger than its closest related work.
See our research publication here

Edward Kim and Shruthika Vangala, "Vinereactor: Crowdsourced Spontaneous Facial Expression Data",
International Conference on Multimedia Retrieval, 2016.

@inproceedings{kim2016vinereactor,
  title={Vinereactor: Crowdsourced Spontaneous Facial Expression Data},
  author={Kim, Edward and Vangala, Shruthika},
  booktitle={International Conference on Multimedia Retrieval (ICMR)},
  year={2016},
  organization={IEEE}
}

Deep Learning

One of the most important cues for human communication is the interpretation of facial expressions. We present a novel computer vision approach for Action Unit (AU) recognition based upon deep learning. We introduce a new convolutional neural network training loss specific to AU intensity that utilizes a binned cross entropy method to fine-tune an existing network. We demonstrate that this loss can be more effectively and accurately trained in comparison to an L2 regression or naive cross entropy approach. Additionally, our model naturally represents the co-occurance of action units and can handle missing data through regression data imputation. Finally, our experimental results demonstrate the improvement of our framework over the current state-of-the-art.
See our research publication here

Edward Kim and Shruthika Vangala, "Deep Action Unit Classification using a Binned Intensity Loss and Semantic Context Model",
International Conference on Pattern Recognition, 2016.

@inproceedings{kim2016deep,
  title={Deep action unit classification using a binned intensity loss and semantic context model},
  author={Kim, Edward and Vangala, Shruthika},
  booktitle={23rd International Conference on Pattern Recognition (ICPR)},
  pages={4136--4141},
  year={2016},
  organization={IEEE}
}

Affect Transfer Learning

The key component of our research is the recognition that emotion classification of an image sequence is not an absolute truth, nor is it the sole dependent variable of an input video. In other words, machines cannot simply look at images and infer emotion; humans need to be in the loop. Some people will find a video amusing, whereas others may not. Thus, there reaction of human observers (as opposed to the actual video content) is much more correlated to solving the problem of emotion classification. If we can quantify the emotion displayed by a human watching these videos, we can attempt to transfer the emotion classified from the observer to the video.

Support

This research is supported by an AWS in Education Research Grant award.

The Data

Here is a preview of some of our data.

Stimulus Videos

We collected 200 random vine videos from the comedy vine.co channel by scraping the featured videos from the website sporadically between May and September 2015. The videos have a duration range between 2 - 7 seconds and come with further associated meta data including url, likes, number of comments, and number of "revines".

[ XML ] [ CSV ] [ JSON ]

Reaction Videos

We collected 6,029 video responses from 343 unique mechanical turk workers in response to the 200 stimulus videos, with a minimum of 30 reaction videos per stimulus video. The total number of video frames in the database is 1,380,343 at a resolution of 320x240. All of the reactions are stored in JPG format and the total size of the reaction video data is 143 GB. Of the 6,029 video responses, 3455 (57.3%) of the reactions were approved to be used for research purposes. This public dataset includes 222 unique subjects with an average of 17.275 reactions per stimulus video.

Data coming soon!