paint-brush
Researchers Think Playing a Game Could Help AI Better Understand Human Emotionsby@gamifications
New Story

Researchers Think Playing a Game Could Help AI Better Understand Human Emotions

by Gamifications FTW Publications5mJanuary 14th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Researchers have developed a gamified method of acquiring annotated facial emotion data without an explicit labeling effort by humans.
featured image - Researchers Think Playing a Game Could Help AI Better Understand Human Emotions
Gamifications FTW Publications HackerNoon profile picture
0-item

Authors:

(1) Krist Shingjergji, Educational Sciences, Open University of the Netherlands, Heerlen, The Netherlands (krist.shingjergji@ou.nl);

(2) Deniz Iren, Center for Actionable Research, Open University of the Netherlands, Heerlen, The Netherlands (deniz.iren@ou.nl);

(3) Felix Bottger, Center for Actionable Research, Open University of the Netherlands, Heerlen, The Netherlands;

(4) Corrie Urlings, Educational Sciences, Open University of the Netherlands, Heerlen, The Netherlands;

(5) Roland Klemke, Educational Sciences, Open University of the Netherlands, Heerlen, The Netherlands.

Editor's note: This is Part 4 of 6 of a study detailing the development of a gamified method of acquiring annotated facial emotion data. Read the rest below.

IV. METHODOLOGY

In this section, we present the overall methodology of this study, and describe the experimental setup for the data collection and analysis. In order to evaluate the efficacy and utility of the proposed gamified data collection method, i.e., Facegame, we pose the first research question as follows.


RQ1: Do the data generated by the players represent emotional facial expressions accurately?


We also hypothesize that through playing the game repeatedly, players exercise deliberate practice of their face expression and perception skills, and thus, be able to improve them. Hence, our second research question is as follows.


RQ2: Do the players improve their facial expression and perception skills through repeated play?


Finally, to evaluate how the proposed explanation method is perceived, and whether this yield an improved level of understanding regarding the outcome of the FER model. Hence, we formulate the third research question as follows.


RQ3: Do the natural language explanations help players understand the outcome of the FER model?


A. Experiment setup and procedure


In the experiment design, we adjusted the Facegame slightly. Participants were asked to play six rounds of the game, each round corresponding to one of the six target images, and each image representing one basic emotion. Participants were shown each target image five times in a row to be able observe the score change in each try. They were given five seconds, indicated by a countdown on the screen, to mimic the face. Following, the score of their attempt was displayed (Eq. 1). To examine the potential effect of the natural language prescriptions, we divided the participants into two groups. One group received only the score (control group), and the other received the natural language prescriptions as well as the score (treatment group)


B. Analysis of the collected emotional facial image data


To answer the first research question, we analyzed the collected emotional facial image data, and evaluate to what extend the collected data can be used to model facial expressions of emotions. To this end, we used two commonly used in-thewild data sets; Facial Expression Recognition 2013 (FER2013) [42], and The Real-world Affective Face Database (RAF-DB) [43] [44], and we trained a simple neural network classifier.


FER2013 is a large-scale dataset that includes images collected automatically by the Google image search API labelled with the six basic emotions and neutral. RAF-DB is a real-world dataset that includes images of faces collected from the Internet and manually labelled with crowdsourcing. The neural network architecture we trained is similar to the baseline model used in the study of Bishay et al. [45], a shallow convolutional neural network (CNN) including four convolutional layers with 32, 32, 64, and 64 filters respectively and the ReLU activation function. The first three layers were followed by a max-pooling layer with a 2×2 filter, and the last one by a flatten layer. The final two layers are fully-connected. The first fully-connected layer has 96 neurons, while the second has six sigmoid units representing the predictions of the six emotions.


Considering that FER2013 and RAF-DB are large-scale, unbalanced datasets, samples were taken in order to respect the size and class distribution of the Facegame data. The samples were created by random sampling without replacement 200 and 50 instances from each of the six emotion from the training and testing set, respectively. With this technique we obtained a balanced sample of each set containing 1,200 instances for training and 300 instances for testing. To achieve generalizabile observations, we formed five different samples from each of both FER2013 and RAF-DB datasets. Finally, we compared the performance of the models trained on each sample set twice; first, without the inclusion of the Facegame Data, second, with the inclusion of Facegame Data.


C. Participant Survey


To receive quantitative and qualitative feedback regarding the design of Facegame, and to partially answer the third research question, we prepared a questionnaire. The participants were asked to fill in the questionnaire after completing the game experiment. The questions covered demographics questions, i.e., age and gender, technical information, i.e., type of device and browser used for the game, and their quantitative and qualitative feedback on the game. The quantitative feedback was given with a 5-Likert scale score (Very Satisfied, Somewhat Satisfied, Neutral, Somewhat Unsatisfied, Very Unsatisfied) on different aspects of the game; the ease of use, time to load and browser compatibility, and design. The survey for the participants in the treatment group included additional information regarding the natural language prescriptions. Specifically, they were asked to give a score on the usefulness and the design of the prescriptions. The qualitative feedback was sought with two open-ended questions about comments and suggestions regarding the functionality and the design of the game. Similar to the quantitative feedback, the participants of the treatment group were asked two additional open-ended questions regarding the clarity and other comments on the prescriptions that were displayed. The analysis of the survey data consisted of descriptive statistics on the age of the participants, quantitative analysis of the satisfaction scores regarding the design aspects of the game, and qualitative analysis on the open-ended questions.


D. Analysis of Skill Improvement


To answer the second and the third research questions, we analyzed the consecutive scores of the players in each round. Specifically, we compared the first score of each round against the mean of the remaining four scores of the same round. We tested the significance of the difference between the mean of the scores for both groups, and also between the groups using a t-test.


This paper is available on arxiv under CC BY 4.0 DEED license.