WASSA 2018 Implicit Emotion Shared Task Data

Data format

The data consists of two columns, separated by a tabulator. The first column contains the emotion class of the word which has been removed in the text in the second column. The position where it has been removed is marked with [#TARGETWORD#]. Data only available to participants with username and password is marked as (P).

As published originally Tweet texts with Labels Tweet ID with labels
Sample Data sample.csv.gz
Trial Data Data without labels and Labels Data with Labels Tweet IDs with Labels
Training Data Data with labels (P) Data with Labels (P) Tweet IDs with Labels
Test Data Test data without labels and Labels Data with Labels Tweet IDs with Labels

Below is the information as it has been available during the shared task. The data is the same as in the table above.

  • Sample data (published on 2018-02-08, disclaimer: the preprocessing of the data changed from sample to training, trial and test data.)
  • Training data v3 (published on 2018-06-18)
    • Bugfix from v2: Some tweets which originally contained the word "unhappy" were labeled as sad, some as joy. As the trigger word in these tweets is only "happy", therefore the tweet contains un[#triggerword#], we fixed this such that they are always labeled as joy.
    • To access the training data, fill the form at [not available anymore]. Make sure that you mention your real name and affiliation. An individual password will be sent to you a couple of days after you filled the form.
    • Tweet-IDs will be made publicly available after the shared task.
  • Development/Trial data without labels v3 and associated labels (published 2018-06-18, this is the format you should prepare for submission; an evaluation script is available at evaluation).
  • Test data without labels (published on 2018-07-02), labels of the test data (published 2018-07-13)

Old versions of the data: