Evaluation
The evaluation will be based on macro average F1 measure. For development purposes, we provide an evaluation script which you can download at
The script takes two files as input, a gold file (which is a file like trial.labels
in the data section) and a prediction file which has the same format.
This is only to help with the evaluation during development. Participation is managed through our Codalab page at https://competitions.codalab.org/competitions/19214.
To help you estimate the performance, a baseline result with a bag-of-words (unigrams and bigram) logistic regression classifier, trained on the whole training data (version 3) and tested on the whole trial data (version 3) lead to the following result:
Label | TP | FP | FN | P | R | F |
---|---|---|---|---|---|---|
joy | 1197 | 536 | 539 | 0.691 | 0.690 | 0.690 |
sad | 801 | 556 | 659 | 0.59 | 0.549 | 0.569 |
disgust | 972 | 634 | 625 | 0.605 | 0.609 | 0.607 |
anger | 826 | 712 | 774 | 0.537 | 0.516 | 0.526 |
surprise | 953 | 744 | 647 | 0.562 | 0.596 | 0.578 |
fear | 1036 | 624 | 562 | 0.624 | 0.648 | 0.636 |
MicAvg | 5785 | 3806 | 3806 | 0.603 | 0.603 | 0.603 |
MacAvg | 0.601 | 0.601 | 0.601 |
Official result: 0.601042964046
For comparison, these were the results for the data of version 2.
Label | TP | FP | FN | P | R | F |
---|---|---|---|---|---|---|
joy | 975 | 653 | 622 | 0.599 | 0.611 | 0.605 |
sad | 821 | 646 | 778 | 0.56 | 0.513 | 0.536 |
disgust | 970 | 642 | 627 | 0.602 | 0.607 | 0.605 |
anger | 818 | 705 | 782 | 0.537 | 0.511 | 0.524 |
surprise | 952 | 747 | 648 | 0.56 | 0.595 | 0.577 |
fear | 1038 | 624 | 560 | 0.625 | 0.65 | 0.637 |
MicAvg | 5574 | 4017 | 4017 | 0.581 | 0.581 | 0.581 |
MacAvg | 0.58 | 0.581 | 0.58 |
Official result: 0.580426957687
For comparison, these were the results for the data of version 1.
Label | TP | FP | FN | P | R | F |
---|---|---|---|---|---|---|
joy | 979 | 656 | 621 | 0.599 | 0.612 | 0.605 |
sad | 817 | 644 | 783 | 0.559 | 0.511 | 0.534 |
disgust | 975 | 643 | 625 | 0.603 | 0.609 | 0.606 |
anger | 818 | 708 | 782 | 0.536 | 0.511 | 0.523 |
surprise | 957 | 739 | 643 | 0.564 | 0.598 | 0.581 |
fear | 1038 | 626 | 562 | 0.624 | 0.649 | 0.636 |
MicAvg | 5584 | 4016 | 4016 | 0.582 | 0.582 | 0.582 |
MacAvg | 0.581 | 0.582 | 0.581 |
Official result: 0.580853294
This result have been achieved with the simple text classifier (a wrapper for liblinear at https://bitbucket.org/rklinger/simpletextclassifier), tested on Mac OS X and Linux with Java 1.8:
# get and compile simple text classifier baseline
unset SSH_ASKPASS
git clone https://bitbucket.org/rklinger/simpletextclassifier.git
cd simpletextclassifier
mvn compile assembly:single
cd ..
# get training data (you need your credentials here)
wget --user USERNAME --password PASSWORD http://implicitemotions.wassa2018.com/data/protected/train-v3.csv.gz
gunzip train-v3.csv.gz
# get trial data
wget http://implicitemotions.wassa2018.com/data/unprotected/trial-v3.csv.gz
wget http://implicitemotions.wassa2018.com/data/unprotected/trial-v3.labels.gz
gunzip trial-v3.csv.gz
gunzip trial-v3.labels.gz
# get official evaluation script
wget http://implicitemotions.wassa2018.com/evaluation/evaluate-iest.py
chmod +x evaluate-iest.py
# train model
./simpletextclassifier/bin/run.sh --train ./train-v3.csv --model ./iest.model
# apply model
./simpletextclassifier/bin/run.sh --test ./trial-v3.csv --model ./iest.model > ./trial-v3.prediction
# evaluate
./evaluate-iest.py trial-v2.labels trial-v3.prediction