Summary: Machine Learning on the Rosanne-ABC Firing Incident Dataset

Summary of results: which methodology/modality “wins?”

Vanilla Merged
Algorithm Speed CCI % ROC AUC RMSE F-1 CCI % ROC AUC F-1 RMSE
ZeroR Instant 37.9333 0.4990 0.4144 NULL 47.1000 0.4990 NULL 0.4536
OneR Instant 43.0000 0.5420 0.5339 NULL 52.9833 0.5660 NULL 0.5599
NaiveBayes Fast 63.8500 0.8160 0.3808 0.6410 63.9667 0.8000 0.6430 0.4374
IBK Fast 56.5333 0.6910 0.4386 0.5230 59.5833 0.6510 0.5470 0.4972
RandomTree Fast 59.5833 0.6800 0.4474 0.5920 62.7167 0.6700 0.6210 0.4954
SimpleLogistic Moderate 73.6500 0.8850 0.3065 0.7320 73.6500 0.8730 0.7300 0.3502
DecisionTable Slow too slow for viable computation on consumer-grade hardware
MultilayerPerceptron Slow
RandomForest Slow
Vanilla Merged
Meta-Classifier Speed CCI % ROC AUC RMSE F-1 CCI % ROC AUC F-1 RMSE
Stack (ZR, NB) Moderate 37.9333 0.4990 0.4144 NULL vacuous results, omitted
Stack (NB. RT) Moderate 63.7000 0.8230 0.3795 0.6350 61.9833 0.6980 0.6130 0.4523
Vote (ZR, NB, RT) Moderate 62.0833 0.8430 0.3414 0.6110 64.0500 0.8330 0.6260 0.3830
CostSensitive (ZR) Instant 37.9333 0.4990 0.4144 NULL 36.6667 0.4990 NULL 0.4623
CostSensitive (OR) Instant 42.7000 0.5400 0.5353 NULL 39.6167 0.5170 NULL 0.6345
CostSensitive (NB) Fast 63.8500 0.8160 0.3808 0.6410 64.0833 0.8010 0.6450 0.4365
CostSensitive (IBK) Fast 56.5333 0.6910 0.4386 0.5230 59.5833 0.6510 0.5470 0.4972
CostSensitive (RT) Fast 59.5833 0.6800 0.4474 0.5920 63.3833 0.7050 0.6350 0.4728
CostSensitive (SL) Moderate 73.6500 0.8850 0.3065 0.7320 74.7833 0.8780 0.7450 0.3478

My results are contained in a separate text file in lab journal format. Salient results consisted of:

My methodological strategy began with a wide selection of algorithms. In particular, I was concerned about the trade-off between speed and accuracy. If one algorithm yields >80% accuracy (NB: none of those I queried did so) but takes days to compute (eg. Multilayer Perceptron), it may be unsuitable for rapid analysis of the constantly-changed media landscape in which we work.

An algorithm that takes an hour and returns ~70% accuracy may be more desirable (as was the case with SimpleLogistic). Furthermore, depending on the usage case (such as user interfaces or a mobile app), one might prefer something with lower accuracy but near-instantaneous results (NaiveBayes being an excellent candidate).

Another concern is whether certain modalities can demonstrate the rather cynical scenario where people willing to publicly defend a racist cannot be easily discerned from noise and chatter. I shall return to this concern shortly.

All-in-all, NaiveBayes performed well in terms of worst-case computation time, taking very little time to provide results. Using what I refer to as the “vanilla” dataset, results came with a weighted average ROC AUC just over 0.8 and approximately 64% correctly classified instances. On the other hand, ZeroR performed horribly — basically classifying everything in one class (NB: this is still better than just randomly guessing).

A dark horse contender, however, showed up late in the game: SimpleLogistic. SimpleLogistic took under an hour to build a model and run a 10-fold cross-validation while returning better accuracy (CCI %, ROC AUC, RMSE) than the other algorithms I queried. It took about 30-40 minutes to run this depending such modalities as merged data, penalties, and so on. I find the speed-accuracy trade-off to be reasonable. Furthermore, unlike almost other algorithms investigated, applying CostSensitiveClassifier to merged SimpleLogistic yielded slightly improved accuracy (in terms of CCI% and ROC AUC); however, this came at a cost of a higher RMSE.

Lastly, aside from the observations made in the previous paragraph, using meta-classification strategies such as voting/stacking and introducing penalties did not yield very different results from the algorithms they modulated, but were none-the-less interesting to observe.

Upon further scrutiny, I felt tempted to give merit to the rather bold claim that Pro-Rosanne rhetoric is hard to discern from noise (suggested by running the RandomTree algorithm with penalties). A counter-example to this was discovered when running vanilla IBK with penalties (Anti-Roseanne came back misclassified as Unclear/Unrelated rather often).

This underscores a subtle but critical notion: just as one entertains a moral hazard by cherry-picking data, one may also encounter into a similar, far graver problem by bashing models to support a claim and hiding behind algorithms as a sort of unassailable black box.