Summary of results: which methodology/modality “wins?”
Vanilla | Merged | ||||||||
Algorithm | Speed | CCI % | ROC AUC | RMSE | F-1 | CCI % | ROC AUC | F-1 | RMSE |
ZeroR | Instant | 37.9333 | 0.4990 | 0.4144 | NULL | 47.1000 | 0.4990 | NULL | 0.4536 |
OneR | Instant | 43.0000 | 0.5420 | 0.5339 | NULL | 52.9833 | 0.5660 | NULL | 0.5599 |
NaiveBayes | Fast | 63.8500 | 0.8160 | 0.3808 | 0.6410 | 63.9667 | 0.8000 | 0.6430 | 0.4374 |
IBK | Fast | 56.5333 | 0.6910 | 0.4386 | 0.5230 | 59.5833 | 0.6510 | 0.5470 | 0.4972 |
RandomTree | Fast | 59.5833 | 0.6800 | 0.4474 | 0.5920 | 62.7167 | 0.6700 | 0.6210 | 0.4954 |
SimpleLogistic | Moderate | 73.6500 | 0.8850 | 0.3065 | 0.7320 | 73.6500 | 0.8730 | 0.7300 | 0.3502 |
DecisionTable | Slow | too slow for viable computation on consumer-grade hardware | |||||||
MultilayerPerceptron | Slow | ||||||||
RandomForest | Slow | ||||||||
Vanilla | Merged | ||||||||
Meta-Classifier | Speed | CCI % | ROC AUC | RMSE | F-1 | CCI % | ROC AUC | F-1 | RMSE |
Stack (ZR, NB) | Moderate | 37.9333 | 0.4990 | 0.4144 | NULL | vacuous results, omitted | |||
Stack (NB. RT) | Moderate | 63.7000 | 0.8230 | 0.3795 | 0.6350 | 61.9833 | 0.6980 | 0.6130 | 0.4523 |
Vote (ZR, NB, RT) | Moderate | 62.0833 | 0.8430 | 0.3414 | 0.6110 | 64.0500 | 0.8330 | 0.6260 | 0.3830 |
CostSensitive (ZR) | Instant | 37.9333 | 0.4990 | 0.4144 | NULL | 36.6667 | 0.4990 | NULL | 0.4623 |
CostSensitive (OR) | Instant | 42.7000 | 0.5400 | 0.5353 | NULL | 39.6167 | 0.5170 | NULL | 0.6345 |
CostSensitive (NB) | Fast | 63.8500 | 0.8160 | 0.3808 | 0.6410 | 64.0833 | 0.8010 | 0.6450 | 0.4365 |
CostSensitive (IBK) | Fast | 56.5333 | 0.6910 | 0.4386 | 0.5230 | 59.5833 | 0.6510 | 0.5470 | 0.4972 |
CostSensitive (RT) | Fast | 59.5833 | 0.6800 | 0.4474 | 0.5920 | 63.3833 | 0.7050 | 0.6350 | 0.4728 |
CostSensitive (SL) | Moderate | 73.6500 | 0.8850 | 0.3065 | 0.7320 | 74.7833 | 0.8780 | 0.7450 | 0.3478 |
My results are contained in a separate text file in lab journal format. Salient results consisted of:
My methodological strategy began with a wide selection of algorithms. In particular, I was concerned about the trade-off between speed and accuracy. If one algorithm yields >80% accuracy (NB: none of those I queried did so) but takes days to compute (eg. Multilayer Perceptron), it may be unsuitable for rapid analysis of the constantly-changed media landscape in which we work.
An algorithm that takes an hour and returns ~70% accuracy may be more desirable (as was the case with SimpleLogistic). Furthermore, depending on the usage case (such as user interfaces or a mobile app), one might prefer something with lower accuracy but near-instantaneous results (NaiveBayes being an excellent candidate).
Another concern is whether certain modalities can demonstrate the rather cynical scenario where people willing to publicly defend a racist cannot be easily discerned from noise and chatter. I shall return to this concern shortly.
All-in-all, NaiveBayes performed well in terms of worst-case computation time, taking very little time to provide results. Using what I refer to as the “vanilla” dataset, results came with a weighted average ROC AUC just over 0.8 and approximately 64% correctly classified instances. On the other hand, ZeroR performed horribly — basically classifying everything in one class (NB: this is still better than just randomly guessing).
A dark horse contender, however, showed up late in the game: SimpleLogistic. SimpleLogistic took under an hour to build a model and run a 10-fold cross-validation while returning better accuracy (CCI %, ROC AUC, RMSE) than the other algorithms I queried. It took about 30-40 minutes to run this depending such modalities as merged data, penalties, and so on. I find the speed-accuracy trade-off to be reasonable. Furthermore, unlike almost other algorithms investigated, applying CostSensitiveClassifier to merged SimpleLogistic yielded slightly improved accuracy (in terms of CCI% and ROC AUC); however, this came at a cost of a higher RMSE.
Lastly, aside from the observations made in the previous paragraph, using meta-classification strategies such as voting/stacking and introducing penalties did not yield very different results from the algorithms they modulated, but were none-the-less interesting to observe.
Upon further scrutiny, I felt tempted to give merit to the rather bold claim that Pro-Rosanne rhetoric is hard to discern from noise (suggested by running the RandomTree algorithm with penalties). A counter-example to this was discovered when running vanilla IBK with penalties (Anti-Roseanne came back misclassified as Unclear/Unrelated rather often).
This underscores a subtle but critical notion: just as one entertains a moral hazard by cherry-picking data, one may also encounter into a similar, far graver problem by bashing models to support a claim and hiding behind algorithms as a sort of unassailable black box.