Summary: Machine Learning on the Rosanne-ABC Firing Incident Dataset

Summary of results: which methodology/modality “wins?”

		Vanilla				Merged
Algorithm	Speed	CCI %	ROC AUC	RMSE	F-1	CCI %	ROC AUC	F-1	RMSE
ZeroR	Instant	37.9333	0.4990	0.4144	NULL	47.1000	0.4990	NULL	0.4536
OneR	Instant	43.0000	0.5420	0.5339	NULL	52.9833	0.5660	NULL	0.5599
NaiveBayes	Fast	63.8500	0.8160	0.3808	0.6410	63.9667	0.8000	0.6430	0.4374
IBK	Fast	56.5333	0.6910	0.4386	0.5230	59.5833	0.6510	0.5470	0.4972
RandomTree	Fast	59.5833	0.6800	0.4474	0.5920	62.7167	0.6700	0.6210	0.4954
SimpleLogistic	Moderate	73.6500	0.8850	0.3065	0.7320	73.6500	0.8730	0.7300	0.3502
DecisionTable	Slow	too slow for viable computation on consumer-grade hardware
MultilayerPerceptron	Slow
RandomForest	Slow

		Vanilla				Merged
Meta-Classifier	Speed	CCI %	ROC AUC	RMSE	F-1	CCI %	ROC AUC	F-1	RMSE
Stack (ZR, NB)	Moderate	37.9333	0.4990	0.4144	NULL	vacuous results, omitted
Stack (NB. RT)	Moderate	63.7000	0.8230	0.3795	0.6350	61.9833	0.6980	0.6130	0.4523
Vote (ZR, NB, RT)	Moderate	62.0833	0.8430	0.3414	0.6110	64.0500	0.8330	0.6260	0.3830
CostSensitive (ZR)	Instant	37.9333	0.4990	0.4144	NULL	36.6667	0.4990	NULL	0.4623
CostSensitive (OR)	Instant	42.7000	0.5400	0.5353	NULL	39.6167	0.5170	NULL	0.6345
CostSensitive (NB)	Fast	63.8500	0.8160	0.3808	0.6410	64.0833	0.8010	0.6450	0.4365
CostSensitive (IBK)	Fast	56.5333	0.6910	0.4386	0.5230	59.5833	0.6510	0.5470	0.4972
CostSensitive (RT)	Fast	59.5833	0.6800	0.4474	0.5920	63.3833	0.7050	0.6350	0.4728
CostSensitive (SL)	Moderate	73.6500	0.8850	0.3065	0.7320	74.7833	0.8780	0.7450	0.3478

My results are contained in a separate text file in lab journal format. Salient results consisted of:

My methodological strategy began with a wide selection of algorithms. In particular, I was concerned about the trade-off between speed and accuracy. If one algorithm yields >80% accuracy (NB: none of those I queried did so) but takes days to compute (eg. Multilayer Perceptron), it may be unsuitable for rapid analysis of the constantly-changed media landscape in which we work.

An algorithm that takes an hour and returns ~70% accuracy may be more desirable (as was the case with SimpleLogistic). Furthermore, depending on the usage case (such as user interfaces or a mobile app), one might prefer something with lower accuracy but near-instantaneous results (NaiveBayes being an excellent candidate).

Another concern is whether certain modalities can demonstrate the rather cynical scenario where people willing to publicly defend a racist cannot be easily discerned from noise and chatter. I shall return to this concern shortly.

All-in-all, NaiveBayes performed well in terms of worst-case computation time, taking very little time to provide results. Using what I refer to as the “vanilla” dataset, results came with a weighted average ROC AUC just over 0.8 and approximately 64% correctly classified instances. On the other hand, ZeroR performed horribly — basically classifying everything in one class (NB: this is still better than just randomly guessing).

A dark horse contender, however, showed up late in the game: SimpleLogistic. SimpleLogistic took under an hour to build a model and run a 10-fold cross-validation while returning better accuracy (CCI %, ROC AUC, RMSE) than the other algorithms I queried. It took about 30-40 minutes to run this depending such modalities as merged data, penalties, and so on. I find the speed-accuracy trade-off to be reasonable. Furthermore, unlike almost other algorithms investigated, applying CostSensitiveClassifier to merged SimpleLogistic yielded slightly improved accuracy (in terms of CCI% and ROC AUC); however, this came at a cost of a higher RMSE.

Lastly, aside from the observations made in the previous paragraph, using meta-classification strategies such as voting/stacking and introducing penalties did not yield very different results from the algorithms they modulated, but were none-the-less interesting to observe.

Upon further scrutiny, I felt tempted to give merit to the rather bold claim that Pro-Rosanne rhetoric is hard to discern from noise (suggested by running the RandomTree algorithm with penalties). A counter-example to this was discovered when running vanilla IBK with penalties (Anti-Roseanne came back misclassified as Unclear/Unrelated rather often).

This underscores a subtle but critical notion: just as one entertains a moral hazard by cherry-picking data, one may also encounter into a similar, far graver problem by bashing models to support a claim and hiding behind algorithms as a sort of unassailable black box.