A Distributed Fair Random Forest
thesisposted on 02.05.2020, 00:00 by James Fantin
Machine learning algorithms are increasingly responsible for making critical decisions which have broad societal impact. Questions are arising about the fairness of the algorithms which make these decisions. While existing models have been proposed, many require direct access to private data which may be impossible given new privacy regulations. We propose a distributed fair random forest algorithm which does not require direct access to private demographic data. Our approach uses randomly generated decisions trees which are added to our forest if they are fair with a weighted voting mechanism for accuracy. In building on existing literature, we assume a third party holds private demographic data which can communicate with a data center that builds a model without compromising the privacy of individuals demographic data. We compare our algorithm against existing fair random forest and decision tree algorithms and show that our method can outperform existing methods.