A Distributed Fair Random Forest
James Fantin∗
Advised by Dr. Chao Lan
May 2, 2020
Abstract
Machine learning algorithms are increasingly responsible for making critical decisions which
have broad societal impact. Questions are arising about the fairness of the algorithms which
make these decisions. While existing models have been proposed, many require direct access to
private data which may be impossible given new privacy regulations. We propose a distributed
fair random forest algorithm which does not require direct access to private demographic data.
Our approach uses randomly generated decisions trees which are added to our forest if they
are fair with a weighted voting mechanism for accuracy. In building on existing literature, we
assume a third party holds private demographic data which can communicate with a data center
that builds a model without compromising the privacy of individuals demographic data. We
compare our algorithm against existing fair random forest and decision tree algorithms and show
that our method can outperform existing methods.
1 Introduction
Issues with the fairness of machine learning models are becoming wildly apparent. In 2018 Amazon
shut down an internal AI model used to give recommendations for hiring decisions because it
discriminated against female applicants [1]. Recently Apple was sued for discrimination because a
machine learning model gave female applicants for the Apple credit card a lower credit limit than
male applicants [11]. In response to fairness concerns, many fair algorithms have been produced.
Unfortunately, existing methods often require direct access to private demographic information.
This poses a challenge as governments are beginning to regulate access to private information. The
European Union’s General Data Protection Regulation (GDPR) and California’s California Con-
sumer Privacy Act (CCPA) give strict requirements for holding sensitive demographic data. This
has wide-ranging implications for the privacy of protected features which may even be impossible
to directly access. With direct access to private data becoming impossible, techniques are needed
which allow fair algorithms to be developed without direct access to private data. Recently, there
has been work developing distributed fair frameworks which assume a third party has access to
private features and is responsible for building models which are given to a data center containing
the non-private data [4].
In this paper, we propose a distributed fair random forest technique based on the two-party
framework developed in [4]. We maintain the same assumptions about the existence of a third
∗Department of Computer Science, University of Wyoming, email: jfantin1@uwyo.edu
1
party responsible for holding private demographic data and a data center holding the rest of the
data. The third party is responsible for auditing the algorithms produced by the data center to
ensure they are fair.
In our paper, we explore extending the technique to a random forest. Random forest has many
benefits which make it one of the most widely used algorithms for machine learning. One benefit is
it is quite interpretable based on feature contribution scores which score how important features are
at predicting the label [9]. Fairness research has already seen the benefits of interpretable random
forest in regard to which features are selected in fair models versus unfair models [10]. This allows
one to easily compare the importance of features in varying models and discover which features are
the most important to fair models.
We apply our algorithm on three data sets used in fairness research and compare with existing
models to demonstrate that our algorithm achieves similar fairness and accurate results often
outperforming existing models in fairness. This trade-off can be controlled by a hyper-parameter
ρ.
2 Related Work
New discussions about access to protected data are appearing. Encryption based techniques which
hide demographic information have been proposed [7], but these come at the price of increasing
time complexity of the algorithm. It also may be difficult to convince stakeholders and policymakers
that these encryption techniques are secure and would not cause any data privacy leaks if they were
ever compromised [3].
Fair decision tree techniques have also been developed. Most consist of a modification of the
information gain metric when selecting attributes to split leaf nodes. The general idea is to add a
new fairness information gain that measures how closely the selected attribute correlates with the
protected features. Often this criteria is subtracted or multiplied by the standard information gain
criteria to split based on accurate but fair attributes [6]. In their paper, the entropy score is used
for the fair information gain criteria and measures the entropy in regards to the protected feature.
Another decision tree technique called FAHT uses a Hoeffding tree with a similarly modified
information gain which measures the fairness gain of a specific attribute [12]. Here the fairness
information gain is combined using multiplication as opposed to a subtraction approach to account
for the varying scales between the two metrics. This paper develops the algorithm for an on-line
setting.
The existing fair random forest algorithm technique is to modify the information gain similarly
to the previous papers and modify the Gini impurity score to measure both the class label and
the protected feature [10]. In this paper, the fair information gain is subtracted from the standard
information gain criteria. Intuitively, the model chooses features at nodes in the decision trees
which are correlated with the label but not with the protected attribute. One item of note is that
if a feature is highly correlated with the label and the protected attribute, but no better feature
exists to split the node on then this feature will be chosen. So in certain data sets, this algorithm
can still lead to biased results. In the paper the fair decision trees are then combined using the
standard majority voting routine into a fair random forest.
2
3 Fair Random Forest Framework
We propose a new technique using randomly generated decision trees as opposed to a regularization
approach for fairness. This has the benefit by allowing our method generate trees quickly which
are easily adapted into a distributed framework. We also describe the added benefit that we can
generate more fair trees and threshold the fairness of each tree in order to optimize the trade-off
between fairness and accuracy.
In line with existing fairness research, we will use the common statistical parity [8] metric and
the normed disparate impact [8] metric to analyze the fairness of each algorithm.
We use the extra random decision tree algorithm developed in [2] for the base trees in our
random forest technique. We add the constraint that only one feature is randomly selected at each
split. This added constraint amounts to building a completely random decision tree. For improved
performance, we use the implementation of this algorithm developed in the popular SciKit learn
python package.
An added benefit of using random decision trees is that with a few added constraints our random
decision trees can be differentially private. Jagannathan et al. uses a slightly modified version of
the extra random decision tree algorithm and show that by removing a pruning step in the tree
generation process and adding noise to each leaf node that the algorithm is -differentially private
[5]. Their work considers the random forest framework with random decision trees in a similar
fashion to our technique. Thus, by following their steps, our algorithm could be made differentially
private, though future research is needed to prove that by only selecting the fairest random decision
trees that differential privacy is maintained.
As opposed to existing fair random forest algorithms, our algorithm maintains individual privacy
using a third party which holds private demographic information. A data center is then responsible
for generation of the decision tree and storing the non-private data. The data center builds a random
decision tree with a training set D and sends decision tree to the third party. The third party audits
the tree on D and returns D if the statistical parity is below ρ, the fairness constraint set by the
data center. A smaller ρ will build a fairer model compared to a larger ρ. The hyper-parameter ρ
can be used to balance the trade-off between increased accuracy and increased fairness.
The linear expression to learn fair models in [4] cannot easily be applied to decision trees.
Instead, we propose a weighted voting technique to create a linear combination of decision trees.
The standard random forest algorithm uses a majority voting technique to predict the label for
each instance. Majority voting with our fair model is not optimal as it will often result in trivial
decision trees. We define a trivial decision tree as a decision tree DT which for every instance x,
DT predicts x has label 0 (or label 1 for all instances). This occurs because trivial decision trees
are always fair and have a statistical parity value of 0. Thus, they can often be chosen by our
technique when ρ is small enough.
While a random forest built with many trivial decision trees would be fair, it would not lead to
any interesting predictions, which we desire for real world use cases. To solve this issue, we propose
a weighted voting mechanism for each tree to learn an accurate tree. We collect the predictions
for all decision trees and then use a ridge regression model to use learn weights for each decision
tree. This has the added benefit that most trivial decision trees will receive a weight of 0 since their
predictions do not help predict the true label. Thus, the ridge model eliminates trivial decision trees
leaving decision trees and weights more accurate trees higher leading to more useful predictions,
which we demonstrate is true in practice.
3
3.1 Algorithm Design
In this section we present the pseudocode for our algorithm. ExtraDecisionTree is a function
which produces a random decision tree [2]. This object has two functions which are fit which fits
and builds the tree and predict which predicts the values from some data. Statistical parity is
a function which calculates the statistical parity. Ridge is a function which produces a standard
ridge regression model with two standard functions fit and predict. For our fair random forest R,
we assume it contains a list R.dTrees which we will use to store decision trees. R.ridgeModel is
where we will store the ridge regression model.
Algorithm 1: Training A Fair Random Forest
Input:
D, training set
L, true label
M , number of fair models
F , fair attribute
α, ridge regression alpha value
ρ, fairness constraint
Output: A fair random forest R
i = 0;
predAr = [];
while i < M do
decisionTree = ExtraDecisionTree(D);
decisionTree.fit(D);
prediction = decisionTree.predict(D);
sp = StatisticalParity(D, F , prediction);
if sp < ρ then
i++;
R.addTree(decisionTree);
predAr.append(prediction);
end
end
ridgeModel = new Ridge(α);
ridgeModel.fit(predAr, L);
R.ridgeModel = ridgeModel;
In the distributed model, a third parity with the private demographic data would be responsible
for measuring the statistical parity of each model. This would allow the third party to control all
access to the private demographic data. Then the third party would only return the tree if statistical
parity was below ρ.
4
Algorithm 2: Prediction of Fair Random Forest
Input:
R, A Fair Random Forest
D, data set
Output: A prediction pred for D
predAr = [];
for dTree in R.dTrees do
predAr.append(dtree.predict(D));
end
pred = ridgeMode.predict(predAr);
4 Experiment
In this section, we evaluate our proposed algorithm on three real-world data sets and compare
against two other tree based methods. All code for our algorithm is written in python and available
at our website1 along with the data used in each of the model comparisons. We reproduce the FAHT
code, as this was not made available by the authors, in Java which is also available on our website2.
4.1 Data
To allow comparison against existing fair distributed research [4], we compare against the Com-
munity Crime3, COMPAS4, and Credit Card5 data sets. The Community data set contains 1993
instances with 101 features. The label is the crime rate of the community. The protected feature
is the fraction of African-American residents which we threshold to a binary value. The value is 0
if the percentage is less than 0.5 and 1 otherwise. The COMPAS dataset contains 18317 instances
with 40 features. We removed data with incomplete records and reduced the dataset to 1600 in-
stances and 15 features. The binary label is risk of recidivism with race as the protected feature.
The Credit Card data set contains 30000 instances with 23 features. The label is if a person defaults
on their credit payment and education degree is the sensitive attribute.
4.2 Design
For each algorithm, we randomly chose 75% of the instances for training data and 25% for testing
data. The models are evaluated on 50 trials where we randomly shuffle the data sets in each trial.
Since the FAHT algorithm uses an on-line setting, there is no testing / training split but random
shuffles are used to account for concept drift which may occur overtime in the data.
We optimize each algorithm using grid search for hyper-parameters or values chosen in the
corresponding paper. FAHT has no hyper-parameters in the model so no tuning is needed. For our
1https://github.com/pjlake98/Distributed-Fair-Random-Forest
2https://github.com/pjlake98/Fair-Forest
3https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
4https://www.kaggle.com/danofer/compass
5https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
5
Table 1: Performance on the Community Crime Data Set
Method Statistical Parity Normed Disparate Classification Error
FAHT 0.0943 ± 0.4405 0.7590 ± 0.4533 0.1482 ± 0.0117
FRF 0.0991 ± 0.3755 0.7754 ± 0.6879 0.1263 ± 0.0741
FDT 0.1730 ± 0.1914 0.7177 ± 0.3171 0.1395 ± 0.0392
DFRF 0.0153 ± 0.0430 0.7252 ± 0.6741 0.1413 ± 0.0770
Table 2: Performance on the COMPAS Data Set
Method Statistical Parity Normed Disparate Classification Error
FAHT 0.0008 ± 0.0037 0.2100 ± 0.3535 0.2296 ± 0.0031
FRF 0.0000 ± 0.0000 NA 0.2281 ± 0.0176
FDT 0.0019 ± 0.0019 0.3581 ± 0.6419 0.2302 ± 0.0164
DFRF 0.0037 ± 0.0023 0.6612 ± 0.6307 0.2296 ± 0.0104
Table 3: Performance on the Credit Card Data Set
Method Statistical Parity Normed Disparate Classification Error
FAHT 0.0020 ± 0.0050 0.2561 ± 0.2435 0.2309 ± 0.0013
FRF 0.0626 ± 0.0436 0.3632 ± 0.1450 0.1882 ± 0.0124
FDT 0.0461 ± 0.0289 0.2757 ± 0.1508 0.1924 ± 0.0138
DFRF 0.0123 ± 0.0540 0.1285 ± 0.2904 0.2055 ± 0.0183
DFRF algorithm, we select 100 as the number of base models and 1 for the minimum number of
instances in each leaf. In the community crime data set, we chose 50 for the alpha value in ridge
regression with a max depth of 20. For the COMPAS data set, we chose 5 for alpha and 4 for the
max depth. For the credit card data set, we chose 225 for alpha and 12 for the max depth.
For the FRF (fair random forest) algorithm we choose 1 for the minimum number of instances
in each leaf in each trial. We select 4 as the max depth on the community crime set, 6 for the
max depth on the COMPAS set, and 6 for the credit card set. For the FDT (fair decision tree) we
choose 1 for the minimum number of instances in each leaf in each trial. We select 4 for the max
depth on the community crime set, 6 for the depth of the COMPAS set, and 6 for the credit card
set.
4.3 Results
Our results are summarized in Tables 1, 2 and 3. We notice that on the Community Crime data set
our DFRF algorithm achieves the lowest statistical parity by far compared to the other algorithms
with a slight cost to accuracy of the model. In the other two data sets it still maintains small
statistical parity in line with existing models. This suggests our DFRF algorithm can remove bias
while maintaining low classification error.
4.4 Sensitive Analysis
One benefit of our technique is that ρ our fairness constraint is a hyper-parameter. Thus, we can
alter ρ to optimize the trade-off between fairness and accuracy. We demonstrate this behavior on
6
Figure 1: Classification Performance versus ρ
the Community Crime Data Set in Figure 1. The performance is averaged over 50 random trials
with the same hyper-parameters as chosen as were used in Table 1. By choosing a lower ρ we may
have a more fair model at the cost of higher classification error.
5 Conclusion
We propose a distributed fair random forest algorithm which protects the privacy of private de-
mographic data. We assume a third party maintains access to private demographic data and
communicates with a data center audit fair trees without compromising individual demographic
data. Our algorithm is compared against existing fair tree algorithms and we demonstrate that it
maintains similar accuracy and fairness while sometimes achieving the best fairness on three real
world data sets.
References
[1] Dastin, J. Amazon scraps secret ai recruiting tool that showed bias against women. Reuters
(2018).
[2] Geurts, P., Ernst, D., and Wehenkel, L. Extremely randomized trees. Machine learning
63, 1 (2006), 3–42.
[3] Holstein, K., Wortman Vaughan, J., Daumé III, H., Dudik, M., and Wallach, H.
Improving fairness in machine learning systems: What do industry practitioners need? In
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (2019),
pp. 1–16.
[4] Hu, H., Liu, Y., Wang, Z., and Lan, C. A distributed fair machine learning framework with
private demographic data protection. 2019 IEEE International Conference on Data Mining
(2019).
[5] Jagannathan, G., Pillaipakkamnatt, K., and Wright, R. N. A practical differentially
private random decision tree classifier. In 2009 IEEE International Conference on Data Mining
Workshops (2009), IEEE, pp. 114–121.
7
[6] Kamiran, F., Calders, T., and Pechenizkiy, M. Discrimination aware decision tree
learning. In 2010 IEEE International Conference on Data Mining (2010), IEEE, pp. 869–874.
[7] Kilbertus, N., Gascón, A., Kusner, M. J., Veale, M., Gummadi, K. P., and Weller,
A. Blind justice: Fairness with encrypted sensitive attributes. Proceedings of the 35th Inter-
national Conference on Machine Learnin (2018).
[8] McNamara, D., Ong, C. S., and Williamson, R. C. Provably fair representations. CoRR
abs/1710.04394 (2017).
[9] Palczewska, A., Palczewski, J., Robinson, R. M., and Neagu, D. Interpreting random
forest classification models using a feature contribution method. In Integration of reusable
systems. Springer, 2014, pp. 193–218.
[10] Raff, E., Sylvester, J., and Mills, S. Fair forests: Regularized tree induction to minimize
model bias. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society
(2018), pp. 243–250.
[11] Vigdor, N. Apple card investigated after gender discrimination complaints. The New York
Times (2019).
[12] Zhang, W., and Ntoutsi, E. Faht: an adaptive fairness-aware decision tree classifier.
Twenty-Eighth International Joint Conference on Artificial Intelligence (2019).
8