I have been working with Anjanette Raymond (JD) at the Ostrom Workshop at Indiana University and Dr. Sriraam Natarajan at The University of Texas, Dallas on a project to use data mining techniques to help us learn more about the nature of judicial bias. Following past literature I used a random forest classifier to predict the outcomes of supreme court cases in the Supreme Court Database (SCBD). After confirming the predictive power of this method, I began engineering demographic features (race and gender of petitioner and respondent, for example). I validated my approach by predicting the race of the petitioner by using the other information about a case. This means that race has some relationship with how a case is handled.
The next step for this project is to use Bayesian Networks and other models in an attempt to understand the complex relationships behind judicial bias. All it takes is simple statistics to show disparities in outcomes between race. However, using machine learning approaches we can tease out the causal relationships that lead to these outcomes. For example, it might be the case that the judges are biased, but it could also be that there are other variables (such as access to a skilled attorney) that lead to the results we see. Most likely, it a combination of a variety of factors, which we will be better equipped to mitigate once we understand their influence.
The SCBD is only the most easily accessible dataset for court cases. I would also like to focus on local-scale cases in the future, which could be more relevant to the average person. Of particular interest are traffic violations, as we suspect that many areas might store this data along with the demographic information from drivers licenses, allowing us to avoid the arduous process of manually finding and entering feature data.