Enhancing transparency and trust in predictive analytics for education

Hadis Anahideh, assistant professor and director of the OPLEX Lab at UIC

Hadis Anahideh, assistant professor and director of the OPLEX Lab at UIC, is leading a Department of Education-funded project – with collaborator Denisa Gandara at UT Austin – using artificial intelligence and machine learning models to predict a college student’s outcome fairly.

The project aims to provide university officials and decision-makers with unbiased tools and information to help them decide about admissions and interventions to make students more successful during the program.

The findings of this research, titled “Fair Multivariate Adaptive Regression Splines for Ensuring Equity and Transparency,” were presented at the prestigious AAAI 2024 conference and published in its proceedings.

Over the years, data has been collected about previous students with diverse characteristics and outcomes. The outcomes may include whether they graduated successfully from the program or not or their total GPA, serving as a measure of success. Other characteristics may include high school math scores or SAT scores.

“Many variables and a wealth of information has been collected over the years by universities, but decision-makers cannot review each student’s history one by one to analyze what has happened in previous years,” Anahideh said.

Using artificial intelligence and machine learning models to read such data, extract essential insights, and create a model that decision-makers can use for incoming students is the most efficient way to alleviating this workload.

“The models will not make decisions outright, but they will help decision-makers to make more informed choices based on what happened in the past,” she said.

Anahideh and her team faced different challenges in the existing models that needed to be addressed. A major problem is the bias that exists in the historical data. The data showed a disparity in the success rates of students from certain demographic groups or a lower representation in general. This may stem from factors such as resource availability, family backgrounds, parental education levels, or family income.

“The historical data has historical issues,” she said. “For many reasons, the presence and success of certain demographic groups in the education system was pretty low compared to the majority.”

She added that for various reasons, some demographic groups have historically lower representation and success rates within the education system compared to others. Factors such as gender and other demographic characteristics play significant roles in these disparities.

When the existing machine learning models or predictive analytic techniques directly read the data collected in the past, they inherently absorb any biases present within that data. These models cannot recognize these biases as an issue and produce biased outcomes.

This problem has been addressed in recent studies, and researchers are trying to address the issue of bias in machine learning.

“What we did was akin to filling a significant gap in the literature,” she said. “There was a lack of attention to numerical responses, such as total GPA, which has values. It is not only if they would graduate or not. It is what would be the specific value of GPA?”

For those settings, there were very few models that could address the bias and those models had very limited capacity in terms of their complexity and computational efficiency. They couldn’t extract information in complex settings like education in a timely manner.

“What we proposed was a model designed to handle these numerical metrics of success. It’s complex enough that it can extract the intricate information, and it can provide fair decision rules for the decision-makers using the model,” she said.

The decision rules are based on the characteristics and background of the students, assigning specific cutoff points for key variables. For example, using the model, a student with an SAT score above a certain value with a family income level below a threshold, would have a certain predicted GPA. This provides a sound rationale for the model’s prediction, making it accessible to practitioners.

“We also incorporated fairness metrics, so the outcome is not biased. The model we developed is more fair and more accurate compared to other fair regression models, which can handle numerical responses. Our model outperformed those and that was a big success,” Anahideh said.

The model also addresses “trustworthiness research,” which increases the trust between researchers and practitioners, particularly as complex models may be too complicated for practitioners to understand.

“If we want them to trust us, we need to be transparent. And our model is very transparent by providing the decision rules,” she said. “We created an online tool that is publicly available, and everybody can use it.”