Today, artificial intelligence (AI) and machine learning (ML) algorithms and models can be found in just about every vertical and industry. They are used by financial institutions to speed up approval for loans; in healthcare to assist with diagnostics; by judicial systems to help predict recidivism rates; by social media networks to choose which content to show each user; in public welfare systems to help select those eligible for assistance; by recruiters and college acceptance programs; and many more.
AI and ML were widely welcomed as bringing many benefits, chief among them faster decision-making, but it was also believed that AI-led decisions would be more fair. Unlike human beings, an algorithm is not affected by a nice smile, and doesn’t have a bad day or simply take a dislike to someone. A research paper into the criminal justice system in the US concluded that algorithms could help reduce racial disparities.
However, AI and ML algorithms have still resulted in “unfair” decisions. For example, COMPAS, an algorithm used to predict recidivism in Broward County, Florida, incorrectly labeled African-American defendants as “high-risk” at nearly twice the rate it mislabeled white defendants.
The unfairness built into some AI algorithms has led to a demand for explainable AI, where the decision-making process can be examined. And since most often the input values for these ML and AI models is personal information, they must comply with data security and privacy policies, bringing ML under the DPO’s purview.
The potential unfairness of AI/ML models
An algorithm doesn’t understand concepts such as “fair,” “bias,” or “prejudice,” but it can still produce unfair, biased, and prejudiced results. This is highly problematic when AI models make such consequential decisions. As the NIST observed, “AI can make decisions that affect whether a person is admitted into a school, authorized for a bank loan or accepted as a rental applicant.”
There are a number of ways that AI could produce unfair results. If the model is trained on a training set that is not representative, it will make mistaken premises that lead to unfair predictions. Amazon discovered this when it used AI to hire candidates for a tech position. It fed the algorithm data from its most successful hires over the past 10 years, but almost all of those were men. The model concluded that only males make good employees, and penalized female applicants.
Another possibility is that the algorithm picks up on statistical correlations that are accurate, but illegal. For example, a mortgage lending model could notice that older people are more likely to default on their loans and reduce loan approvals based on age, which would be against most anti-discrimination laws.
The rise of explainable AI
In theory, AI biases can be corrected much more easily than human ones. With AI models, you just need to find and fix the root of the bias, like correcting datasets that only use male tech employees.
But that’s easier said than done. AI algorithms can be hard to understand and interpret, and sometimes it’s a struggle to find the root cause. There are algorithms that are open and easy to read, but others are black boxes. It can be impossible to know what produced this unfair result, and impossible even to prove that it’s unfair.
Developers and engineers might not realize that they are building a model or using training data that will produce unfair results. That’s why you need an explainable AI model where you can trace the process it uses to reach a conclusion. McKinsey notes that “Explainability techniques could help identify whether the factors considered in a decision reflect bias and could enable more accountability than in human decision making.”
Science itself requires engineers to be able to test their processes and explain how the model arrived at this result. If that’s not the case, we’ll end up back with humans following their gut instinct, except we’ve replaced gut instinct with impenetrable AI.
Yet translating fairness or ethics into machine learning is tough when algorithms don’t understand such concepts. David De Cremer, founder and director of the Center of AI Technology for Humankind at NUS Business school, writes in HBR that “to address this issue, computer scientists and engineers are focusing primarily on how to govern the use of data provided to help the algorithm learn and how to use guiding principles and techniques that can promote interpretable AI: systems that allow us to understand how the results emerged.”
This is the challenge of explainable AI: how to turn your black box AI or ML model into a process where you can explain all the steps that led to the result and prove that it’s not based on a data privacy violation.
New regulations raise the stakes for explainable AI
As well as the ethical need for fairness and transparency, privacy regulations require you to research, understand, and explain your model. In Europe, for example, GDPR standards require large tech companies to provide data subjects with “meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject.”
Existing anti-discrimination laws for hiring, permitting credit, and other actions already prohibit using data like age, gender, race, sexual orientation, etc. These laws include automated AI-based decision making, so data scientists have to prove that such data was never part of the training set.
In the US, the new ADPPA bill under discussion between the House and the Senate includes a detailed description of what must be done to make sure that ML models don’t infringe on rights to privacy. ADPPA requires companies to:
- Make their models explainable;
- Audit models to reduce the risk that it could produce biased results;
- Prove they evaluated the algorithm sufficiently throughout the design phase;
- Prove they evaluated the training data used to develop the algorithm;
- Receive explicit consent from the data subject if training data includes personal data
The challenges of data privacy in an era of explainable AI
Issues around data privacy and ML and AI algorithms are not new, although ADPPA sharpens the question. Today’s big tech companies use enormous, democratized datasets, allowing anyone in the company to access data potentially covered by data privacy laws. This raises the stakes in ensuring that data is de-contextualized before it can be used for AI models.
Even minor, accidental uses of personal data in ML algorithms can have enormous effects for the data subject. For example, an ML recruitment algorithm might choose to ignore qualified candidates because of their race, or an AI-powered dynamic pricing algorithm on an ecommerce site could raise the price for a customer because of their zip code. Social harms like lack of trust or radicalization to extremist beliefs can arise from algorithms that push content towards consumers, and it’s not even far-fetched to suggest that someone could be arrested or confined to prison because of an algorithm that misused private data to conclude the person was guilty of a crime.
Because it’s so easy to create and run these models in an era of easily-accessible, cloud-based ML tools, many people without a data science background can create and run an ML model. Due to their lack of experience, they may use forbidden sensitive data for training data, and/or run a model using ML without specific consent from the data subject.
With the typical disconnect between privacy professionals and the development team, data protection and governance personnel may never know that this breach has occurred – until a lawsuit arrives. DPOs are left chasing feathers in the wind, trying to keep all their models and training datasets under supervision.
How Privya can help
Ensuring explainable AI that doesn’t infringe on data protection regulations requires bringing together the insights of DPOs, and the ML/AI know-how of data science teams. DPOs today have a huge role to play in the creation of AI/ML models, even if they don’t have any expertise in the data science field.
Privya can help. Privya is a privacy tech company that scans code as it’s being built. It understands the flow of an algorithm, detects if personal data is being built into it, and flag the algorithm, all in an automated manner. In this way, Privya helps provide the data compliance input that data science teams need, as well as giving DPOs the visibility into development flows that could trigger an issue in an ML algorithm that might violate principles of fairness and ethics.