AI data could be tainted even as it’s being cleaned

Risk USA: Expert says even touching raw data could lead to loss of context

Data cleansing efforts should be properly documented, says Capital One's Hanif

Companies cleaning the data they’re using for their machine learning models could unintentionally adulterate it in the process, one expert has said.

“Anytime you touch the data before it enters your algorithm, there is absolutely always the risk that it removes something that has contextual information, and you don’t know it yet,” said Zachary Hanif, principal machine learning engineer at Capital One, who spoke on a panel on data science at the Risk USA conference in New York on November 9.


Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact or view our subscription options here:

You are currently unable to copy this content. Please contact to find out more.

Sorry, our subscription options are not loading right now

Please try again later. Get in touch with our customer services team if this issue persists.

New to View our subscription options


Want to know what’s included in our free membership? Click here

This address will be used to create your account

You need to sign in to use this feature. If you don’t have a account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here