Using machine learning to understand the adversary in credit card fraud
An electric toothbrush purchased in Florida, a pair of sneakers purchased in Michigan, a vacuum cleaner purchased in Ohio ‒ you didn’t buy them, but someone using your credit card did.
Credit card fraud is a costly problem for banks and a major frustration for consumers. In 2016, credit card fraud resulted in losses of $22 billion globally.
For banking firms, there is a need to strike a balance between the volume of flagged credit card transactions – where a customer’s card gets declined – and losses due to fraud, especially as consumers expect safety and surety from their credit card company. In managing these competing issues through data science applications banks have often relied on static models that act as a filters for fraud and non-fraud transactions, and which rely on supervised training. These models, which change very little over time, are exposed to the risk of being learned and circumvented.
And as fraudsters learn the classifier and chip away at its effectiveness, data scientists at credit card companies spend large amounts of time and money to combat their efforts.
"To know your enemy, you must become your enemy." – Sun Tzu, The Art of War
DSI Master of Science in Data Science students Tyler Lewris (‘18), Adrian Mead (‘18), and Sai Prasanth (‘18) undertook a capstone project for a multinational banking firm that aimed to use adversarial learning – a field of machine learning concerned with problems that typically exist in highly contentious environments (like spam filters, cybersecurity, and credit card fraud), where a data scientist often has to contend with a malicious actor who is dedicated to beating whatever system the data scientist is trying to install – to create a better system for identifying fraud.
“This will not only cut costs for the credit card company, it will a boon for consumers,” said MSDS ‘18 student Adrian Mead, a researcher working on the project. “The likely benefits from reductions in fraud include easier access to credit for riskier consumers (as fraud itself can be spotted more successfully), overall lower APRs for credit card customers (as there would be a lower cost to be recouped from fraud), and a reduction in the number of consumers that need to spend precious time on the phone with a fraud investigator as they review recent charges. Merchants also bear some of the burden, so decreases in fraud could also lead to lower prices on a large number of goods.”
Lewris, Mead, and Prasanth worked closely with the client,which included representatives from the banking firm, and faculty mentors Stephen Adams, Peter Alonzi, and Peter Beling. Capstone projects play an integral role in the 11-month MSDS program. Interwoven through the entire program, the projects program both elucidate concepts learned in the classroom and allow students to apply data science tools and techniques to real-world problems in the industry.
To better understand the fraudulent activity, the group used modeling to take into account the behavior of the adversary. The competitive advantage of modeling adversary behavior lies with the financial institution being able to learn and adapt to changing fraud strategies and react accordingly.
Previous adversarial learning work in fraud prevention showed increased effectiveness over static models that did not account for changing fraudster behavior. Lewris, Mead, and Prasanth extended this work by utilizing reinforcement learning. Inspired by behavioral psychology, reinforcement learning focuses on performance, and involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). A recent example is Google's AlphaGo, a Go-playing program trained entirely with reinforcement learning that was able to beat the world's best human player, something that many thought was impossible.
This project used reinforcement learning by framing the fraudster and card issuer interaction as a Markov Decision Process (MDP) and performing prediction and control. By defining a set of states that an agent of interest can be in, actions that this agent can take, and rewards that the agent can procure through traversing these states and actions, there are a number of techniques that allow the agent to quickly identify the best possible series of states and actions that lead to the greatest expected rewards. The MDP at play here takes on the perspective of an agent (in this case, the fraudster with a stolen credit card) who interacts with an environment (merchants and a fraud classifier), by taking actions (transactions), and receiving rewards (relating to whether the transactions were successful/declined).
This approach allowed researchers to simulate fraudulent episodes in such a way that techniques like model-free policy iteration can identify an optimal policy for the fraudster. The episode ended when the card is terminated by the credit card company for fraud.
“We found that when we used reinforcement learning to train a fraud-committing agent to be good at committing fraud (that is, as the agent in our Markov Decision Process), we were able to most successfully confuse the fraud agent's ability to learn how to commit fraud successfully when making small and frequent changes to our fraud classifier (the model which says whether or not a transaction is fraud),” Mead said. “However, there was also a threshold to how effective our attempts at confusion were and we saw diminishing returns to varying the classifier too often.”
Lewris, Mead, and Prasanth found that, compared to a static classifier, using reinforcement learning to make small changes to the fraud classifier on a regular basis led to a significant decrease in the ability of a fraud agent to learn an optimal policy.