New Age Fraud Analytics: Machine Learning on Hadoop

Fraud represents the biggest loss for banks, accounting for upward of $1.744 billion in losses annually. The banking industry spends millions each year on technologies aimed at reducing fraud and retaining customers, but the spend does little in protecting banks. Let’s focus on why the current fraud detection approaches don't work as well as they should and how machine learning on big data can help.

Most current approaches to detect fraud are largely static and rely on patterns and signatures derived from a subset of historical transactions. Banks often use sophisticated mathematical models created from known historical fraud to determine if a transaction occurring in real time is fraudulent or not. Little, if any, attention is paid to detecting first time fraud, which has no known signature. Moreover, the signature obtained is also not comprehensive enough as it is created from a subset of data. As a result, banks are always playing catch-up and first-time fraud often goes undetected.

Legacy fraud solutions

Another issue is the frequency with which the models are updated. In many cases the models used to detect fraudulent patterns are only updated once a year due to the difficulty, cost and time required for accurate model creation and deployment. A transactional fraud scheme may go undetected for months before being properly categorized in an updated model.

Finally, the most important consideration for banks is the balance between flagging suspected fraudulent transactions with the negative impact on customer satisfaction when transactions are mistakenly declined. Reducing the number of these false positives directly relates to the accuracy of detecting fraudulent activities. The deficiencies of current techniques, which rely on patterns derived from subsets of historical transactions, offer an opportunity to create new models with higher predictive accuracy.

  • A superior solution, therefore, is to implement a comprehensive fraud detection approach that can detect both known and novel instances of fraud as they occur in real time, with a higher level of accuracy. So how do you build a modern fraud analytics solution that works better at detecting illegal transactions while minimizing the false alarms that annoy customers? The answer is machine learning on big data.

The advent of big data, on distributed Hadoop-based platforms like MapR, has made it possible to economically and efficiently store and process large amounts of data. This enables enterprises to use comprehensive historical transaction data to discover fraud signatures, which was not possible before. By increasing the quantity of data available for comprehensive analysis, the accuracy of fraud detection systems can be greatly increased. The challenge then is to find tools and techniques that can analyze data on a huge scale in real time, and to detect first time fraudulent activities that have no known signatures with high accuracy.

One example of a platform that can provide a new age solution to the fraud analytics problem is Skytree, the first machine learning platform built from the ground up to work on large data sets at high performance with best-in-class accuracy. It runs natively on MapR Hadoop clusters and supports a large set of supervised and unsupervised learning methods. These techniques can detect fraud based on patterns and signatures as well as detect first time fraud based on anomalous transaction detection. In addition, Skytree’s unique automated model and parameter selection technique makes it easy to iterate through multiple methods on larger datasets, making frequent model updates possible and giving the most accurate results.

SkyTree and MapR diagram

Fraud patterns diagram

Cutting-edge fraud detection systems need to be adaptive, agile and accurate. This requires deep analysis of ever-growing datasets and continuous updates to production models. High-performance machine learning on Hadoop utilizing both supervised and unsupervised learning methods to detect fraud. It further enables accurate and timely detection of repeat and first-time fraud. With advanced machine learning on big data, fraud no longer needs to be the price of doing business.


Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free