Understanding Payment Fraud Prediction as a Data Science Problem Involving Payment Big Data

Spread the love

Banks and other financial organizations process billions of transactions every day. As part of their day to day activities, they also have to detect and prevent payment frauds which is becoming more and more sophisticated with time. Historically, these organizations applied manual or rule based systems for fraud detection but that is no longer sufficient.

Advancements in data science mean that today we are able to build fast, and effective systems for fraud prediction that continuously learn and improve with evolving fraud patterns.

In this article, we introduce payment fraud prediction as a data science problem.

Understanding Payment Fraud

Fraud is a billion-dollar business and it is increasing every year. The PwC global economic crime survey of 2018 found that half (49 percent) of the 7,200 companies they surveyed had experienced fraud of some kind.

This is an increase from the PwC 2016 study in which slightly more than a third of organizations surveyed (36%) had experienced economic crime.

What is Fraud ?

“The term fraud here refers to the abuse of an organization’s system without necessarily leading to direct legal consequences.”

In a competitive environment, fraud can become a business critical problem if it is very prevalent and if the prevention procedures are not fail-safe.

Payment Fraud Detection

Payment fraud detection, being part of the overall fraud control, automates and helps reduce the manual parts of a screening/checking process. It is a challenging problem because it impossible to be certain about the legitimacy of an intention behind an application or transaction.

Types of Credit Card Fraud

a). Identity Theft

Your card details are overseen by some other person. Fake phone call convincing you to share the details.

b). Stolen Cards

When your card is lost or stolen and the person possessing it knows how to get things done.

c). Hacking

Although it is most improbable, but a high-level hacking of the bank account details does happen.

Challenges for Fraud Detection Systems

There are many challenges for fraud detection systems, including :-

Imbalanced Data
Cost of Fraud Detection
Enormous Quantities of Real Time Data
Rapidly Changing Fraud Patterns
- Adaptive techniques used against the model by the scammers

How to Tackle these Challenges?

Speed

The model used must be simple and fast enough to detect the anomaly and classify it as a fraudulent transaction as quickly as possible.

Class Imbalance

Imbalance can be dealt with by properly using some methods which we will talk about in the next paragraph

Protecting Privacy

For protecting the privacy of the user the dimensionality of the data can be reduced. Key fields can be encrypted or anonymized.

Fraud Experts

A more trustworthy source must be taken which double-check the data, at least for training the model.

Analysis of Fraud Dataset

In this example we want to predict the probability that an online transaction is fraudulent, as denoted by the binary target isFraud.

This is a binary classification problem – i.e. our target variable is a binary attribute (Is the user making the click fraudlent or not?) and our goal is to classify users into “fraudlent” or “not fraudlent” as well as possible.

Data Organization

The data is broken into two files i.e. identity and transaction, which are joined by TransactionID.

Not all transactions have corresponding identity information.

Feature Analysis

Categorical Features – Transaction

ProductCD
card1 – card6
addr1, addr2
P_emaildomain
R_emaildomain
M1 – M9

Categorical Features – Identity

DeviceType
DeviceInfo
id_12 – id_38

TransactionDT

timedelta from a given reference datetime (not an actual timestamp).

“TransactionDT first value is 86400, which corresponds to the number of seconds in a day (60 * 60 * 24 = 86400) so I think the unit is seconds.

Using this, we know the data spans 6 months, as the maximum value is 15811131, which would correspond to day 183.”

TransactionAMT

transaction payment amount in USD

ProductCD

product code

The product for each transaction

card1 – card6

payment card information, such as card type, card category, issue bank, country, etc.

addr

address

dist

distance

P_ and (R__) emaildomain

Purchaser and Recipient email domain

C1-C14

counting, such as how many addresses are found to be associated with the payment card, etc. The actual meaning is masked.

D1-D15

timedelta, such as days between previous transaction, etc.

M1-M9

match, such as names on card and address, etc.

Vxxx

Vesta engineered rich features, including ranking, counting, and other entity relations.

“For example, how many times the payment card associated with a IP and email or address appeared in a 24 hours time range, etc.”

Data Related Challenges

Main challenges involved in credit card fraud detection are:

Enormous data is processed every day.
Model must be fast enough to respond to the scam in time.
Imbalanced data i.e. most of the transactions(99.8%) are not fraudulent which makes it really hard for detecting the fraudulent ones
Data availability as the data is mostly private.
Misclassified data can be another major issue, as not every fraudulent transaction is caught and reported.

References

https://www.kaggle.com/hassanamin/fraud-complete-eda/data

Understanding Payment Fraud Prediction as a Data Science Problem Involving Payment Big Data

ByHassan Amin

Understanding Payment Fraud

What is Fraud ?

Payment Fraud Detection

Types of Credit Card Fraud

Challenges for Fraud Detection Systems

How to Tackle these Challenges?

Analysis of Fraud Dataset

Feature Analysis

Data Related Challenges

References

Related

By Hassan Amin

Related Post

Understanding and Developing Data Strategy and Monetization

Combating AI Fear

Transforming Real Estate Search with REGA

Building a Personal Brand

Book Summary : Think and Grow Rich by Napolean Hill

Book Summary : Thinking, Fast and Slow by Nobel Laureate Daniel Kahneman

A Guide to Good and Bad Habits for Teens

Setting Up PostGIS Extension On Greenplum 6 In Ubuntu 18.04

ChatGPT is a Turning Point

Building Conversational Chatbot with GPT3

Success of Technology Companies Depends on Relationship between Technical Leadership and Management

You missed

Building a Personal Brand

Book Summary : Think and Grow Rich by Napolean Hill

Book Summary : Thinking, Fast and Slow by Nobel Laureate Daniel Kahneman

A Guide to Good and Bad Habits for Teens