payment fraud
Spread the love

Loading

Banks and other financial organizations process billions of transactions every day. As part of their day to day activities, they also have to detect and prevent payment frauds which is becoming more and more sophisticated with time. Historically, these organizations applied manual or rule based systems for fraud detection but that is no longer sufficient.

Advancements in data science mean that today we are able to build fast, and effective systems for fraud prediction that continuously learn and improve with evolving fraud patterns.

In this article, we introduce payment fraud prediction as a data science problem.

Understanding Payment Fraud

Fraud is a billion-dollar business and it is increasing every year.  The PwC global economic crime survey of 2018 found that half (49 percent) of the 7,200 companies they surveyed had experienced fraud of some kind. 

This is an increase from the PwC 2016 study in which slightly more than a third of organizations surveyed (36%) had experienced economic crime.

What is Fraud ?

“The term fraud here refers to the abuse of an organization’s system without necessarily leading to direct legal consequences.”

In a competitive environment, fraud can become a business critical problem if it is very prevalent and if the prevention procedures are not fail-safe. 

Payment Fraud Detection

Payment fraud detection, being part of the overall fraud control, automates and helps reduce the manual parts of a screening/checking process. It is a challenging problem because it impossible to be certain about the legitimacy of an intention behind an application or transaction.

Types of Credit Card Fraud 

a). Identity Theft

Your card details are overseen by some other person. Fake phone call convincing you to share the details.

b). Stolen Cards

When your card is lost or stolen and the person possessing it knows how to get things done.

c). Hacking

Although it is most improbable, but a high-level hacking of the bank account details does happen.

Challenges for Fraud Detection Systems

There are many challenges for fraud detection systems, including :-

  • Imbalanced Data
  • Cost of Fraud Detection
  • Enormous Quantities of Real Time Data
  • Rapidly Changing Fraud Patterns
    • Adaptive techniques used against the model by the scammers

How to Tackle these Challenges?

Speed

The model used must be simple and fast enough to detect the anomaly and classify it as a fraudulent transaction as quickly as possible.

Class Imbalance

Imbalance can be dealt with by properly using some methods which we will talk about in the next paragraph

Protecting Privacy

For protecting the privacy of the user the dimensionality of the data can be reduced. Key fields can be encrypted or anonymized.

Fraud Experts

A more trustworthy source must be taken which double-check the data, at least for training the model.

Analysis of Fraud Dataset

In this example we want to predict the probability that an online transaction is fraudulent, as denoted by the binary target isFraud.

This is a binary classification problem – i.e. our target variable is a binary attribute (Is the user making the click fraudlent or not?) and our goal is to classify users into “fraudlent” or “not fraudlent” as well as possible.

Data Organization

The data is broken into two files i.e. identity and transaction, which are joined by TransactionID. 

Not all transactions have corresponding identity information.

Feature Analysis

Categorical Features – Transaction

  • ProductCD
  • card1 – card6
  • addr1, addr2
  • P_emaildomain
  • R_emaildomain
  • M1 – M9

Categorical Features – Identity

  • DeviceType
  • DeviceInfo
  • id_12 – id_38

TransactionDT

timedelta from a given reference datetime (not an actual timestamp). 

“TransactionDT first value is 86400, which corresponds to the number of seconds in a day (60 * 60 * 24 = 86400) so I think the unit is seconds. 

Using this, we know the data spans 6 months, as the maximum value is 15811131, which would correspond to day 183.”

TransactionAMT 

transaction payment amount in USD

ProductCD

product code

The product for each transaction

card1 – card6

payment card information, such as card type, card category, issue bank, country, etc.

addr 

address

dist 

distance

P_ and (R__) emaildomain

Purchaser and Recipient email domain

C1-C14

counting, such as how many addresses are found to be associated with the payment card, etc. The actual meaning is masked.

D1-D15

timedelta, such as days between previous transaction, etc.

M1-M9

match, such as names on card and address, etc.

Vxxx

Vesta engineered rich features, including ranking, counting, and other entity relations.

“For example, how many times the payment card associated with a IP and email or address appeared in a 24 hours time range, etc.”

Data Related Challenges

Main challenges involved in credit card fraud detection are:

  • Enormous data is processed every day. 
  • Model must be fast enough to respond to the scam in time.
  • Imbalanced data i.e. most of the transactions(99.8%) are not fraudulent which makes it really hard for detecting the fraudulent ones
  • Data availability as the data is mostly private.
  • Misclassified data can be another major issue, as not every fraudulent transaction is caught and reported.

References

https://www.kaggle.com/hassanamin/fraud-complete-eda/data

By Hassan Amin

Dr. Syed Hassan Amin has done Ph.D. in Computer Science from Imperial College London, United Kingdom and MS in Computer System Engineering from GIKI, Pakistan. During PhD, he has worked on Image Processing, Computer Vision, and Machine Learning. He has done research and development in many areas including Urdu and local language Optical Character Recognition, Retail Analysis, Affiliate Marketing, Fraud Prediction, 3D reconstruction of face images from 2D images, and Retinal Image analysis in addition to other areas.