The financial cost of fraud is estimated to cost more than 5 trillion dollars in 2019 with losses rising by 56% over the past decade. This accounts for approximately 6% of global GDP.
Being a key player in facilitating money flows, large financial institutions and banks are at the front-lines of tackling fraud. Besides the huge financial losses incurred and the possibility of stiff penalties by regulators for AML violations, financial institutions also suffer large damage to their reputation and possibly the loss of trust of their customers.
Fraudsters are relying on increasingly sophisticated techniques to bypass conventional detection systems and traditional rule-based approaches to tackle fraud are outdated. Banks and other financial institutions need to explore new methods to better combat identity theft, phishing attacks, credit card fraud, money laundering and other types of financial crime.
Drawbacks of Existing Approaches
The finance sector has traditionally relied on rule-based methods to screen for financial fraud. These rules are created by a panel of experts based on past observations and experience. The list of rules is updated as new cases are flagged out. This approach has the following limitations and problems:
- Due to the ever-changing and evolving nature of the frauds, rule books are becoming more complex and difficult to maintain and implement.
- High false-positive rates i.e. most of the activities flagged as fraudulent turn out to be legitimate customer activities. This leads to poorer customer experience as transactions are held up for investigations.
- A traditional relational database obscures the relations between entities and makes it hard to trace connections and suspicious activities.
- It is a labour-intensive approach as it requires human intervention at every stage of evaluation, identification, and monitoring.
- Existing anti-money laundering (AML) measures include generating risk ratings and flagging various suspicious behaviour e.g. cash deposits over \$10,000. This approach fails to detect funds laundered in smaller, non-rounded dollar amounts.
Due to these limitations, banks are exploring more sophisticated techniques which include social network analysis, advanced data mining, natural language processing, and other machine-learning and AI-based techniques. For a case study on some of the different applications of machine learning for fraud detection, check out our previous article.
At Cylynx, we help companies adopt graph technology to solve problems ranging from detecting mule accounts to real-time transaction monitoring. By lowering the barriers of adoption of graph technologies, financial institutions of all sizes would be better equipped to deal with more sophisticated fraudulent techniques. In the rest of this article, we give an overview of network analysis for fraud detection.
Social Network Analysis to Fraud Detection
A graph (also known as a network) is a data structure consisting of nodes connected together by edges. Nodes are sometimes also known as vertices and edges are sometimes referred to as relations.
Many common scenarios that we encounter every day have a natural graph representation, from the structure of a molecule, to the web weaved by a spider, to road networks or social network connections. For a comprehensive yet accessible overview of graph theory, Barabási’s Network Science Book is highly recommended and easily available online.
Financial transactions can similarly be represented as a graph. Customers or entities can be represented by nodes and transactions between them, edges. That is one possible network map and probably the most commonly seen in the industry. From our experience, creating a more detailed representation such as taking into account products, geography or IP addresses can further enrich the network and analytical insights derived from it.
Importantly, representing banking data as transaction relationships brings additional analytical value than simply analysing the properties of the entities or transactions in a tabular form. This also opens the room for the application of various graph algorithms such as path finding or centrality measures. These methods can help in fraud analytics by utilizing relationship information in addition to user-level attributes.
Applications of Network Analytics for Fraud Detection
The different machine learning and A.I. techniques work with varied efficacy on a range of fraud detection problems. For example, neural network based face recognition systems work well for identity verification as part of a KYC onboarding routine and can help cut identity fraud.
Network analytics lends itself well to identify suspicious relationships and transaction behaviour. We list 4 kinds of frauds which network analytics have proved effective for.
1. Fraud Rings
Fraud rings can consist of ten to thousands of criminals devoted to committing a specific type of fraud. This can include identity theft, mail or tax fraud, or forgery.
These organised syndicates engage in a wide range of illicit behaviours including cheque and signature fraud, claiming false loan applications or using stolen credit cards.
2. Synthetic Identities
Synthetic Identity fraud happens when criminals combine real and fake information to create a new identity. This new identity is then used to open new accounts or make fraudulent purchases. The identities are also used to pump up the credit score of fake IDs to extend their credit. Accounts are often used as mule accounts to facilitate other illegal activities while hiding under the radar of the risk and compliance teams.
3. Account Takeover
In this type of fraud, a criminal takes control of an account that belongs to a bank customer. They then use the customers’ information to make unauthorized transactions, possibly siphoning the funds over time to an overseas account which makes it hard for remedial actions to be taken.
4. Insurance Fraud
Insurance fraud includes false quotes and claims, inflated claims, or disaster fraud. These can be committed by organized groups to steal sums through fraudulent business activities.
Common Analytical Methods
In this section, we dive deeper into some of the more commonly used network analytics methods and explain how they are used for fraud detection.
1. Centrality Measures
Centrality measures are one of the most commonly used indices or statistics calculated on networks. They are used to give a sense of a node or entity's prominence within a network. One common measure is known as degree centrality which simply reflects the number of relationships or edges that a given node has.
Another popular measure is the Pagerank score. It is also known as the web ranking algorithm that powers Google’s search engine, at least as initially released. Pagerank works under the assumption that the more important an entity is, the higher likelihood it is to be connected with other entities.
These measures can help shed light on the accounts which are most central to the entire transaction network and help to identify suspiciously well-connected accounts. They can be calculated as part of a feature library and used for supervised learning problems.
2. Paths and Traversal
As its name suggests, paths refer to routes that connect one entity to another. Given a particular network, one could ask what is the shortest route between two given entities - this is known as the shortest path problem.
Alternatively, one could simply explore the relationships given a starting point and traverse neighbouring entities in a random manner otherwise known as random walks.
Path and traversal methods are good ways of detecting cyclical patterns since cycles are simply paths that start from a particular node and end at the same node. Path-finding or cycle detection algorithms are also commonly used by investigators or forensic teams to discover links between multiple accounts or individuals.
3. Entity Link Analysis
Unlike road networks which clearly indicate whether a road exists between two given places or not, financial datasets are not as straightforward. Due to data collection difficulties and insufficient information, an analyst often has to make a guess whether two particular individuals are related. To rephrase it in graph terminology, one can ask whether a link exists between two given nodes. In network theory, the task of predicting the existence of a link between two entities in a network is known as the link prediction problem.
Algorithms to tackle this class of problems can be used to perform related-party checks or estimate the probability of an account being associated with a known fraudulent account. For an interesting use case, check out the following research paper by Hasan and Zaki which applies link prediction to examine groups of terrorists and criminals.
4. Cluster Analysis
Clustering is the task of finding sets of similarly related nodes such that nodes within a given set are close to each other but further apart from nodes in other sets. Graph clustering methods differ from other methods such as k-means clustering by focusing on the links or edges that connect nodes rather than their attributes.
In a social network graph, clustering tends to produce clusters of people with similar hobbies and interests. In a transaction banking graph, clustering can be similarly used to surface business connections or social circles.
These clusters or communities can be used for various fraud analytics applications. For example, anomaly detection can be carried out by asking how likely a particular entity in a given cluster is likely to make a transaction with an arbitrarily selected cluster. The less likely it is for that transaction to take place, the higher the anomaly score that would be assigned.
The algorithms discussed in this section illustrate how network analytics can help to discover relationships, behaviours and activities. This can help financial companies reduce false positives while being able to detect new and more novel fraudulent cases.
Identifying Motives and Motifs
By focusing on relationships and links rather than individual data points, a network approach allows better identification of suspicious groups, and subgroups, also known more technically as network motifs. Here we list some patterns (or motifs) that could be indicative signs of suspicious motives in a banking transaction network:
- Highly-connected customers which are not businesses as seen from a centrality measure such as Pagerank score. An average customer tends to have a relatively small amount of trusted parties which he/she transacts with. Highly-connected customers especially across multiple cliques could be indicative of possible illegal activity such as phishing scams.
- Clusters of connected personal information points e.g. contact number, device ID, IP address and accounts. It is rare for such PII information to be shared across customers. Such patterns could be indicative of attempts to use stolen credentials for more sophisticated scams or an attempt by the adversary to create mule accounts to bypass traditional screening methods.
- Circular movement of funds between accounts of nodes. A visual representation of such behaviour would be patterns of triangles, squares or other cyclic patterns. Such behaviours suggest efforts to obscure the provenance of funds and should be flagged for possible money laundering violations.
Other Benefits of Using Graph Network Analysis
Besides having access to a wide range of network science algorithms, there are multiple benefits of adopting a network approach in fraud investigation and as part of a data exploratory process.
Data Exploration
Visualising connections as a graph often helps in decision-making and provides better clarity and context than information in tabular form. This increases the efficiency of incident response teams and makes complex relationship structures more easily understood.
Single Source of Truth
Traditional methods of manual screening involve a long and tedious process since data is often stored in multiple different silos.
Graph database solutions allow multiple different types of data to be stored in a single source of truth. Information on a particular customer's behaviours and fraudulent activities can be easily queried and retrieved.
Transparency and Explainability
Obtain better transparency and explainability with clear graphical evidence. As regulations and requirements on machine learning explainability and fairness grow, network science offers a good middle ground between accuracy and explainability.
Pattern matching is a relatively simple idea to grasp and many graph techniques are grounded on strong mathematical and statistical theory.
In Summary
Financial institutions and banks need to keep up with the increasingly sophisticated techniques and methodologies used by fraudsters. Network science offers a comprehensive set of tools to uncover suspicious patterns and activities.
At Cylynx, we aim to simplify the adoption of graph technology for financial institutions. We are piloting with financial institutions on transaction monitoring solutions to deploy state of the art techniques for risk screening and monitoring. We are also developing Motif, a business intelligence platform for graph data which can help financial institutions:
1. Detect and monitor fraudulent activities
2. Understand connections and relationships between different accounts
3. Identify suspicious networks
4. Improve employees productivity by decreasing the time spent on false-positive cases
It helps connect data across traditional databases, graph databases or other information silos to form a clear picture of user activities and behaviour. Get in touch with us for a pilot trial today.