What SQL skills does a fraud analyst need?

Five recurring patterns: (1) statistical anomaly bounds — 3-sigma cutoffs, IQR fences, percentile rankings to flag outliers. (2) Velocity rules — count events per actor inside a sliding time window using window functions. (3) Geographic and temporal mismatches — cards used in two cities five minutes apart, same actor across timezones. (4) Cross-account self-joins — multiple accounts sharing the same email, device, or IP. (5) Recursive CTEs for chargeback investigation — tracing a disputed transaction back to its origin chain. Beyond these, day-to-day fraud SQL is heavy on aggregation, conditional logic (CASE), date math, and NULL handling.

How is fraud SQL different from analytics SQL?

Same engine, different mindset. Analytics SQL aggregates known patterns: total revenue, conversion rate, cohort retention. Fraud SQL hunts unknowns: anomalies, deviations, broken assumptions. The patterns above (3-sigma, velocity, mismatch) are mostly absent from BI dashboards because they're investigative tools, not metrics. A fraud analyst writes 'find me what's weird' queries; a BI analyst writes 'how big is X' queries. They share aggregation and joins; they diverge sharply on what they're looking for.

Can I become a fraud analyst with just SQL skills?

SQL is the floor, not the ceiling. Entry-level fraud analyst roles (especially at banks and fintechs) hire on SQL fluency plus pattern recognition. Beyond SQL: domain understanding (how chargebacks work, what KYC means, common fraud schemes), basic statistics (Z-scores, percentiles, means vs medians), and increasingly some Python or R for ML-assisted detection. But SQL fluency on transactional data is the qualifying skill — without it you can't even start. Start there, ladder up.

What's the difference between bank-level and transaction-level fraud detection?

Bank-level: looking at aggregate signals across an entire institution — NPL ratio spikes, sudden asset growth, loan deterioration, regulatory capital depletion. Used by regulators (FDIC, OCC) and risk teams to flag potentially failing banks before they fail. Transaction-level: looking at individual transactions or accounts — a card used in two cities five minutes apart, an account with abnormal velocity, a chargeback pattern. Used by fraud analysts at banks and fintechs to catch real-time fraud. The SQL patterns are similar (window functions, anomaly bounds) but the data scale and latency requirements are very different.

How do I practice fraud SQL without access to real transaction data?

Two paths. First, learn the patterns on bank-level data (NPL ratios, asset growth, capital adequacy) — same SQL skills, different data shape. The FDIC public-domain dataset is free and real. Second, use synthetic transaction datasets — Kaggle has several (the 'IEEE-CIS Fraud Detection' dataset is the most-cited), or generate your own with tools like Faker. SQL Quest's Banking & Finance track currently covers the bank-level patterns; a fraud_transactions track with synthetic credit-card data is on the roadmap for Q2 2026. The patterns transfer: a 3-sigma cutoff on NPL ratio teaches the same SQL as a 3-sigma cutoff on transaction amounts.

SQL for Fraud Analytics — 5 Patterns Every Fraud Analyst Writes Weekly

Published May 2026 · 14 min read · 🛡️ Fraud / Banking

A fraud analyst's day is mostly SQL. Not Python notebooks, not dashboards — raw queries hunting for anomalies, velocity spikes, geographic mismatches, and cross-account collusion. The patterns are recurring. Five of them cover roughly 80% of investigative work. This is a hands-on tour of those five, with runnable examples on real FDIC banking data and a roadmap to the transaction-level patterns coming to SQL Quest's fraud track.

In this article

Why fraud analytics is SQL-heavy
Bank-level vs transaction-level — same SQL, different data
Pattern 1: Statistical anomaly bounds (3-sigma, IQR)
Pattern 2: Velocity rules (events per window)
Pattern 3: Geographic / temporal mismatch
Pattern 4: Cross-account self-join collusion
Pattern 5: Recursive CTE chargeback investigation
What you can practice today on FDIC data
From SQL fluency to fraud analyst role

Why fraud analytics is SQL-heavy

Fraud is the inverse of analytics. Analytics aggregates the known: how much revenue did we book this quarter, how does retention curve look. Fraud hunts the unknown: which transactions look wrong, which accounts share suspicious markers, which chargebacks chain back to a single bad actor.

The mismatch matters because most BI tools are optimized for analytics. They show you what something is, not what something shouldn't be. Fraud analysts therefore spend more time in raw SQL than analytics teammates do — they're writing one-off investigative queries, not maintaining dashboards.

Three reasons SQL stays central even as ML-based fraud detection grows:

Investigation is interactive. When a customer disputes a charge, the analyst writes a query, sees results, refines, repeats. ML models flag candidates but humans investigate. The investigation language is SQL.
Rules are auditable. A regulator can inspect a SQL rule. An ML model is a black box. Most production fraud rules in regulated industries (banking, payments, insurance) are still SQL-shaped because compliance demands it.
Speed. A skilled analyst writes a velocity rule in SQL faster than spinning up a notebook, loading data, training a model. The first investigative pass on a new pattern is almost always raw SQL.

If you want a fraud analyst job at a bank, fintech, payments processor, or insurer, your SQL needs to be sharper than a generic data analyst's. Not because the syntax is harder — it's the same SQL — but because the patterns are different.

Bank-level vs transaction-level — same SQL, different data

One distinction worth pinning down before the patterns: fraud analysis splits along data altitude.

Layer	What you analyze	Used by
Bank-level	NPL ratio, asset growth, capital adequacy, branch concentration	Regulators (FDIC, OCC), risk teams
Account-level	Customer behavior baselines, KYC patterns, dormant-then-active	Compliance, AML investigators
Transaction-level	Card swipes, transfers, deposits, chargebacks	Fraud analysts at banks/fintechs

The five patterns below work at every layer. The data shape changes; the SQL is roughly the same. A 3-sigma cutoff on NPL ratio teaches the same skill as a 3-sigma cutoff on transaction amounts. We'll show examples mostly at the transaction layer (where most fraud-analyst work happens) and call out where each maps to bank-level data you can practice on FDIC's public dataset today.

Pattern 1: Statistical anomaly bounds

The simplest fraud detection: define what "normal" looks like, flag everything outside.

Two common bound shapes:

3-sigma cutoff. Flag observations more than 3 standard deviations from the mean. Assumes normal distribution; works poorly on heavy-tailed data (which most fraud data is). Quick and dirty for a first pass.
IQR fence. Flag observations outside Q1 - 1.5×IQR or Q3 + 1.5×IQR. Doesn't assume normality; more robust on real fraud data.

— Pattern 1: 3-sigma anomaly cutoff on transaction amounts

WITH baseline AS (
  SELECT
    AVG(amount) AS mean_amount,
    -- SQLite's STDEV is non-standard; most engines have STDDEV
    SQRT(AVG((amount - (SELECT AVG(amount) FROM transactions))
              * (amount - (SELECT AVG(amount) FROM transactions)))) AS sd_amount
  FROM transactions
  WHERE txn_date >= DATE('now', '-30 days')
)
SELECT
  t.txn_id,
  t.account_id,
  t.amount,
  t.txn_date,
  ROUND((t.amount - b.mean_amount) / b.sd_amount, 2) AS z_score
FROM transactions t CROSS JOIN baseline b
WHERE ABS((t.amount - b.mean_amount) / b.sd_amount) > 3
ORDER BY z_score DESC;

The query computes the mean and standard deviation across the last 30 days, then flags any transaction whose Z-score is above 3 (positive) or below -3 (negative). Output: a list of suspicious transactions ordered by how far from the mean they are.

Caveat: 3-sigma assumes normal distribution. Transaction amounts are heavy-tailed — 99.9th percentile is way above 3-sigma. If you only flag >3-sigma, you'll catch the obvious mega-outliers and miss the cluster of slightly-fishy ones. In real fraud work, you usually combine 3-sigma with percentile-based bounds.

Mapping to bank-level (practice today)

The same Z-score logic flags banks with abnormal NPL ratios. Replace amount with npl_ratio and transactions with financials. Banks far above the mean NPL are credit-quality outliers — exactly what regulators look for. SQL Quest's Banking track includes an "Above-Median NPL Banks" challenge that walks you through this pattern.

Pattern 2: Velocity rules (events per window)

Velocity is fraud's most reliable signal. A legitimate user makes ~5 card transactions per day. A compromised card hits 50 transactions in 30 minutes as the fraudster tests stolen credentials before banks block. Counting events in a sliding window catches this almost universally.

— Pattern 2: Velocity rule using window functions

SELECT
  account_id,
  txn_id,
  amount,
  txn_at,
  COUNT(*) OVER (
    PARTITION BY account_id
    ORDER BY txn_at
    RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW
  ) AS txns_last_5_min,
  SUM(amount) OVER (
    PARTITION BY account_id
    ORDER BY txn_at
    RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW
  ) AS spend_last_5_min
FROM transactions
QUALIFY
  txns_last_5_min >= 5
  OR spend_last_5_min > 2000;

For each transaction, compute how many transactions and how much total spend occurred on the same account in the last 5 minutes. Flag accounts hitting either threshold. Tune the thresholds to your loss tolerance — lower thresholds catch more fraud but generate more false positives.

QUALIFY is Snowflake/BigQuery/Databricks syntax for filtering on window function output. PostgreSQL and SQLite need a CTE wrapper:

— Same query, PostgreSQL/SQLite-compatible (no QUALIFY)

WITH velocity AS (
  SELECT
    account_id, txn_id, amount, txn_at,
    COUNT(*) OVER (...) AS txns_last_5_min,
    SUM(amount) OVER (...) AS spend_last_5_min
  FROM transactions
)
SELECT * FROM velocity
WHERE txns_last_5_min >= 5 OR spend_last_5_min > 2000;

Interview tip: Window functions with RANGE BETWEEN ... PRECEDING are a top-3 most-asked SQL interview pattern at fintech and payments companies (Stripe, Square, PayPal, Capital One). If you can write a velocity rule cold in 90 seconds, you've cleared a major hurdle.

Pattern 3: Geographic / temporal mismatch

A card swiped in New York at 10:00 cannot legitimately swipe in Tokyo at 10:05. The accounts should share a coherent location-time profile. When they don't, something is wrong.

— Pattern 3: Cards swiped impossibly far apart in short window

WITH ranked_swipes AS (
  SELECT
    account_id,
    txn_at,
    location_lat,
    location_lng,
    LAG(txn_at)         OVER (PARTITION BY account_id ORDER BY txn_at) AS prev_at,
    LAG(location_lat)   OVER (PARTITION BY account_id ORDER BY txn_at) AS prev_lat,
    LAG(location_lng)   OVER (PARTITION BY account_id ORDER BY txn_at) AS prev_lng
  FROM transactions
)
SELECT
  account_id,
  txn_at,
  prev_at,
  -- Haversine distance in km, simplified for illustration
  ROUND(
    6371 * ACOS(
      COS(RADIANS(prev_lat)) * COS(RADIANS(location_lat)) *
      COS(RADIANS(location_lng) - RADIANS(prev_lng)) +
      SIN(RADIANS(prev_lat)) * SIN(RADIANS(location_lat))
    )
  ) AS distance_km,
  (EXTRACT(EPOCH FROM (txn_at - prev_at)) / 60.0) AS minutes_between
FROM ranked_swipes
WHERE prev_at IS NOT NULL
  AND distance_km > 200
  AND minutes_between < 30;

This catches the classic "impossible travel" pattern: same account, two transactions more than 200 km apart within 30 minutes. The Haversine formula computes great-circle distance between two lat/lng points; SQL engines without trig functions will need a UDF or external service.

Real production rules layer in airline travel time (200 km in 30 min is impossible, but 2000 km in 4 hours is fine if there's a flight) and known-VPN flags (a VPN hop can fake the geography). The pure-SQL version is the starting point.

Pattern 4: Cross-account self-join collusion

Sophisticated fraudsters spread activity across many accounts to stay under per-account velocity rules. The detection: find accounts that share suspicious markers — same email domain, same device, same shipping address, same IP block.

— Pattern 4: Accounts sharing the same device fingerprint

SELECT
  a1.account_id AS account_a,
  a2.account_id AS account_b,
  a1.device_fingerprint,
  a1.created_at AS a_created,
  a2.created_at AS b_created,
  ABS(EXTRACT(EPOCH FROM a1.created_at - a2.created_at) / 3600.0) AS hours_apart
FROM accounts a1
JOIN accounts a2
  ON a1.device_fingerprint = a2.device_fingerprint
  AND a1.account_id < a2.account_id  -- avoid double-counting and self-pairs
WHERE a1.device_fingerprint IS NOT NULL
ORDER BY hours_apart ASC;

The a1.account_id < a2.account_id trick is essential — without it you get every pair twice plus self-pairs. Most pairs are innocent (shared family device, public computer), but the cluster of accounts created within minutes of each other on the same device is a classic synthetic-identity fraud pattern.

Production fraud teams extend this to ~5-10 dimensions simultaneously: device + email-domain + ip-block + browser-fingerprint + behavioral-pattern. The SQL is the same shape — just more JOIN conditions.

Pattern 5: Recursive CTE chargeback investigation

When a customer disputes a transaction, the chargeback team needs to understand the chain: what merchant, what original card, what previous chargebacks on that account, what disputes on that merchant. Recursive CTEs walk the chain.

— Pattern 5: Recursive walk through related disputes

WITH RECURSIVE dispute_chain AS (
  -- Anchor: the dispute we're investigating
  SELECT
    dispute_id,
    account_id,
    merchant_id,
    related_dispute_id,
    1 AS depth
  FROM disputes
  WHERE dispute_id = 12345

  UNION ALL

  -- Recursive: follow same-account or same-merchant disputes
  SELECT
    d.dispute_id,
    d.account_id,
    d.merchant_id,
    d.related_dispute_id,
    chain.depth + 1
  FROM disputes d
  JOIN dispute_chain chain
    ON (d.account_id = chain.account_id OR d.merchant_id = chain.merchant_id)
    AND d.dispute_id != chain.dispute_id
  WHERE chain.depth < 5  -- prevent runaway recursion
)
SELECT DISTINCT dispute_id, account_id, merchant_id, depth
FROM dispute_chain
ORDER BY depth, dispute_id;

This finds every dispute connected through shared-account or shared-merchant edges, up to 5 hops away. The depth < 5 guard prevents runaway recursion if the dispute graph has cycles. Output: the connected cluster of disputes around the original investigation.

Recursive CTEs intimidate analysts who haven't used them; they're actually the cleanest way to walk hierarchical or graph-like data. Worth practicing until they feel routine — they show up frequently in fraud investigation interviews.

What you can practice today on FDIC data

SQL Quest's Banking & Finance track covers all five pattern shapes at the bank-level layer using the FDIC public dataset. The patterns transfer 1:1 to transaction-level work — same SQL skills, different data table.

Pattern	FDIC equivalent challenge
Anomaly bounds	"Above-Median NPL Banks" — flag credit-quality outliers
Velocity (rate of change)	"QoQ Asset Growth — Top Banks" — detect abnormal asset-base movement using LAG
Mismatch (geographic)	"Bank Count Per State" / "Top 3 Banks Per State" — concentration analysis (PARTITION BY state)
Cross-account JOIN	"Bank with Branch Count and Income" — join institutions ↔ branches ↔ financials
Failure investigation	"Failed Banks Recent Decade" — timeline of failures + estimated_loss patterns

Practice fraud-adjacent SQL on FDIC data →

🆕 Now live — fraud_transactions track: A dedicated synthetic credit-card transaction dataset is now in SQL Quest's Banking & Finance track. 200 accounts, 2,000 transactions, 76 chargebacks, with realistic fraud injection: 15 amount outliers (3-sigma+), 4 velocity bursts (10+ txns in 5 min), 3 impossible-travel pairs (NYC→Tokyo in 11 min), 5 accounts in a collusion ring (shared device fingerprint), and 3-deep chargeback chains. Five new challenges (#270-274) walk through each pattern with runnable SQL. Start with Challenge #270 → Or jump to the dedicated Fraud Analytics SQL practice landing for all 12 challenges, the dataset schema, and career context.

From SQL fluency to fraud analyst role

If you want to break into fraud analytics from a non-fraud background (banker, data analyst, career switcher), here's a tested ladder:

Get the five patterns above cold. Be able to write each one in 10 minutes without reference. This is interview-ready territory.
Learn the domain. Read about chargebacks (the Visa/MC dispute lifecycle), KYC (Know Your Customer), AML (Anti-Money Laundering, the BSA/SAR framework), card fraud taxonomy (CNP vs CP, ATO, synthetic identity, mule accounts).
Build a portfolio project. Use a public fraud dataset (Kaggle "IEEE-CIS Fraud Detection" is the most-cited; Capital One has a synthetic dataset). Write 5-10 investigative queries. Document them on GitHub. Fraud teams hire on portfolio + SQL more than degree.
Apply to fraud-adjacent roles first. Risk analyst, payments-ops analyst, transaction-monitoring analyst — these are easier entry points than direct fraud-analyst openings, and they hand you transaction data exposure.
Move internal once you're inside. Banks and fintechs strongly prefer internal moves into fraud teams. A year in transaction monitoring at a bank is worth more than a year of external interviewing.

The role pays well — entry fraud analysts at US banks/fintechs earn $70-100k, senior fraud analysts $120-180k, fraud strategy/leadership $200k+. The work is intellectually engaging (you're hunting bad actors), and the demand is structurally growing (fraud volumes scale with transaction volumes scale with internet payments).

Practice on real banking data — free tier

SQL Quest's Banking & Finance track has 20 challenges on real FDIC data covering the five fraud-adjacent SQL patterns. Wrong-answer diagnostic explains exactly which row mismatched and why. AI Coach for when you're stuck.

Start the Banking SQL track →

SQL for Fraud Analytics — 5 Patterns Every Fraud Analyst Writes Weekly

Why fraud analytics is SQL-heavy

Bank-level vs transaction-level — same SQL, different data

Pattern 1: Statistical anomaly bounds

Mapping to bank-level (practice today)

Pattern 2: Velocity rules (events per window)

Pattern 3: Geographic / temporal mismatch

Pattern 4: Cross-account self-join collusion

Pattern 5: Recursive CTE chargeback investigation

What you can practice today on FDIC data

From SQL fluency to fraud analyst role

Practice on real banking data — free tier

Related reading