SQL for Fraud Analytics — 5 Patterns Every Fraud Analyst Writes Weekly

Published May 2026 · 14 min read · 🛡️ Fraud / Banking

A fraud analyst's day is mostly SQL. Not Python notebooks, not dashboards — raw queries hunting for anomalies, velocity spikes, geographic mismatches, and cross-account collusion. The patterns are recurring. Five of them cover roughly 80% of investigative work. This is a hands-on tour of those five, with runnable examples on real FDIC banking data and a roadmap to the transaction-level patterns coming to SQL Quest's fraud track.

In this article

  1. Why fraud analytics is SQL-heavy
  2. Bank-level vs transaction-level — same SQL, different data
  3. Pattern 1: Statistical anomaly bounds (3-sigma, IQR)
  4. Pattern 2: Velocity rules (events per window)
  5. Pattern 3: Geographic / temporal mismatch
  6. Pattern 4: Cross-account self-join collusion
  7. Pattern 5: Recursive CTE chargeback investigation
  8. What you can practice today on FDIC data
  9. From SQL fluency to fraud analyst role

Why fraud analytics is SQL-heavy

Fraud is the inverse of analytics. Analytics aggregates the known: how much revenue did we book this quarter, how does retention curve look. Fraud hunts the unknown: which transactions look wrong, which accounts share suspicious markers, which chargebacks chain back to a single bad actor.

The mismatch matters because most BI tools are optimized for analytics. They show you what something is, not what something shouldn't be. Fraud analysts therefore spend more time in raw SQL than analytics teammates do — they're writing one-off investigative queries, not maintaining dashboards.

Three reasons SQL stays central even as ML-based fraud detection grows:

If you want a fraud analyst job at a bank, fintech, payments processor, or insurer, your SQL needs to be sharper than a generic data analyst's. Not because the syntax is harder — it's the same SQL — but because the patterns are different.

Bank-level vs transaction-level — same SQL, different data

One distinction worth pinning down before the patterns: fraud analysis splits along data altitude.

LayerWhat you analyzeUsed by
Bank-levelNPL ratio, asset growth, capital adequacy, branch concentrationRegulators (FDIC, OCC), risk teams
Account-levelCustomer behavior baselines, KYC patterns, dormant-then-activeCompliance, AML investigators
Transaction-levelCard swipes, transfers, deposits, chargebacksFraud analysts at banks/fintechs

The five patterns below work at every layer. The data shape changes; the SQL is roughly the same. A 3-sigma cutoff on NPL ratio teaches the same skill as a 3-sigma cutoff on transaction amounts. We'll show examples mostly at the transaction layer (where most fraud-analyst work happens) and call out where each maps to bank-level data you can practice on FDIC's public dataset today.

Pattern 1: Statistical anomaly bounds

The simplest fraud detection: define what "normal" looks like, flag everything outside.

Two common bound shapes:

— Pattern 1: 3-sigma anomaly cutoff on transaction amounts
WITH baseline AS (
  SELECT
    AVG(amount) AS mean_amount,
    -- SQLite's STDEV is non-standard; most engines have STDDEV
    SQRT(AVG((amount - (SELECT AVG(amount) FROM transactions))
              * (amount - (SELECT AVG(amount) FROM transactions)))) AS sd_amount
  FROM transactions
  WHERE txn_date >= DATE('now', '-30 days')
)
SELECT
  t.txn_id,
  t.account_id,
  t.amount,
  t.txn_date,
  ROUND((t.amount - b.mean_amount) / b.sd_amount, 2) AS z_score
FROM transactions t CROSS JOIN baseline b
WHERE ABS((t.amount - b.mean_amount) / b.sd_amount) > 3
ORDER BY z_score DESC;

The query computes the mean and standard deviation across the last 30 days, then flags any transaction whose Z-score is above 3 (positive) or below -3 (negative). Output: a list of suspicious transactions ordered by how far from the mean they are.

Caveat: 3-sigma assumes normal distribution. Transaction amounts are heavy-tailed — 99.9th percentile is way above 3-sigma. If you only flag >3-sigma, you'll catch the obvious mega-outliers and miss the cluster of slightly-fishy ones. In real fraud work, you usually combine 3-sigma with percentile-based bounds.

Mapping to bank-level (practice today)

The same Z-score logic flags banks with abnormal NPL ratios. Replace amount with npl_ratio and transactions with financials. Banks far above the mean NPL are credit-quality outliers — exactly what regulators look for. SQL Quest's Banking track includes an "Above-Median NPL Banks" challenge that walks you through this pattern.

Pattern 2: Velocity rules (events per window)

Velocity is fraud's most reliable signal. A legitimate user makes ~5 card transactions per day. A compromised card hits 50 transactions in 30 minutes as the fraudster tests stolen credentials before banks block. Counting events in a sliding window catches this almost universally.

— Pattern 2: Velocity rule using window functions
SELECT
  account_id,
  txn_id,
  amount,
  txn_at,
  COUNT(*) OVER (
    PARTITION BY account_id
    ORDER BY txn_at
    RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW
  ) AS txns_last_5_min,
  SUM(amount) OVER (
    PARTITION BY account_id
    ORDER BY txn_at
    RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW
  ) AS spend_last_5_min
FROM transactions
QUALIFY
  txns_last_5_min >= 5
  OR spend_last_5_min > 2000;

For each transaction, compute how many transactions and how much total spend occurred on the same account in the last 5 minutes. Flag accounts hitting either threshold. Tune the thresholds to your loss tolerance — lower thresholds catch more fraud but generate more false positives.

QUALIFY is Snowflake/BigQuery/Databricks syntax for filtering on window function output. PostgreSQL and SQLite need a CTE wrapper:

— Same query, PostgreSQL/SQLite-compatible (no QUALIFY)
WITH velocity AS (
  SELECT
    account_id, txn_id, amount, txn_at,
    COUNT(*) OVER (...) AS txns_last_5_min,
    SUM(amount) OVER (...) AS spend_last_5_min
  FROM transactions
)
SELECT * FROM velocity
WHERE txns_last_5_min >= 5 OR spend_last_5_min > 2000;

Interview tip: Window functions with RANGE BETWEEN ... PRECEDING are a top-3 most-asked SQL interview pattern at fintech and payments companies (Stripe, Square, PayPal, Capital One). If you can write a velocity rule cold in 90 seconds, you've cleared a major hurdle.

Pattern 3: Geographic / temporal mismatch

A card swiped in New York at 10:00 cannot legitimately swipe in Tokyo at 10:05. The accounts should share a coherent location-time profile. When they don't, something is wrong.

— Pattern 3: Cards swiped impossibly far apart in short window
WITH ranked_swipes AS (
  SELECT
    account_id,
    txn_at,
    location_lat,
    location_lng,
    LAG(txn_at)         OVER (PARTITION BY account_id ORDER BY txn_at) AS prev_at,
    LAG(location_lat)   OVER (PARTITION BY account_id ORDER BY txn_at) AS prev_lat,
    LAG(location_lng)   OVER (PARTITION BY account_id ORDER BY txn_at) AS prev_lng
  FROM transactions
)
SELECT
  account_id,
  txn_at,
  prev_at,
  -- Haversine distance in km, simplified for illustration
  ROUND(
    6371 * ACOS(
      COS(RADIANS(prev_lat)) * COS(RADIANS(location_lat)) *
      COS(RADIANS(location_lng) - RADIANS(prev_lng)) +
      SIN(RADIANS(prev_lat)) * SIN(RADIANS(location_lat))
    )
  ) AS distance_km,
  (EXTRACT(EPOCH FROM (txn_at - prev_at)) / 60.0) AS minutes_between
FROM ranked_swipes
WHERE prev_at IS NOT NULL
  AND distance_km > 200
  AND minutes_between < 30;

This catches the classic "impossible travel" pattern: same account, two transactions more than 200 km apart within 30 minutes. The Haversine formula computes great-circle distance between two lat/lng points; SQL engines without trig functions will need a UDF or external service.

Real production rules layer in airline travel time (200 km in 30 min is impossible, but 2000 km in 4 hours is fine if there's a flight) and known-VPN flags (a VPN hop can fake the geography). The pure-SQL version is the starting point.

Pattern 4: Cross-account self-join collusion

Sophisticated fraudsters spread activity across many accounts to stay under per-account velocity rules. The detection: find accounts that share suspicious markers — same email domain, same device, same shipping address, same IP block.

— Pattern 4: Accounts sharing the same device fingerprint
SELECT
  a1.account_id AS account_a,
  a2.account_id AS account_b,
  a1.device_fingerprint,
  a1.created_at AS a_created,
  a2.created_at AS b_created,
  ABS(EXTRACT(EPOCH FROM a1.created_at - a2.created_at) / 3600.0) AS hours_apart
FROM accounts a1
JOIN accounts a2
  ON a1.device_fingerprint = a2.device_fingerprint
  AND a1.account_id < a2.account_id  -- avoid double-counting and self-pairs
WHERE a1.device_fingerprint IS NOT NULL
ORDER BY hours_apart ASC;

The a1.account_id < a2.account_id trick is essential — without it you get every pair twice plus self-pairs. Most pairs are innocent (shared family device, public computer), but the cluster of accounts created within minutes of each other on the same device is a classic synthetic-identity fraud pattern.

Production fraud teams extend this to ~5-10 dimensions simultaneously: device + email-domain + ip-block + browser-fingerprint + behavioral-pattern. The SQL is the same shape — just more JOIN conditions.

Pattern 5: Recursive CTE chargeback investigation

When a customer disputes a transaction, the chargeback team needs to understand the chain: what merchant, what original card, what previous chargebacks on that account, what disputes on that merchant. Recursive CTEs walk the chain.

— Pattern 5: Recursive walk through related disputes
WITH RECURSIVE dispute_chain AS (
  -- Anchor: the dispute we're investigating
  SELECT
    dispute_id,
    account_id,
    merchant_id,
    related_dispute_id,
    1 AS depth
  FROM disputes
  WHERE dispute_id = 12345

  UNION ALL

  -- Recursive: follow same-account or same-merchant disputes
  SELECT
    d.dispute_id,
    d.account_id,
    d.merchant_id,
    d.related_dispute_id,
    chain.depth + 1
  FROM disputes d
  JOIN dispute_chain chain
    ON (d.account_id = chain.account_id OR d.merchant_id = chain.merchant_id)
    AND d.dispute_id != chain.dispute_id
  WHERE chain.depth < 5  -- prevent runaway recursion
)
SELECT DISTINCT dispute_id, account_id, merchant_id, depth
FROM dispute_chain
ORDER BY depth, dispute_id;

This finds every dispute connected through shared-account or shared-merchant edges, up to 5 hops away. The depth < 5 guard prevents runaway recursion if the dispute graph has cycles. Output: the connected cluster of disputes around the original investigation.

Recursive CTEs intimidate analysts who haven't used them; they're actually the cleanest way to walk hierarchical or graph-like data. Worth practicing until they feel routine — they show up frequently in fraud investigation interviews.

What you can practice today on FDIC data

SQL Quest's Banking & Finance track covers all five pattern shapes at the bank-level layer using the FDIC public dataset. The patterns transfer 1:1 to transaction-level work — same SQL skills, different data table.

PatternFDIC equivalent challenge
Anomaly bounds"Above-Median NPL Banks" — flag credit-quality outliers
Velocity (rate of change)"QoQ Asset Growth — Top Banks" — detect abnormal asset-base movement using LAG
Mismatch (geographic)"Bank Count Per State" / "Top 3 Banks Per State" — concentration analysis (PARTITION BY state)
Cross-account JOIN"Bank with Branch Count and Income" — join institutions ↔ branches ↔ financials
Failure investigation"Failed Banks Recent Decade" — timeline of failures + estimated_loss patterns
Practice fraud-adjacent SQL on FDIC data →

🆕 Now live — fraud_transactions track: A dedicated synthetic credit-card transaction dataset is now in SQL Quest's Banking & Finance track. 200 accounts, 2,000 transactions, 76 chargebacks, with realistic fraud injection: 15 amount outliers (3-sigma+), 4 velocity bursts (10+ txns in 5 min), 3 impossible-travel pairs (NYC→Tokyo in 11 min), 5 accounts in a collusion ring (shared device fingerprint), and 3-deep chargeback chains. Five new challenges (#270-274) walk through each pattern with runnable SQL. Start with Challenge #270 → Or jump to the dedicated Fraud Analytics SQL practice landing for all 12 challenges, the dataset schema, and career context.

From SQL fluency to fraud analyst role

If you want to break into fraud analytics from a non-fraud background (banker, data analyst, career switcher), here's a tested ladder:

  1. Get the five patterns above cold. Be able to write each one in 10 minutes without reference. This is interview-ready territory.
  2. Learn the domain. Read about chargebacks (the Visa/MC dispute lifecycle), KYC (Know Your Customer), AML (Anti-Money Laundering, the BSA/SAR framework), card fraud taxonomy (CNP vs CP, ATO, synthetic identity, mule accounts).
  3. Build a portfolio project. Use a public fraud dataset (Kaggle "IEEE-CIS Fraud Detection" is the most-cited; Capital One has a synthetic dataset). Write 5-10 investigative queries. Document them on GitHub. Fraud teams hire on portfolio + SQL more than degree.
  4. Apply to fraud-adjacent roles first. Risk analyst, payments-ops analyst, transaction-monitoring analyst — these are easier entry points than direct fraud-analyst openings, and they hand you transaction data exposure.
  5. Move internal once you're inside. Banks and fintechs strongly prefer internal moves into fraud teams. A year in transaction monitoring at a bank is worth more than a year of external interviewing.

The role pays well — entry fraud analysts at US banks/fintechs earn $70-100k, senior fraud analysts $120-180k, fraud strategy/leadership $200k+. The work is intellectually engaging (you're hunting bad actors), and the demand is structurally growing (fraud volumes scale with transaction volumes scale with internet payments).

Practice on real banking data — free tier

SQL Quest's Banking & Finance track has 20 challenges on real FDIC data covering the five fraud-adjacent SQL patterns. Wrong-answer diagnostic explains exactly which row mismatched and why. AI Coach for when you're stuck.

Start the Banking SQL track →