StudyComputer ScienceDatabases

SQL for Finance - Querying Your Way Through Millions of Trades

2026-04-03 21 min read Computer Science
SQL for Finance - Querying Your Way Through Millions of Trades

SQL for Finance: Querying Your Way Through Millions of Trades

Somewhere, right now, a database is holding 47 million trade records, 3 billion market data ticks, and the entire transaction history of a bank that has existed since 1832. That database does not care that it is Monday morning. It does not care that the CEO wants “a quick number” for the board meeting. It does not care that you just spilled coffee on your keyboard. It just sits there, waiting for a well-written query.

SQL (Structured Query Language) is how you talk to that database. It is the universal language of data retrieval, and in finance, it is everywhere. Trade databases, risk systems, regulatory reporting, market data warehouses, position management, P&L reconciliation, compliance monitoring. If money moved, SQL knows about it.

The thing about SQL in finance is that it is not particularly complicated. You do not need a PhD in database theory. You need to know about 20 patterns really well, and you need to understand the specific shapes of financial data. The rest is just practice and knowing which questions to ask.

SQL is the only language where you can earn a six-figure salary knowing fewer than 20 keywords. SELECT, FROM, WHERE, JOIN, GROUP BY, ORDER BY, and you are already dangerous.


The Financial Database: What You Are Working With

Before writing queries, let us set up the tables we will use throughout this article. These represent a simplified but realistic trading database.

Table: trades

The core of any trading system. Every executed trade gets a row.

CREATE TABLE trades (
    trade_id        SERIAL PRIMARY KEY,
    trade_date      DATE NOT NULL,
    trade_time      TIMESTAMP NOT NULL,
    trader          VARCHAR(50) NOT NULL,
    desk            VARCHAR(30) NOT NULL,
    ticker          VARCHAR(10) NOT NULL,
    side            VARCHAR(4) NOT NULL,    -- 'BUY' or 'SELL'
    quantity        INTEGER NOT NULL,
    price           DECIMAL(12, 4) NOT NULL,
    notional        DECIMAL(15, 2) GENERATED ALWAYS AS (quantity * price) STORED,
    currency        VARCHAR(3) DEFAULT 'USD',
    status          VARCHAR(10) DEFAULT 'FILLED'
);

Table: positions

The current inventory. How much of each stock does each desk hold right now?

CREATE TABLE positions (
    position_id     SERIAL PRIMARY KEY,
    desk            VARCHAR(30) NOT NULL,
    ticker          VARCHAR(10) NOT NULL,
    quantity        INTEGER NOT NULL,          -- Positive = long, negative = short
    avg_cost        DECIMAL(12, 4) NOT NULL,
    current_price   DECIMAL(12, 4),
    last_updated    TIMESTAMP DEFAULT NOW()
);

Table: market_data

End-of-day prices for all instruments.

CREATE TABLE market_data (
    ticker          VARCHAR(10) NOT NULL,
    trade_date      DATE NOT NULL,
    open_price      DECIMAL(12, 4),
    high_price      DECIMAL(12, 4),
    low_price       DECIMAL(12, 4),
    close_price     DECIMAL(12, 4),
    volume          BIGINT,
    PRIMARY KEY (ticker, trade_date)
);

Sample Data

Let us insert some realistic trades:

INSERT INTO trades (trade_date, trade_time, trader, desk, ticker, side, quantity, price)
VALUES
    ('2026-03-28', '2026-03-28 09:31:00', 'Alice', 'Equities', 'AAPL', 'BUY',  1000, 148.25),
    ('2026-03-28', '2026-03-28 09:45:00', 'Alice', 'Equities', 'MSFT', 'BUY',   500, 375.80),
    ('2026-03-28', '2026-03-28 10:12:00', 'Bob',   'Equities', 'GOOGL', 'SELL',  300, 152.40),
    ('2026-03-28', '2026-03-28 11:03:00', 'Alice', 'Equities', 'AAPL', 'BUY',   500, 149.10),
    ('2026-03-28', '2026-03-28 14:22:00', 'Charlie', 'Fixed Income', 'TLT', 'BUY', 2000, 95.50),
    ('2026-03-28', '2026-03-28 15:15:00', 'Bob',   'Equities', 'NVDA', 'BUY',   200, 825.30),
    ('2026-03-31', '2026-03-31 09:30:00', 'Alice', 'Equities', 'AAPL', 'SELL',  800, 151.20),
    ('2026-03-31', '2026-03-31 10:05:00', 'David', 'Derivatives', 'SPY', 'BUY', 1500, 520.45),
    ('2026-03-31', '2026-03-31 11:30:00', 'Bob',   'Equities', 'GOOGL', 'BUY',  400, 155.10),
    ('2026-03-31', '2026-03-31 14:00:00', 'Charlie', 'Fixed Income', 'TLT', 'SELL', 1000, 96.20);

This is a tiny dataset. In production, you would be looking at millions of rows per day across dozens of instruments. But the queries are the same. SQL does not care if the table has 10 rows or 10 billion.


The Basics: SELECT, WHERE, ORDER BY

Querying Trades

-- All trades for a specific date
SELECT trade_id, trade_time, trader, ticker, side, quantity, price, notional
FROM trades
WHERE trade_date = '2026-03-28'
ORDER BY trade_time;
trade_idtrade_timetradertickersidequantitypricenotional
109:31:00AliceAAPLBUY1000148.25148,250.00
209:45:00AliceMSFTBUY500375.80187,900.00
310:12:00BobGOOGLSELL300152.4045,720.00
411:03:00AliceAAPLBUY500149.1074,550.00
514:22:00CharlieTLTBUY200095.50191,000.00
615:15:00BobNVDABUY200825.30165,060.00

Filtering with Multiple Conditions

-- Alice's buys over $100K notional
SELECT ticker, side, quantity, price, notional
FROM trades
WHERE trader = 'Alice'
  AND side = 'BUY'
  AND notional > 100000
ORDER BY notional DESC;
tickersidequantitypricenotional
MSFTBUY500375.80187,900.00
AAPLBUY1000148.25148,250.00

Pattern Matching with LIKE and IN

-- All tech stock trades
SELECT trader, ticker, side, quantity, price
FROM trades
WHERE ticker IN ('AAPL', 'MSFT', 'GOOGL', 'NVDA', 'META', 'AMZN')
ORDER BY ticker, trade_time;

-- All traders whose name starts with 'A'
SELECT DISTINCT trader
FROM trades
WHERE trader LIKE 'A%';

Key Insight: IN is your best friend when filtering against a known list of values. It is cleaner than chaining OR conditions and easier to maintain. When the list comes from another table, use a subquery or JOIN instead.


Aggregations: GROUP BY

Aggregation is where SQL starts earning its keep. Summarizing millions of rows into a handful of meaningful numbers is what databases are built for.

Daily Trading Volume by Desk

SELECT
    trade_date,
    desk,
    COUNT(*) AS num_trades,
    SUM(notional) AS total_notional,
    ROUND(AVG(notional), 2) AS avg_trade_size
FROM trades
GROUP BY trade_date, desk
ORDER BY trade_date, total_notional DESC;
trade_datedesknum_tradestotal_notionalavg_trade_size
2026-03-28Equities4435,730.00108,932.50
2026-03-28Fixed Income1191,000.00191,000.00
2026-03-28Equities1165,060.00165,060.00
2026-03-31Derivatives1780,675.00780,675.00
2026-03-31Equities2183,000.0091,500.00
2026-03-31Fixed Income196,200.0096,200.00

Top Traders by Volume

SELECT
    trader,
    COUNT(*) AS total_trades,
    SUM(quantity) AS total_shares,
    SUM(notional) AS total_notional,
    ROUND(AVG(price), 2) AS avg_price
FROM trades
GROUP BY trader
ORDER BY total_notional DESC;
tradertotal_tradestotal_sharestotal_notionalavg_price
Alice42800531,660.00206.09
David11500780,675.00520.45
Charlie23000287,200.0095.85
Bob3900272,820.00377.60

Filtering Aggregated Results with HAVING

-- Desks with more than $500K in daily notional
SELECT
    trade_date,
    desk,
    SUM(notional) AS total_notional
FROM trades
GROUP BY trade_date, desk
HAVING SUM(notional) > 500000
ORDER BY total_notional DESC;

WHERE filters rows before aggregation. HAVING filters groups after aggregation. Mixing them up is the SQL equivalent of putting on your shoes before your socks. Technically possible, but everyone will notice.


JOINs: Combining Tables

Real-world queries almost always involve multiple tables. JOINs are how you connect them.

INNER JOIN: Matching Rows Only

-- Trades with current market prices
SELECT
    t.trade_date,
    t.ticker,
    t.side,
    t.price AS trade_price,
    m.close_price AS market_close,
    ROUND((m.close_price - t.price) * t.quantity, 2) AS unrealized_pnl
FROM trades t
INNER JOIN market_data m
    ON t.ticker = m.ticker
    AND t.trade_date = m.trade_date
WHERE t.side = 'BUY'
ORDER BY unrealized_pnl DESC;

LEFT JOIN: Keep All Rows from the Left Table

-- All trades, with market data if available
SELECT
    t.trade_id,
    t.ticker,
    t.price AS trade_price,
    m.close_price,
    CASE
        WHEN m.close_price IS NULL THEN 'NO MARKET DATA'
        ELSE 'OK'
    END AS data_status
FROM trades t
LEFT JOIN market_data m
    ON t.ticker = m.ticker
    AND t.trade_date = m.trade_date;

Key Insight: LEFT JOIN is essential in finance for data quality checks. If a trade has no matching market data, you want to know about it (not silently drop it, which is what INNER JOIN would do). “Show me everything, and flag what’s missing” is a very common pattern.

Self-JOIN: Comparing a Table to Itself

-- Find pairs of trades in the same stock on the same day
-- (potential wash trades or paired activity)
SELECT
    t1.trade_id AS trade_1,
    t2.trade_id AS trade_2,
    t1.ticker,
    t1.trader AS trader_1,
    t2.trader AS trader_2,
    t1.side AS side_1,
    t2.side AS side_2,
    t1.quantity AS qty_1,
    t2.quantity AS qty_2
FROM trades t1
INNER JOIN trades t2
    ON t1.ticker = t2.ticker
    AND t1.trade_date = t2.trade_date
    AND t1.trade_id < t2.trade_id  -- Avoid duplicates and self-matches
    AND t1.side != t2.side          -- Opposite sides
ORDER BY t1.ticker, t1.trade_time;

Self-joins are how compliance teams find suspicious trading patterns. “Did anyone buy and sell the same stock on the same day?” is a query that compliance runs more often than you think.


Window Functions: The Power Tool

Window functions are where SQL goes from “useful” to “incredibly powerful.” They let you perform calculations across related rows without collapsing the result set (unlike GROUP BY, which reduces rows).

Running Total of Daily Notional

SELECT
    trade_date,
    trade_time,
    trader,
    ticker,
    notional,
    SUM(notional) OVER (
        PARTITION BY trade_date
        ORDER BY trade_time
    ) AS running_total
FROM trades
ORDER BY trade_date, trade_time;
trade_datetrade_timetradertickernotionalrunning_total
2026-03-2809:31:00AliceAAPL148,250148,250
2026-03-2809:45:00AliceMSFT187,900336,150
2026-03-2810:12:00BobGOOGL45,720381,870
2026-03-2811:03:00AliceAAPL74,550456,420
2026-03-2814:22:00CharlieTLT191,000647,420
2026-03-2815:15:00BobNVDA165,060812,480

Ranking Traders

SELECT
    trader,
    SUM(notional) AS total_notional,
    RANK() OVER (ORDER BY SUM(notional) DESC) AS rank,
    ROUND(
        SUM(notional) * 100.0 / SUM(SUM(notional)) OVER (),
        1
    ) AS pct_of_total
FROM trades
GROUP BY trader
ORDER BY rank;
tradertotal_notionalrankpct_of_total
David780,675141.7%
Alice531,660228.4%
Charlie287,200315.3%
Bob272,820414.6%

Moving Average of Stock Prices

-- 5-day moving average of closing prices
SELECT
    ticker,
    trade_date,
    close_price,
    ROUND(
        AVG(close_price) OVER (
            PARTITION BY ticker
            ORDER BY trade_date
            ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
        ), 4
    ) AS ma_5day,
    ROUND(
        AVG(close_price) OVER (
            PARTITION BY ticker
            ORDER BY trade_date
            ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
        ), 4
    ) AS ma_20day
FROM market_data
WHERE ticker = 'AAPL'
ORDER BY trade_date DESC
LIMIT 10;

LAG and LEAD: Looking at Adjacent Rows

-- Daily price change and percentage change
SELECT
    ticker,
    trade_date,
    close_price,
    LAG(close_price) OVER (
        PARTITION BY ticker ORDER BY trade_date
    ) AS prev_close,
    ROUND(
        close_price - LAG(close_price) OVER (
            PARTITION BY ticker ORDER BY trade_date
        ), 4
    ) AS daily_change,
    ROUND(
        (close_price - LAG(close_price) OVER (
            PARTITION BY ticker ORDER BY trade_date
        )) / LAG(close_price) OVER (
            PARTITION BY ticker ORDER BY trade_date
        ) * 100, 2
    ) AS pct_change
FROM market_data
WHERE ticker IN ('AAPL', 'MSFT')
ORDER BY ticker, trade_date DESC;

Key Insight: Window functions are the single most important SQL feature for financial analysis. They let you compute running totals, moving averages, rankings, percentiles, and row comparisons without subqueries or self-joins. If you learn one advanced SQL feature, make it window functions.

Before window functions existed, computing a running total required a self-join or a correlated subquery. Both were ugly and slow. Window functions turned a 15-line query into a 3-line query. The person who invented them deserves a Nobel Prize, or at least a really nice dinner.


CTEs: Common Table Expressions

CTEs (the WITH clause) let you name a subquery and reference it multiple times. They make complex queries readable and maintainable.

Daily P&L Report

WITH daily_buys AS (
    SELECT
        trade_date,
        ticker,
        SUM(quantity) AS bought_qty,
        SUM(notional) AS bought_notional,
        ROUND(SUM(notional) / NULLIF(SUM(quantity), 0), 4) AS avg_buy_price
    FROM trades
    WHERE side = 'BUY'
    GROUP BY trade_date, ticker
),
daily_sells AS (
    SELECT
        trade_date,
        ticker,
        SUM(quantity) AS sold_qty,
        SUM(notional) AS sold_notional,
        ROUND(SUM(notional) / NULLIF(SUM(quantity), 0), 4) AS avg_sell_price
    FROM trades
    WHERE side = 'SELL'
    GROUP BY trade_date, ticker
)
SELECT
    COALESCE(b.trade_date, s.trade_date) AS trade_date,
    COALESCE(b.ticker, s.ticker) AS ticker,
    COALESCE(b.bought_qty, 0) AS bought,
    COALESCE(s.sold_qty, 0) AS sold,
    COALESCE(b.bought_qty, 0) - COALESCE(s.sold_qty, 0) AS net_quantity,
    COALESCE(b.bought_notional, 0) AS buy_value,
    COALESCE(s.sold_notional, 0) AS sell_value,
    COALESCE(s.sold_notional, 0) - COALESCE(b.bought_notional, 0) AS realized_pnl
FROM daily_buys b
FULL OUTER JOIN daily_sells s
    ON b.trade_date = s.trade_date
    AND b.ticker = s.ticker
ORDER BY trade_date, ticker;

Position Builder

Build current positions from the trade history:

WITH trade_flows AS (
    SELECT
        ticker,
        SUM(CASE WHEN side = 'BUY' THEN quantity ELSE -quantity END) AS net_quantity,
        SUM(CASE WHEN side = 'BUY' THEN notional ELSE -notional END) AS net_cost
    FROM trades
    GROUP BY ticker
),
valued_positions AS (
    SELECT
        tf.ticker,
        tf.net_quantity,
        ROUND(tf.net_cost / NULLIF(tf.net_quantity, 0), 4) AS avg_cost,
        m.close_price AS current_price,
        ROUND(tf.net_quantity * m.close_price, 2) AS market_value,
        ROUND(tf.net_quantity * m.close_price - tf.net_cost, 2) AS unrealized_pnl
    FROM trade_flows tf
    LEFT JOIN (
        SELECT DISTINCT ON (ticker)
            ticker, close_price
        FROM market_data
        ORDER BY ticker, trade_date DESC
    ) m ON tf.ticker = m.ticker
    WHERE tf.net_quantity != 0
)
SELECT
    ticker,
    net_quantity,
    avg_cost,
    current_price,
    market_value,
    unrealized_pnl,
    ROUND(unrealized_pnl / NULLIF(ABS(net_quantity * avg_cost), 0) * 100, 2)
        AS pnl_pct
FROM valued_positions
ORDER BY ABS(market_value) DESC;

Key Insight: CTEs are not just syntactic sugar. They make your intent explicit. A reader can look at daily_buys and daily_sells and immediately understand the structure, instead of mentally parsing nested subqueries three levels deep. In finance, where queries get audited by risk and compliance, readability is not a luxury. It is a requirement.

If your query is so complex that you cannot explain it to a colleague in 30 seconds, break it into CTEs. Each CTE should answer one clear question. “What did we buy?” “What did we sell?” “What is the net?” Three simple questions instead of one incomprehensible query.


Real-World Queries: The Ones You Actually Need

Query 1: VWAP (Volume-Weighted Average Price)

VWAP is the gold standard benchmark for execution quality. It tells you the average price at which a stock traded, weighted by volume.

SELECT
    ticker,
    trade_date,
    ROUND(
        SUM(price * quantity) / NULLIF(SUM(quantity), 0),
        4
    ) AS vwap,
    SUM(quantity) AS total_volume,
    MIN(price) AS low_price,
    MAX(price) AS high_price
FROM trades
WHERE trade_date = '2026-03-28'
GROUP BY ticker, trade_date
ORDER BY total_volume DESC;
tickertrade_datevwaptotal_volumelow_pricehigh_price
TLT2026-03-2895.5000200095.5095.50
AAPL2026-03-28148.53331500148.25149.10
MSFT2026-03-28375.8000500375.80375.80
GOOGL2026-03-28152.4000300152.40152.40
NVDA2026-03-28825.3000200825.30825.30

If a trader bought AAPL at $149.10 and the VWAP was $148.53, they paid more than the market average. That is 57 cents of “execution slippage.” On 500 shares, that is $285 they left on the table. Execution desks live and die by these numbers.

Query 2: Largest Trades (Outlier Detection)

-- Find trades that are more than 3 standard deviations above the mean notional
WITH trade_stats AS (
    SELECT
        AVG(notional) AS mean_notional,
        STDDEV(notional) AS std_notional
    FROM trades
)
SELECT
    t.trade_id,
    t.trade_date,
    t.trader,
    t.ticker,
    t.notional,
    ROUND((t.notional - ts.mean_notional) / ts.std_notional, 2) AS z_score
FROM trades t
CROSS JOIN trade_stats ts
WHERE t.notional > ts.mean_notional + 3 * ts.std_notional
ORDER BY t.notional DESC;

Query 3: Trader Activity Heatmap Data

-- Hourly trading activity breakdown
SELECT
    trader,
    EXTRACT(HOUR FROM trade_time) AS hour,
    COUNT(*) AS num_trades,
    SUM(notional) AS total_notional
FROM trades
GROUP BY trader, EXTRACT(HOUR FROM trade_time)
ORDER BY trader, hour;

Query 4: Concentration Risk

-- What percentage of total exposure is in each stock?
WITH total AS (
    SELECT SUM(ABS(quantity * current_price)) AS total_exposure
    FROM positions
)
SELECT
    p.ticker,
    p.quantity,
    ABS(p.quantity * p.current_price) AS exposure,
    ROUND(
        ABS(p.quantity * p.current_price) / t.total_exposure * 100,
        2
    ) AS pct_of_total
FROM positions p
CROSS JOIN total t
ORDER BY exposure DESC;

If one stock is 40% of your total exposure, you do not have a portfolio. You have a stock and some friends. Concentration risk queries keep risk managers employed and traders annoyed.


Performance: Making Queries Not Terrible

When your trades table has 50 million rows, a badly written query can take minutes. Here is how to keep things fast.

Indexes

Indexes are the single biggest performance lever in SQL. They are like the index at the back of a textbook: instead of reading every page, you jump straight to the right one.

-- Essential indexes for a trades table
CREATE INDEX idx_trades_date ON trades (trade_date);
CREATE INDEX idx_trades_ticker ON trades (ticker);
CREATE INDEX idx_trades_trader ON trades (trader);
CREATE INDEX idx_trades_date_ticker ON trades (trade_date, ticker);

-- Essential indexes for market_data
CREATE INDEX idx_market_date ON market_data (trade_date);
CREATE INDEX idx_market_ticker_date ON market_data (ticker, trade_date);

EXPLAIN: Understanding Query Performance

EXPLAIN ANALYZE
SELECT ticker, SUM(notional)
FROM trades
WHERE trade_date = '2026-03-28'
GROUP BY ticker;

This shows you the query execution plan: which indexes are used, how many rows are scanned, and where the time is spent. If you see “Seq Scan” on a large table, you need an index.

Performance Rules of Thumb

Do ThisNot ThisWhy
Filter with indexed columns in WHEREWrap indexed columns in functionsWHERE trade_date = '2026-03-28' uses the index. WHERE DATE(trade_time) = '2026-03-28' does not.
Use specific columns in SELECTUse SELECT *Fetching all columns reads more data from disk.
Use EXISTS for existence checksUse COUNT(*) > 0EXISTS stops at the first match. COUNT scans everything.
Join on indexed columnsJoin on computed expressionsThe optimizer can use indexes for direct column joins.
Use LIMIT for exploratory queriesFetch millions of rows to “take a quick look”Your query returns 47 million rows. Do you really need all of them?

Key Insight: The difference between a query that takes 50 milliseconds and one that takes 5 minutes is almost always an index. A composite index on (trade_date, ticker) can turn a full table scan into an index-only scan. Always index the columns you filter and join on.


Common Pitfalls

  1. Dividing by zero. Financial data loves to have zero quantities, zero prices, and empty positions. Always use NULLIF(denominator, 0) to prevent division-by-zero errors. price / NULLIF(quantity, 0) returns NULL instead of crashing.

  2. Forgetting NULL handling. NULL is not zero. NULL is not empty string. NULL is “unknown.” Any arithmetic with NULL returns NULL. Use COALESCE(value, 0) when you need a default.

  3. Floating-point precision. Never use FLOAT for money. Use DECIMAL(12, 4) or NUMERIC. Floating-point arithmetic can turn $100.00 into $99.99999999 and cause P&L reconciliation nightmares.

  4. Not using transactions. When updating positions or inserting related records, always wrap the operations in a transaction (BEGIN ... COMMIT). Otherwise a crash halfway through can leave your data inconsistent.

  5. Ignoring time zones. A trade executed at “3 PM” could be New York, London, or Tokyo time. Always store timestamps in UTC and convert for display. Financial data across time zones without UTC is a debugging nightmare.

  6. Using SELECT * in production queries. It fetches every column, breaks when schema changes, and makes the query optimizer’s life harder. Always list the specific columns you need.


Wrapping Up

SQL is the unsung hero of finance technology. Every trade, every position, every risk calculation, and every regulatory report ultimately comes from a SQL query hitting a database. The language itself is deceptively simple (you can learn the basics in an afternoon), but mastering the patterns that matter in finance (window functions, CTEs, aggregation, proper JOINs) is what separates “I can write a SELECT statement” from “I can build the query that tells the risk committee exactly how exposed we are.”

Start with the basics: SELECT, WHERE, GROUP BY, JOIN. Then learn window functions, because they change everything. Then learn CTEs, because they make complex queries readable. Then learn EXPLAIN, because performance matters when your table has 50 million rows.

And always remember the golden rule: use DECIMAL for money, NULLIF for division, and UTC for timestamps. Break any of these rules and you will spend your weekend debugging a reconciliation break instead of enjoying it.

SQL is not sexy. SQL is not trending. SQL has not changed much since 1986. But SQL is running every financial institution on the planet, and it will still be running them long after the trendy frameworks have come and gone. Learn it well.


Cheat Sheet

Key Questions & Answers

What SQL features matter most for financial data analysis?

Window functions (running totals, moving averages, ranking), CTEs (readable multi-step queries), GROUP BY with HAVING (aggregation and filtering), and JOINs (combining trade, market, and position data). These four features cover 90% of financial SQL work.

Why should I use DECIMAL instead of FLOAT for financial data?

FLOAT uses binary floating-point arithmetic, which cannot represent many decimal fractions exactly. This leads to rounding errors like $99.99999999 instead of $100.00. DECIMAL stores exact decimal values. In finance, a fraction of a cent times millions of trades adds up to real money.

How do I compute VWAP in SQL?

SUM(price * quantity) / SUM(quantity). This gives the volume-weighted average price. Always use NULLIF(SUM(quantity), 0) in the denominator to handle edge cases where total volume is zero.

What is the difference between WHERE and HAVING?

WHERE filters individual rows before aggregation. HAVING filters groups after aggregation. Use WHERE for conditions on raw data (e.g., WHERE trade_date = '2026-03-28'). Use HAVING for conditions on aggregated results (e.g., HAVING SUM(notional) > 1000000).

Key Concepts at a Glance

ConceptSummary
SELECT … FROM … WHEREBasic query: choose columns, table, and filter conditions
GROUP BY + aggregate functionsSummarize data: COUNT, SUM, AVG, MIN, MAX
HAVINGFilter after aggregation (WHERE filters before)
INNER JOINOnly rows that match in both tables
LEFT JOINAll rows from left table, matching rows from right (NULL if no match)
FULL OUTER JOINAll rows from both tables, NULLs where no match
Window functionsCalculations across related rows: SUM() OVER, RANK(), LAG(), LEAD()
PARTITION BYDivides rows into groups for window function calculation
CTE (WITH clause)Named subquery for readability and reuse
VWAPSUM(price * quantity) / SUM(quantity)
NULLIF(x, 0)Returns NULL instead of dividing by zero
COALESCE(x, default)Returns the first non-NULL value
DECIMAL(12,4)Exact decimal type for financial amounts
INDEXSpeeds up queries by creating lookup structures on columns
EXPLAIN ANALYZEShows the query execution plan and actual timings
UTC timestampsAlways store times in UTC, convert for display

Sources & Further Reading

PreviousWeb Scraping 101 - Extracting Financial Data with Python