SQL for Finance: Querying Your Way Through Millions of Trades¶
Somewhere, right now, a database is holding 47 million trade records, 3 billion market data ticks, and the entire transaction history of a bank that has existed since 1832. That database does not care that it is Monday morning. It does not care that the CEO wants “a quick number” for the board meeting. It does not care that you just spilled coffee on your keyboard. It just sits there, waiting for a well-written query.
SQL (Structured Query Language) is how you talk to that database. It is the universal language of data retrieval, and in finance, it is everywhere. Trade databases, risk systems, regulatory reporting, market data warehouses, position management, P&L reconciliation, compliance monitoring. If money moved, SQL knows about it.
The thing about SQL in finance is that it is not particularly complicated. You do not need a PhD in database theory. You need to know about 20 patterns really well, and you need to understand the specific shapes of financial data. The rest is just practice and knowing which questions to ask.
SQL is the only language where you can earn a six-figure salary knowing fewer than 20 keywords. SELECT, FROM, WHERE, JOIN, GROUP BY, ORDER BY, and you are already dangerous.
The Financial Database: What You Are Working With¶
Before writing queries, let us set up the tables we will use throughout this article. These represent a simplified but realistic trading database.
Table: trades¶
The core of any trading system. Every executed trade gets a row.
CREATE TABLE trades (
trade_id SERIAL PRIMARY KEY,
trade_date DATE NOT NULL,
trade_time TIMESTAMP NOT NULL,
trader VARCHAR(50) NOT NULL,
desk VARCHAR(30) NOT NULL,
ticker VARCHAR(10) NOT NULL,
side VARCHAR(4) NOT NULL, -- 'BUY' or 'SELL'
quantity INTEGER NOT NULL,
price DECIMAL(12, 4) NOT NULL,
notional DECIMAL(15, 2) GENERATED ALWAYS AS (quantity * price) STORED,
currency VARCHAR(3) DEFAULT 'USD',
status VARCHAR(10) DEFAULT 'FILLED'
);
Table: positions¶
The current inventory. How much of each stock does each desk hold right now?
CREATE TABLE positions (
position_id SERIAL PRIMARY KEY,
desk VARCHAR(30) NOT NULL,
ticker VARCHAR(10) NOT NULL,
quantity INTEGER NOT NULL, -- Positive = long, negative = short
avg_cost DECIMAL(12, 4) NOT NULL,
current_price DECIMAL(12, 4),
last_updated TIMESTAMP DEFAULT NOW()
);
Table: market_data¶
End-of-day prices for all instruments.
CREATE TABLE market_data (
ticker VARCHAR(10) NOT NULL,
trade_date DATE NOT NULL,
open_price DECIMAL(12, 4),
high_price DECIMAL(12, 4),
low_price DECIMAL(12, 4),
close_price DECIMAL(12, 4),
volume BIGINT,
PRIMARY KEY (ticker, trade_date)
);
Sample Data¶
Let us insert some realistic trades:
INSERT INTO trades (trade_date, trade_time, trader, desk, ticker, side, quantity, price)
VALUES
('2026-03-28', '2026-03-28 09:31:00', 'Alice', 'Equities', 'AAPL', 'BUY', 1000, 148.25),
('2026-03-28', '2026-03-28 09:45:00', 'Alice', 'Equities', 'MSFT', 'BUY', 500, 375.80),
('2026-03-28', '2026-03-28 10:12:00', 'Bob', 'Equities', 'GOOGL', 'SELL', 300, 152.40),
('2026-03-28', '2026-03-28 11:03:00', 'Alice', 'Equities', 'AAPL', 'BUY', 500, 149.10),
('2026-03-28', '2026-03-28 14:22:00', 'Charlie', 'Fixed Income', 'TLT', 'BUY', 2000, 95.50),
('2026-03-28', '2026-03-28 15:15:00', 'Bob', 'Equities', 'NVDA', 'BUY', 200, 825.30),
('2026-03-31', '2026-03-31 09:30:00', 'Alice', 'Equities', 'AAPL', 'SELL', 800, 151.20),
('2026-03-31', '2026-03-31 10:05:00', 'David', 'Derivatives', 'SPY', 'BUY', 1500, 520.45),
('2026-03-31', '2026-03-31 11:30:00', 'Bob', 'Equities', 'GOOGL', 'BUY', 400, 155.10),
('2026-03-31', '2026-03-31 14:00:00', 'Charlie', 'Fixed Income', 'TLT', 'SELL', 1000, 96.20);
This is a tiny dataset. In production, you would be looking at millions of rows per day across dozens of instruments. But the queries are the same. SQL does not care if the table has 10 rows or 10 billion.
The Basics: SELECT, WHERE, ORDER BY¶
Querying Trades¶
-- All trades for a specific date
SELECT trade_id, trade_time, trader, ticker, side, quantity, price, notional
FROM trades
WHERE trade_date = '2026-03-28'
ORDER BY trade_time;
| trade_id | trade_time | trader | ticker | side | quantity | price | notional |
|---|---|---|---|---|---|---|---|
| 1 | 09:31:00 | Alice | AAPL | BUY | 1000 | 148.25 | 148,250.00 |
| 2 | 09:45:00 | Alice | MSFT | BUY | 500 | 375.80 | 187,900.00 |
| 3 | 10:12:00 | Bob | GOOGL | SELL | 300 | 152.40 | 45,720.00 |
| 4 | 11:03:00 | Alice | AAPL | BUY | 500 | 149.10 | 74,550.00 |
| 5 | 14:22:00 | Charlie | TLT | BUY | 2000 | 95.50 | 191,000.00 |
| 6 | 15:15:00 | Bob | NVDA | BUY | 200 | 825.30 | 165,060.00 |
Filtering with Multiple Conditions¶
-- Alice's buys over $100K notional
SELECT ticker, side, quantity, price, notional
FROM trades
WHERE trader = 'Alice'
AND side = 'BUY'
AND notional > 100000
ORDER BY notional DESC;
| ticker | side | quantity | price | notional |
|---|---|---|---|---|
| MSFT | BUY | 500 | 375.80 | 187,900.00 |
| AAPL | BUY | 1000 | 148.25 | 148,250.00 |
Pattern Matching with LIKE and IN¶
-- All tech stock trades
SELECT trader, ticker, side, quantity, price
FROM trades
WHERE ticker IN ('AAPL', 'MSFT', 'GOOGL', 'NVDA', 'META', 'AMZN')
ORDER BY ticker, trade_time;
-- All traders whose name starts with 'A'
SELECT DISTINCT trader
FROM trades
WHERE trader LIKE 'A%';
Key Insight:
INis your best friend when filtering against a known list of values. It is cleaner than chainingORconditions and easier to maintain. When the list comes from another table, use a subquery or JOIN instead.
Aggregations: GROUP BY¶
Aggregation is where SQL starts earning its keep. Summarizing millions of rows into a handful of meaningful numbers is what databases are built for.
Daily Trading Volume by Desk¶
SELECT
trade_date,
desk,
COUNT(*) AS num_trades,
SUM(notional) AS total_notional,
ROUND(AVG(notional), 2) AS avg_trade_size
FROM trades
GROUP BY trade_date, desk
ORDER BY trade_date, total_notional DESC;
| trade_date | desk | num_trades | total_notional | avg_trade_size |
|---|---|---|---|---|
| 2026-03-28 | Equities | 4 | 435,730.00 | 108,932.50 |
| 2026-03-28 | Fixed Income | 1 | 191,000.00 | 191,000.00 |
| 2026-03-28 | Equities | 1 | 165,060.00 | 165,060.00 |
| 2026-03-31 | Derivatives | 1 | 780,675.00 | 780,675.00 |
| 2026-03-31 | Equities | 2 | 183,000.00 | 91,500.00 |
| 2026-03-31 | Fixed Income | 1 | 96,200.00 | 96,200.00 |
Top Traders by Volume¶
SELECT
trader,
COUNT(*) AS total_trades,
SUM(quantity) AS total_shares,
SUM(notional) AS total_notional,
ROUND(AVG(price), 2) AS avg_price
FROM trades
GROUP BY trader
ORDER BY total_notional DESC;
| trader | total_trades | total_shares | total_notional | avg_price |
|---|---|---|---|---|
| Alice | 4 | 2800 | 531,660.00 | 206.09 |
| David | 1 | 1500 | 780,675.00 | 520.45 |
| Charlie | 2 | 3000 | 287,200.00 | 95.85 |
| Bob | 3 | 900 | 272,820.00 | 377.60 |
Filtering Aggregated Results with HAVING¶
-- Desks with more than $500K in daily notional
SELECT
trade_date,
desk,
SUM(notional) AS total_notional
FROM trades
GROUP BY trade_date, desk
HAVING SUM(notional) > 500000
ORDER BY total_notional DESC;
WHERE filters rows before aggregation. HAVING filters groups after aggregation. Mixing them up is the SQL equivalent of putting on your shoes before your socks. Technically possible, but everyone will notice.
JOINs: Combining Tables¶
Real-world queries almost always involve multiple tables. JOINs are how you connect them.
INNER JOIN: Matching Rows Only¶
-- Trades with current market prices
SELECT
t.trade_date,
t.ticker,
t.side,
t.price AS trade_price,
m.close_price AS market_close,
ROUND((m.close_price - t.price) * t.quantity, 2) AS unrealized_pnl
FROM trades t
INNER JOIN market_data m
ON t.ticker = m.ticker
AND t.trade_date = m.trade_date
WHERE t.side = 'BUY'
ORDER BY unrealized_pnl DESC;
LEFT JOIN: Keep All Rows from the Left Table¶
-- All trades, with market data if available
SELECT
t.trade_id,
t.ticker,
t.price AS trade_price,
m.close_price,
CASE
WHEN m.close_price IS NULL THEN 'NO MARKET DATA'
ELSE 'OK'
END AS data_status
FROM trades t
LEFT JOIN market_data m
ON t.ticker = m.ticker
AND t.trade_date = m.trade_date;
Key Insight: LEFT JOIN is essential in finance for data quality checks. If a trade has no matching market data, you want to know about it (not silently drop it, which is what INNER JOIN would do). “Show me everything, and flag what’s missing” is a very common pattern.
Self-JOIN: Comparing a Table to Itself¶
-- Find pairs of trades in the same stock on the same day
-- (potential wash trades or paired activity)
SELECT
t1.trade_id AS trade_1,
t2.trade_id AS trade_2,
t1.ticker,
t1.trader AS trader_1,
t2.trader AS trader_2,
t1.side AS side_1,
t2.side AS side_2,
t1.quantity AS qty_1,
t2.quantity AS qty_2
FROM trades t1
INNER JOIN trades t2
ON t1.ticker = t2.ticker
AND t1.trade_date = t2.trade_date
AND t1.trade_id < t2.trade_id -- Avoid duplicates and self-matches
AND t1.side != t2.side -- Opposite sides
ORDER BY t1.ticker, t1.trade_time;
Self-joins are how compliance teams find suspicious trading patterns. “Did anyone buy and sell the same stock on the same day?” is a query that compliance runs more often than you think.
Window Functions: The Power Tool¶
Window functions are where SQL goes from “useful” to “incredibly powerful.” They let you perform calculations across related rows without collapsing the result set (unlike GROUP BY, which reduces rows).
Running Total of Daily Notional¶
SELECT
trade_date,
trade_time,
trader,
ticker,
notional,
SUM(notional) OVER (
PARTITION BY trade_date
ORDER BY trade_time
) AS running_total
FROM trades
ORDER BY trade_date, trade_time;
| trade_date | trade_time | trader | ticker | notional | running_total |
|---|---|---|---|---|---|
| 2026-03-28 | 09:31:00 | Alice | AAPL | 148,250 | 148,250 |
| 2026-03-28 | 09:45:00 | Alice | MSFT | 187,900 | 336,150 |
| 2026-03-28 | 10:12:00 | Bob | GOOGL | 45,720 | 381,870 |
| 2026-03-28 | 11:03:00 | Alice | AAPL | 74,550 | 456,420 |
| 2026-03-28 | 14:22:00 | Charlie | TLT | 191,000 | 647,420 |
| 2026-03-28 | 15:15:00 | Bob | NVDA | 165,060 | 812,480 |
Ranking Traders¶
SELECT
trader,
SUM(notional) AS total_notional,
RANK() OVER (ORDER BY SUM(notional) DESC) AS rank,
ROUND(
SUM(notional) * 100.0 / SUM(SUM(notional)) OVER (),
1
) AS pct_of_total
FROM trades
GROUP BY trader
ORDER BY rank;
| trader | total_notional | rank | pct_of_total |
|---|---|---|---|
| David | 780,675 | 1 | 41.7% |
| Alice | 531,660 | 2 | 28.4% |
| Charlie | 287,200 | 3 | 15.3% |
| Bob | 272,820 | 4 | 14.6% |
Moving Average of Stock Prices¶
-- 5-day moving average of closing prices
SELECT
ticker,
trade_date,
close_price,
ROUND(
AVG(close_price) OVER (
PARTITION BY ticker
ORDER BY trade_date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
), 4
) AS ma_5day,
ROUND(
AVG(close_price) OVER (
PARTITION BY ticker
ORDER BY trade_date
ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
), 4
) AS ma_20day
FROM market_data
WHERE ticker = 'AAPL'
ORDER BY trade_date DESC
LIMIT 10;
LAG and LEAD: Looking at Adjacent Rows¶
-- Daily price change and percentage change
SELECT
ticker,
trade_date,
close_price,
LAG(close_price) OVER (
PARTITION BY ticker ORDER BY trade_date
) AS prev_close,
ROUND(
close_price - LAG(close_price) OVER (
PARTITION BY ticker ORDER BY trade_date
), 4
) AS daily_change,
ROUND(
(close_price - LAG(close_price) OVER (
PARTITION BY ticker ORDER BY trade_date
)) / LAG(close_price) OVER (
PARTITION BY ticker ORDER BY trade_date
) * 100, 2
) AS pct_change
FROM market_data
WHERE ticker IN ('AAPL', 'MSFT')
ORDER BY ticker, trade_date DESC;
Key Insight: Window functions are the single most important SQL feature for financial analysis. They let you compute running totals, moving averages, rankings, percentiles, and row comparisons without subqueries or self-joins. If you learn one advanced SQL feature, make it window functions.
Before window functions existed, computing a running total required a self-join or a correlated subquery. Both were ugly and slow. Window functions turned a 15-line query into a 3-line query. The person who invented them deserves a Nobel Prize, or at least a really nice dinner.
CTEs: Common Table Expressions¶
CTEs (the WITH clause) let you name a subquery and reference it multiple times. They make complex queries readable and maintainable.
Daily P&L Report¶
WITH daily_buys AS (
SELECT
trade_date,
ticker,
SUM(quantity) AS bought_qty,
SUM(notional) AS bought_notional,
ROUND(SUM(notional) / NULLIF(SUM(quantity), 0), 4) AS avg_buy_price
FROM trades
WHERE side = 'BUY'
GROUP BY trade_date, ticker
),
daily_sells AS (
SELECT
trade_date,
ticker,
SUM(quantity) AS sold_qty,
SUM(notional) AS sold_notional,
ROUND(SUM(notional) / NULLIF(SUM(quantity), 0), 4) AS avg_sell_price
FROM trades
WHERE side = 'SELL'
GROUP BY trade_date, ticker
)
SELECT
COALESCE(b.trade_date, s.trade_date) AS trade_date,
COALESCE(b.ticker, s.ticker) AS ticker,
COALESCE(b.bought_qty, 0) AS bought,
COALESCE(s.sold_qty, 0) AS sold,
COALESCE(b.bought_qty, 0) - COALESCE(s.sold_qty, 0) AS net_quantity,
COALESCE(b.bought_notional, 0) AS buy_value,
COALESCE(s.sold_notional, 0) AS sell_value,
COALESCE(s.sold_notional, 0) - COALESCE(b.bought_notional, 0) AS realized_pnl
FROM daily_buys b
FULL OUTER JOIN daily_sells s
ON b.trade_date = s.trade_date
AND b.ticker = s.ticker
ORDER BY trade_date, ticker;
Position Builder¶
Build current positions from the trade history:
WITH trade_flows AS (
SELECT
ticker,
SUM(CASE WHEN side = 'BUY' THEN quantity ELSE -quantity END) AS net_quantity,
SUM(CASE WHEN side = 'BUY' THEN notional ELSE -notional END) AS net_cost
FROM trades
GROUP BY ticker
),
valued_positions AS (
SELECT
tf.ticker,
tf.net_quantity,
ROUND(tf.net_cost / NULLIF(tf.net_quantity, 0), 4) AS avg_cost,
m.close_price AS current_price,
ROUND(tf.net_quantity * m.close_price, 2) AS market_value,
ROUND(tf.net_quantity * m.close_price - tf.net_cost, 2) AS unrealized_pnl
FROM trade_flows tf
LEFT JOIN (
SELECT DISTINCT ON (ticker)
ticker, close_price
FROM market_data
ORDER BY ticker, trade_date DESC
) m ON tf.ticker = m.ticker
WHERE tf.net_quantity != 0
)
SELECT
ticker,
net_quantity,
avg_cost,
current_price,
market_value,
unrealized_pnl,
ROUND(unrealized_pnl / NULLIF(ABS(net_quantity * avg_cost), 0) * 100, 2)
AS pnl_pct
FROM valued_positions
ORDER BY ABS(market_value) DESC;
Key Insight: CTEs are not just syntactic sugar. They make your intent explicit. A reader can look at
daily_buysanddaily_sellsand immediately understand the structure, instead of mentally parsing nested subqueries three levels deep. In finance, where queries get audited by risk and compliance, readability is not a luxury. It is a requirement.
If your query is so complex that you cannot explain it to a colleague in 30 seconds, break it into CTEs. Each CTE should answer one clear question. “What did we buy?” “What did we sell?” “What is the net?” Three simple questions instead of one incomprehensible query.
Real-World Queries: The Ones You Actually Need¶
Query 1: VWAP (Volume-Weighted Average Price)¶
VWAP is the gold standard benchmark for execution quality. It tells you the average price at which a stock traded, weighted by volume.
SELECT
ticker,
trade_date,
ROUND(
SUM(price * quantity) / NULLIF(SUM(quantity), 0),
4
) AS vwap,
SUM(quantity) AS total_volume,
MIN(price) AS low_price,
MAX(price) AS high_price
FROM trades
WHERE trade_date = '2026-03-28'
GROUP BY ticker, trade_date
ORDER BY total_volume DESC;
| ticker | trade_date | vwap | total_volume | low_price | high_price |
|---|---|---|---|---|---|
| TLT | 2026-03-28 | 95.5000 | 2000 | 95.50 | 95.50 |
| AAPL | 2026-03-28 | 148.5333 | 1500 | 148.25 | 149.10 |
| MSFT | 2026-03-28 | 375.8000 | 500 | 375.80 | 375.80 |
| GOOGL | 2026-03-28 | 152.4000 | 300 | 152.40 | 152.40 |
| NVDA | 2026-03-28 | 825.3000 | 200 | 825.30 | 825.30 |
If a trader bought AAPL at $149.10 and the VWAP was $148.53, they paid more than the market average. That is 57 cents of “execution slippage.” On 500 shares, that is $285 they left on the table. Execution desks live and die by these numbers.
Query 2: Largest Trades (Outlier Detection)¶
-- Find trades that are more than 3 standard deviations above the mean notional
WITH trade_stats AS (
SELECT
AVG(notional) AS mean_notional,
STDDEV(notional) AS std_notional
FROM trades
)
SELECT
t.trade_id,
t.trade_date,
t.trader,
t.ticker,
t.notional,
ROUND((t.notional - ts.mean_notional) / ts.std_notional, 2) AS z_score
FROM trades t
CROSS JOIN trade_stats ts
WHERE t.notional > ts.mean_notional + 3 * ts.std_notional
ORDER BY t.notional DESC;
Query 3: Trader Activity Heatmap Data¶
-- Hourly trading activity breakdown
SELECT
trader,
EXTRACT(HOUR FROM trade_time) AS hour,
COUNT(*) AS num_trades,
SUM(notional) AS total_notional
FROM trades
GROUP BY trader, EXTRACT(HOUR FROM trade_time)
ORDER BY trader, hour;
Query 4: Concentration Risk¶
-- What percentage of total exposure is in each stock?
WITH total AS (
SELECT SUM(ABS(quantity * current_price)) AS total_exposure
FROM positions
)
SELECT
p.ticker,
p.quantity,
ABS(p.quantity * p.current_price) AS exposure,
ROUND(
ABS(p.quantity * p.current_price) / t.total_exposure * 100,
2
) AS pct_of_total
FROM positions p
CROSS JOIN total t
ORDER BY exposure DESC;
If one stock is 40% of your total exposure, you do not have a portfolio. You have a stock and some friends. Concentration risk queries keep risk managers employed and traders annoyed.
Performance: Making Queries Not Terrible¶
When your trades table has 50 million rows, a badly written query can take minutes. Here is how to keep things fast.
Indexes¶
Indexes are the single biggest performance lever in SQL. They are like the index at the back of a textbook: instead of reading every page, you jump straight to the right one.
-- Essential indexes for a trades table
CREATE INDEX idx_trades_date ON trades (trade_date);
CREATE INDEX idx_trades_ticker ON trades (ticker);
CREATE INDEX idx_trades_trader ON trades (trader);
CREATE INDEX idx_trades_date_ticker ON trades (trade_date, ticker);
-- Essential indexes for market_data
CREATE INDEX idx_market_date ON market_data (trade_date);
CREATE INDEX idx_market_ticker_date ON market_data (ticker, trade_date);
EXPLAIN: Understanding Query Performance¶
EXPLAIN ANALYZE
SELECT ticker, SUM(notional)
FROM trades
WHERE trade_date = '2026-03-28'
GROUP BY ticker;
This shows you the query execution plan: which indexes are used, how many rows are scanned, and where the time is spent. If you see “Seq Scan” on a large table, you need an index.
Performance Rules of Thumb¶
| Do This | Not This | Why |
|---|---|---|
| Filter with indexed columns in WHERE | Wrap indexed columns in functions | WHERE trade_date = '2026-03-28' uses the index. WHERE DATE(trade_time) = '2026-03-28' does not. |
| Use specific columns in SELECT | Use SELECT * | Fetching all columns reads more data from disk. |
| Use EXISTS for existence checks | Use COUNT(*) > 0 | EXISTS stops at the first match. COUNT scans everything. |
| Join on indexed columns | Join on computed expressions | The optimizer can use indexes for direct column joins. |
| Use LIMIT for exploratory queries | Fetch millions of rows to “take a quick look” | Your query returns 47 million rows. Do you really need all of them? |
Key Insight: The difference between a query that takes 50 milliseconds and one that takes 5 minutes is almost always an index. A composite index on
(trade_date, ticker)can turn a full table scan into an index-only scan. Always index the columns you filter and join on.
Common Pitfalls¶
Dividing by zero. Financial data loves to have zero quantities, zero prices, and empty positions. Always use
NULLIF(denominator, 0)to prevent division-by-zero errors.price / NULLIF(quantity, 0)returns NULL instead of crashing.Forgetting NULL handling. NULL is not zero. NULL is not empty string. NULL is “unknown.” Any arithmetic with NULL returns NULL. Use
COALESCE(value, 0)when you need a default.Floating-point precision. Never use
FLOATfor money. UseDECIMAL(12, 4)orNUMERIC. Floating-point arithmetic can turn $100.00 into $99.99999999 and cause P&L reconciliation nightmares.Not using transactions. When updating positions or inserting related records, always wrap the operations in a transaction (
BEGIN ... COMMIT). Otherwise a crash halfway through can leave your data inconsistent.Ignoring time zones. A trade executed at “3 PM” could be New York, London, or Tokyo time. Always store timestamps in UTC and convert for display. Financial data across time zones without UTC is a debugging nightmare.
Using SELECT * in production queries. It fetches every column, breaks when schema changes, and makes the query optimizer’s life harder. Always list the specific columns you need.
Wrapping Up¶
SQL is the unsung hero of finance technology. Every trade, every position, every risk calculation, and every regulatory report ultimately comes from a SQL query hitting a database. The language itself is deceptively simple (you can learn the basics in an afternoon), but mastering the patterns that matter in finance (window functions, CTEs, aggregation, proper JOINs) is what separates “I can write a SELECT statement” from “I can build the query that tells the risk committee exactly how exposed we are.”
Start with the basics: SELECT, WHERE, GROUP BY, JOIN. Then learn window functions, because they change everything. Then learn CTEs, because they make complex queries readable. Then learn EXPLAIN, because performance matters when your table has 50 million rows.
And always remember the golden rule: use DECIMAL for money, NULLIF for division, and UTC for timestamps. Break any of these rules and you will spend your weekend debugging a reconciliation break instead of enjoying it.
SQL is not sexy. SQL is not trending. SQL has not changed much since 1986. But SQL is running every financial institution on the planet, and it will still be running them long after the trendy frameworks have come and gone. Learn it well.
Cheat Sheet¶
Key Questions & Answers¶
What SQL features matter most for financial data analysis?¶
Window functions (running totals, moving averages, ranking), CTEs (readable multi-step queries), GROUP BY with HAVING (aggregation and filtering), and JOINs (combining trade, market, and position data). These four features cover 90% of financial SQL work.
Why should I use DECIMAL instead of FLOAT for financial data?¶
FLOAT uses binary floating-point arithmetic, which cannot represent many decimal fractions exactly. This leads to rounding errors like $99.99999999 instead of $100.00. DECIMAL stores exact decimal values. In finance, a fraction of a cent times millions of trades adds up to real money.
How do I compute VWAP in SQL?¶
SUM(price * quantity) / SUM(quantity). This gives the volume-weighted average price. Always useNULLIF(SUM(quantity), 0)in the denominator to handle edge cases where total volume is zero.
What is the difference between WHERE and HAVING?¶
WHERE filters individual rows before aggregation. HAVING filters groups after aggregation. Use WHERE for conditions on raw data (e.g.,
WHERE trade_date = '2026-03-28'). Use HAVING for conditions on aggregated results (e.g.,HAVING SUM(notional) > 1000000).
Key Concepts at a Glance¶
| Concept | Summary |
|---|---|
| SELECT … FROM … WHERE | Basic query: choose columns, table, and filter conditions |
| GROUP BY + aggregate functions | Summarize data: COUNT, SUM, AVG, MIN, MAX |
| HAVING | Filter after aggregation (WHERE filters before) |
| INNER JOIN | Only rows that match in both tables |
| LEFT JOIN | All rows from left table, matching rows from right (NULL if no match) |
| FULL OUTER JOIN | All rows from both tables, NULLs where no match |
| Window functions | Calculations across related rows: SUM() OVER, RANK(), LAG(), LEAD() |
| PARTITION BY | Divides rows into groups for window function calculation |
| CTE (WITH clause) | Named subquery for readability and reuse |
| VWAP | SUM(price * quantity) / SUM(quantity) |
| NULLIF(x, 0) | Returns NULL instead of dividing by zero |
| COALESCE(x, default) | Returns the first non-NULL value |
| DECIMAL(12,4) | Exact decimal type for financial amounts |
| INDEX | Speeds up queries by creating lookup structures on columns |
| EXPLAIN ANALYZE | Shows the query execution plan and actual timings |
| UTC timestamps | Always store times in UTC, convert for display |
Sources & Further Reading¶
- Date, C.J., SQL and Relational Theory, O’Reilly
- Molinaro, A., SQL Cookbook, O’Reilly
- PostgreSQL Documentation, Window Functions
- PostgreSQL Documentation, Queries with WITH (CTEs)
- Mode Analytics, SQL Tutorial
- SQLZoo, Interactive SQL Tutorial
- Investopedia, Volume-Weighted Average Price (VWAP)
- Use The Index, Luke, SQL Indexing and Tuning
- Khan Academy, Intro to SQL