Build a financial market data analytics platform. Ingest stock price ticks, compute aggregates, and power live dashboards — all on your Ferrosa cluster.
|
Note
|
Complete the 3-Node Cluster Setup tutorial first, or have a running Ferrosa cluster accessible on localhost:9042.
|
The Use Case
A financial market data platform
You are building the backend for a financial data platform — think a simplified Bloomberg terminal or a stock tracking application. The system needs to handle three things:
-
Ingest raw price ticks. Every time a stock trades on an exchange, you receive a tick containing the symbol, price, volume, and timestamp. During market hours, popular stocks like AAPL can generate hundreds of ticks per second. Across thousands of listed securities, you are looking at tens of thousands of writes per second.
-
Serve historical data. Traders and analysts need to view price history for any stock over any time window — the last 5 minutes, the last hour, or the entire trading day.
-
Power live dashboards. Charting tools display OHLC candles (Open, High, Low, Close) at various intervals — 1-minute, 5-minute, hourly. These aggregations need to be fast because charts refresh in real time.
This workload is overwhelmingly write-heavy during market hours. A relational database with traditional indexes would fall behind on ingestion. And computing OHLC aggregates on the fly from raw ticks — potentially millions of rows — would make dashboard queries unacceptably slow.
The solution is a two-table approach: store the raw ticks for history, and maintain pre-computed candle aggregations for dashboards. Ferrosa handles both write patterns effortlessly because every write is a fast append to the commit log.
Design the Schema
Create the keyspace
CREATE KEYSPACE IF NOT EXISTS market_data WITH replication = {
'class': 'SimpleStrategy', 'replication_factor': 3
};
We use SimpleStrategy with replication factor 3, just like the IoT tutorial.
Every piece of market data is stored on all three nodes in your cluster.
The raw ticks table
This table stores every individual trade as it happens. It is the source of truth for price history.
symbol text,
trade_date date,
trade_time timestamp,
price decimal,
volume bigint,
exchange text,
PRIMARY KEY ((symbol, trade_date), trade_time)
) WITH CLUSTERING ORDER BY (trade_time DESC);
There is something new here compared to the IoT tutorial: a composite partition key.
Notice the double parentheses around (symbol, trade_date).
That means both the symbol and the date together form the partition key.
Why not just use symbol as the partition key?
Because a single stock can generate millions of ticks per year.
If all of AAPL’s ticks went into one partition, that partition would grow without bound.
By including trade_date in the partition key, each partition holds exactly one stock’s ticks for one day.
A busy stock might generate 50,000 ticks in a day — a very manageable partition size.
The tradeoff is that every query must specify both symbol and trade_date.
For a financial application, this is natural — you almost always look at one stock on one day.
| Column | Type | Purpose |
|---|---|---|
|
|
Stock ticker (e.g., "AAPL", "GOOGL") |
|
|
Trading day — part of partition key for bucketing |
|
|
Exact time of the trade, down to the millisecond |
|
|
Trade price (decimal avoids floating-point errors) |
|
|
Number of shares in this trade |
|
|
Which exchange executed the trade (NYSE, NASDAQ) |
|
Tip
|
Why decimal for prices? Financial applications should never use float or double for money. Floating-point arithmetic can produce rounding errors — for example, 0.1 + 0.2 equals 0.30000000000000004 in most languages. The decimal type stores exact values, just like your calculator would.
|
The OHLC candles table
OHLC stands for Open, High, Low, Close — the four prices that define a candle on a stock chart. Each candle summarizes all the trades in a time interval (typically 1 minute, 5 minutes, or 1 hour).
Computing these from raw ticks on every dashboard refresh would be far too slow. Instead, your application computes them as ticks arrive and writes the results to a pre-aggregated table:
symbol text,
candle_date date,
candle_time timestamp,
open_price decimal,
high_price decimal,
low_price decimal,
close_price decimal,
volume bigint,
trade_count int,
PRIMARY KEY ((symbol, candle_date), candle_time)
) WITH CLUSTERING ORDER BY (candle_time DESC);
This table uses the same composite partition key pattern as the ticks table: (symbol, candle_date).
A full trading day has 390 one-minute candles (6.5 hours of market time), so each partition is tiny — just 390 rows.
Dashboard queries return instantly.
The daily summary table
Let’s also create a daily summary table for end-of-day reports:
CREATE TABLE IF NOT EXISTS daily_summary (
symbol text,
trade_date date,
open_price decimal,
high_price decimal,
low_price decimal,
close_price decimal,
total_volume bigint,
trade_count int,
vwap decimal,
PRIMARY KEY (symbol, trade_date)
) WITH CLUSTERING ORDER BY (trade_date DESC);
The vwap column stands for Volume Weighted Average Price — a common metric traders use to assess whether they got a good fill price.
With symbol as the partition key and trade_date as the clustering key, you can query any stock’s daily history across multiple days in a single read.
Insert Sample Data
Insert raw ticks
Let’s simulate a morning of trading. We will insert ticks for three stocks: AAPL, GOOGL, and MSFT. In production, these would arrive from a market data feed at thousands per second. Here we will insert them manually to see the data shape.
USE market_data;
-- AAPL ticks
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:30:00.000', 195.25, 15000, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:30:15.230', 195.30, 8500, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:30:32.100', 195.18, 12000, 'NYSE');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:30:45.500', 195.42, 20000, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:31:02.800', 195.50, 9200, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:31:18.450', 195.38, 11000, 'NYSE');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:31:33.920', 195.55, 7800, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:31:50.100', 195.60, 14500, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:32:05.330', 195.48, 6300, 'NYSE');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:32:22.750', 195.65, 18000, 'NASDAQ');
-- GOOGL ticks
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('GOOGL', '2024-06-15', '2024-06-15 09:30:00.000', 178.90, 22000, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('GOOGL', '2024-06-15', '2024-06-15 09:30:20.400', 178.85, 13500, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('GOOGL', '2024-06-15', '2024-06-15 09:30:41.800', 179.10, 9800, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('GOOGL', '2024-06-15', '2024-06-15 09:31:05.200', 179.25, 17000, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('GOOGL', '2024-06-15', '2024-06-15 09:31:28.600', 179.05, 11200, 'NYSE');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('GOOGL', '2024-06-15', '2024-06-15 09:31:50.900', 179.30, 8400, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('GOOGL', '2024-06-15', '2024-06-15 09:32:10.350', 179.45, 15600, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('GOOGL', '2024-06-15', '2024-06-15 09:32:35.100', 179.50, 19000, 'NASDAQ');
-- MSFT ticks
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:30:00.000', 442.10, 25000, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:30:18.700', 442.25, 12000, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:30:35.400', 442.00, 16500, 'NYSE');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:30:52.100', 442.50, 19800, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:31:10.600', 442.75, 8900, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:31:30.200', 442.60, 14200, 'NYSE');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:31:48.800', 442.80, 21000, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:32:05.500', 443.00, 17500, 'NASDAQ');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:32:22.300', 442.90, 10600, 'NYSE');
INSERT INTO ticks (symbol, trade_date, trade_time, price, volume, exchange)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:32:40.900', 443.15, 23000, 'NASDAQ');
-- AAPL 1-minute candles
INSERT INTO candles_1m (symbol, candle_date, candle_time, open_price, high_price, low_price, close_price, volume, trade_count)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:30:00', 195.25, 195.42, 195.18, 195.42, 55500, 4);
INSERT INTO candles_1m (symbol, candle_date, candle_time, open_price, high_price, low_price, close_price, volume, trade_count)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:31:00', 195.50, 195.60, 195.38, 195.60, 33500, 3);
INSERT INTO candles_1m (symbol, candle_date, candle_time, open_price, high_price, low_price, close_price, volume, trade_count)
VALUES ('AAPL', '2024-06-15', '2024-06-15 09:32:00', 195.48, 195.65, 195.48, 195.65, 24300, 2);
-- GOOGL 1-minute candles
INSERT INTO candles_1m (symbol, candle_date, candle_time, open_price, high_price, low_price, close_price, volume, trade_count)
VALUES ('GOOGL', '2024-06-15', '2024-06-15 09:30:00', 178.90, 179.10, 178.85, 179.10, 45300, 3);
INSERT INTO candles_1m (symbol, candle_date, candle_time, open_price, high_price, low_price, close_price, volume, trade_count)
VALUES ('GOOGL', '2024-06-15', '2024-06-15 09:31:00', 179.25, 179.30, 179.05, 179.30, 36600, 3);
INSERT INTO candles_1m (symbol, candle_date, candle_time, open_price, high_price, low_price, close_price, volume, trade_count)
VALUES ('GOOGL', '2024-06-15', '2024-06-15 09:32:00', 179.45, 179.50, 179.45, 179.50, 34600, 2);
-- MSFT 1-minute candles
INSERT INTO candles_1m (symbol, candle_date, candle_time, open_price, high_price, low_price, close_price, volume, trade_count)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:30:00', 442.10, 442.50, 442.00, 442.50, 73300, 4);
INSERT INTO candles_1m (symbol, candle_date, candle_time, open_price, high_price, low_price, close_price, volume, trade_count)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:31:00', 442.75, 442.80, 442.60, 442.80, 44100, 3);
INSERT INTO candles_1m (symbol, candle_date, candle_time, open_price, high_price, low_price, close_price, volume, trade_count)
VALUES ('MSFT', '2024-06-15', '2024-06-15 09:32:00', 443.00, 443.15, 442.90, 443.15, 51100, 3);
-- Daily summaries
INSERT INTO daily_summary (symbol, trade_date, open_price, high_price, low_price, close_price, total_volume, trade_count, vwap)
VALUES ('AAPL', '2024-06-14', 194.50, 196.20, 194.10, 195.80, 45200000, 128500, 195.42);
INSERT INTO daily_summary (symbol, trade_date, open_price, high_price, low_price, close_price, total_volume, trade_count, vwap)
VALUES ('AAPL', '2024-06-13', 193.80, 195.00, 193.20, 194.50, 38700000, 112000, 194.15);
INSERT INTO daily_summary (symbol, trade_date, open_price, high_price, low_price, close_price, total_volume, trade_count, vwap)
VALUES ('GOOGL', '2024-06-14', 177.50, 179.30, 177.10, 178.90, 28400000, 95200, 178.35);
INSERT INTO daily_summary (symbol, trade_date, open_price, high_price, low_price, close_price, total_volume, trade_count, vwap)
VALUES ('MSFT', '2024-06-14', 440.00, 443.50, 439.80, 442.10, 32100000, 108000, 441.85);
Insert OHLC candles
In production, your application would compute these candles from the raw ticks in real time. Here, we will insert them directly. Each candle summarizes one minute of trading.
Insert daily summaries
The daily summaries provide end-of-day aggregate data for multi-day analysis.
Query Patterns
Get the latest trades for a stock
A trader wants to see what just happened with AAPL. This query returns the 10 most recent ticks for today:
SELECT trade_time, price, volume, exchange
FROM ticks
WHERE symbol = 'AAPL'
AND trade_date = '2024-06-15'
LIMIT 10;
Because the partition key is (symbol, trade_date), Ferrosa knows exactly which partition to read.
And because we used CLUSTERING ORDER BY (trade_time DESC), the newest ticks come first without any sorting overhead.
Price history for a time window
An analyst wants to see all AAPL trades during the first two minutes of trading:
SELECT trade_time, price, volume, exchange
FROM ticks
WHERE symbol = 'AAPL'
AND trade_date = '2024-06-15'
AND trade_time >= '2024-06-15 09:30:00'
AND trade_time <= '2024-06-15 09:32:00';
The clustering key (trade_time) supports range queries natively.
Ferrosa seeks directly to the start of the range and reads forward.
No index scan, no sort — just a sequential read of the data that is already stored in order.
OHLC candles for charting
A charting library on the frontend needs 1-minute candle data to draw a candlestick chart. This is the query that powers the chart:
SELECT candle_time, open_price, high_price, low_price, close_price, volume
FROM candles_1m
WHERE symbol = 'AAPL'
AND candle_date = '2024-06-15';
This returns all candles for AAPL on June 15th, newest first. A full trading day has 390 candles — small enough to send to the frontend in a single response. The chart renders instantly because the data is already aggregated.
Compare candles across stocks
Want to compare the first three minutes of AAPL and MSFT? Run the same candle query for each symbol. In your application, you would execute these in parallel:
-- AAPL candles
SELECT candle_time, open_price, close_price, volume
FROM candles_1m
WHERE symbol = 'AAPL'
AND candle_date = '2024-06-15';
-- MSFT candles
SELECT candle_time, open_price, close_price, volume
FROM candles_1m
WHERE symbol = 'MSFT'
AND candle_date = '2024-06-15';
Each query hits a different partition, so they can run concurrently on different nodes. This is one of the strengths of the Cassandra data model — queries that touch different partitions scale horizontally by nature.
Volume analysis
Which minutes had the most trading volume for MSFT?
Query the candle data and sort by volume in your application, or simply look at the volume column:
SELECT candle_time, volume, trade_count
FROM candles_1m
WHERE symbol = 'MSFT'
AND candle_date = '2024-06-15';
The first minute of trading (09:30) typically has the highest volume as overnight orders fill — and our sample data reflects that pattern with 73,300 shares traded.
Multi-day price history
For longer-term analysis, query the daily summary table.
Because symbol is the partition key and trade_date is the clustering key, you can pull a stock’s entire history in one query:
SELECT trade_date, open_price, close_price, total_volume, vwap
FROM daily_summary
WHERE symbol = 'AAPL';
This returns every daily summary for AAPL, newest first.
Add a LIMIT clause if you only need the last few days.
What You Learned
You have built a complete market data analytics backend. Here is what you practiced:
-
Composite partition keys for bounded partitions. Using
(symbol, trade_date)as the partition key keeps each partition to one stock’s ticks for one day. Partitions stay small and predictable, even as the total dataset grows to billions of rows. -
Pre-computed aggregations for fast dashboards. Instead of computing OHLC candles from raw ticks on every query, you maintain a
candles_1mtable with pre-aggregated data. This is the "write more, read fast" philosophy of the Cassandra data model. -
Multiple tables for multiple access patterns. Raw ticks for detailed history, 1-minute candles for charts, daily summaries for reports — three tables serving three different query patterns, all from the same underlying data.
-
Using
decimalfor financial data. Floating-point types introduce rounding errors that are unacceptable in finance. Thedecimaltype stores exact values. -
Descending clustering order for "latest first" queries. By sorting newest trades first on disk, the most common query pattern — "show me what just happened" — reads the first few rows of the partition with zero sorting overhead.
Next, try the E-Commerce Platform tutorial to learn about collection types, denormalized query tables, and the update/delete patterns used in shopping cart and order workflows.