Content Personalization

Build a recommendation engine backend. Track user preferences, content interactions, and viewing history to deliver personalized experiences.

Note	This tutorial assumes you have a running 3-node Ferrosa cluster. If you haven’t set one up yet, follow the 3-Node Cluster Setup guide first — it only takes about 5 minutes.

Create the Keyspace

Personalization data needs to be fast and always available. When a user opens your app, they expect to see their recommendations instantly — not a loading spinner. Let’s create a keyspace with full replication across all three nodes.

With RF=3, every user’s preference data is available even if a node goes down. In a streaming service, downtime means lost engagement — so redundancy is worth it.

Design the Schema

A personalization engine needs four things: who the user is, what content exists, what the user has watched, and what to recommend next. One table for each.

Users

User profiles store preferences that drive recommendations. We use a set for genres because users can love multiple categories.

Content

The content catalog. Every movie, series, and documentary in the library. We use a set for tags so you can attach multiple descriptors to each title.

Watch History

Every time a user watches something, we record it. Partitioned by user, clustered by time descending. We denormalize the title into this table so we can display the history without joining to the content table.

Tip	In Ferrosa (and Cassandra-style databases), joins are expensive. By storing the title in the watch history table, we can display "Continue Watching" lists with a single partition read. The small storage cost is worth the massive speed gain.

Recommendations

Pre-computed recommendations for each user, ranked by a relevance score. Your recommendation algorithm (collaborative filtering, content-based, etc.) writes to this table, and the app reads from it.

Four tables, each serving a specific screen in the app: profile, catalog, history, and the "For You" page.

The full schema (keyspace, tables, and index):

CREATE KEYSPACE IF NOT EXISTS personalization
  WITH replication = {
    'class': 'SimpleStrategy',
    'replication_factor': 3
  };

USE personalization;

CREATE TABLE users (
    user_id             uuid PRIMARY KEY,
    name                text,
    email               text,
    signup_date         date,
    preferred_genres    set<text>,
    preferred_language  text
);

CREATE TABLE content (
    content_id    uuid PRIMARY KEY,
    title         text,
    content_type  text,       -- movie, series, documentary
    genre         text,
    release_year  int,
    rating_avg    float,
    tags          set<text>
);

CREATE TABLE watch_history (
    user_id       uuid,
    watched_at    timestamp,
    content_id    uuid,
    title         text,
    progress_pct  int,
    rating        int,
    PRIMARY KEY (user_id, watched_at)
) WITH CLUSTERING ORDER BY (watched_at DESC);

CREATE TABLE recommendations (
    user_id     uuid,
    rank        int,
    content_id  uuid,
    title       text,
    score       float,
    reason      text,
    PRIMARY KEY (user_id, rank)
);

CREATE INDEX idx_content_genre ON content (genre);

Create User Profiles

Let’s create three users with different taste profiles. These preferences will drive the recommendations later.

Lena loves sci-fi and documentaries. Marco is into comedies and dramas. Yuki gravitates toward anime and action. Three very different taste profiles that will produce very different recommendations.

Add Content Catalog

Let’s populate the catalog with a mix of content types and genres. Ten titles across sci-fi, comedy, drama, anime, documentary, and more. Each has tags that help the recommendation engine find similar content.

Record Watch History

Every play event gets recorded. The progress_pct field lets you build "Continue Watching" rows, and the rating feeds back into the recommendation algorithm.

Fifteen watch events across three users. Notice that Lena stopped "The Last Algorithm" at 60% and "The Silk Road Diaries" at 35% — those are prime candidates for the "Continue Watching" row. A rating of 0 means the user hasn’t rated it yet.

Store Recommendations

In production, a recommendation algorithm would compute these scores. For this tutorial, we will insert them directly. Each user gets a ranked list with a human-readable reason explaining why the content was recommended.

Each recommendation includes a reason that explains why it was suggested. This transparency builds trust — users engage more when they understand why something was recommended to them.

All data (users, content, watch history, and recommendations):

USE personalization;

-- User profiles
INSERT INTO users (user_id, name, email, signup_date, preferred_genres, preferred_language)
VALUES (aaaa1111-1111-1111-1111-111111111111,
        'Lena Johansson', 'lena@example.com', '2025-08-10',
        {'sci-fi', 'thriller', 'documentary'}, 'en');

INSERT INTO users (user_id, name, email, signup_date, preferred_genres, preferred_language)
VALUES (bbbb2222-2222-2222-2222-222222222222,
        'Marco Rossi', 'marco@example.com', '2025-11-22',
        {'comedy', 'drama', 'romance'}, 'it');

INSERT INTO users (user_id, name, email, signup_date, preferred_genres, preferred_language)
VALUES (cccc3333-3333-3333-3333-333333333333,
        'Yuki Tanaka', 'yuki@example.com', '2026-01-05',
        {'anime', 'sci-fi', 'action'}, 'ja');

-- Content catalog
INSERT INTO content (content_id, title, content_type, genre, release_year, rating_avg, tags)
VALUES (uuid(), 'Stellar Drift', 'movie', 'sci-fi', 2025, 4.5, {'space', 'exploration', 'award-winner'});

INSERT INTO content (content_id, title, content_type, genre, release_year, rating_avg, tags)
VALUES (uuid(), 'The Last Algorithm', 'series', 'thriller', 2026, 4.7, {'tech', 'suspense', 'binge-worthy'});

INSERT INTO content (content_id, title, content_type, genre, release_year, rating_avg, tags)
VALUES (uuid(), 'Ocean Depths', 'documentary', 'documentary', 2025, 4.8, {'nature', 'marine-biology', 'stunning-visuals'});

INSERT INTO content (content_id, title, content_type, genre, release_year, rating_avg, tags)
VALUES (uuid(), 'Roman Holiday Redux', 'movie', 'romance', 2026, 3.9, {'italy', 'travel', 'feel-good'});

INSERT INTO content (content_id, title, content_type, genre, release_year, rating_avg, tags)
VALUES (uuid(), 'Laughing Matters', 'series', 'comedy', 2025, 4.2, {'standup', 'improv', 'light-hearted'});

INSERT INTO content (content_id, title, content_type, genre, release_year, rating_avg, tags)
VALUES (uuid(), 'Neon Ronin', 'series', 'anime', 2026, 4.6, {'cyberpunk', 'action', 'japanese'});

INSERT INTO content (content_id, title, content_type, genre, release_year, rating_avg, tags)
VALUES (uuid(), 'Quantum Entangled', 'movie', 'sci-fi', 2024, 4.1, {'physics', 'mind-bending', 'indie'});

INSERT INTO content (content_id, title, content_type, genre, release_year, rating_avg, tags)
VALUES (uuid(), 'The Silk Road Diaries', 'documentary', 'documentary', 2025, 4.4, {'history', 'travel', 'cultural'});

INSERT INTO content (content_id, title, content_type, genre, release_year, rating_avg, tags)
VALUES (uuid(), 'Midnight in Milan', 'movie', 'drama', 2026, 4.3, {'italy', 'family', 'emotional'});

INSERT INTO content (content_id, title, content_type, genre, release_year, rating_avg, tags)
VALUES (uuid(), 'Blade Circuit', 'movie', 'action', 2025, 3.8, {'martial-arts', 'futuristic', 'fast-paced'});

-- Watch history: Lena Johansson (sci-fi and documentary fan)
INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (aaaa1111-1111-1111-1111-111111111111, '2026-03-10 20:00:00', uuid(), 'Stellar Drift', 100, 5);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (aaaa1111-1111-1111-1111-111111111111, '2026-03-12 19:30:00', uuid(), 'The Last Algorithm', 60, 0);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (aaaa1111-1111-1111-1111-111111111111, '2026-03-14 21:00:00', uuid(), 'Ocean Depths', 100, 5);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (aaaa1111-1111-1111-1111-111111111111, '2026-03-16 20:15:00', uuid(), 'Quantum Entangled', 100, 4);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (aaaa1111-1111-1111-1111-111111111111, '2026-03-17 22:00:00', uuid(), 'The Silk Road Diaries', 35, 0);

-- Watch history: Marco Rossi (comedy and drama fan)
INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (bbbb2222-2222-2222-2222-222222222222, '2026-03-11 18:00:00', uuid(), 'Laughing Matters', 100, 4);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (bbbb2222-2222-2222-2222-222222222222, '2026-03-13 20:30:00', uuid(), 'Midnight in Milan', 100, 5);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (bbbb2222-2222-2222-2222-222222222222, '2026-03-15 19:00:00', uuid(), 'Roman Holiday Redux', 100, 3);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (bbbb2222-2222-2222-2222-222222222222, '2026-03-16 21:00:00', uuid(), 'Ocean Depths', 45, 0);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (bbbb2222-2222-2222-2222-222222222222, '2026-03-17 20:00:00', uuid(), 'The Last Algorithm', 25, 0);

-- Watch history: Yuki Tanaka (anime and action fan)
INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (cccc3333-3333-3333-3333-333333333333, '2026-03-09 22:00:00', uuid(), 'Neon Ronin', 100, 5);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (cccc3333-3333-3333-3333-333333333333, '2026-03-11 21:30:00', uuid(), 'Blade Circuit', 100, 4);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (cccc3333-3333-3333-3333-333333333333, '2026-03-14 20:00:00', uuid(), 'Stellar Drift', 100, 4);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (cccc3333-3333-3333-3333-333333333333, '2026-03-16 19:00:00', uuid(), 'Quantum Entangled', 80, 0);

INSERT INTO watch_history (user_id, watched_at, content_id, title, progress_pct, rating)
VALUES (cccc3333-3333-3333-3333-333333333333, '2026-03-17 21:00:00', uuid(), 'The Last Algorithm', 100, 5);

-- Recommendations for Lena (sci-fi, documentary fan)
INSERT INTO recommendations (user_id, rank, content_id, title, score, reason)
VALUES (aaaa1111-1111-1111-1111-111111111111, 1, uuid(), 'Neon Ronin', 0.94, 'Fans of Stellar Drift also loved this cyberpunk epic');

INSERT INTO recommendations (user_id, rank, content_id, title, score, reason)
VALUES (aaaa1111-1111-1111-1111-111111111111, 2, uuid(), 'Blade Circuit', 0.87, 'Action sci-fi with themes you enjoy');

INSERT INTO recommendations (user_id, rank, content_id, title, score, reason)
VALUES (aaaa1111-1111-1111-1111-111111111111, 3, uuid(), 'Midnight in Milan', 0.72, 'Highly rated drama trending this week');

-- Recommendations for Marco (comedy, drama fan)
INSERT INTO recommendations (user_id, rank, content_id, title, score, reason)
VALUES (bbbb2222-2222-2222-2222-222222222222, 1, uuid(), 'The Silk Road Diaries', 0.91, 'You loved Italian settings — explore more cultural stories');

INSERT INTO recommendations (user_id, rank, content_id, title, score, reason)
VALUES (bbbb2222-2222-2222-2222-222222222222, 2, uuid(), 'Stellar Drift', 0.83, 'Award-winning film popular with drama fans');

INSERT INTO recommendations (user_id, rank, content_id, title, score, reason)
VALUES (bbbb2222-2222-2222-2222-222222222222, 3, uuid(), 'Neon Ronin', 0.68, 'Trending series with strong character development');

-- Recommendations for Yuki (anime, action fan)
INSERT INTO recommendations (user_id, rank, content_id, title, score, reason)
VALUES (cccc3333-3333-3333-3333-333333333333, 1, uuid(), 'Ocean Depths', 0.89, 'Visually stunning — similar to anime art you enjoy');

INSERT INTO recommendations (user_id, rank, content_id, title, score, reason)
VALUES (cccc3333-3333-3333-3333-333333333333, 2, uuid(), 'Laughing Matters', 0.76, 'Light palate cleanser between action binges');

INSERT INTO recommendations (user_id, rank, content_id, title, score, reason)
VALUES (cccc3333-3333-3333-3333-333333333333, 3, uuid(), 'Roman Holiday Redux', 0.61, 'Popular with users who have similar taste profiles');

INSERT INTO recommendations (user_id, rank, content_id, title, score, reason)
VALUES (cccc3333-3333-3333-3333-333333333333, 4, uuid(), 'Midnight in Milan', 0.55, 'Emotional storytelling with cinematic visuals');

Power the Personalization Engine

These are the queries that power the different screens in a streaming app. Each one is a single-partition read — fast enough for real-time rendering.

User’s Watch History

The "Recently Watched" row on the home screen. Most recent first, thanks to the descending clustering order.

Recommendations for a User

The "For You" section. Recommendations are pre-sorted by rank, so this query returns them in the right order with no extra work.

Content by Genre

Browsing by category. We use the secondary index on genre to make this query work without ALLOW FILTERING.

Important

Secondary indexes work well for low-cardinality columns like genre (where there are a limited number of distinct values). Avoid them on high-cardinality columns like user_id or email — for those, create a dedicated lookup table instead.

Top-Rated Content

Build a "Top Picks" row by filtering for highly rated content.

Update Viewing Progress

When a user pauses a movie, update their progress so they can resume later.

Find Users with Similar Taste

Collaborative filtering starts with finding users who share genre preferences. This query finds everyone who likes sci-fi.

Both Lena and Yuki will show up here — which is why the recommendation engine suggested some of the same titles to both of them.

All queries (selects, updates, and filters):

USE personalization;

-- Lena's recent viewing activity
SELECT watched_at, title, progress_pct, rating
FROM watch_history
WHERE user_id = aaaa1111-1111-1111-1111-111111111111
LIMIT 10;

-- Yuki's personalized recommendations
SELECT rank, title, score, reason
FROM recommendations
WHERE user_id = cccc3333-3333-3333-3333-333333333333;

-- Browse all sci-fi content
SELECT title, content_type, release_year, rating_avg
FROM content
WHERE genre = 'sci-fi';

-- All content rated 4.5 or higher
SELECT title, content_type, genre, rating_avg
FROM content
WHERE rating_avg >= 4.5
ALLOW FILTERING;

-- Marco paused Ocean Depths at 45%, now resuming to 78%
UPDATE watch_history
SET progress_pct = 78
WHERE user_id = bbbb2222-2222-2222-2222-222222222222
  AND watched_at = '2026-03-16 21:00:00';

-- Find users who like sci-fi
SELECT user_id, name, preferred_genres
FROM users
WHERE preferred_genres CONTAINS 'sci-fi'
ALLOW FILTERING;

What You Learned

You just built the backend for a content recommendation engine. Here is what you accomplished:

Designed a denormalized schema where titles are duplicated across tables to avoid expensive joins
Used CQL collection types (sets) to model multi-valued preferences like genres and tags
Built a watch history table with descending time order for instant "Recently Watched" queries
Created a pre-ranked recommendations table that serves the "For You" page in a single read
Used a secondary index on genre for catalog browsing, and learned when indexes are appropriate
Queried set membership with CONTAINS to find users with overlapping taste profiles

This architecture powers real streaming services. The key insight is that every screen in the app maps to a single-partition query. Ferrosa handles millions of users this way because each query touches exactly one partition — no matter how large the cluster grows.