Partial Indexes in MySQL: The Trick PostgreSQL Doesn't Want You to Know
Emulating filtered indexes with generated columns and functional key parts - no migration required
PostgreSQL has this nice trick - CREATE INDEX ... WHERE status = 'pending'. The index only stores rows that match the condition. If you have 50 million orders and only 200,000 are pending, your index is tiny. Scans are fast. Disk usage is minimal.
MySQL doesn’t have that syntax. There’s no WHERE clause in CREATE INDEX. And if you google “partial index MySQL,” you’ll mostly find people saying “just switch to PostgreSQL.”
But here’s the thing - since MySQL 8.0, you can get almost the same result. It takes a different path, but the destination is the same: a small index that only contains the rows you care about.
The core idea
Instead of telling the index which rows to include, you tell it what value to store. For rows you don’t want - you store NULL.
MySQL indexes do include NULL entries (unlike PostgreSQL’s partial indexes that skip them entirely), so you won’t get identical disk savings. But the optimizer still benefits massively because the non-NULL keys are clustered together, and your queries only scan that small portion.
The trick works through two mechanisms: functional key parts and generated stored columns.
Approach 1: functional index
Say you have an orders table with 50M rows. Only 0.4% are pending. You want to quickly find pending orders by their ID.
CREATE INDEX idx_pending
ON orders ((IF(status = 'pending', order_id, NULL)));The IF() expression returns the order_id for pending rows and NULL for everything else. The index stores all 50M entries, but only 200K have meaningful (non-NULL) values.
The catch? Your query has to use the exact same expression:
SELECT *
FROM orders
WHERE IF(status = 'pending', order_id, NULL) = 12345;That’s ugly. It works, but it leaks implementation details into your application code. Your ORM won’t like it. Your teammates won’t like it either.
Approach 2: generated column (the better way)
This is what I’d actually recommend for production:
ALTER TABLE orders
ADD COLUMN pending_id BIGINT
AS (IF(status = 'pending', order_id, NULL)) STORED,
ADD INDEX idx_pending_id (pending_id);Now you have a real column - pending_id - that’s automatically maintained by MySQL. It’s order_id for pending rows, NULL for everything else. Your queries look normal:
SELECT *
FROM orders
WHERE pending_id IS NOT NULL;
-- Or find a specific pending order:
SELECT *
FROM orders
WHERE pending_id = 12345;The optimizer picks up idx_pending_id automatically. No query hints needed. No ugly IF() in your WHERE clause. Run EXPLAIN and you’ll see the index scan hitting only the pending rows.
Approach 3: boolean flag + composite
When you need to fetch all pending orders sorted by date:
ALTER TABLE orders
ADD COLUMN is_pending TINYINT(1)
AS (status = 'pending') STORED,
ADD INDEX idx_pending_date (is_pending, created_at);Now SELECT * FROM orders WHERE is_pending = 1 ORDER BY created_at DESC uses an index range scan. The index is compact because most rows have is_pending = 0, and the optimizer skips those efficiently.
The gotchas
NULLs are still indexed. This is the biggest difference from PostgreSQL. A true partial index in Postgres with WHERE status = 'pending' physically excludes non-matching rows. MySQL’s approach stores NULL keys, so the index is larger. For a 50M row table with 200K pending rows, PostgreSQL’s partial index stores 200K entries. MySQL’s stores 50M entries (most of them NULL). Still way faster than a full index on status, but not as compact.
Exact expression match matters for functional indexes. If your index uses IF(status = 'pending', order_id, NULL) but your query uses WHERE status = 'pending', the optimizer won’t connect the dots. That’s why the generated column approach is better - it gives you a clean column name to query against.
STORED, not VIRTUAL. Generated columns can be VIRTUAL (computed on read) or STORED (physically saved). Only STORED columns can be indexed. The STORED keyword means MySQL maintains the value on every INSERT and UPDATE, so there’s a small write overhead.
Run ANALYZE TABLE after. The optimizer needs fresh statistics to know about your new index. Always ANALYZE TABLE orders; after adding the index.
When this pays off
The sweet spot is clear: a column with low cardinality where one value is rare and you query for it frequently. Some classic examples:
status = 'pending'on an orders table (0.5% of rows)is_deleted = 1on a soft-delete table (0.1% of rows)needs_review = trueon a content moderation queueretry_count > 0on a job queue
If the rare value is more than 15-20% of the table, a regular index works fine. The partial approach shines when you’re indexing the needle, not the haystack.
Quick validation
After creating the index, always verify:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE pending_id = 12345;You should see your idx_pending_id in the output, with a low row estimate. If EXPLAIN shows a full table scan, check that you ran ANALYZE TABLE and that your WHERE clause matches the generated column - not the original status column.
If you want to take this further, combine it with covering indexes - add the columns you SELECT to the index so MySQL never touches the table at all.
Found this useful? Subscribe to Podo Stack for weekly database patterns, Kubernetes internals, and Cloud Native tools ripe for production.
Still running full indexes on columns where 99% of rows don’t matter? Try the generated column approach - it takes one ALTER TABLE and the improvement shows up immediately in EXPLAIN.


