Dealing with large amounts of data in production comes with a unique set of challenges. As table sizes grows from millions towards billions of rows, query performance will degrade even for the most finely tuned indexes. If you are not careful, a popular endpoint of your application that struggles with blocking I/O calls to your database can tie up your server’s resources and cause widespread grief among your users.
Consider an enterprise-y application where you allow customers to create storefronts. The most used
and important tables will contain an account_id
. Envision that each online store offers 10,000
unique SKUs on average. If all the inventory items end up in a single table and you have 10,000
customers, there would be 100 million rows in it. Now, shudder at the thought of shipping an
inefficient query into production that ends up doing sequential scans over that entire massive table
during some common business logic.
Something you could do in this scenario is list
partition the table by
account_id
. Here’s a simplified schema:
CREATE TABLE inventory_items (
account_id bigint NOT NULL,
name varchar NOT NULL,
sku varchar NOT NULL,
price_cents int NOT NULL
) PARTITION BY LIST (account_id);
The PARTITION BY LIST (account_id)
clause defines the partition key. If you try to INSERT
a
row into inventory_items
, it will expect a partitioned table to exist for the supplied
account_id
. For example, in order to insert an inventory item with an account_id
of 1, you would
need to create one like so:
CREATE TABLE IF NOT EXISTS inventory_items_account_1
PARTITION OF inventory_items FOR VALUES IN (1);
This means each account’s inventory items will end up in their own table, which results in smaller, isolated tables. Indexes defined on the parent table are created independently for each partition as well.
If you need a unique index for sku
by account_id
, instead of creating a composite UNIQUE INDEX
on a gargantuan single table,
you could instead model it as a primary key constraint:
ALTER TABLE ONLY inventory_items
ADD CONSTRAINT inventory_items_pkey PRIMARY KEY (account_id, sku);
If needed, each individual partition can also have their own indexes, constraints and default values, distinct from others.
This may seem fairly novel and interesting, but may not feel compelling if it’s not easy to incorporate into your web framework of choice. So then, how could one implement this in Ruby on Rails?
First, let’s create the schema for our partitioned inventory_items
table:
class CreateInventroyItems < ActiveRecord::Migration[7.0]
def change
create_table(
:inventory_items,
primary_key: [:account_id, :sku],
options: "PARTITION BY LIST (account_id)",
) do |t|
t.references :account, null: false
t.string :name, null: false
t.string :sku, null: false
t.integer :price_cents, null: false
end
end
end
Note the composite primary key definition - this is the bit that will build the primary key constraint. Support for this landed in Rails in 7.1 and in prior releases, a 3rd party gem can be used.
In lieu of using composite primary keys, you could forego a primary key entirely:
create_table(
:analytic_events,
id: false,
options: "PARTITION BY LIST (account_id)",
)
This could make sense in scenarios where you are just INSERT
-ing with no plans to ever SELECT
or
run operations on specific rows by primary key.
Ultimately, Rails’ default behavior of specifying a
SEQUENCE
to control the
generation of the parent table’s id
column is not enough to ensure primary key uniqueness across
partitioned tables. So if you need a primary key, it must include the partition key as part of it.
Let’s get back to our example. Now that we have our partitioned table, we need to add some logic to
Account
that will know how to create our list partitions when those records are created and
destroyed. Here’s a model concern for that:
module ListPartitionable
extend ActiveSupport::Concern
class_methods do
def manage_list_partition_tables_for(*table_names)
self.list_partition_tables = table_names
end
end
included do
class_attribute :list_partition_tables, default: []
after_create :create_list_partition_tables
after_destroy :drop_list_partition_tables
private def partition_table_name_for(table_name)
"#{table_name}_#{self.class.name.foreign_key.sub(/_id$/, "")}_#{id}"
end
private def create_list_partition_tables
list_partition_tables.each do |table_name|
self.class.connection.execute(
<<~SQL.squish,
CREATE TABLE IF NOT EXISTS #{partition_table_name_for(table_name)}
PARTITION OF #{table_name} FOR VALUES IN (#{id});
SQL
)
end
end
private def drop_list_partition_tables
list_partition_tables.each do |table_name|
self.class.connection.execute(
<<~SQL.squish,
DROP TABLE IF EXISTS #{partition_table_name_for(table_name)};
SQL
)
end
end
end
end
A simplified pair of models for said example would look like this:
class Account < ApplicationRecord
include ListPartitionable
has_many :inventory_items, dependent: nil
manage_list_partition_tables_for :inventory_items
end
class InventoryItem < ApplicationRecord
belongs_to :account
end
If you need to seed the partitioned tables for any preexisting accounts, you can run
Account.find_each(&:create_list_partition_tables)
. Going forward, account creation and destruction
will cascade into partition table management through ActiveRecord
hooks.
Another perk to point out is that there is no need to specify a dependent:
option since an
Account
’s destruction drops its siloed inventory items table. DROP
-ing a table is faster than
doing bulk deletes and negates the need to
VACUUM
deleted tuples.
With your models and tables set up, you can now interact with your ActiveRecord
objects in the
normal ways you’d expect:
account = Account.create!(name: "account")
item = account.inventory_items.create!(
sku: "1234",
name: "dummies guide to rails",
)
account.inventory_items.find_by(sku: "1234").name
#=> "dummies guide to rails"
An EXPLAIN
call on your generated SQL will confirm your query plans, e.g.:
EXPLAIN SELECT "inventory_items".*
FROM "inventory_items"
WHERE "inventory_items"."account_id" = 1
AND "inventory_items"."sku" = "1234"
------------------------------------------
-- Index Scan using inventory_items_account_1_pkey on inventory_items_account_1 inventory_items
-- Index Cond: ((account_id = '1'::bigint) AND ((sku)::text = '1234'::text))
Nice, so that’s it? Almost… Let’s rewind to the
create_table
definition in our migration, and notice the use of options:
. This is appended to the CREATE TABLE
PostgeSQL statement when run
as a migration, but those options aren’t persisted to db/schema.rb
when the schema is dumped. This
means that if you load db/schema.rb
in your test environment (rather than re-running migrations),
your partitioned tables won’t be able to be created correctly.
As of this commit,
Rails now supports dumping list partitioning options to db/schema.rb
. Otherwise, you’ll need to
inject support for it.
# needed for Rails < 7.2.2 or main branch prior to this commit:
# https://github.com/rails/rails/commit/d4df3d5f81344ed07187eaced049f1c59d624c34
module SchemaStatementsPartitionAwareness
def table_options(table_name)
options = {}
if (comment = table_comment(table_name))
options[:comment] = comment
end
if (partition_definition = table_partition_definition(table_name))
options[:options] = "PARTITION BY #{partition_definition}"
end
options
end
def inherited_table?(table_name)
scope = quoted_scope(table_name, type: "BASE TABLE")
!!query_value(<<~SQL.squish, "SCHEMA")
SELECT inhparent::pg_catalog.regclass
FROM pg_catalog.pg_inherits p
LEFT JOIN pg_catalog.pg_class c ON p.inhrelid = c.oid
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname = #{scope[:name]}
AND c.relkind IN (#{scope[:type]})
AND n.nspname = #{scope[:schema]}
SQL
end
def table_partition_definition(table_name)
scope = quoted_scope(table_name, type: "BASE TABLE")
query_value(<<~SQL.squish, "SCHEMA")
SELECT pg_catalog.pg_get_partkeydef(c.oid)
FROM pg_catalog.pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname = #{scope[:name]}
AND c.relkind IN (#{scope[:type]})
AND n.nspname = #{scope[:schema]}
SQL
end
end
ActiveSupport.on_load(:active_record_postgresqladapter) do
ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaStatements.prepend(SchemaStatementsPartitionAwareness)
end
You’ll probably also want to not dump child partitions to your schema.
module SchemaDumperExcludeChildPartitions
def ignored?(table_name)
super || @connection.inherited_table?(table_name)
end
end
ActiveSupport.on_load(:active_record_postgresqladapter) do
ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaDumper.prepend(SchemaDumperExcludeChildPartitions)
end
Alternatively, a simpler way around this is to just set your schema format to :sql
in
config/application.rb
and call it a day:
config.active_record.schema_format = :sql
As with any clever solution that may veer from standard conventions, there are trade-offs to consider.
Native list partitioning planning time is pretty abysmal when it comes to large number of partitions in older versions of PostgreSQL. In version 11.x, I was encountered planning times above 10 seconds with 10,000 partitions, even when specifying the partition key. Upon upgrading to 15.5, that planning time dropped to under 1ms. If you plan on having a large number of partitions, I’d recommend keeping your PG version up to date.
Query performance can be dramatically improved in certain situations, but if you aren’t careful
you can still footgun yourself royally. For example, If we had 10,000 accounts and searched
InventoryItem.where(sku: "1234")
that would result in 10,000 separate queries against each
partition to be merged together. List partitioning is not a good choice if you need to search
across partitions without knowing which partition keys you need, and your ActiveRecord
queries
should be written accordingly.
There is mental overhead required for understanding how list partitioning works that may become
friction if sharing the codebase with a larger team that may not be as database savvy, or are less
comfortable with “hacks” like monkey patches in initializer files to support db/schema.rb
dumping. It may also be fair to consider list partitioning as a premature optimization if you
don’t expect massive amounts of data and the other potential benefits aren’t relevant.
Disclaimers aside, I’ve been running such a list partitioning system at scale in production for a couple weeks now and so far, so good!