Way Mondo

PostgreSQL Native List Partitioning in Rails

A Pattern for ActiveRecord

February 18, 2024

Dealing with large amounts of data in production comes with a unique set of challenges. As table sizes grows from millions towards billions of rows, query performance will degrade even for the most finely tuned indexes. If you are not careful, a popular endpoint of your application that struggles with blocking I/O calls to your database can tie up your server’s resources and cause widespread grief among your users.

Consider an enterprise-y application where you allow customers to create storefronts. The most used and important tables will contain an account_id. Envision that each online store offers 10,000 unique SKUs on average. If all the inventory items end up in a single table and you have 10,000 customers, there would be 100 million rows in it. Now, shudder at the thought of shipping an inefficient query into production that ends up doing sequential scans over that entire massive table during some common business logic.

Something you could do in this scenario is list partition the table by account_id. Here’s a simplified schema:

CREATE TABLE inventory_items (
  account_id bigint NOT NULL,
  name varchar NOT NULL,
  sku varchar NOT NULL,
  price_cents int NOT NULL
) PARTITION BY LIST (account_id);

The PARTITION BY LIST (account_id) clause defines the partition key. If you try to INSERT a row into inventory_items, it will expect a partitioned table to exist for the supplied account_id. For example, in order to insert an inventory item with an account_id of 1, you would need to create one like so:

CREATE TABLE IF NOT EXISTS inventory_items_account_1
  PARTITION OF inventory_items FOR VALUES IN (1);

This means each account’s inventory items will end up in their own table, which results in smaller, isolated tables. Indexes defined on the parent table are created independently for each partition as well.

If you need a unique index for sku by account_id, instead of creating a composite UNIQUE INDEX on a gargantuan single table, you could instead model it as a primary key constraint:

ALTER TABLE ONLY inventory_items
  ADD CONSTRAINT inventory_items_pkey PRIMARY KEY (account_id, sku);

If needed, each individual partition can also have their own indexes, constraints and default values, distinct from others.

This may seem fairly novel and interesting, but may not feel compelling if it’s not easy to incorporate into your web framework of choice. So then, how could one implement this in Ruby on Rails?

First, let’s create the schema for our partitioned inventory_items table:

class CreateInventroyItems < ActiveRecord::Migration[7.0]
  def change
    create_table(
      :inventory_items,
      primary_key: [:account_id, :sku],
      options: "PARTITION BY LIST (account_id)",
    ) do |t|
      t.references :account, null: false
      t.string :name, null: false
      t.string :sku, null: false
      t.integer :price_cents, null: false
    end
  end
end

Note the composite primary key definition - this is the bit that will build the primary key constraint. Support for this landed in Rails in 7.1 and in prior releases, a 3rd party gem can be used.

In lieu of using composite primary keys, you could forego a primary key entirely:

create_table(
  :analytic_events,
  id: false,
  options: "PARTITION BY LIST (account_id)",
)

This could make sense in scenarios where you are just INSERT-ing with no plans to ever SELECT or run operations on specific rows by primary key.

Ultimately, Rails’ default behavior of specifying a SEQUENCE to control the generation of the parent table’s id column is not enough to ensure primary key uniqueness across partitioned tables. So if you need a primary key, it must include the partition key as part of it.

Let’s get back to our example. Now that we have our partitioned table, we need to add some logic to Account that will know how to create our list partitions when those records are created and destroyed. Here’s a model concern for that:

module ListPartitionable
  extend ActiveSupport::Concern

  class_methods do
    def manage_list_partition_tables_for(*table_names)
      self.list_partition_tables = table_names
    end
  end

  included do
    class_attribute :list_partition_tables, default: []

    after_create :create_list_partition_tables
    after_destroy :drop_list_partition_tables

    private def partition_table_name_for(table_name)
      "#{table_name}_#{self.class.name.foreign_key.sub(/_id$/, "")}_#{id}"
    end

    private def create_list_partition_tables
      list_partition_tables.each do |table_name|
        self.class.connection.execute(
          <<~SQL.squish,
            CREATE TABLE IF NOT EXISTS #{partition_table_name_for(table_name)}
              PARTITION OF #{table_name} FOR VALUES IN (#{id});
          SQL
        )
      end
    end

    private def drop_list_partition_tables
      list_partition_tables.each do |table_name|
        self.class.connection.execute(
          <<~SQL.squish,
            DROP TABLE IF EXISTS #{partition_table_name_for(table_name)};
          SQL
        )
      end
    end
  end
end

A simplified pair of models for said example would look like this:

class Account < ApplicationRecord
  include ListPartitionable
  has_many :inventory_items, dependent: nil
  manage_list_partition_tables_for :inventory_items
end

class InventoryItem < ApplicationRecord
  belongs_to :account
end

If you need to seed the partitioned tables for any preexisting accounts, you can run Account.find_each(&:create_list_partition_tables). Going forward, account creation and destruction will cascade into partition table management through ActiveRecord hooks.

Another perk to point out is that there is no need to specify a dependent: option since an Account’s destruction drops its siloed inventory items table. DROP-ing a table is faster than doing bulk deletes and negates the need to VACUUM deleted tuples.

With your models and tables set up, you can now interact with your ActiveRecord objects in the normal ways you’d expect:

account = Account.create!(name: "account")
item = account.inventory_items.create!(
  sku: "1234",
  name: "dummies guide to rails",
)
account.inventory_items.find_by(sku: "1234").name
#=> "dummies guide to rails"

An EXPLAIN call on your generated SQL will confirm your query plans, e.g.:

EXPLAIN SELECT "inventory_items".*
  FROM "inventory_items"
  WHERE "inventory_items"."account_id" = 1
  AND "inventory_items"."sku" = "1234"
------------------------------------------
-- Index Scan using inventory_items_account_1_pkey on inventory_items_account_1 inventory_items
--   Index Cond: ((account_id = '1'::bigint) AND ((sku)::text = '1234'::text))

Nice, so that’s it? Almost… Let’s rewind to the create_table definition in our migration, and notice the use of options:. This is appended to the CREATE TABLE PostgeSQL statement when run as a migration, but those options aren’t persisted to db/schema.rb when the schema is dumped. This means that if you load db/schema.rb in your test environment (rather than re-running migrations), your partitioned tables won’t be able to be created correctly.

I have a pull request open on Rails to support this, but for now you can add an initializer with a monkey patch:

module SchemaDumperExcludeChildPartitions
  def ignored?(table_name)
    super || @connection.inherited_table?(table_name)
  end
end

module SchemaStatementsPartitionAwareness
  # TODO: this can be removed if this PR makes it into Rails
  # https://github.com/rails/rails/pull/50475
  def table_options(table_name)
    options = {}
    if (comment = table_comment(table_name))
      options[:comment] = comment
    end
    if (partition_definition = table_partition_definition(table_name))
      options[:options] = "PARTITION BY #{partition_definition}"
    end
    options
  end

  def inherited_table?(table_name)
    scope = quoted_scope(table_name, type: "BASE TABLE")

    !!query_value(<<~SQL.squish, "SCHEMA")
      SELECT inhparent::pg_catalog.regclass
      FROM pg_catalog.pg_inherits p
        LEFT JOIN pg_catalog.pg_class c ON p.inhrelid = c.oid
        LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
      WHERE c.relname = #{scope[:name]}
        AND c.relkind IN (#{scope[:type]})
        AND n.nspname = #{scope[:schema]}
    SQL
  end

  def table_partition_definition(table_name)
    scope = quoted_scope(table_name, type: "BASE TABLE")

    query_value(<<~SQL.squish, "SCHEMA")
      SELECT pg_catalog.pg_get_partkeydef(c.oid)
      FROM pg_catalog.pg_class c
        LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
      WHERE c.relname = #{scope[:name]}
        AND c.relkind IN (#{scope[:type]})
        AND n.nspname = #{scope[:schema]}
    SQL
  end
end

ActiveSupport.on_load(:active_record_postgresqladapter) do
  ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaDumper.prepend(SchemaDumperExcludeChildPartitions)
  ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaStatements.prepend(SchemaStatementsPartitionAwareness)
end

Alternatively, a simpler way around this is to just set your schema format to :sql in config/application.rb and call it a day:

config.active_record.schema_format = :sql

As with any clever solution that may veer from standard conventions, there are trade-offs to consider.

  • Native list partitioning planning time is pretty abysmal when it comes to large number of partitions in older versions of PostgreSQL. In version 11.x, I was encountered planning times above 10 seconds with 10,000 partitions, even when specifying the partition key. Upon upgrading to 15.5, that planning time dropped to under 1ms. If you plan on having a large number of partitions, I’d recommend keeping your PG version up to date.

  • Query performance can be dramatically improved in certain situations, but if you aren’t careful you can still footgun yourself royally. For example, If we had 10,000 accounts and searched InventoryItem.where(sku: "1234") that would result in 10,000 separate queries against each partition to be merged together. List partitioning is not a good choice if you need to search across partitions without knowing which partition keys you need, and your ActiveRecord queries should be written accordingly.

  • There is mental overhead required for understanding how list partitioning works that may become friction if sharing the codebase with a larger team that may not be as database savvy, or are less comfortable with “hacks” like monkey patches in initializer files to support db/schema.rb dumping. It may also be fair to consider list partitioning as a premature optimization if you don’t expect massive amounts of data and the other potential benefits aren’t relevant.

Disclaimers aside, I’ve been running such a list partitioning system at scale in production for a couple weeks now and so far, so good!

Way Mondo is me, Justin Talbott
/ waymondo