Are Exclusive Arcs Evil?

There inevitably comes a time while designing a data schema when it’s desirable to have an entity that can belong to one of a number of different types of entities. A Comment could belong to a Post or it could belong to a Page - therefore the Page and Post entities both share the characteristic of being commentable. This concept is known as polymorphism and becomes a consideration in relational database design.

In its purest form, a relationship in a database consists of a single foreign key on one table that points to the primary key of another table. In many cases, this association is not optional and/or you should not be able to delete a record without addressing it’s foreign references.

CREATE TABLE users (
  id bigint PRIMARY KEY NOT NULL,
  name varchar NOT NULL
);

CREATE TABLE posts (
  id bigint PRIMARY KEY NOT NULL,
  user_id bigint REFERENCES users NOT NULL,
  content text NOT NULL
);

The examples in this post will be using PostgreSQL specifically but should reflect similar syntax and capabilities of the larger SQL ecosystem.

The REFERENCES clause above represents a foreign key constraint. It prevents your application’s business logic from infringing upon the data’s referential integrity. If one tries to delete a User without specifying what to do with its existing Posts, the database will rightfully error.

One of the common patterns for polymorphism, such as the one in Ruby on Rails’s ActiveRecord ORM, is to use an additional column in the foreign identification strategy.

class Post < ApplicationRecord
  has_many :comments, as: :commentable
end

class Page < ApplicationRecord
  has_many :comments, as: :commentable
end

class Comment < ApplicationRecord
  belongs_to :commentable, polymorphic: true
end

CREATE TABLE comments (
  id bigint PRIMARY KEY NOT NULL,
  commentable_id bigint NOT NULL,
  commentable_type varchar NOT NULL,
  content text NOT NULL
);

In this case commentable_type is a string ("Post" or "Page") that refers to the application’s definition of the class that knows how to identify the foreign table. Ruby on Rails is a beloved and battle-tested framework, so this pattern has surely stood the test of time for many, yet what might a critique of this pattern be?

If you want to change the name of the ActiveRecord class used for identifying the foreign table, you must also update the database strings that represent it. Likewise, your database doesn’t know what a "Post" or a "Page" are. Because of this, you also aren’t able to add a foreign key constraint as you can’t uniquely identify the primary key of another table without first turning commentable_type into a class within your application. Referential integrity can’t be maintained and you could end up with invalid data that shouldn’t be allowed to exist.

So what other options are there? One decent one is to have a join table for each relationship.

CREATE TABLE comments (
  id bigint PRIMARY KEY NOT NULL,
  content text NOT NULL
)

CREATE TABLE comments_posts (
  post_id bigint REFERENCES posts NOT NULL,
  comment_id bigint REFERENCES comments NOT NULL
)

CREATE TABLE comments_pages (
  page_id bigint REFERENCES pages NOT NULL,
  comment_id bigint REFERENCES comments NOT NULL
)

CREATE UNIQUE INDEX idx_comments_posts_on_comment_id
  ON comments_posts USING btree (comment_id);

CREATE UNIQUE INDEX idx_comments_pages_on_comment_id
  ON comments_pages USING btree (comment_id);

What downsides might there be to this approach? Firstly, you can create a Comment that doesn’t belong to anything since there is no foreign constraint on the table itself. The above example is capable of expressing exclusivity of a relationship via the unique index constraints, but only for each individual join table. Therefore nothing could prevent the database from having a Comment that belongs to both a Post and a Page.

There is another pattern known as an exclusive arc. It is when the polymorphic entity’s schema contains all possible foreign keys.

CREATE TABLE comments (
  id bigint PRIMARY KEY NOT NULL,
  content text NOT NULL,
  page_id bigint REFERENCES pages,
  post_id bigint REFERENCES posts
)

If we left it there, this would also be inadequate, or some might even say… evil? What would prevent a Comment from having both a post_id and a page_id? Or neither?

Luckily, SQL databases support another type of constraint called a check constraint, which allows specification that a certain condition must be true for the row to be valid. This can be used to ensure that only one of the foreign keys is set per row (or at most one, in the case of an optional relationship).

CREATE TABLE comments (
  id bigint PRIMARY KEY NOT NULL,
  content text NOT NULL,
  page_id bigint REFERENCES comments,
  post_id bigint REFERENCES posts, 
  CONSTRAINT commentable CHECK ((
   CASE WHEN (page_id IS NULL) THEN 0 ELSE 1 END +
   CASE WHEN (post_id IS NULL) THEN 0 ELSE 1 END) = 1)
)

And with that, you get the benefit of both foreign key and check constraints without needing additional join tables or application logic stored as strings. Referential integrity gods be pleased.

So how would one implement this in Ruby on Rails? The schema above actually maps cleanly onto a set of optional belongs_to relationships, and a validation could be added to mirror the check constraint’s logic. Extra "commentable" getter and setter methods could be dynamically handled methods on the Comment model itself. An incomplete, un-optimized, mostly illustrative example would look something like:

class Post < ApplicationRecord
  has_many :comments, dependent: :destroy
end

class Page < ApplicationRecord
  has_many :comments, dependent: :destroy
end

class Comment < ApplicationRecord
  belongs_to :post, optional: true
  belongs_to :page, optional: true
  validate :post_or_page_present?

  def commentable
    post || page
  end

  def commentable=(post_or_page)
    case post_or_page
    when Post
      self.post = post_or_page
      self.page = nil
    when Page
      self.page = post_or_page
      self.post = nil
    end
  end

  private

  def post_or_page_present?
    return unless [post, page].compact.size == 1

    errors.add(:base, :commentable) 
  end
end

If your spidey sense is tingling, you might notice the boilerplate here can be abstracted away with the (arguably evil) power of metaprograamming. I acted upon this and created the activerecord-exclusive-arc gem.

In a nutshell, it provides a Rails generator command that will generate the necessary migration and inject your ActiveRecord model with the exclusive arc’s module and registration.

$ bin/rails g exclusive_arc Comment commentable post page
   create  db/migrate/20230617204515_comment_commentable_exclusive_arc_post_page.rb
   insert  app/models/comment.rb

class CommentCommentableExclusiveArcPostPage < ActiveRecord::Migration[7.0]
  def change
    add_reference :comments, :post, foreign_key: true, index: {where: "post_id IS NOT NULL"}
    add_reference :comments, :page, foreign_key: true, index: {where: "page_id IS NOT NULL"}
    add_check_constraint(
      :comments,
      "(CASE WHEN post_id IS NULL THEN 0 ELSE 1 END + CASE WHEN page_id IS NULL THEN 0 ELSE 1 END) = 1",
      name: "commentable"
    )
  end
end

class Comment < ApplicationRecord
  include ExclusiveArc::Model
  has_exclusive_arc :commentable, [:post, :page]
end

You can find out more information in the gem’s README.

Ultimately, I’d claim that the exclusive arc is not evil - rather, it is a benevolent polymorphism pattern that seeks to maintain your data’s integrity if you leverage the power of your database’s constraints. And with the right tooling, they can be ergonomically pleasant to work with as well.

Way Mondo

Are Exclusive Arcs Evil?

Or: What Are They and How to Implement One in Ruby on Rails