There inevitably comes a time while designing a data schema when it’s desirable to have an entity
that can belong to one of a number of different types of entities. A Comment
could belong to a
Post
or it could belong to a Page
- therefore the Page
and Post
entities both share the
characteristic of being commentable
. This concept is known as polymorphism and becomes a
consideration in relational database design.
In its purest form, a relationship in a database consists of a single foreign key on one table that points to the primary key of another table. In many cases, this association is not optional and/or you should not be able to delete a record without addressing it’s foreign references.
CREATE TABLE users (
id bigint PRIMARY KEY NOT NULL,
name varchar NOT NULL
);
CREATE TABLE posts (
id bigint PRIMARY KEY NOT NULL,
user_id bigint REFERENCES users NOT NULL,
content text NOT NULL
);
The examples in this post will be using PostgreSQL
specifically but should reflect similar syntax
and capabilities of the larger SQL
ecosystem.
The REFERENCES
clause above represents a foreign key constraint. It prevents your application’s
business logic from infringing upon the data’s referential integrity. If one tries to delete a
User
without specifying what to do with its existing Posts
, the database will rightfully error.
One of the common patterns for polymorphism, such as the one in Ruby on Rails’s ActiveRecord
ORM, is to use an additional column in the foreign identification strategy.
class Post < ApplicationRecord
has_many :comments, as: :commentable
end
class Page < ApplicationRecord
has_many :comments, as: :commentable
end
class Comment < ApplicationRecord
belongs_to :commentable, polymorphic: true
end
CREATE TABLE comments (
id bigint PRIMARY KEY NOT NULL,
commentable_id bigint NOT NULL,
commentable_type varchar NOT NULL,
content text NOT NULL
);
In this case commentable_type
is a string ("Post"
or "Page"
) that refers to the application’s
definition of the class that knows how to identify the foreign table. Ruby on Rails is a beloved and
battle-tested framework, so this pattern has surely stood the test of time for many, yet what might
a critique of this pattern be?
If you want to change the name of the ActiveRecord
class used for identifying the foreign table,
you must also update the database strings that represent it. Likewise, your database doesn’t know
what a "Post"
or a "Page"
are. Because of this, you also aren’t able to add a foreign key
constraint as you can’t uniquely identify the primary key of another table without first turning
commentable_type
into a class within your application. Referential integrity can’t be maintained
and you could end up with invalid data that shouldn’t be allowed to exist.
So what other options are there? One decent one is to have a join table for each relationship.
CREATE TABLE comments (
id bigint PRIMARY KEY NOT NULL,
content text NOT NULL
)
CREATE TABLE comments_posts (
post_id bigint REFERENCES posts NOT NULL,
comment_id bigint REFERENCES comments NOT NULL
)
CREATE TABLE comments_pages (
page_id bigint REFERENCES pages NOT NULL,
comment_id bigint REFERENCES comments NOT NULL
)
CREATE UNIQUE INDEX idx_comments_posts_on_comment_id
ON comments_posts USING btree (comment_id);
CREATE UNIQUE INDEX idx_comments_pages_on_comment_id
ON comments_pages USING btree (comment_id);
What downsides might there be to this approach? Firstly, you can create a Comment
that doesn’t
belong to anything since there is no foreign constraint on the table itself. The above example is
capable of expressing exclusivity of a relationship via the unique index constraints, but only for
each individual join table. Therefore nothing could prevent the database from having a Comment
that belongs to both a Post
and a Page
.
There is another pattern known as an exclusive arc. It is when the polymorphic entity’s schema contains all possible foreign keys.
CREATE TABLE comments (
id bigint PRIMARY KEY NOT NULL,
content text NOT NULL,
page_id bigint REFERENCES pages,
post_id bigint REFERENCES posts
)
If we left it there, this would also be inadequate, or some might even say…
evil?
What would prevent a Comment
from having both a post_id
and a page_id
? Or neither?
Luckily, SQL databases support another type of constraint called a check constraint, which allows specification that a certain condition must be true for the row to be valid. This can be used to ensure that only one of the foreign keys is set per row (or at most one, in the case of an optional relationship).
CREATE TABLE comments (
id bigint PRIMARY KEY NOT NULL,
content text NOT NULL,
page_id bigint REFERENCES comments,
post_id bigint REFERENCES posts,
CONSTRAINT commentable CHECK ((
CASE WHEN (page_id IS NULL) THEN 0 ELSE 1 END +
CASE WHEN (post_id IS NULL) THEN 0 ELSE 1 END) = 1)
)
And with that, you get the benefit of both foreign key and check constraints without needing additional join tables or application logic stored as strings. Referential integrity gods be pleased.
So how would one implement this in Ruby on Rails? The schema above actually maps cleanly onto a
set of optional belongs_to
relationships, and a validation could be added to mirror the check
constraint’s logic. Extra "commentable"
getter and setter methods could be dynamically handled
methods on the Comment
model itself. An incomplete, un-optimized, mostly illustrative example
would look something like:
class Post < ApplicationRecord
has_many :comments, dependent: :destroy
end
class Page < ApplicationRecord
has_many :comments, dependent: :destroy
end
class Comment < ApplicationRecord
belongs_to :post, optional: true
belongs_to :page, optional: true
validate :post_or_page_present?
def commentable
post || page
end
def commentable=(post_or_page)
case post_or_page
when Post
self.post = post_or_page
self.page = nil
when Page
self.page = post_or_page
self.post = nil
end
end
private
def post_or_page_present?
return unless [post, page].compact.size == 1
errors.add(:base, :commentable)
end
end
If your spidey sense is tingling, you might notice the boilerplate here can be abstracted away with
the (arguably evil) power of metaprograamming. I acted upon this and created the
activerecord-exclusive-arc
gem.
In a nutshell, it provides a Rails generator command that will generate the necessary migration and
inject your ActiveRecord
model with the exclusive arc’s module and registration.
$ bin/rails g exclusive_arc Comment commentable post page
create db/migrate/20230617204515_comment_commentable_exclusive_arc_post_page.rb
insert app/models/comment.rb
class CommentCommentableExclusiveArcPostPage < ActiveRecord::Migration[7.0]
def change
add_reference :comments, :post, foreign_key: true, index: {where: "post_id IS NOT NULL"}
add_reference :comments, :page, foreign_key: true, index: {where: "page_id IS NOT NULL"}
add_check_constraint(
:comments,
"(CASE WHEN post_id IS NULL THEN 0 ELSE 1 END + CASE WHEN page_id IS NULL THEN 0 ELSE 1 END) = 1",
name: "commentable"
)
end
end
class Comment < ApplicationRecord
include ExclusiveArc::Model
has_exclusive_arc :commentable, [:post, :page]
end
You can find out more information in the gem’s README.
Ultimately, I’d claim that the exclusive arc is not evil - rather, it is a benevolent polymorphism pattern that seeks to maintain your data’s integrity if you leverage the power of your database’s constraints. And with the right tooling, they can be ergonomically pleasant to work with as well.