Introducing: The Repository Pattern
Note: This is part of the Production vs Tutorial Code Series.
When I first started writing Rails code, I loved ActiveRecord
, the ruby implementation of the Active Record
data access pattern.
It felt so natural as an object oriented engineer to interact with “smart” objects that could talk to the database on my behalf.
I even went so far as to implement an Active Record style SDK framework for working with our simplistic C.R.U.D. APIs.
# Models
class Book < ApplicationRecord
has_and_belongs_to_many :authors
has_many :book_reviews
has_many :reviews, through: :book_reviews
has_many :top_reviews, -> { best_score.limit(3) }, through: :book_reviews, source: :review
scope :aggregate, -> { select('books.*').group('books.*') } # there may be a better way to do this
end
class Author < ApplicationRecord
has_and_belongs_to_many :books
end
class BookReview
belongs_to :book
belongs_to :review
end
class Review < ApplicationRecord
belongs_to :user
scope :best_score, -> { order(score: :desc) }
scope :best_average_score, -> { order('AVG(reviews.score) DESC') }
end
# Non-trivial usage
author = Author.find(params[:id])
books_for_author_sorted_by_review =
author
.books
.aggregate # to return one row per book
.joins(:book_reviews).joins(:reviews).merge(Review.best_average_score) # to sort by average review
.includes(:authors) # to avoid N+1 queries when I list the co-authors of the books
.includes(:top_reviews) # to avoid N+1 queries when I populate a carousel of reviews
# ... etc
But as time has gone on, the novelty has worn thin. I don’t love the hidden complexity of ActiveRecord where you can accidentally trigger DB queries (especially N+1 style in lists). I don’t love writing chained query statements in my controllers and business logic. I definitely don’t love maintaining my own Active Record library as my APIs become more and more complex, and less and less CRUD-y over time.
Enter the Repository
(aka Facade
) Design Pattern.
A repository or facade is just a layer in an application that hides complexity.
# NOTE: all the same models are defined as above,
# we still are leveraging ActiveRecord under the hood.
module Books
class << self
def find_all_by_author(author, **other_params)
find_all(base_query: Book.where(author:), **other_params)
end
private
def find_all(base_query: Book.all, order_by: :published_at, joins: [], includes: [])
need_to_aggregate = false
sorted_query =
case order_by
when :best_average_review
need_to_aggregate = true
joins << :book_reviews
joins << :reviews
base_query.merge(Review.best_average_score)
else
base_query.order(order_by)
end
query_with_joins =
if joins.present?
joins.uniq.reduce(sorted_query) do |query, relationship|
query.joins(relationship)
end
else
sorted_query
end
query_with_aggregation =
if need_to_aggregate
query_with_joins.aggregate
else
query_with_joins
end
query_with_aggregation.includes(includes)
end
end
end
# Usage
author = Author.find(params[:id])
books_for_author_sorted_by_review =
Books.find_all_by_author(
author,
order_by: :best_average_review,
includes: %i[authors top_reviews]
)
This is more work ahead of time because you have to consider various actual use cases and plan for what arguments you might need.
However, the sheer readability, clarity, and simplicity of the calling code more than makes up for the implementation complexity.
Commiting to the Bit
The problem with the above example of course, is that we still get “smart” objects back, which could accidentally trigger more queries despite our best efforts.
The good news is that since Rails 6.1, you can replace base_query: Books.all
with base_query: Books.strict_loading
.
This will prevent you from loading new associations not found in the includes: []
.
Another approach would be to abandon active_record
in favor of the rom
(Ruby Object Mapper) gem, which implements the Data Mapper pattern in ruby.
This is a whole different approach to interacting with your database, where you separate the definition of the queries from the definition of the (static) data returned.
SDKs
My general approach to SDKs is now a repository pattern, with request methods often implemented by the Method Object
approach.
These then return dumb data objects for consumers of the SDK to interact with.
In some SDKs we also wrap request responses in Result Monads to indicate success/failure rather than raising an exception on a bad API response.
Adopting the Repository Pattern
The great news about the repository pattern is that is very easy to adopt piecemeal.
- Take any DB query or SDK call you make in multiple places and put it behind a method on a new module.
- Update all usages of this call
- Find a similar query and create a new method on the repository
- See if you can extract some common code between the two
- See if the resulting code is so similar that it could be handled with different argument values rather than different named methods.
Also: always resist the urge to use secret knowledge of the internals of a repository.
Take the opportunity to call strict_loading
on the results for ActiveRecord, and then treat the return value as a dumb data object (both for ActiveRecord an SDKs).
Find that you need to load an associated record? Make that part of the repository method call (either through includes(:association)
or a secondary SDK call), and inject the preloaded data into the return values before they leave the repository.
Consider wrapping your basic data object(s) in a ViewModel
or Decorator
instead of returning a bare ActiveRecord
or similar class.
Adapting the Repository Pattern
No, that isn’t a typo.
The Repository
pattern can also really shine when paired with the Adapter
design pattern, especially when you have semi-homogenous data stored in divergent subsystems.
Let’s say you have “content” data stored in a variety of places:
- a local DB
- a CMS accessible via API
- a custom API
module Content
module Books # DB
def self.fetch(id)
Book.find_by_id(id)
end
end
module Articles # CMS
def self.fetch(id)
MyArticlesCMS.get(id)
end
end
module Reviews # API
def self.fetch(id)
MyReviewsApi.find(id)
end
end
class << self
def books
Books
end
def articles
Articles
end
def reviews
Reviews
end
end
end
# Usage
book = Content.books.fetch(params[:book_id])
article = Content.articles.fetch(params[:article_id])
review = Content.reviews.fetch(params[:review_id])
If you wrap the data coming back in a common interface (say, title
, desciption
, etc) you can end up mixing and matching search results in lists, which can be nice.
Conclusion
The Repository
pattern is a very powerful tool to clean up and organize your data access code, and you can start implementing it immediately in small steps.
Your code will be easier to read and reason about, and you will find hidden opportunities for code reuse and simplification through simple method refactoring.
Hopefully all the links in the above article will be valuable for future reading.