Introducing: The Repository Pattern

Note: This is part of the Production vs Tutorial Code Series.

When I first started writing Rails code, I loved ActiveRecord, the ruby implementation of the Active Record data access pattern. It felt so natural as an object oriented engineer to interact with “smart” objects that could talk to the database on my behalf. I even went so far as to implement an Active Record style SDK framework for working with our simplistic C.R.U.D. APIs.

# Models
class Book < ApplicationRecord
  has_and_belongs_to_many :authors
  has_many :book_reviews
  has_many :reviews, through: :book_reviews
  has_many :top_reviews, -> { best_score.limit(3) }, through: :book_reviews, source: :review
  
  scope :aggregate, -> { select('books.*').group('books.*') } # there may be a better way to do this
end

class Author < ApplicationRecord
  has_and_belongs_to_many :books
end

class BookReview
  belongs_to :book
  belongs_to :review
end

class Review < ApplicationRecord
  belongs_to :user
  
  scope :best_score, -> { order(score: :desc) }
  scope :best_average_score, -> { order('AVG(reviews.score) DESC') }
end

# Non-trivial usage
author = Author.find(params[:id])
books_for_author_sorted_by_review =
  author
    .books
    .aggregate # to return one row per book
    .joins(:book_reviews).joins(:reviews).merge(Review.best_average_score) # to sort by average review
    .includes(:authors) # to avoid N+1 queries when I list the co-authors of the books
    .includes(:top_reviews) # to avoid N+1 queries when I populate a carousel of reviews  
# ... etc

But as time has gone on, the novelty has worn thin. I don’t love the hidden complexity of ActiveRecord where you can accidentally trigger DB queries (especially N+1 style in lists). I don’t love writing chained query statements in my controllers and business logic. I definitely don’t love maintaining my own Active Record library as my APIs become more and more complex, and less and less CRUD-y over time.

Enter the Repository (aka Facade) Design Pattern.

A repository or facade is just a layer in an application that hides complexity.

# NOTE: all the same models are defined as above,
#  we still are leveraging ActiveRecord under the hood.
module Books
  class << self
    def find_all_by_author(author, **other_params)
      find_all(base_query: Book.where(author:), **other_params)
    end
    
    private
    
    def find_all(base_query: Book.all, order_by: :published_at, joins: [], includes: [])
      need_to_aggregate = false
      
      sorted_query = 
        case order_by
        when :best_average_review
          need_to_aggregate = true
          joins << :book_reviews
          joins << :reviews
          base_query.merge(Review.best_average_score)
        else
          base_query.order(order_by)
        end
      
      query_with_joins =
        if joins.present?
          joins.uniq.reduce(sorted_query) do |query, relationship| 
            query.joins(relationship)
          end
        else
          sorted_query
        end
      
      query_with_aggregation =
        if need_to_aggregate
          query_with_joins.aggregate
        else
          query_with_joins
        end
      
      query_with_aggregation.includes(includes)
    end
  end
end

# Usage
author = Author.find(params[:id])
books_for_author_sorted_by_review = 
  Books.find_all_by_author(
    author, 
    order_by: :best_average_review, 
    includes: %i[authors top_reviews]
  )

This is more work ahead of time because you have to consider various actual use cases and plan for what arguments you might need.

However, the sheer readability, clarity, and simplicity of the calling code more than makes up for the implementation complexity.

Commiting to the Bit

The problem with the above example of course, is that we still get “smart” objects back, which could accidentally trigger more queries despite our best efforts.

The good news is that since Rails 6.1, you can replace base_query: Books.all with base_query: Books.strict_loading. This will prevent you from loading new associations not found in the includes: [].

Another approach would be to abandon active_record in favor of the rom (Ruby Object Mapper) gem, which implements the Data Mapper pattern in ruby.

This is a whole different approach to interacting with your database, where you separate the definition of the queries from the definition of the (static) data returned.

SDKs

My general approach to SDKs is now a repository pattern, with request methods often implemented by the Method Object approach. These then return dumb data objects for consumers of the SDK to interact with. In some SDKs we also wrap request responses in Result Monads to indicate success/failure rather than raising an exception on a bad API response.

Adopting the Repository Pattern

The great news about the repository pattern is that is very easy to adopt piecemeal.

Take any DB query or SDK call you make in multiple places and put it behind a method on a new module.
Update all usages of this call
Find a similar query and create a new method on the repository
See if you can extract some common code between the two
See if the resulting code is so similar that it could be handled with different argument values rather than different named methods.

Also: always resist the urge to use secret knowledge of the internals of a repository. Take the opportunity to call strict_loading on the results for ActiveRecord, and then treat the return value as a dumb data object (both for ActiveRecord an SDKs).

Find that you need to load an associated record? Make that part of the repository method call (either through includes(:association) or a secondary SDK call), and inject the preloaded data into the return values before they leave the repository.

Consider wrapping your basic data object(s) in a ViewModel or Decorator instead of returning a bare ActiveRecord or similar class.

Adapting the Repository Pattern

No, that isn’t a typo. The Repository pattern can also really shine when paired with the Adapter design pattern, especially when you have semi-homogenous data stored in divergent subsystems.

Let’s say you have “content” data stored in a variety of places:

a local DB
a CMS accessible via API
a custom API

module Content
  module Books # DB
    def self.fetch(id)
      Book.find_by_id(id)
    end
  end
  
  module Articles # CMS
    def self.fetch(id)
      MyArticlesCMS.get(id)
    end
  end
  
  module Reviews # API
    def self.fetch(id)
      MyReviewsApi.find(id)
    end
  end
  
  class << self
    def books
      Books
    end
    
    def articles
      Articles
    end
    
    def reviews
      Reviews
    end
  end
end

# Usage

book = Content.books.fetch(params[:book_id])
article = Content.articles.fetch(params[:article_id])
review = Content.reviews.fetch(params[:review_id])

If you wrap the data coming back in a common interface (say, title, desciption, etc) you can end up mixing and matching search results in lists, which can be nice.

Conclusion

The Repository pattern is a very powerful tool to clean up and organize your data access code, and you can start implementing it immediately in small steps.

Your code will be easier to read and reason about, and you will find hidden opportunities for code reuse and simplification through simple method refactoring.

Hopefully all the links in the above article will be valuable for future reading.