find_each warnings

October 29th, 2016 - Bonn

Although usign find_each, find_in_batches to iterate through large collections seems like a very good idea, there is a gotcha with scopes carrying limit or order information: due to the implementation, they are (or were, up until the latest active_record versions) silently ignored.

I was playing with the idea of patching these methods to add an exception when an order was going to be ignored, turns out that there is already a few things in place, and more to come.

With rails 4.2.6, the logger already warns when order or limit are present:

ActiveRecord::Batches (4.2.6)

def find_in_batches(options = {})
  # ...
  if logger && (arel.orders.present? || arel.taken.present?)
    logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
  end
  # ...

In rails 5.0.0.1, there is an option to throw an excepcion instead of logging a warning that we can configure at the application level:

http://api.rubyonrails.org/v5.0.0.1/classes/ActiveRecord/Batches.html#method-i-find_in_batches

:error_on_ignore - Overrides the application config to specify if an error should be raised when the order and limit have to be ignored due to batching.

ActiveRecord::Batches (5.0.1)

# ActiveRecord::Base.error_on_ignored_order_or_limit = true

def act_on_order_or_limit_ignored(error_on_ignore)
  raise_error = (error_on_ignore.nil? ? self.klass.error_on_ignored_order_or_limit : error_on_ignore)

  if raise_error
    raise ArgumentError.new(ORDER_OR_LIMIT_IGNORED_MESSAGE)
  elsif logger
    logger.warn(ORDER_OR_LIMIT_IGNORED_MESSAGE)
  end
end

In the next release of rails we’ll get support for limit, whereas order will leave a warning or raise an exception depending on our settings on ActiveRecord::Base.error_on_ignored_order

https://github.com/rails/rails/commit/451437c6f57e66cc7586ec966e530493927098c7

The flag error_on_ignored_order_or_limit has been deprecated in favor of the current error_on_ignored_order.

Xavier Noria

Batch processing methods support limit:

Post.limit(10_000).find_each do |post| # ... end

It also works in find_in_batches and in_batches

With a rails < 5 app, we can add a patch to add the raise on ignore option manually, like here:

# initializers/raise_on_ignored_limit_or_order.rb
raise 'Can not patch this version of ActiveRecord' unless ActiveRecord::VERSION::STRING == '4.2.6'

module ActiveRecord
  module Batches
    # source:  https://github.com/rails/rails/blob/v4.2.6/activerecord/lib/active_record/relation/batches.rb#L98
    # User.limit(1).find_each...
    # ArgumentError: Limit is not supported with find_each
    #
    # remove the limit first: User.limit(1).limit(nil).find_each do ...
    #
    def find_in_batches(options = {})
      raise ArgumentError.new('Limit is not supported with find_each') if arel.taken
      raise ArgumentError.new('Order is not supported with find_each') if arel.orders.present?

      options.assert_valid_keys(:start, :batch_size)

      relation = self
      start = options[:start]
      batch_size = options[:batch_size] || 1000

      unless block_given?
        return to_enum(:find_in_batches, options) do
          total = start ? where(table[primary_key].gteq(start)).size : size
          (total - 1).div(batch_size) + 1
        end
      end

      # if logger && (arel.orders.present? || arel.taken.present?)
      #  logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
      # end

      relation = relation.reorder(batch_order).limit(batch_size)
      records = start ? relation.where(table[primary_key].gteq(start)).to_a : relation.to_a

      while records.any?
        records_size = records.size
        primary_key_offset = records.last.id
        raise "Primary key not included in the custom select clause" unless primary_key_offset

        yield records

        break if records_size < batch_size

        records = relation.where(table[primary_key].gt(primary_key_offset)).to_a
      end
    end
  end
end