find_each warnings
Although usign find_each
, find_in_batches
to iterate through large collections
seems like a very good idea, there is a gotcha with scopes carrying limit or
order information: due to the implementation, they are (or were, up until the latest active_record versions) silently ignored.
I was playing with the idea of patching these methods to add an exception when an order was going to be ignored, turns out that there is already a few things in place, and more to come.
With rails 4.2.6
, the logger already warns when order or limit are present:
def find_in_batches(options = {})
# ...
if logger && (arel.orders.present? || arel.taken.present?)
logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
end
# ...
In rails 5.0.0.1
, there is an option to throw an excepcion instead of logging a warning that we
can configure at the application level:
http://api.rubyonrails.org/v5.0.0.1/classes/ActiveRecord/Batches.html#method-i-find_in_batches
:error_on_ignore - Overrides the application config to specify if an error should be raised when the order and limit have to be ignored due to batching.
# ActiveRecord::Base.error_on_ignored_order_or_limit = true
def act_on_order_or_limit_ignored(error_on_ignore)
raise_error = (error_on_ignore.nil? ? self.klass.error_on_ignored_order_or_limit : error_on_ignore)
if raise_error
raise ArgumentError.new(ORDER_OR_LIMIT_IGNORED_MESSAGE)
elsif logger
logger.warn(ORDER_OR_LIMIT_IGNORED_MESSAGE)
end
end
In the next release of rails we’ll get support for limit
, whereas order
will
leave a warning or raise an exception depending on our settings on ActiveRecord::Base.error_on_ignored_order
https://github.com/rails/rails/commit/451437c6f57e66cc7586ec966e530493927098c7
The flag error_on_ignored_order_or_limit has been deprecated in favor of the current error_on_ignored_order.
Xavier Noria
Batch processing methods support limit:
Post.limit(10_000).find_each do |post| # ... end
It also works in find_in_batches and in_batches
With a rails < 5 app, we can add a patch to add the raise on ignore option manually, like here:
# initializers/raise_on_ignored_limit_or_order.rb
raise 'Can not patch this version of ActiveRecord' unless ActiveRecord::VERSION::STRING == '4.2.6'
module ActiveRecord
module Batches
# source: https://github.com/rails/rails/blob/v4.2.6/activerecord/lib/active_record/relation/batches.rb#L98
# User.limit(1).find_each...
# ArgumentError: Limit is not supported with find_each
#
# remove the limit first: User.limit(1).limit(nil).find_each do ...
#
def find_in_batches(options = {})
raise ArgumentError.new('Limit is not supported with find_each') if arel.taken
raise ArgumentError.new('Order is not supported with find_each') if arel.orders.present?
options.assert_valid_keys(:start, :batch_size)
relation = self
start = options[:start]
batch_size = options[:batch_size] || 1000
unless block_given?
return to_enum(:find_in_batches, options) do
total = start ? where(table[primary_key].gteq(start)).size : size
(total - 1).div(batch_size) + 1
end
end
# if logger && (arel.orders.present? || arel.taken.present?)
# logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
# end
relation = relation.reorder(batch_order).limit(batch_size)
records = start ? relation.where(table[primary_key].gteq(start)).to_a : relation.to_a
while records.any?
records_size = records.size
primary_key_offset = records.last.id
raise "Primary key not included in the custom select clause" unless primary_key_offset
yield records
break if records_size < batch_size
records = relation.where(table[primary_key].gt(primary_key_offset)).to_a
end
end
end
end