aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorHomu <homu@barosl.com>2016-07-29 07:11:10 +0900
committerHomu <homu@barosl.com>2016-07-29 07:11:10 +0900
commitccf79ff17a9f2be2ce47e7809e52e2edf62708f7 (patch)
treedcec33cf63c658c7e6ccdae3fdc808ebc0de9c02
parent881439a0df4aca1c987d30b105c696020a87427d (diff)
parentb4fd57fd7c747ce8972ccb14aec013ce3da391bb (diff)
downloadbundler-ccf79ff17a9f2be2ce47e7809e52e2edf62708f7.tar.gz
Auto merge of #4818 - NickLaMuro:better_net_http_reuse_for_compact_index, r=indirect
Allow for Thread reuse for CompactIndex fetcher Overview -------- One of the main benefits of the `Net::HTTP::Persistent` library is that it maintains the HTTP connections so that multiple requests to the same host don't need to re-instantiate the http tcp socket to make a request, and also allows this to be done across threads. But when used with the `CompactIndex` client, the each iteration of the `remaining_gems?` loop throws away the `Bundle::Worker` and initiates new Threads with new connections. Since the resulting work on these threads only returns to the caller the result, and can be collected cleanly, it is safe to reuse these threads by other repeated `gem_name` iterations and gain a bit of a performance boost in the process. While in this code change, we do run `@bundle_worker.stop`, we could simple skip this now as we will never use more then one `Bundle::Worker`'s worth of threads, keeping around the worker for further calls to `.specs` on the same fetcher. That said, this has not been don at this time. Metrics ------- This is the result of running the fetcher for fetching gem metadata against the Gemfile for the [ManageIQ](https://github.com/ManageIQ/manageiq) project (these are sorted collection of results from the quickest times to the slowest times from a collection of benchmarks made over time, and are not in any other order besides that): | No changes | Shared threads, 25 workers | Shared threads, 15 workers | | ---------- | --------------------------- | -------------------------- | | 14.780272 | 4.429732 | 4.424576 | | 14.987697 | 4.664379 | 4.447966 | | 16.161742 | 7.470335 | 4.596416 | | 16.698832 | 8.538736 | 8.103314 | | 16.973982 | 10.293958 | 10.368032 | | 17.044186 | 12.548805 | 10.873393 | | 17.401267 | 13.527048 | 11.734078 | | 17.417851 | 13.640801 | 12.379456 | | 17.611628 | 13.695791 | 14.071921 | | 17.702503 | 13.930277 | 14.151556 | | 17.926312 | 14.512490 | 14.247501 | | 17.953948 | 14.547216 | 14.372980 | | 18.116374 | 14.562247 | 14.494275 | | 18.193501 | 15.054224 | 14.496102 | | 18.278697 | 16.325938 | 14.496330 | | 18.313061 | 16.773398 | 14.799070 | | 18.561052 | 17.029987 | 15.377202 | | 18.657163 | 17.321094 | 15.658238 | | 18.661411 | 17.443478 | 17.018813 | | 18.739866 | 17.895521 | 17.142276 | | 19.008614 | 18.075015 | 17.414206 | | 19.174890 | 18.159239 | 17.860201 | | 19.235642 | 18.188842 | 18.460573 | The following changes to the `lib/bundler/source/rubygems.rb` were used to capture the metrics. ```diff diff --git a/lib/bundler/source/rubygems.rb b/lib/bundler/source/rubygems.rb index aedad70..2cd637b 100644 --- a/lib/bundler/source/rubygems.rb +++ b/lib/bundler/source/rubygems.rb @@ -350,11 +350,17 @@ module Bundler " Downloading full index instead..." unless allow_api if allow_api + require "benchmark" + puts Benchmark.measure { api_fetchers.each do |f| Bundler.ui.info "Fetching gem metadata from #{f.uri}", Bundler.ui.debug? idx.use f.specs_with_retry(dependency_names, self) Bundler.ui.info "" unless Bundler.ui.debug? # new line now that the dots are over end + } + + puts; puts; puts "exiting..." + exit ``` And the command used to capture the metrics using this branch: ``` $ for I in `seq 1 15`; do \ BUNDLE_DISABLE_POSTIT=1 ruby -I /path/to/bundler/lib /path/to/bundler/exe/bundle update; \ done ``` Some things to note about this data: - These results are the lowest of the recorded and usable results, not an average - I went with the lowest because is show the range of improvement the best, but I didn't get an even number of usable results, so this was the best way (I found) to share a somewhat unbiased distribution - Usable results were the following - Used the `compact_index` resolver - Didn't require a retry - Unfortunately, almost half of my results had to be thrown away because they didn't match that criteria One thing that I also observed while running in debug node and inspecting the threads is the it seems that one or two workers tend take a majority of the requests, while the other workers seem to only process one. This is why the metrics show a second column where I adjusted the number of workers spawned to 15 (no reflected in this code change), something to also consider. Tests ----- Wasn't able to run tests locally when I using the latest version of master (not sure what broke), and unsure how I would test this functionality. Let me know if this is something you would like to go forward with adding and I will look into adding some tests around it.
-rw-r--r--lib/bundler/fetcher/compact_index.rb18
1 files changed, 15 insertions, 3 deletions
diff --git a/lib/bundler/fetcher/compact_index.rb b/lib/bundler/fetcher/compact_index.rb
index e6f936c2..be5683db 100644
--- a/lib/bundler/fetcher/compact_index.rb
+++ b/lib/bundler/fetcher/compact_index.rb
@@ -44,6 +44,8 @@ module Bundler
complete_gems.concat(deps.map(&:first)).uniq!
remaining_gems = next_gems - complete_gems
end
+ @bundle_worker.stop if @bundle_worker
+ @bundle_worker = nil # reset it. Not sure if necessary
gem_info
end
@@ -86,15 +88,25 @@ module Bundler
end.tap do |client|
client.in_parallel = lambda do |inputs, &blk|
func = lambda {|object, _index| blk.call(object) }
- worker_name = "Compact Index (#{display_uri.host})"
- worker = Bundler::Worker.new(25, worker_name, func)
+ worker = bundle_worker(func)
inputs.each {|input| worker.enq(input) }
- inputs.map { worker.deq }.tap { worker.stop }
+ inputs.map { worker.deq }
end
end
end
end
+ def bundle_worker(func = nil)
+ if @bundle_worker
+ @bundle_worker.tap do |worker|
+ worker.instance_variable_set(:@func, func) if func
+ end
+ else
+ worker_name = "Compact Index (#{display_uri.host})"
+ @bundle_worker ||= Bundler::Worker.new(25, worker_name, func)
+ end
+ end
+
def cache_path
Bundler.user_cache.join("compact_index", remote.cache_slug)
end