diff options
author | Homu <homu@barosl.com> | 2016-07-29 07:11:10 +0900 |
---|---|---|
committer | Homu <homu@barosl.com> | 2016-07-29 07:11:10 +0900 |
commit | ccf79ff17a9f2be2ce47e7809e52e2edf62708f7 (patch) | |
tree | dcec33cf63c658c7e6ccdae3fdc808ebc0de9c02 | |
parent | 881439a0df4aca1c987d30b105c696020a87427d (diff) | |
parent | b4fd57fd7c747ce8972ccb14aec013ce3da391bb (diff) | |
download | bundler-ccf79ff17a9f2be2ce47e7809e52e2edf62708f7.tar.gz |
Auto merge of #4818 - NickLaMuro:better_net_http_reuse_for_compact_index, r=indirect
Allow for Thread reuse for CompactIndex fetcher
Overview
--------
One of the main benefits of the `Net::HTTP::Persistent` library is that it maintains the HTTP connections so that multiple requests to the same host don't need to re-instantiate the http tcp socket to make a request, and also allows this to be done across threads. But when used with the `CompactIndex` client, the each iteration of the `remaining_gems?` loop throws away the `Bundle::Worker` and initiates new Threads with new connections.
Since the resulting work on these threads only returns to the caller the result, and can be collected cleanly, it is safe to reuse these threads by other repeated `gem_name` iterations and gain a bit of a performance boost in the process.
While in this code change, we do run `@bundle_worker.stop`, we could simple skip this now as we will never use more then one `Bundle::Worker`'s worth of threads, keeping around the worker for further calls to `.specs` on the same fetcher. That said, this has not been don at this time.
Metrics
-------
This is the result of running the fetcher for fetching gem metadata against the Gemfile for the [ManageIQ](https://github.com/ManageIQ/manageiq) project (these are sorted collection of results from the quickest times to the slowest times from a collection of benchmarks made over time, and are not in any other order besides that):
| No changes | Shared threads, 25 workers | Shared threads, 15 workers |
| ---------- | --------------------------- | -------------------------- |
| 14.780272 | 4.429732 | 4.424576 |
| 14.987697 | 4.664379 | 4.447966 |
| 16.161742 | 7.470335 | 4.596416 |
| 16.698832 | 8.538736 | 8.103314 |
| 16.973982 | 10.293958 | 10.368032 |
| 17.044186 | 12.548805 | 10.873393 |
| 17.401267 | 13.527048 | 11.734078 |
| 17.417851 | 13.640801 | 12.379456 |
| 17.611628 | 13.695791 | 14.071921 |
| 17.702503 | 13.930277 | 14.151556 |
| 17.926312 | 14.512490 | 14.247501 |
| 17.953948 | 14.547216 | 14.372980 |
| 18.116374 | 14.562247 | 14.494275 |
| 18.193501 | 15.054224 | 14.496102 |
| 18.278697 | 16.325938 | 14.496330 |
| 18.313061 | 16.773398 | 14.799070 |
| 18.561052 | 17.029987 | 15.377202 |
| 18.657163 | 17.321094 | 15.658238 |
| 18.661411 | 17.443478 | 17.018813 |
| 18.739866 | 17.895521 | 17.142276 |
| 19.008614 | 18.075015 | 17.414206 |
| 19.174890 | 18.159239 | 17.860201 |
| 19.235642 | 18.188842 | 18.460573 |
The following changes to the `lib/bundler/source/rubygems.rb` were used to capture the metrics.
```diff
diff --git a/lib/bundler/source/rubygems.rb b/lib/bundler/source/rubygems.rb
index aedad70..2cd637b 100644
--- a/lib/bundler/source/rubygems.rb
+++ b/lib/bundler/source/rubygems.rb
@@ -350,11 +350,17 @@ module Bundler
" Downloading full index instead..." unless allow_api
if allow_api
+ require "benchmark"
+ puts Benchmark.measure {
api_fetchers.each do |f|
Bundler.ui.info "Fetching gem metadata from #{f.uri}", Bundler.ui.debug?
idx.use f.specs_with_retry(dependency_names, self)
Bundler.ui.info "" unless Bundler.ui.debug? # new line now that the dots are over
end
+ }
+
+ puts; puts; puts "exiting..."
+ exit
```
And the command used to capture the metrics using this branch:
```
$ for I in `seq 1 15`; do \
BUNDLE_DISABLE_POSTIT=1 ruby -I /path/to/bundler/lib /path/to/bundler/exe/bundle update; \
done
```
Some things to note about this data:
- These results are the lowest of the recorded and usable results, not an average
- I went with the lowest because is show the range of improvement the best, but I didn't get an even number of usable results, so this was the best way (I found) to share a somewhat unbiased distribution
- Usable results were the following
- Used the `compact_index` resolver
- Didn't require a retry
- Unfortunately, almost half of my results had to be thrown away because they didn't match that criteria
One thing that I also observed while running in debug node and inspecting the threads is the it seems that one or two workers tend take a majority of the requests, while the other workers seem to only process one. This is why the metrics show a second column where I adjusted the number of workers spawned to 15 (no reflected in this code change), something to also consider.
Tests
-----
Wasn't able to run tests locally when I using the latest version of master (not sure what broke), and unsure how I would test this functionality. Let me know if this is something you would like to go forward with adding and I will look into adding some tests around it.
-rw-r--r-- | lib/bundler/fetcher/compact_index.rb | 18 |
1 files changed, 15 insertions, 3 deletions
diff --git a/lib/bundler/fetcher/compact_index.rb b/lib/bundler/fetcher/compact_index.rb index e6f936c2..be5683db 100644 --- a/lib/bundler/fetcher/compact_index.rb +++ b/lib/bundler/fetcher/compact_index.rb @@ -44,6 +44,8 @@ module Bundler complete_gems.concat(deps.map(&:first)).uniq! remaining_gems = next_gems - complete_gems end + @bundle_worker.stop if @bundle_worker + @bundle_worker = nil # reset it. Not sure if necessary gem_info end @@ -86,15 +88,25 @@ module Bundler end.tap do |client| client.in_parallel = lambda do |inputs, &blk| func = lambda {|object, _index| blk.call(object) } - worker_name = "Compact Index (#{display_uri.host})" - worker = Bundler::Worker.new(25, worker_name, func) + worker = bundle_worker(func) inputs.each {|input| worker.enq(input) } - inputs.map { worker.deq }.tap { worker.stop } + inputs.map { worker.deq } end end end end + def bundle_worker(func = nil) + if @bundle_worker + @bundle_worker.tap do |worker| + worker.instance_variable_set(:@func, func) if func + end + else + worker_name = "Compact Index (#{display_uri.host})" + @bundle_worker ||= Bundler::Worker.new(25, worker_name, func) + end + end + def cache_path Bundler.user_cache.join("compact_index", remote.cache_slug) end |