aboutsummaryrefslogtreecommitdiffstats
path: root/vm_core.h
Commit message (Collapse)AuthorAgeFilesLines
* altstack is native thread's attrKoichi Sasada2022-05-241-7/+3
| | | | Move th->altstack to th->nt->altstack.
* add `rb_th_serial()`Koichi Sasada2022-05-241-0/+6
| | | | `rb_th_serial(th)` returns th's serial for debug print purpose.
* remove `NON_SCALAR_THREAD_ID` supportKoichi Sasada2022-05-241-2/+0
| | | | | | | | | `NON_SCALAR_THREAD_ID` shows `pthread_t` is non-scalar (non-pointer) and only s390x is known platform. However, the supporting code is very complex and it is only used for deubg print information. So this patch removes the support of `NON_SCALAR_THREAD_ID` and make the code simple.
* `rb_thread_t::serial` for debugKoichi Sasada2022-05-201-0/+1
| | | | | | | | | | | `rb_thread_t::serial` is auto-incremented serial number for threads and it can overflow, it means the serial is not a ID for each thread, it is only for debug print. `RUBY_DEBUG_LOG` shows this information. Also skip EC related information if EC is NULL. This patch enable to use `RUBY_DEBUG_LOG` without setup EC.
* Delete autoload data from global features after autoload has completed. (#5910)Samuel Williams2022-05-171-8/+18
| | | | | * Update naming of critical section assertions macros. * Improved locking for autoload.
* Fix various autoload race conditions. (#5898)Samuel Williams2022-05-151-1/+10
| | | | | | | | | | | * Add RUBY_VM_CRITICAL_SECTION for detecting unexpected context switch. * Prevent race between GC mark and autoload setup. * Protect race on autoload state. * Avoid potential race condition when allocating `autoload_featuremap`. * Add NEWS entry for autoload fixes.
* Rust YJITAlan Wu2022-04-271-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In December 2021, we opened an [issue] to solicit feedback regarding the porting of the YJIT codebase from C99 to Rust. There were some reservations, but this project was given the go ahead by Ruby core developers and Matz. Since then, we have successfully completed the port of YJIT to Rust. The new Rust version of YJIT has reached parity with the C version, in that it passes all the CRuby tests, is able to run all of the YJIT benchmarks, and performs similarly to the C version (because it works the same way and largely generates the same machine code). We've even incorporated some design improvements, such as a more fine-grained constant invalidation mechanism which we expect will make a big difference in Ruby on Rails applications. Because we want to be careful, YJIT is guarded behind a configure option: ```shell ./configure --enable-yjit # Build YJIT in release mode ./configure --enable-yjit=dev # Build YJIT in dev/debug mode ``` By default, YJIT does not get compiled and cargo/rustc is not required. If YJIT is built in dev mode, then `cargo` is used to fetch development dependencies, but when building in release, `cargo` is not required, only `rustc`. At the moment YJIT requires Rust 1.60.0 or newer. The YJIT command-line options remain mostly unchanged, and more details about the build process are documented in `doc/yjit/yjit.md`. The CI tests have been updated and do not take any more resources than before. The development history of the Rust port is available at the following commit for interested parties: https://github.com/Shopify/ruby/commit/1fd9573d8b4b65219f1c2407f30a0a60e537f8be Our hope is that Rust YJIT will be compiled and included as a part of system packages and compiled binaries of the Ruby 3.2 release. We do not anticipate any major problems as Rust is well supported on every platform which YJIT supports, but to make sure that this process works smoothly, we would like to reach out to those who take care of building systems packages before the 3.2 release is shipped and resolve any issues that may come up. [issue]: https://bugs.ruby-lang.org/issues/18481 Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> Co-authored-by: Noah Gibbs <the.codefolio.guy@gmail.com> Co-authored-by: Kevin Newton <kddnewton@gmail.com>
* introduce struct `rb_native_thread`Koichi Sasada2022-04-231-12/+10
| | | | | | | | | `rb_thread_t` contained `native_thread_data_t` to represent thread implementation dependent data. This patch separates them and rename it `rb_native_thread` and point it from `rb_thraed_t`. Now, 1 Ruby thread (`rb_thread_t`) has 1 native thread (`rb_native_thread`).
* rename thread internal namingKoichi Sasada2022-04-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | Now GVL is not process *Global* so this patch try to use another words. * `rb_global_vm_lock_t` -> `struct rb_thread_sched` * `gvl->owner` -> `sched->running` * `gvl->waitq` -> `sched->readyq` * `rb_gvl_init` -> `rb_thread_sched_init` * `gvl_destroy` -> `rb_thread_sched_destroy` * `gvl_acquire` -> `thread_sched_to_running` # waiting -> ready -> running * `gvl_release` -> `thread_sched_to_waiting` # running -> waiting * `gvl_yield` -> `thread_sched_yield` * `GVL_UNLOCK_BEGIN` -> `THREAD_BLOCKING_BEGIN` * `GVL_UNLOCK_END` -> `THREAD_BLOCKING_END` * removed * `rb_ractor_gvl` * `rb_vm_gvl_destroy` (not used) There are GVL functions such as `rb_thread_call_without_gvl()` yet but I don't have good name to replace them. Maybe GVL stands for "Greate Valuable Lock" or something like that.
* Finer-grained constant cache invalidation (take 2)Kevin Newton2022-04-011-32/+8
| | | | | | | | | | | | | | | | | | | | | | | | This commit reintroduces finer-grained constant cache invalidation. After 8008fb7 got merged, it was causing issues on token-threaded builds (such as on Windows). The issue was that when you're iterating through instruction sequences and using the translator functions to get back the instruction structs, you're either using `rb_vm_insn_null_translator` or `rb_vm_insn_addr2insn2` depending if it's a direct-threading build. `rb_vm_insn_addr2insn2` does some normalization to always return to you the non-trace version of whatever instruction you're looking at. `rb_vm_insn_null_translator` does not do that normalization. This means that when you're looping through the instructions if you're trying to do an opcode comparison, it can change depending on the type of threading that you're using. This can be very confusing. So, this commit creates a new translator function `rb_vm_insn_normalizing_translator` to always return the non-trace version so that opcode comparisons don't have to worry about different configurations. [Feature #18589]
* Prefix ccan headers (#4568)Nobuyoshi Nakada2022-03-301-11/+11
| | | | | | | | | | | | | * Prefixed ccan headers * Remove unprefixed names in ccan/build_assert * Remove unprefixed names in ccan/check_type * Remove unprefixed names in ccan/container_of * Remove unprefixed names in ccan/list Co-authored-by: Samuel Williams <samuel.williams@oriontransfer.co.nz>
* Revert "Finer-grained inline constant cache invalidation"Nobuyoshi Nakada2022-03-251-8/+32
| | | | | | | | | | | | This reverts commits for [Feature #18589]: * 8008fb7352abc6fba433b99bf20763cf0d4adb38 "Update formatting per feedback" * 8f6eaca2e19828e92ecdb28b0fe693d606a03f96 "Delete ID from constant cache table if it becomes empty on ISEQ free" * 629908586b4bead1103267652f8b96b1083573a8 "Finer-grained inline constant cache invalidation" MSWin builds on AppVeyor have been crashing since the merger.
* Finer-grained inline constant cache invalidationKevin Newton2022-03-241-32/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current behavior - caches depend on a global counter. All constant mutations cause caches to be invalidated. ```ruby class A B = 1 end def foo A::B # inline cache depends on global counter end foo # populate inline cache foo # hit inline cache C = 1 # global counter increments, all caches are invalidated foo # misses inline cache due to `C = 1` ``` Proposed behavior - caches depend on name components. Only constant mutations with corresponding names will invalidate the cache. ```ruby class A B = 1 end def foo A::B # inline cache depends constants named "A" and "B" end foo # populate inline cache foo # hit inline cache C = 1 # caches that depend on the name "C" are invalidated foo # hits inline cache because IC only depends on "A" and "B" ``` Examples of breaking the new cache: ```ruby module C # Breaks `foo` cache because "A" constant is set and the cache in foo depends # on "A" and "B" class A; end end B = 1 ``` We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT.
* Add ISEQ_BODY macroPeter Zhu2022-03-241-1/+3
| | | | | | Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using this macro will make it easier for us to change the allocation strategy of rb_iseq_constant_body when using Variable Width Allocation.
* [wasm] eval_inter.h gc.c vm_core.h: include wasm/setjmp.h instead of sysroot ↵Yuta Saito2022-01-191-1/+5
| | | | header
* make `overloaded_cme_table` truly weak key mapKoichi Sasada2021-12-211-0/+1
| | | | | | | | | | | | | `overloaded_cme_table` keeps cme -> monly_cme pairs to manage corresponding `monly_cme` for `cme`. The lifetime of the `monly_cme` should be longer than `monly_cme`, but the previous patch losts the reference to the living `monly_cme`. Now `overloaded_cme_table` values are always root (keys are only weak reference), it means `monly_cme` does not freed until corresponding `cme` is invalidated. To make managing easy, move `overloaded_cme_table` to `rb_vm_t`.
* fix local TP memory leakKoichi Sasada2021-12-151-1/+2
| | | | | | | It free `rb_hook_list_t` itself if needed. To recognize the need, this patch introduced `rb_hook_list_t::is_local` flag. This patch is succession of https://github.com/ruby/ruby/pull/4652
* Introduce an option "--dump=insns_without_opt" for debugging purposesYusuke Endoh2021-12-131-1/+1
|
* `Primitive.mandatory_only?` for fast pathKoichi Sasada2021-11-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compare with the C methods, A built-in methods written in Ruby is slower if only mandatory parameters are given because it needs to check the argumens and fill default values for optional and keyword parameters (C methods can check the number of parameters with `argc`, so there are no overhead). Passing mandatory arguments are common (optional arguments are exceptional, in many cases) so it is important to provide the fast path for such common cases. `Primitive.mandatory_only?` is a special builtin function used with `if` expression like that: ```ruby def self.at(time, subsec = false, unit = :microsecond, in: nil) if Primitive.mandatory_only? Primitive.time_s_at1(time) else Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in)) end end ``` and it makes two ISeq, ``` def self.at(time, subsec = false, unit = :microsecond, in: nil) Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in)) end def self.at(time) Primitive.time_s_at1(time) end ``` and (2) is pointed by (1). Note that `Primitive.mandatory_only?` should be used only in a condition of an `if` statement and the `if` statement should be equal to the methdo body (you can not put any expression before and after the `if` statement). A method entry with `mandatory_only?` (`Time.at` on the above case) is marked as `iseq_overload`. When the method will be dispatch only with mandatory arguments (`Time.at(0)` for example), make another method entry with ISeq (2) as mandatory only method entry and it will be cached in an inline method cache. The idea is similar discussed in https://bugs.ruby-lang.org/issues/16254 but it only checks mandatory parameters or more, because many cases only mandatory parameters are given. If we find other cases (optional or keyword parameters are used frequently and it hurts performance), we can extend the feature.
* Select including thread impl file at config timeYuta Saito2021-10-301-5/+1
|
* vm_core.h: Avoid unaligned access to ic_serial on 32-bit machineYusuke Endoh2021-10-291-4/+37
| | | | This caused Bus error on 32 bit Solaris
* Make Coverage suspendable (#4856)Yusuke Endoh2021-10-251-1/+3
| | | | | | | * Make Coverage suspendable Add `Coverage.suspend`, `Coverage.resume` and some methods. [Feature #18176] [ruby-core:105321]
* suppress warnings for probable NULL dererefencesNobuyoshi Nakada2021-10-241-0/+2
|
* `RubyVM.keep_script_lines`Koichi Sasada2021-10-211-0/+1
| | | | | | | | | | | | | | `RubyVM.keep_script_lines` enables to keep script lines for each ISeq and AST. This feature is for debugger/REPL support. ```ruby RubyVM.keep_script_lines = true RubyVM::keep_script_lines = true eval("def foo = nil\ndef bar = nil") pp RubyVM::InstructionSequence.of(method(:foo)).script_lines ```
* Add comments about special runtime routines YJIT callsAlan Wu2021-10-201-0/+2
| | | | | | | | | | | When YJIT make calls to routines without reconstructing interpreter state through jit_prepare_routine_call(), it relies on the routine to never allocate, raise, and push/pop control frames. Comment about this on the routines that YJTI calls. This is probably something we should dynamically verify on debug builds. It's hard to statically verify this as it requires verifying all functions in the call tree. Maybe something to look at in the future.
* Cleanup diff against upstream. Add commentsAlan Wu2021-10-201-2/+2
| | | | | I did a `git diff --stat` against upstream and looked at all the files that are outside of YJIT to come up with these minor changes.
* YJIT: Fancier opt_getinlinecacheAlan Wu2021-10-201-0/+3
| | | | | | | Make sure `opt_getinlinecache` is in a block all on its own, and invalidate it from the interpreter when `opt_setinlinecache`. It will recompile with a filled cache the second time around. This lets YJIT runs well when the IC for constant is cold.
* Yet Another Ruby JIT!Jose Narvaez2021-10-201-4/+4
| | | | Renaming uJIT to YJIT. AKA s/ujit/yjit/g.
* WIP refactor block lists to use darrayMaxime Chevalier-Boisvert2021-10-201-1/+3
|
* Tie lifetime of uJIT blocks to iseqsAlan Wu2021-10-201-0/+5
| | | | | | | | | | | | | | | | | | | | * Tie lifetime of uJIT blocks to iseqs Blocks weren't being freed when iseqs are collected. * Add rb_dary. Use it for method dependency table * Keep track of blocks per iseq Remove global version_tbl * Block version bookkeeping fix * dary -> darray * free ujit_blocks * comment about size of ujit_blocks
* WIP JIT-to-JIT returnsMaxime Chevalier-Boisvert2021-10-201-0/+2
|
* Do not load file with same realpath twice when requiringJeremy Evans2021-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes issues with paths being loaded twice in certain cases when symlinks are used. It took me multiple attempts to get this working. My original attempt tried to convert paths to realpaths before adding them to $LOADED_FEATURES. Unfortunately, this doesn't work well with the loaded feature index, which is based off load paths and not realpaths. While I was able to get require working, I'm fairly sure the loaded feature index was not being used as expected, which would have significant performance implications. Additionally, I was never able to get that approach working with autoload when autoloading a non-realpath file. It also broke some specs. This takes a more conservative approach. Directly before loading the file, if the file with the same realpath has been required, the loading of the file is skipped. The realpaths are stored as fstrings in a hidden hash. When rebuilding the loaded feature index, the hash of realpaths is also rebuilt. I'm guessing this makes rebuilding process slower, but I don think that is a hot path. In general, modifying loaded features is only done when reloading, and that tends to be in non-production environments. Change test_require_with_loaded_features_pop test to use 30 threads and 300 iterations, instead of 4 threads and 1000 iterations. I saw only sporadic failures with 4/1000, but consistent failures 30/300 threads. These failures were due to the fact that the concurrent deletions from $LOADED_FEATURES in other threads can result in rb_ary_entry returning nil when rebuilding the loaded features index. To avoid concurrency issues when rebuilding the loaded features index, the building of the index itself is left alone, and afterwards, a separate loop is done on a copy of the loaded feature snapshot in order to rebuild the realpaths hash. Fixes [Bug #17885]
* Revert "Do not load file with same realpath twice when requiring"Jeremy Evans2021-09-181-1/+0
| | | | | | | This reverts commit ddb85c5d2bdf75a83eb163856508691a7436b446. This commit causes unexpected warnings in TestTranscode#test_loading_race occasionally in CI.
* Do not load file with same realpath twice when requiringJeremy Evans2021-09-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes issues with paths being loaded twice in certain cases when symlinks are used. It took me multiple attempts to get this working. My original attempt tried to convert paths to realpaths before adding them to $LOADED_FEATURES. Unfortunately, this doesn't work well with the loaded feature index, which is based off load paths and not realpaths. While I was able to get require working, I'm fairly sure the loaded feature index was not being used as expected, which would have significant performance implications. Additionally, I was never able to get that approach working with autoload when autoloading a non-realpath file. It also broke some specs. This takes a more conservative approach. Directly before loading the file, if the file with the same realpath has been required, the loading of the file is skipped. The realpaths are stored as fstrings in a hidden hash. When rebuilding the loaded feature index, the hash of realpaths is also rebuilt. I'm guessing this makes rebuilding process slower, but I don think that is a hot path. In general, modifying loaded features is only done when reloading, and that tends to be in non-production environments. Change test_require_with_loaded_features_pop test to use 30 threads and 300 iterations, instead of 4 threads and 1000 iterations. I saw only sporadic failures with 4/1000, but consistent failures 30/300 threads. These failures were due to the fact that the concurrent deletions from $LOADED_FEATURES in other threads can result in rb_ary_entry returning nil when rebuilding the loaded features index. To avoid concurrency issues when rebuilding the loaded features index, the building of the index itself is left alone, and afterwards, a separate loop is done on a copy of the loaded feature snapshot in order to rebuild the realpaths hash. Fixes [Bug #17885]
* suppress GCC's -Wsuggest-attribute=format卜部昌平2021-09-101-0/+1
| | | | I was not aware of this because I use clang these days.
* Use free instead of xfree to free altstackYusuke Endoh2021-09-061-1/+1
| | | | | | | | | The altstack memory of a thread may be free'ed even after the VM is destructed. After that, GC is no longer available, so calling xfree may lead to a segfault. This changeset uses the bare free function to free the altstack memory instead of xfree. [Bug #18126]
* Remove root_jmpbuf in rb_thread_structNobuyoshi Nakada2021-08-101-1/+0
| | | | It has not been used since 1b82c877dfa72e8505ded149fd0e3ba956529d3f.
* Replace copy coroutine with pthread implementation.Samuel Williams2021-07-011-5/+2
|
* Add a cache for class variableseileencodes2021-06-181-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redo of 34a2acdac788602c14bf05fb616215187badd504 and 931138b00696419945dc03e10f033b1f53cd50f3 which were reverted. GitHub PR #4340. This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19] | |compare-ruby|built-ruby| |:--------|-----------:|---------:| |vm_cvar | 5.681M| 36.980M| | | -| 6.51x| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do |x| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Pack iseq_inline_constant_cache_entryNobuyoshi Nakada2021-06-091-11/+8
| | | | | Reordered iseq_inline_constant_cache_entry members not to exceed the size of RValue.
* Clarify these are just for MJITTakashi Kokubun2021-06-021-2/+2
| | | | | | and not for third-party libraries. See: e6484a153038703447b50fcac26349249922ab28
* Enable VM_ASSERT in --jit CIs (#4543)Takashi Kokubun2021-06-011-3/+3
|
* Add Thread#native_thread_id [Feature #17853]NARUSE, Yui2021-05-261-0/+7
|
* Update a comment about what 'inline' attr meansTakashi Kokubun2021-05-211-1/+4
|
* Revert "Filling cache values on cvar write"Aaron Patterson2021-05-111-5/+0
| | | | | This reverts commit 08de37f9fa3469365e6b5c964689ae2bae0eb9f3. This reverts commit e8ae922b62adb00a80d3d4c49f7d7b0e6026eaba.
* Add a cache for class variableseileencodes2021-05-111-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19] | |compare-ruby|built-ruby| |:--------|-----------:|---------:| |vm_cvar | 5.681M| 36.980M| | | -| 6.51x| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do |x| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Fix trivial -Wundef warningsBenoit Daloze2021-05-041-4/+4
| | | | | | * See [Feature #17752] Co-authored-by: xtkoba (Tee KOBAYASHI) <xtkoba+ruby@gmail.com>
* Adjust struct member offset for i386 Cygwinxtkoba2021-05-011-0/+8
| | | Fixes [Bug #17606]
* Use rb_fstring for "defined" strings.Aaron Patterson2021-03-171-1/+0
| | | | | | We can take advantage of fstrings to de-duplicate the defined strings. This means we don't need to keep the list of defined strings on the VM (or register them as mark objects)
* global call-cache cache table for rb_funcall*Koichi Sasada2021-01-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes Ruby's method with given receiver. Ruby 2.7 introduced inline method cache with static memory area. However, Ruby 3.0 reimplemented the method cache data structures and the inline cache was removed. Without inline cache, rb_funcall* searched methods everytime. Most of cases per-Class Method Cache (pCMC) will be helped but pCMC requires VM-wide locking and it hurts performance on multi-Ractor execution, especially all Ractors calls methods with rb_funcall*. This patch introduced Global Call-Cache Cache Table (gccct) for rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage method cache entry atomically and gccct enables method-caching without VM-wide locking. This table solves the performance issue on multi-ractor execution. [Bug #17497] Ruby-level method invocation does not use gccct because it has inline-method-cache and the table size is limited. Basically rb_funcall* is not used frequently, so 1023 entries can be enough. We will revisit the table size if it is not enough.