aboutsummaryrefslogtreecommitdiffstats
path: root/yjit/bindgen/src/main.rs
Commit message (Collapse)AuthorAgeFilesLines
* Optimized forwarding callers and calleesAaron Patterson2024-06-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls. Calls it optimizes look like this: ```ruby def bar(a) = a def foo(...) = bar(...) # optimized foo(123) ``` ```ruby def bar(a) = a def foo(...) = bar(1, 2, ...) # optimized foo(123) ``` ```ruby def bar(*a) = a def foo(...) list = [1, 2] bar(*list, ...) # optimized end foo(123) ``` All variants of the above but using `super` are also optimized, including a bare super like this: ```ruby def foo(...) super end ``` This patch eliminates intermediate allocations made when calling methods that accept `...`. We can observe allocation elimination like this: ```ruby def m x = GC.stat(:total_allocated_objects) yield GC.stat(:total_allocated_objects) - x end def bar(a) = a def foo(...) = bar(...) def test m { foo(123) } end test p test # allocates 1 object on master, but 0 objects with this patch ``` ```ruby def bar(a, b:) = a + b def foo(...) = bar(...) def test m { foo(1, b: 2) } end test p test # allocates 2 objects on master, but 0 objects with this patch ``` How does it work? ----------------- This patch works by using a dynamic stack size when passing forwarded parameters to callees. The caller's info object (known as the "CI") contains the stack size of the parameters, so we pass the CI object itself as a parameter to the callee. When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee. The CI at the forwarded call site is adjusted using information from the caller's CI. I think this description is kind of confusing, so let's walk through an example with code. ```ruby def delegatee(a, b) = a + b def delegator(...) delegatee(...) # CI2 (FORWARDING) end def caller delegator(1, 2) # CI1 (argc: 2) end ``` Before we call the delegator method, the stack looks like this: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 4| # | 5| delegatee(...) # CI2 (FORWARDING) | 6| end | 7| | 8| def caller | -> 9| delegator(1, 2) # CI1 (argc: 2) | 10| end | ``` The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in to `delegator`, it writes `CI1` on to the stack as a local variable for the `delegator` method. The `delegator` method has a special local called `...` that holds the caller's CI object. Here is the ISeq disasm fo `delegator`: ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil 0006 leave [Re] ``` The local called `...` will contain the caller's CI: CI1. Here is the stack when we enter `delegator`: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 -> 4| # | CI1 (argc: 2) 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me 6| end | specval 7| | type 8| def caller | 9| delegator(1, 2) # CI1 (argc: 2) | 10| end | ``` The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to memcopy the caller's stack before calling `delegatee`. In this case, it will memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much memory to copy from the caller because `CI1` contains stack size information (argc: 2). Before executing the `send` instruction, we push `...` on the stack. The `send` instruction pops `...`, and because it is tagged with `FORWARDING`, it knows to memcopy (using the information in the CI it just popped): ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil 0006 leave [Re] ``` Instruction 001 puts the caller's CI on the stack. `send` is tagged with FORWARDING, so it reads the CI and _copies_ the callers stack to this stack: ``` Executing Line | Code | Stack ---------------+---------------------------------------+-------- 1| def delegatee(a, b) = a + b | self 2| | 1 3| def delegator(...) | 2 4| # | CI1 (argc: 2) -> 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me 6| end | specval 7| | type 8| def caller | self 9| delegator(1, 2) # CI1 (argc: 2) | 1 10| end | 2 ``` The "FORWARDING" call site combines information from CI1 with CI2 in order to support passing other values in addition to the `...` value, as well as perfectly forward splat args, kwargs, etc. Since we're able to copy the stack from `caller` in to `delegator`'s stack, we can avoid allocating objects. I want to do this to eliminate object allocations for delegate methods. My long term goal is to implement `Class#new` in Ruby and it uses `...`. I was able to implement `Class#new` in Ruby [here](https://github.com/ruby/ruby/pull/9289). If we adopt the technique in this patch, then we can optimize allocating objects that take keyword parameters for `initialize`. For example, this code will allocate 2 objects: one for `SomeObject`, and one for the kwargs: ```ruby SomeObject.new(foo: 1) ``` If we combine this technique, plus implement `Class#new` in Ruby, then we can reduce allocations for this common operation. Co-Authored-By: John Hawthorn <john@hawthorn.email> Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
* Do not emit shape transition warnings when YJIT is compilingJean Boussier2024-06-041-1/+1
| | | | | | | [Bug #20522] If `Warning.warn` is redefined in Ruby, emitting a warning would invoke Ruby code, which can't safely be done when YJIT is compiling.
* Stop marking chilled strings as frozenÉtienne Barrié2024-05-281-0/+1
| | | | | | | | | | | | | They were initially made frozen to avoid false positives for cases such as: str = str.dup if str.frozen? But this may cause bugs and is generally confusing for users. [Feature #20205] Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
* YJIT: Add specialized codegen function for `TrueClass#===` (#10640)Randy Stauner2024-04-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * YJIT: Add specialized codegen function for `TrueClass#===` TrueClass#=== is currently number 10 in the most frequent C calls list of the lobsters benchmark. ``` require "benchmark/ips" def wrap true === true true === false true === :x end Benchmark.ips do |x| x.report(:wrap) do wrap end end ``` ``` before Warming up -------------------------------------- wrap 1.791M i/100ms Calculating ------------------------------------- wrap 17.806M (± 1.0%) i/s - 89.544M in 5.029363s after Warming up -------------------------------------- wrap 4.024M i/100ms Calculating ------------------------------------- wrap 40.149M (± 1.1%) i/s - 201.223M in 5.012527s ``` Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com> Co-authored-by: Kevin Menard <kevin.menard@shopify.com> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> * Fix the new test for RJIT --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com> Co-authored-by: Kevin Menard <kevin.menard@shopify.com> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* YJIT: Optimize local variables when EP == BP (take 2) (#10607)Takashi Kokubun2024-04-251-0/+2
| | | | | | | | | | | | | * Revert "Revert "YJIT: Optimize local variables when EP == BP" (#10584)" This reverts commit c8783441952217c18e523749c821f82cd7e5d222. * YJIT: Take care of GC references in ISEQ invariants Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> --------- Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
* YJIT: Add a specialized codegen function for `Class#superclass`. (#10613)Kevin Menard2024-04-241-0/+1
| | | | | | | | Add a specialized codegen function for `Class#superclass`. Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com> Co-authored-by: Randy Stauner <randy.stauner@shopify.com> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Revert "YJIT: Optimize local variables when EP == BP" (#10584)Alan Wu2024-04-191-2/+0
| | | | | | This reverts commit 4cc58ea0b865f2fd20f1e881ddbd4c4fab0b072c. Since the change landed call-threshold=1 CI runs have been timing out. There has also been `verify-ctx` violations. Revert for now while we debug.
* YJIT: Optimize local variables when EP == BP (#10487)Takashi Kokubun2024-04-171-0/+2
|
* YJIT: Support splat with C methods with -1 arityAlan Wu2024-02-271-0/+2
| | | | | Usually we deal with splats by speculating that they're of a specific size. In this case, the C method takes a pointer and a length, so we can support changing sizes just fine.
* YJIT: Pass nil to anonymous kwrest when empty (#9972)Alan Wu2024-02-151-0/+1
| | | | | | | This is the same optimization as e4272fd29 ("Avoid allocation when passing no keywords to anonymous kwrest methods") but for YJIT. For anonymous kwrest parameters, nil is just as good as an empty hash. On the usage side, update `splatkw` to handle `nil` with a leaner path.
* Move rb_class_allocate_instance from gc.c to object.cPeter Zhu2024-02-141-1/+4
|
* Specialize String#byteslice(a, b) (#9939)Aaron Patterson2024-02-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | * Specialize String#byteslice(a, b) This adds a specialization for String#byteslice when there are two parameters. This makes our protobuf parser go from 5.84x slower to 5.33x slower ``` Comparison: decode upstream (53738 bytes): 7228.5 i/s decode protobuff (53738 bytes): 1236.8 i/s - 5.84x slower Comparison: decode upstream (53738 bytes): 7024.8 i/s decode protobuff (53738 bytes): 1318.5 i/s - 5.33x slower ``` * Update yjit/src/codegen.rs --------- Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
* YJIT: Add support for `**kwrest` parametersAlan Wu2024-02-121-2/+1
| | | | | | Now that `...` uses `**kwrest` instead of regular splat and ruby2keywords, we need to support these type of methods to support `...` well.
* YJIT: Add top ISEQ call counts to --yjit-stats (#9906)Takashi Kokubun2024-02-091-0/+2
|
* YJIT: Add codegen for Float arithmetics (#9774)Takashi Kokubun2024-01-311-0/+4
| | | | | * YJIT: Add codegen for Float arithmetics * Add Flonum and Fixnum tests
* YJIT: Fix exits on splatkw instruction (#9711)Takashi Kokubun2024-01-261-0/+1
|
* YJIT: Support concattoarray and pushtoarray (#9708)Takashi Kokubun2024-01-251-0/+1
|
* YJIT: Avoid leaks by skipping objects with a singleton classAlan Wu2024-01-241-0/+1
| | | | | | | | | | | | | | For receiver with a singleton class, there are multiple vectors YJIT can end up retaining the object. There is a path in jit_guard_known_klass() that bakes the receiver into the code, and the object could also be kept alive indirectly through a path starting at the CME object baked into the code. To avoid these leaks, avoid compiling calls on objects with a singleton class. See: https://github.com/Shopify/ruby/issues/552 [Bug #20209]
* YJIT: Fix ruby2_keywords splat+rest and drop bogus checksAlan Wu2024-01-231-0/+1
| | | | | | | | | | | | | | | | | | YJIT didn't guard for ruby2_keywords hash in case of splat calls that land in methods with a rest parameter, creating incorrect results. The compile-time checks didn't correspond to any actual effects of ruby2_keywords, so it was masking this bug and YJIT was needlessly refusing to compile some code. About 16% of fallback reasons in `lobsters` was due to the ISeq check. We already handle the tagging part with exit_if_supplying_kw_and_has_no_kw() and should now have a dynamic guard for all splat cases. Note for backporting: You also need 7f51959ff1. [Bug #20195]
* YJIT: Optimize defined?(yield) (#9599)Takashi Kokubun2024-01-191-0/+1
| | | | | | | * YJIT: Optimize defined?(yield) * Remove an irrelevant comment * s/get/gen/
* YJIT: Fix unused warningsAlan Wu2024-01-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ``` warning: unused import: `condition::Condition` --> src/asm/arm64/arg/mod.rs:13:9 | 13 | pub use condition::Condition; | ^^^^^^^^^^^^^^^^^^^^ | = note: `#[warn(unused_imports)]` on by default warning: unused import: `rb_yjit_fix_mul_fix as rb_fix_mul_fix` --> src/cruby.rs:188:9 | 188 | pub use rb_yjit_fix_mul_fix as rb_fix_mul_fix; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ warning: unused import: `rb_insn_len as raw_insn_len` --> src/cruby.rs:142:9 | 142 | pub use rb_insn_len as raw_insn_len; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = note: `#[warn(unused_imports)]` on by default ``` Make asm public so it stops warning about unused public stuff in there.
* Refactor rb_shape_transition_shape_capa outJean Boussier2023-11-081-1/+0
| | | | | | | | | | | | | | | | Right now the `rb_shape_get_next` shape caller need to first check if there is capacity left, and if not call `rb_shape_transition_shape_capa` before it can call `rb_shape_get_next`. And on each of these it needs to checks if we got a TOO_COMPLEX back. All this logic is duplicated in the interpreter, YJIT and RJIT. Instead we can have `rb_shape_get_next` do the capacity transition when needed. The caller can compare the old and new shapes capacity to know if resizing is needed. It also can check for TOO_COMPLEX only once.
* YJIT: Lookup IDs on boot instead of binding to themAlan Wu2023-10-171-3/+1
| | | | | | | | | | Previously, the version-controlled `cruby_bindings.inc.rs` file contained the build-time artifact `id.h`, which nobu mentioned hinders the goal of having fewer magic numbers in the repository. Lookup the IDs YJIT needs on boot. It costs cycles, but it's fine since YJIT only uses a handful of IDs at the moment. No perceptible degradation to boot time found in my testing.
* YJIT: Remove duplicate cfp->iseq accessorAlan Wu2023-10-051-1/+0
|
* YJIT: Plug native stack overflowAlan Wu2023-09-141-0/+1
| | | | | | | Previously, TestStack#test_machine_stack_size failed pretty consistently on ARM64 macOS, with Rust code and part of the interpreter used for per-instruction fallback (rb_vm_invokeblock() and friends) touching the stack guard page and crashing with SEGV. I've also seen the same test fail on x64 Linux, though with a different symptom.
* YJIT: implement side chain fallback for setlocal to avoid exiting (#8227)Maxime Chevalier-Boisvert2023-08-171-0/+1
| | | | | | | | | | | * YJIT: implement side chain fallback for setlocal to avoid exiting * Update yjit/src/codegen.rs Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> --------- Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* YJIT: Implement checkmatch instruction (#8203)Takashi Kokubun2023-08-101-0/+1
|
* YJIT: Count throw instructions for each tag (#8188)Takashi Kokubun2023-08-091-0/+2
| | | | | * YJIT: Count throw instructions for each tag * Show % of each throw type
* YJIT: Compile exception handlers (#8171)Takashi Kokubun2023-08-081-0/+2
| | | Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
* YJIT: Move ROBJECT_OFFSET_* to yjit.c (#8157)Takashi Kokubun2023-08-021-2/+3
|
* YJIT: getblockparamproxy for when block is a ProcAlan Wu2023-07-271-0/+1
|
* YJIT: Fallback send instructions to vm_sendish (#8106)Takashi Kokubun2023-07-241-0/+2
|
* YJIT: refactoring to allow for fancier call threshold logic (#8078)Maxime Chevalier-Boisvert2023-07-171-0/+1
| | | | | | | | | | | | | * YJIT: refactoring to allow for fancier call threshold logic * Avoid potentially compiling functions multiple times. * Update vm.c Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> --------- Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* YJIT: Make ratio_in_yjit always available (#8064)Takashi Kokubun2023-07-131-0/+3
|
* YJIT: Avoid identity-based known-class guards for IO objects (#7911)Alan Wu2023-06-061-0/+1
| | | | | | `IO#reopen` is very special in that it is able to change the class and singleton class of IO instances. In its presence, it is not correct to assume that IO instances has a stable class/singleton class and guard by comparing identity.
* YJIT: Add codegen for Integer methods (#7665)Takashi Kokubun2023-04-051-1/+6
| | | | | | | * YJIT: Add codegen for Integer methods * YJIT: Update dependencies * YJIT: Fix Integer#[] for argc=2
* [Feature #19579] Remove !USE_RVARGC code (#7655)Peter Zhu2023-04-041-1/+0
| | | | | | | | | | | Remove !USE_RVARGC code [Feature #19579] The Variable Width Allocation feature was turned on by default in Ruby 3.2. Since then, we haven't received bug reports or backports to the non-Variable Width Allocation code paths, so we assume that nobody is using it. We also don't plan on maintaining the non-Variable Width Allocation code, so we are going to remove it.
* YJIT: Add codegen for Array#<< (#7645)Takashi Kokubun2023-04-031-0/+1
|
* Remove an unneeded function copyTakashi Kokubun2023-04-011-1/+1
|
* YJIT: Support entry for multiple PCs per ISEQ (GH-7535)Takashi Kokubun2023-03-171-0/+1
|
* YJIT: Assert that we have the VM lock while markingAlan Wu2023-03-151-0/+2
| | | | | Somewhat important because having the lock is a key part of the soundness reasoning for the `unsafe` usage here.
* YJIT: Introduce no_gc attribute (#7511)Takashi Kokubun2023-03-141-2/+3
|
* YJIT: Handle rest+splat where non-splat < required (#7499)Jimmy Miller2023-03-131-0/+1
|
* Add defined_ivar as YJIT instruction as wellOle Friis Østergaard2023-03-081-0/+1
| | | | | | | | | | This works much like the existing `defined` implementation, but calls out to rb_ivar_defined instead of the more general rb_vm_defined. Other difference to the existing `defined` implementation is that this new instruction has to take the same operands as the CRuby `defined_ivar` instruction.
* YJIT: Handle splat+rest for args pass greater than required (#7468)Jimmy Miller2023-03-071-0/+1
| | | | | | | | | | | For example: ```ruby def my_func(x, y, *rest) p [x, y, rest] end my_func(1, 2, 3, *[4, 5]) ```
* YJIT: Handle special case of splat and rest lining up (#7422)Jimmy Miller2023-03-071-0/+1
| | | | | | | | | | | | | | | | If you have a method that takes rest arguments and a splat call that happens to line up perfectly with that rest, you can just dupe the array rather than move anything around. We still have to dupe, because people could have a custom to_a method or something like that which means it is hard to guarantee we have exclusive access to that array. Example: ```ruby def foo(a, b, *rest) end foo(1, 2, *[3, 4]) ```
* Merge internal/intern/gc.h into internal/gc.hMatt Valentine-House2023-02-271-1/+1
|
* Move `attached_object` into `rb_classext_struct`Jean Boussier2023-02-161-0/+1
| | | | | | Given that signleton classes don't have an allocator, we can re-use these bytes to store the attached object in `rb_classext_struct` without making it larger.
* YJIT: `Kernel#{is_a?,instance_of?}` fast paths (GH-7297)Jimmy Miller2023-02-151-0/+1
| | | | Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* YJIT: Optimize != for Integers and Strings (#7301)Takashi Kokubun2023-02-141-0/+1
|