ruby.git - rhe's working repository

	Commit message (Collapse)	Author	Age	Files	Lines
*	YJIT: Add --yjit-perf (#8697)	Takashi Kokubun	2023-10-18	1	-1/+0
\| \| \|	Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
*	YJIT: Skip adding past_page_bytes for past pages (#8433)	Takashi Kokubun	2023-09-13	1	-2/+22
\| \| \|	YJIT: Skip adding past_pages_bytes for past pages
*	YJIT: Make compiled_* stats available by default (#8379)	Takashi Kokubun	2023-09-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	* YJIT: Make compiled_* stats available by default * Update comment about default counters [ci skip] Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com> --------- Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
*	YJIT: x64: Split mem-to-mem Insn::Store like Insn::Mov	Alan Wu	2023-08-22	1	-1/+1
\| \| \|	The ARM backend allows for this so let's make x64 consistent.
*	YJIT: implement fast path for integer multiplication in opt_mult (#8204)	Maxime Chevalier-Boisvert	2023-08-18	3	-1/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* YJIT: implement fast path for integer multiplication in opt_mult * Update yjit/src/codegen.rs Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> * Implement mul with overflow checking on arm64 * Fix missing semicolon * Add arm splitting for lshift, rshift, urshift --------- Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
*	YJIT: implement codegen for rb_int_lshift (#8201)	Maxime Chevalier-Boisvert	2023-08-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	* YJIT: implement codegen for rb_int_lshift * Update yjit/src/asm/x86_64/mod.rs Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> --------- Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
*	YJIT: implement imul instruction encoding in x86 assembler (#8191)	Maxime Chevalier-Boisvert	2023-08-09	2	-0/+34
\|
*	Implement MUL instruction for aarch64 (#8193)	Kevin Newton	2023-08-09	3	-0/+94
\|
*	YJIT: expand bitwise shift support in x86 assembler (#8174)	Maxime Chevalier-Boisvert	2023-08-04	2	-15/+26
\|
*	Add a newline at EOF [ci skip]	Nobuyoshi Nakada	2023-05-24	1	-1/+1
\|
*	YJIT: Fix build on A64	Alan Wu	2023-04-11	1	-1/+1
\| \| \| \|	Typo fix for the last commit (1432b37)
*	YJIT: Fix a compilation warning in x86_64	Takashi Kokubun	2023-04-11	1	-0/+1
\| \| \| \|	This is used only for arm64's cb.jmp_ptr_bytes().
*	YJIT: Reduce paddings if --yjit-exec-mem-size <= 128 on arm64 (#7671)	Takashi Kokubun	2023-04-11	1	-7/+9
\| \| \| \| \| \| \|	* YJIT: Reduce paddings if --yjit-exec-mem-size <= 128 on arm64 * YJIT: Define jmp_ptr_bytes on CodeBlock
*	YJIT: Count the number of actually written bytes (#7658)	Takashi Kokubun	2023-04-05	1	-13/+29
\|
*	YJIT: code_gc(): Assert self is inline to avoid other_cb()	Alan Wu	2023-03-29	1	-3/+6
\| \| \| \| \| \| \|	The derived `&mut` from `other_cb()` overlapped with the parameter `ocb`. Use `cfg!()` instead of `#[cfg...]` to avoid unused warnings.
*	YJIT: Fix overlapping &mut in Assembler::code_gc()	Alan Wu	2023-03-29	1	-10/+6
\| \| \| \| \| \| \| \| \|	Making overlapping `&mut`s triggers Undefined Bahavior. This function previously had them through `cb` and `ocb` aliasing with `self` or live references in the caller. To fix the overlap, take `ocb` as a parameter and don't use `get_inline_cb()` in the body of the function.
*	YJIT: Fix a cargo test warning on x86_64 (#7428)	Takashi Kokubun	2023-03-03	1	-0/+1
\|
*	YJIT: Delete stale `frozen_bytes` related code (#7423)	Alan Wu	2023-03-02	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code and comments in there have been disabled by comments for a long time. The issues that the counter used to solve are now solved more comprehensively by "runningness" [tracking][1] introduced by Code GC and [delayed deallocation][2]. Having a single counter doesn't fit our current model where code pages that could be touched or not are interleaved, anyway. Just delete the code. [1]: e7c71c6c9271b0c29f210769159090e17128e740 [2]: a0b0365e905e1ac51998ace7e6fc723406a2f157
*	YJIT: Fix assertion for partially mapped last pages (#7337)	Takashi Kokubun	2023-02-20	1	-1/+1
\| \| \|	Follows up [Bug #19400]
*	YJIT: add counters for polymorphic send and send with known class (#7288)	Maxime Chevalier-Boisvert	2023-02-10	1	-2/+2
\|
*	YJIT: Use the system page size when the code page size is too small (#7267)	Alan Wu	2023-02-09	1	-28/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously on ARM64 Linux systems that use 64 KiB pages (`CONFIG_ARM64_64K_PAGES=y`), YJIT was panicking on boot due to a failed assertion. The assertion was making sure that code GC can free the last code page that YJIT manages without freeing unrelated memory. YJIT prefers picking 16 KiB as the granularity at which to free code memory, but when the system can only free at 64 KiB granularity, that is not possible. The fix is to use the system page size as the code page size when the system page size is 64 KiB. Continue to use 16 KiB as the code page size on common systems that use 16/4 KiB pages. Add asserts to code_gc() and free_page() about code GC's assumptions. Fixes [Bug #19400]
*	Fix typos in YJIT [ci skip]	Alan Wu	2023-02-02	3	-4/+4
\|
*	YJIT: other_cb is None in tests	Alan Wu	2023-02-02	1	-0/+1
\| \| \| \| \|	Since the other cb is in CodegenGlobals, and we want Rust tests to be self-contained.
*	YJIT: Move CodegenGlobals::freed_pages into an Rc	Alan Wu	2023-02-02	1	-13/+35
\| \| \| \| \|	This allows for supplying a freed_pages vec in Rust tests. We need it so we can test scenarios that occur after code GC.
*	Add stats so we can keep track of x86 rel32 vs register calls (#7142)	Maxime Chevalier-Boisvert	2023-01-18	1	-0/+4
\| \| \| \| \| \| \|	* Add stats so we can keep track of x86 rel32 vs register calls To know if we get that "prime real estate" as Alan put it. * Fix bug pointed by Alan
*	Enable `clippy` checks for yjit in CI (#7093)	Ian Ker-Seymer	2023-01-12	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	* Add job to check clippy lints in CI * Address all remaining clippy lints * Check lints on arm64 as well * Apply latest clippy lints * Do not exit 0 on clippy warnings
*	Strip trailing spaces [ci skip]	Nobuyoshi Nakada	2023-01-12	2	-3/+3
\|
*	YJIT: Fix a compilation warning with release build (#7092)	Takashi Kokubun	2023-01-10	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	warning: unused variable: `start_addr` --> ../yjit/src/asm/mod.rs:359:39 \| 359 \| pub fn remove_comments(&mut self, start_addr: CodePtr, end_addr: CodePtr) { \| ^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_start_addr` \| = note: `#[warn(unused_variables)]` on by default warning: unused variable: `end_addr` --> ../yjit/src/asm/mod.rs:359:60 \| 359 \| pub fn remove_comments(&mut self, start_addr: CodePtr, end_addr: CodePtr) { \|
*	YJIT: Remove old comments for regenerated branches (#7083)	Takashi Kokubun	2023-01-09	1	-0/+7
\|
*	YJIT: Remove --yjit-code-page-size (#6865)	Alan Wu	2022-12-05	1	-30/+29
\| \| \| \| \|	Certain code page sizes don't work and can cause crashes, so having this value available as a command-line option is a bit dangerous. Remove it and turn it into a constant instead.
*	YJIT: Respect destination num_bits on STUR (#6848)	Takashi Kokubun	2022-12-01	1	-2/+7
\|
*	YJIT: fix 32 and 16 bit register store (#6840)	Jemma Issroff	2022-12-01	2	-1/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix 32 and 16 bit register store in YJIT Co-Authored-By: Takashi Kokubun <takashikkbn@gmail.com> * Remove an unnecessary diff * Reuse an rm_num_bits result * Use u16::MAX instead * Update the link Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> * Just use sturh for 16 bits Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
*	YJIT: Fix IseqPayload::pages memory bloat	Alan Wu	2022-11-30	1	-1/+1
\| \| \| \| \| \|	HashSet::clear() doesn't deallocate the backing buffer and shrink the capacity. Replace with a 0-capcity set instead so we reclaim some memory each code GC.
*	YJIT: Use NonNull pointer for CodePtr (#6792)	Takashi Kokubun	2022-11-23	1	-1/+2
\|
*	Fix YJIT backend to account for unsigned int immediates (#6789)	Jemma Issroff	2022-11-23	2	-4/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	YJIT: x86_64: Fix cmp with number where sign bit is set Before this commit, we were unconditionally treating unsigned ints as signed ints when counting the number of bits required for representing the immediate in machine code. When the size of the immediate matches the size of the other operand, no sign extension happens, so this was incorrect. `asm.cmp(opnd64, 0x8000_0000)` panicked even though it's encodable as `CMP r/m32, imm32`. Large shape ids were impacted by this issue. Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org> Co-Authored-By: Alan Wu <alanwu@ruby-lang.org> Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> Co-authored-by: Alan Wu <alanwu@ruby-lang.org>
*	32 bit comparison on shape id	Aaron Patterson	2022-11-18	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit changes the shape id comparisons to use a 32 bit comparison rather than 64 bit. That means we don't need to load the shape id to a register on x86 machines. Given the following program: ```ruby class Foo def initialize @foo = 1 @bar = 1 end def read [@foo, @bar] end end foo = Foo.new foo.read foo.read foo.read foo.read foo.read puts RubyVM::YJIT.disasm(Foo.instance_method(:read)) ``` The machine code we generated _before_ this change is like this: ``` == BLOCK 1/4, ISEQ RANGE [0,3), 65 bytes ====================== # getinstancevariable 0x559a18623023: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x559a18623027: test al, 7 0x559a1862302a: jne 0x559a1862502d 0x559a18623030: cmp rax, 4 0x559a18623034: jbe 0x559a1862502d # guard shape, embedded, and T_OBJECT 0x559a1862303a: mov rcx, qword ptr [rax] 0x559a1862303d: movabs r11, 0xffff00000000201f 0x559a18623047: and rcx, r11 0x559a1862304a: movabs r11, 0xb000000002001 0x559a18623054: cmp rcx, r11 0x559a18623057: jne 0x559a18625046 0x559a1862305d: mov rax, qword ptr [rax + 0x18] 0x559a18623061: mov qword ptr [rbx], rax == BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/4, ISEQ RANGE [3,6), 47 bytes ====================== # gen_direct_jmp: fallthrough # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x559a18623064: mov rax, qword ptr [r13 + 0x18] # guard shape, embedded, and T_OBJECT 0x559a18623068: mov rcx, qword ptr [rax] 0x559a1862306b: movabs r11, 0xffff00000000201f 0x559a18623075: and rcx, r11 0x559a18623078: movabs r11, 0xb000000002001 0x559a18623082: cmp rcx, r11 0x559a18623085: jne 0x559a18625099 0x559a1862308b: mov rax, qword ptr [rax + 0x20] 0x559a1862308f: mov qword ptr [rbx + 8], rax ``` After this change, it's like this: ``` == BLOCK 1/4, ISEQ RANGE [0,3), 41 bytes ====================== # getinstancevariable 0x5560c986d023: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x5560c986d027: test al, 7 0x5560c986d02a: jne 0x5560c986f02d 0x5560c986d030: cmp rax, 4 0x5560c986d034: jbe 0x5560c986f02d # guard shape 0x5560c986d03a: cmp word ptr [rax + 6], 0x19 0x5560c986d03f: jne 0x5560c986f046 0x5560c986d045: mov rax, qword ptr [rax + 0x10] 0x5560c986d049: mov qword ptr [rbx], rax == BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/4, ISEQ RANGE [3,6), 23 bytes ====================== # gen_direct_jmp: fallthrough # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x5560c986d04c: mov rax, qword ptr [r13 + 0x18] # guard shape 0x5560c986d050: cmp word ptr [rax + 6], 0x19 0x5560c986d055: jne 0x5560c986f099 0x5560c986d05b: mov rax, qword ptr [rax + 0x18] 0x5560c986d05f: mov qword ptr [rbx + 8], rax ``` The first ivar read is a bit more complex, but the second ivar read is much simpler. I think eventually we could teach the context about the shape, then emit only one shape guard.
*	YJIT: Always encode Opnd::Value in 64 bits on x86_64 for GC offsets (#6733)	Takashi Kokubun	2022-11-15	2	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* YJIT: Always encode Opnd::Value in 64 bits on x86_64 for GC offsets Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> * Introduce heap_object_p * Leave original mov intact * Remove unneeded branches * Add a test for movabs Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
*	YJIT: Include actual memory region size in stats (#6736)	Takashi Kokubun	2022-11-15	1	-2/+5
\|
*	Implement LDURH on Aarch64	Aaron Patterson	2022-11-14	2	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When RUBY_DEBUG is enabled, shape ids are 16 bits. I would like to do 16 bit comparisons, so I need to load halfwords sometimes. This commit adds LDURH so that I can load halfwords. https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/LDURH--Load-Register-Halfword--unscaled--?lang=en I verified the bytes using clang: ``` $ cat asmthing.s .global _start .align 2 _start: ldurh w10, [x1] ldurh w10, [x1, #123] $ as asmthing.s -o asmthing.o && objdump --disassemble asmthing.o asmthing.o: file format mach-o arm64 Disassembly of section __TEXT,__text: 0000000000000000 <ltmp0>: 0: 2a 00 40 78 ldurh w10, [x1] 4: 2a b0 47 78 ldurh w10, [x1, #123] ```
*	YJIT: Reset dropped_bytes when patching code	Alan Wu	2022-11-08	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	We switch to a new page when we detect dropped_bytes flipping from false to true. Previously, when we patch code for invalidation during code gc, we start with the flag being set to true, so we failed to apply patches that straddle pages. We would write out jumps half way and then stop, which left the code corrupted. Reset the flag before patching so we patch across pages properly.
*	YJIT: Free pages after ObjectSpace API usages (#6676)	Takashi Kokubun	2022-11-07	1	-13/+15
\|
*	YJIT: Make Code GC metrics available for non-stats builds (#6665)	Takashi Kokubun	2022-11-03	1	-2/+0
\|
*	YJIT: Stop incrementing write_pos if cb.has_dropped_bytes (#6664)	Takashi Kokubun	2022-11-03	1	-6/+6
\| \| \| \| \|	Co-Authored-By: Alan Wu <alansi.xingwu@shopify.com> Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
*	YJIT: Avoid accumulating freed pages in the payload (#6657)	Takashi Kokubun	2022-11-02	1	-0/+6
\| \| \| \| \| \| \|	Co-Authored-By: Alan Wu <alansi.xingwu@shopify.com> Co-Authored-By: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Alan Wu <alansi.xingwu@shopify.com> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
*	YJIT: Add RubyVM::YJIT.code_gc (#6644)	Takashi Kokubun	2022-10-31	1	-6/+18
\| \| \| \| \|	* YJIT: Add RubyVM::YJIT.code_gc * Rename compiled_page_count to live_page_count
*	YJIT: GC and recompile all code pages (#6406)	Takashi Kokubun	2022-10-25	1	-10/+169
\| \| \| \| \|	when it fails to allocate a new page. Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
*	YJIT: Fix page rounding for icache busting	Alan Wu	2022-10-21	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we found the current page by rounding the current pointer to the closest smaller page size. This is incorrect because pages are relative to the start of the address we reserve. For example, if the starting address is 12KiB modulo the 16KiB page size, once we have more than 4KiB of code, calculating with the address would incorrectly give us page 1 when we're actually still on page 0. Previously, I can reproduce crashes with: make btest RUN_OPTS=--yjit-code-page-size=32 on ARM64 macOS, where system page sizes are 16KiB.
*	YJIT: Skip dumping code for the other cb on --yjit-dump-disasm (#6592)	Takashi Kokubun	2022-10-19	1	-2/+1
\| \| \| \| \|	YJIT: Skip dumping code for the other cb on --yjit-dump-disasm
*	YJIT: fix a #[warn(unused_parens)]	Alan Wu	2022-10-19	1	-1/+1
\|
*	YJIT: fold the "asm_comments" feature into "disasm" (#6591)	Alan Wu	2022-10-19	2	-7/+7
\| \| \| \| \|	Previously, enabling only "disasm" didn't actually build. Since these two features are closely related and we don't really use one without the other, let's simplify and merge the two features together.