| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Because d80226e7bd often reduces the number of unloaded units, it
increases the number of unload_units calls, which are heavy.
To mitigate that, this throttles unload_units per `max_cache_size / 10`.
Also hoping to fix
https://ci.appveyor.com/project/ruby/ruby/builds/36552382/job/kjmjgw9cjyf2ksd7
|
|
|
|
|
|
|
|
| |
to avoid "Too many JIT code, but skipped unloading units for JIT compaction".
Now we can forget the `in_compact` locking.
Moving some functions from mjit.c to mjit_worker.c because mjit_worker.c
should have functions executed in the JIT worker.
|
|
|
|
| |
for cfd8c7e6ca9f923cee3a062b548d0824fc67e9a5.
|
|
|
|
|
| |
To avoid SEGV like
http://ci.rvm.jp/logfiles/brlog.trunk-mjit.20201124-061530
|
|
|
|
| |
to define USE_MJIT.
|
|
|
|
|
|
| |
vm_core.h needs to be included to know rb_execution_context_t, etc.
I also added a trivial refactoring in mjit.c and missing dependency for
process.c.
|
|
|
|
| |
:bow:
|
|
|
|
|
|
|
|
|
| |
* Re-generate C files for JIT compaction every time
* Refactor in_jit return logic
* Just write code in a single file
* Add a TODO comment [ci skip]
|
|
|
|
|
|
|
|
|
|
|
| |
This has been a TODO since 79df14c04b. While adcf0316d1 covered the
root_fiber of the initial thread, it didn't cover root_fibers of other
threads. Now it's hooked properly in rb_threadptr_root_fiber_setup.
With regards to "XXX: Is this mjit_cont `mjit_cont_free`d?", when
rb_threadptr_root_fiber_release is called, although I'm not sure when
th->root_fiber is truthy, fiber_free seems to call cont_free and
mjit_cont_free. So mjit_conts of root_fibers seem to be freed properly.
|
|
|
|
| |
Apparently #ifdef is always true
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_MSC_VER used to be the macro to switch JIT compaction. However, since
d4381d2ceb, the correct macro to switch it was changed from _MSC_VER
to _WIN32. As I didn't properly replace all relevant _MSC_VER usages
to _WIN32, these macros have been used inconsistently.
nobu replaced _WIN32 with USE_HEADER_TRANSFORMATION in 5eb446d12f3.
Therefore we had USE_HEADER_TRANSFORMATION and _MSC_VER. This commit
makes sure such inconsistent _MSC_VER usages will be unified to the new
header, also renaming it to USE_JIT_COMPACTION to be more precise about
the requirements. The header transformation itself is not quite relevant
to places changed in this commit.
|
|
|
|
| |
Thanks to Ractor (https://github.com/ruby/ruby/pull/2888 and https://github.com/ruby/ruby/pull/3662),
inline caches support parallel access now.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are seeing an error where code that is generated with MJIT contains
references to objects that have been moved. I believe this is due to a
race condition in the compaction function.
`gc_compact` has two steps:
1. Run a full GC to pin objects
2. Compact / update references
Step one is executed with `garbage_collect`. `garbage_collect` calls
`gc_enter` / `gc_exit`, these functions acquire a JIT lock and release a
JIT lock. So a lock is held for the duration of step 1.
Step two is executed by `gc_compact_after_gc`. It also holds a JIT
lock.
I believe the problem is that the JIT is free to execute between step 1
and step 2. It copies call cache values, but doesn't pin them when it
copies them. So the compactor thinks it's OK to move the call cache
even though it is not safe.
We need to hold a lock for the duration of `garbage_collect` *and*
`gc_compact_after_gc`. This patch introduces a lock level which
increments and decrements. The compaction function can increment and
decrement the lock level and prevent MJIT from executing during both
steps.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit introduces Ractor mechanism to run Ruby program in
parallel. See doc/ractor.md for more details about Ractor.
See ticket [Feature #17100] to see the implementation details
and discussions.
[Feature #17100]
This commit does not complete the implementation. You can find
many bugs on using Ractor. Also the specification will be changed
so that this feature is experimental. You will see a warning when
you make the first Ractor with `Ractor.new`.
I hope this feature can help programmers from thread-safety issues.
|
|
|
|
|
| |
Now that vm_empty_cc is VM_CALLCACHE_UNMARKABLE, it has to be properly
ruled out from being GCed.
|
|
|
|
|
|
|
| |
by calling combined functions specialized for each cancel type.
I'm hoping to improve locality of hot code, but this patch's impact should
be insignificant.
|
|
|
|
| |
To fix build failures.
|
|
|
|
| |
This shall fix compile errors.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to improve code locality.
Using benchmark-driver/sinatra with 100 methods JIT-ed,
[Before] 12149.97 rps
1.3M /tmp/_ruby_mjit_p31171u145.so
[After] 12818.83 rps
260K /tmp/_ruby_mjit_p32155u145.so
(VM is 13714.89 rps)
|
|
|
|
|
|
|
|
|
|
| |
Running C compiler for JIT compaction inside a critical section may lock
main thread for a long time when it triggers GC. As I'm planning to
increase this duration a bit, I'd like to make sure this doesn't stop
the world.
For now, I chose to give up unloading units when it's during JIT
compaction, assuming other calls may unload them later.
|
| |
|
|
|
| |
Split ruby.h
|
|
|
|
| |
changing add_iseq_to_process's debug counter name as well for comparison
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
jit_unit to avoid marking wrong cc entries when inlined iseq is compiled
multiple times, resolving the TODO added by daf7c48d88.
This obviates pseudo jit_unit in inlined iseq introduced by 7ec2359374
and fixes memory leak of the adhoc unit.
|
|
|
|
|
|
| |
Fixing SEGVs like:
http://ci.rvm.jp/results/trunk-mjit-wait@silicon-docker/2744905
http://ci.rvm.jp/results/trunk-mjit-wait@silicon-docker/2744420
http://ci.rvm.jp/results/trunk-mjit-wait@silicon-docker/2741400
|
| |
|
|
|
|
| |
Fixed a TODO in b9007b6c548f91e88fd3f2ffa23de740431fa969
|
|
|
|
| |
It was unnecessary in b9007b6c548f91e88fd3f2ffa23de740431fa969
|
|
|
|
|
| |
GC can invoke just after allocation of jit_unit->cc_entries so
it should be zero-cleared.
|
|
|
|
|
|
|
| |
ALLOC_N() can causes GC. Sometimes `mjit_copy_job_handler()`
can be called by mjit_worker thread which is not a Ruby thread,
so we need to prevent GC in this function. This patch has some
issues, but I introduce it to pass the tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
|
|
|
|
| |
The same as 8427fca49bd85205f5a8766292dd893f003c0e48.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
|
|
|
|
| |
Coverity Scan found this issue.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
https://ci.appveyor.com/project/ruby/ruby/builds/29230976/job/c910t37313edb97k
|
|
|
|
|
|
|
|
| |
This is a secret feature for me. It's only for testing and any behavior
with this flag override is unsupported.
I needed this because I sometimes want to add debug options but do not
want to disable optimizations, for using Linux perf.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.
Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.
This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.
Complications:
- A new instruction attribute `comptime_sp_inc` is introduced to
calculate SP increase at compile time without using call caches. At
compile time, a `TS_CALLDATA` operand points to a call info struct, but
at runtime, the same operand points to a call data struct. Instruction
that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
- MJIT code for copying call cache becomes slightly more complicated.
- This changes the bytecode format, which might break existing tools.
[Misc #16258]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now I'm not exactly sure why I needed to check `stop_worker_p` after
`mjit_copy_cache_from_main_thread` of `convert_unit_to_func`
in 4161674b2fbea6bdd01783ac5d3b39d88db22972.
If it's for avoiding deadlock under `in_gc` condition, we should keep it.
However, if it's not the case and it's just for retrying accidental
compilation failure or just to avoid `MJIT_ATOMIC_SET` and
`compact_all_jit_code`, I think this quick stop path is not mandatory.
Because this path is somewhat problematic in my upcoming fix in
mjit_worker, let me try to remove this first and see how CI goes.
|
|
|
|
|
|
|
|
|
|
|
| |
for all compilations and compaction.
Prior to this commit, the last-compiled code has not been used because
MJIT worker is stopped before setting the code, and compaction has also
been skipped.
But it was not intentional and `wait: true` pause should wait until
those two things by its feature.
|
|
|
|
|
| |
`iseq->body->jit_unit->compile_info` should not be referenced before
the null check of `iseq->body->jit_unit`.
|
|
|
|
|
| |
Tabs are expanded because previously the file did not have any tab indentation.
Please update your editor config, and use misc/expand_tabs.rb in the pre-commit hook.
|
| |
|