2 files changed, 93 insertions, 4 deletions
diff --git a/README.md b/README.md
index 9ea99fcb73..dc890da0bc 100644
--- a/README.md
+++ b/README.md
@@ -8,11 +8,11 @@
 YJIT - Yet Another Ruby JIT
 ===========================
 
-**DISCLAIMER: Please note that this project is in early stages of development. It is very much a work in progress, it may cause your software to crash, and current performance results are likely to leave you feeling underwhelmed.**
+**DISCLAIMER: Please note that this project is experimental. It is very much a work in progress, it may cause your software to crash, and current performance results will vary widely, especially on larger applications.**
 
-YJIT is a lightweight, minimalistic Ruby JIT built inside the CRuby/MRI binary.
+YJIT is a lightweight, minimalistic Ruby JIT built inside CRuby.
 It lazily compiles code using a Basic Block Versioning (BBV) architecture. The target use case is that of servers running
-Ruby on Rails, an area where CRuby's MJIT has not yet managed to deliver speedups.
+Ruby on Rails, an area where MJIT has not yet managed to deliver speedups.
 To simplify development, we currently support only macOS and Linux on x86-64, but an ARM64 backend
 is part of future plans.
 This project is open source and falls under the same license as CRuby.
@@ -49,6 +49,16 @@ Because there is no GC for generated code yet, your software could run out of ex
 
 ## Installation
 
+Current YJIT versions are installed by default with CRuby. Make sure to specify the "--yjit" command line option to enable it at runtime.
+
+Experimental YJIT patches that have not yet been merged with CRuby can be found in ruby-build:
+
+```
+ruby-build yjit-dev ~/.rubies/ruby-yjit-dev
+```
+
+They can also be found in the Shopify/yjit repository, which is cloned and build like CRuby.
+
 Start by cloning the `Shopify/yjit` repository:
 
 ```
@@ -71,7 +81,7 @@ Typically configure will choose default C compiler. To specify the C compiler, u
 # Choosing a specific c compiler
 export CC=/path/to/my/choosen/c/compiler
 ```
-before runing `./configure`.
+before running `./configure`.
 
 You can test that YJIT works correctly by running:
 
diff --git a/doc/yjit/yjit_hacking.md b/doc/yjit/yjit_hacking.md
new file mode 100644
index 0000000000..088a9b1675
--- /dev/null
+++ b/doc/yjit/yjit_hacking.md
@@ -0,0 +1,79 @@
+# YJIT Hacking
+
+## Code Generation and Assembly Language
+
+YJIT’s basic purpose is to take ISEQs and generate machine code.
+
+Documentation on each Ruby bytecode can be found in insns.def.
+
+YJIT uses those bytecodes as the “Basic Blocks” in Lazy Basic Block Versioning (LBBV.) For more deep details of LBBV:
+
+* Basic Block Versioning (whitepaper) - https://arxiv.org/abs/1411.0352
+* An Extension of BBV using Object Shapes (whitepaper) - https://arxiv.org/abs/1507.02437
+* Basic Block Versioning talk at ECOOP 2015 - https://www.youtube.com/watch?v=S-aHBuoiYE0
+
+Current YJIT has a simple assembler as a backend. Each method that generates code does it by emitting machine code:
+
+```
+# Excerpt of yjit_gen_exit() from yjit_codegen.c, Sept 2021
+// Generate an exit to return to the interpreter
+static uint32_t
+yjit_gen_exit(VALUE *exit_pc, ctx_t *ctx, codeblock_t *cb)
+{
+    const uint32_t code_pos = cb->write_pos;
+
+    ADD_COMMENT(cb, "exit to interpreter");
+
+    // Generate the code to exit to the interpreters
+    // Write the adjusted SP back into the CFP
+    if (ctx->sp_offset != 0) {
+        x86opnd_t stack_pointer = ctx_sp_opnd(ctx, 0);
+        lea(cb, REG_SP, stack_pointer);
+        mov(cb, member_opnd(REG_CFP, rb_control_frame_t, sp), REG_SP);
+    }
+
+    // Update CFP->PC
+    mov(cb, RAX, const_ptr_opnd(exit_pc));
+    mov(cb, member_opnd(REG_CFP, rb_control_frame_t, pc), RAX);
+```
+
+Later there will be a more complex backend.
+
+## Code Generation vs Code Execution
+
+When you see lea() call above (“load effective address,”) it’s not running the LEA x86 instruction. It’s generating an LEA instruction to the codeblock pointer in the first argument. It will execute that instruction later, when the codeblock gets executed.
+
+This is subtle because YJIT will often wait to compile the method until you’re about to run it -- that’s when it knows the most about what types of arguments the method will receive. So it’s a compile-time instruction, but often it will defer compile-time until just barely before runtime.
+
+The ctx structure tracks what is known at compile time about the arguments being passed into the Ruby bytecode. Often YJIT will “peek” at an expected type before it generates machine code.
+
+## Inlined and Outlined Code
+
+When YJIT is generating code, it needs a code pointer. In many cases it needs two, usually called “cb” (codeblock) and “ocb” (out-of-line codeblock.)
+
+cb is for “inlined” normal code and ocb is for “outline” code such as exits. Inlined code is normal generated code for Ruby operations, while outlined code is for unusual and error conditions, such as encountering an unexpected parameter type and exiting to the interpreter.
+
+Simple inline code for a method body runs faster, so it's useful to remove unexpected exits from the control flow. An exception or unsupported operation will cause YJIT to generate out-of-line code to handle it.
+
+If you search for ocb in yjit_codegen.c, you can see some places where out-of-line code is generated.
+
+YJIT statistics are only gathered when RUBY_DEBUG or YJIT_STATS is true. In some cases the code to increment YJIT statistics will be generated out-of-line, especially if those statistics are gathered when a side exit happens.
+
+## Statistics and Comments
+
+When RUBY_DEBUG is defined to a true value, YJIT will emit comments into the generated machine code. This can make disassemblies a lot more readable. When RUBY_DEBUG or YJIT_STATS is defined and stats are active (--yjit-stats or export YJIT_STATS=1), code will be generated to collect statistics during the run, and a report will be printed when the process exits.
+
+## Entering and Exiting the Interpreter
+
+YJIT won’t generate machine code for an ISEQ until it’s been run a certain number of times (10 by default.) Then, the next time the interpreter would call that ISEQ, it will call the generated machine code version instead. If YJIT hits an unexpected or unsupported operation, it will return to the normal interpreter.
+
+If YJIT returns to the interpreter, the behaviour will be correct but slower. YJIT only optimises part of some operations - for instance, YJIT will not optimise a BMETHOD call yet.
+
+When the interpreter calls to a YJIT-optimised method again, control will return to YJIT’s generated machine code. The more time that’s spent in YJIT-generated code (“ratio in YJIT,”) the more CPU time YJIT can save with its optimisations.
+
+## Side Exits
+
+When YJIT has compiled an ISEQ and is running it later, sometimes it will hit an unexpected condition. It might see a parameter of a different type than before, or square-brackets might be used on a hash when they were first used on an array. In those cases, the generated code will contain a call to return to the interpreter at runtime, called a “side exit.”
+
+Side exits are generated as out-of-line code.
+