aboutsummaryrefslogtreecommitdiffstats
path: root/ast.rb
diff options
context:
space:
mode:
authoryui-knk <spiketeika@gmail.com>2022-09-23 22:40:02 +0900
committerYuichiro Kaneko <spiketeika@gmail.com>2022-11-21 09:01:34 +0900
commitd8601621edcf29e3323b90dcf04b774edd9fb45e (patch)
treefd9b08ca7008ce5ad7f8acafd83accff2347697f /ast.rb
parentbbc4cf5f76c0e17e645ba81ae0b3625c8cbfc4c4 (diff)
downloadruby-d8601621edcf29e3323b90dcf04b774edd9fb45e.tar.gz
Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods
Implementation for Language Server Protocol (LSP) sometimes needs token information. For example both `m(1)` and `m(1, )` has same AST structure other than node locations then it's impossible to check the existence of `,` from AST. However in later case, it might be better to suggest variables list for the second argument. Token information is important for such case. This commit adds these methods. * Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of` * Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes. * Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node. [Feature #19070] Impacts on memory usage and performance are below: Memory usage: ``` $ cat test.rb root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true) $ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] 11408kb # keep_tokens :false $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb 17508kb # keep_tokens :true $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb 30960kb ``` Performance: ``` $ cat ../ast_keep_tokens.yml prelude: | src = <<~SRC module M class C def m1(a, b) 1 + a + b end end end SRC benchmark: without_keep_tokens: | RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false) with_keep_tokens: | RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true) $ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml /home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \ --output=markdown --output-compare -v ../ast_keep_tokens.yml compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] warming up.. | |compare-ruby|built-ruby| |:--------------------|-----------:|---------:| |without_keep_tokens | 21.659k| 21.303k| | | 1.02x| -| |with_keep_tokens | 6.220k| 5.691k| | | 1.09x| -| ```
Diffstat (limited to 'ast.rb')
-rw-r--r--ast.rb52
1 files changed, 46 insertions, 6 deletions
diff --git a/ast.rb b/ast.rb
index 1bc1168b80..24740ebe28 100644
--- a/ast.rb
+++ b/ast.rb
@@ -29,8 +29,8 @@ module RubyVM::AbstractSyntaxTree
#
# RubyVM::AbstractSyntaxTree.parse("x = 1 + 2")
# # => #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:0-1:9>
- def self.parse string, keep_script_lines: false, error_tolerant: false
- Primitive.ast_s_parse string, keep_script_lines, error_tolerant
+ def self.parse string, keep_script_lines: false, error_tolerant: false, keep_tokens: false
+ Primitive.ast_s_parse string, keep_script_lines, error_tolerant, keep_tokens
end
# call-seq:
@@ -44,8 +44,8 @@ module RubyVM::AbstractSyntaxTree
#
# RubyVM::AbstractSyntaxTree.parse_file("my-app/app.rb")
# # => #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:0-31:3>
- def self.parse_file pathname, keep_script_lines: false, error_tolerant: false
- Primitive.ast_s_parse_file pathname, keep_script_lines, error_tolerant
+ def self.parse_file pathname, keep_script_lines: false, error_tolerant: false, keep_tokens: false
+ Primitive.ast_s_parse_file pathname, keep_script_lines, error_tolerant, keep_tokens
end
# call-seq:
@@ -63,8 +63,8 @@ module RubyVM::AbstractSyntaxTree
#
# RubyVM::AbstractSyntaxTree.of(method(:hello))
# # => #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:0-3:3>
- def self.of body, keep_script_lines: false, error_tolerant: false
- Primitive.ast_s_of body, keep_script_lines, error_tolerant
+ def self.of body, keep_script_lines: false, error_tolerant: false, keep_tokens: false
+ Primitive.ast_s_of body, keep_script_lines, error_tolerant, keep_tokens
end
# call-seq:
@@ -137,6 +137,46 @@ module RubyVM::AbstractSyntaxTree
end
# call-seq:
+ # node.tokens -> array
+ #
+ # Returns tokens corresponding to the location of the node.
+ # Returns nil if keep_tokens is not enabled when parse method is called.
+ # Token is an array of:
+ #
+ # - id
+ # - token type
+ # - source code text
+ # - location [first_lineno, first_column, last_lineno, last_column]
+ #
+ # root = RubyVM::AbstractSyntaxTree.parse("x = 1 + 2", keep_tokens: true)
+ # root.tokens # => [[0, :tIDENTIFIER, "x", [1, 0, 1, 1]], [1, :tSP, " ", [1, 1, 1, 2]], ...]
+ # root.tokens.map{_1[2]}.join # => "x = 1 + 2"
+ def tokens
+ return nil unless all_tokens
+
+ all_tokens.each_with_object([]) do |token, a|
+ loc = token.last
+ if ([first_lineno, first_column] <=> [loc[0], loc[1]]) <= 0 &&
+ ([last_lineno, last_column] <=> [loc[2], loc[3]]) >= 0
+ a << token
+ end
+ end
+ end
+
+ # call-seq:
+ # node.all_tokens -> array
+ #
+ # Returns all tokens for the input script regardless the receiver node.
+ # Returns nil if keep_tokens is not enabled when parse method is called.
+ #
+ # root = RubyVM::AbstractSyntaxTree.parse("x = 1 + 2", keep_tokens: true)
+ # root.all_tokens # => [[0, :tIDENTIFIER, "x", [1, 0, 1, 1]], [1, :tSP, " ", [1, 1, 1, 2]], ...]
+ # root.children[-1].all_tokens # => [[0, :tIDENTIFIER, "x", [1, 0, 1, 1]], [1, :tSP, " ", [1, 1, 1, 2]], ...]
+ def all_tokens
+ Primitive.ast_node_all_tokens
+ end
+
+ # call-seq:
# node.children -> array
#
# Returns AST nodes under this one. Each kind of node