From 941ea3713f5b20bc18dcb62037bbffa35d781588 Mon Sep 17 00:00:00 2001 From: zzak Date: Tue, 5 Feb 2013 00:56:11 +0000 Subject: * lib/racc: Merge Racc documentation downstream, add grammar ref file git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@39050 b2dd03c8-39d4-4d8f-98ff-823fe69b080e --- ChangeLog | 4 + lib/racc/parser.rb | 203 ++++++++++++++++++++++++++++++++++--- lib/racc/rdoc/grammar.en.rdoc | 226 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 420 insertions(+), 13 deletions(-) create mode 100644 lib/racc/rdoc/grammar.en.rdoc diff --git a/ChangeLog b/ChangeLog index 635fc7c12d..dfec02d4fe 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,7 @@ +Tue Feb 5 09:55:00 2013 Zachary Scott + + * lib/racc: Merge Racc documentation downstream, add grammar ref file + Tue Feb 5 08:03:00 2013 Zachary Scott * lib/irb.rb, lib/irb/ext/save-history.rb: Add documentation on how to diff --git a/lib/racc/parser.rb b/lib/racc/parser.rb index e0b77f157e..20b358a7ce 100644 --- a/lib/racc/parser.rb +++ b/lib/racc/parser.rb @@ -18,10 +18,164 @@ unless defined?(::ParseError) ParseError = Racc::ParseError end +# Racc is a LALR(1) parser generator. +# It is written in Ruby itself, and generates Ruby programs. +# +# == Command-line Reference +# +# racc [-ofilename] [--output-file=filename] +# [-erubypath] [--embedded=rubypath] +# [-v] [--verbose] +# [-Ofilename] [--log-file=filename] +# [-g] [--debug] +# [-E] [--embedded] +# [-l] [--no-line-convert] +# [-c] [--line-convert-all] +# [-a] [--no-omit-actions] +# [-C] [--check-only] +# [-S] [--output-status] +# [--version] [--copyright] [--help] grammarfile +# +# [+filename+] +# Racc grammar file. Any extention is permitted. +# [-o+outfile+, --output-file=+outfile+] +# A filename for output. default is <+filename+>.tab.rb +# [-O+filename+, --log-file=+filename+] +# Place logging output in file +filename+. +# Default log file name is <+filename+>.output. +# [-e+rubypath+, --executable=+rubypath+] +# output executable file(mode 755). where +path+ is the ruby interpreter. +# [-v, --verbose] +# verbose mode. create +filename+.output file, like yacc's y.output file. +# [-g, --debug] +# add debug code to parser class. To display debuggin information, +# use this '-g' option and set @yydebug true in parser class. +# [-E, --embedded] +# Output parser which doesn't need runtime files (racc/parser.rb). +# [-C, --check-only] +# Check syntax of racc grammer file and quit. +# [-S, --output-status] +# Print messages time to time while compiling. +# [-l, --no-line-convert] +# turns off line number converting. +# [-c, --line-convert-all] +# Convert line number of actions, inner, header and footer. +# [-a, --no-omit-actions] +# Call all actions, even if an action is empty. +# [--version] +# print Racc version and quit. +# [--copyright] +# Print copyright and quit. +# [--help] +# Print usage and quit. +# +# == Generating Parser Using Racc +# +# To compile Racc grammar file, simply type: +# +# $ racc parse.y +# +# This creates ruby script file "parse.tab.y". The -o option can change the output filename. +# +# == Writing A Racc Grammar File +# +# If you want your own parser, you have to write a grammar file. +# A grammar file contains the name of your parser class, grammar for the parser, +# user code, and anything else. +# When writing a grammar file, yacc's knowledge is helpful. +# If you have not used yacc before, Racc is not too difficult. +# +# Here's an example Racc grammar file. +# +# class Calcparser +# rule +# target: exp { print val[0] } +# +# exp: exp '+' exp +# | exp '*' exp +# | '(' exp ')' +# | NUMBER +# end +# +# Racc grammar files resemble yacc files. +# But (of course), this is Ruby code. +# yacc's $$ is the 'result', $0, $1... is +# an array called 'val', and $-1, $-2... is an array called '_values'. +# +# See the {Grammar File Reference}[rdoc-ref:lib/racc/rdoc/grammar.en.rdoc] for +# more information on grammar files. +# +# == Parser +# +# Then you must prepare the parse entry method. There are two types of +# parse methods in Racc, Racc::Parser#do_parse and Racc::Parser#yyparse +# +# Racc::Parser#do_parse is simple. +# +# It's yyparse() of yacc, and Racc::Parser#next_token is yylex(). +# This method must returns an array like [TOKENSYMBOL, ITS_VALUE]. +# EOF is [false, false]. +# (TOKENSYMBOL is a Ruby symbol (taken from String#intern) by default. +# If you want to change this, see the grammar reference. +# +# Racc::Parser#yyparse is little complicated, but useful. +# It does not use Racc::Parser#next_token, instead it gets tokens from any iterator. +# +# For example, yyparse(obj, :scan) causes +# calling +obj#scan+, and you can return tokens by yielding them from +obj#scan+. +# +# == Debugging +# +# When debugging, "-v" or/and the "-g" option is helpful. +# +# "-v" creates verbose log file (.output). +# "-g" creates a "Verbose Parser". +# Verbose Parser prints the internal status when parsing. +# But it's _not_ automatic. +# You must use -g option and set +@yydebug+ to +true+ in order to get output. +# -g option only creates the verbose parser. +# +# === Racc reported syntax error. +# +# Isn't there too many "end"? +# grammar of racc file is changed in v0.10. +# +# Racc does not use '%' mark, while yacc uses huge number of '%' marks.. +# +# === Racc reported "XXXX conflicts". +# +# Try "racc -v xxxx.y". +# It causes producing racc's internal log file, xxxx.output. +# +# === Generated parsers does not work correctly +# +# Try "racc -g xxxx.y". +# This command let racc generate "debugging parser". +# Then set @yydebug=true in your parser. +# It produces a working log of your parser. +# +# == Re-distributing Racc runtime +# +# A parser, which is created by Racc, requires the Racc runtime module; +# racc/parser.rb. +# +# Ruby 1.8.x comes with Racc runtime module, +# you need NOT distribute Racc runtime files. +# +# If you want to include the Racc runtime module with your parser. +# This can be done by using '-E' option: +# +# $ racc -E -omyparser.rb myparser.y +# +# This command creates myparser.rb which `includes' Racc runtime. +# Only you must do is to distribute your parser file (myparser.rb). +# +# Note: parser.rb is LGPL, but your parser is not. +# Your own parser is completely yours. module Racc unless defined?(Racc_No_Extentions) - Racc_No_Extentions = false + Racc_No_Extentions = false # :nodoc: end class Parser @@ -42,11 +196,11 @@ module Racc raise LoadError, 'selecting ruby version of racc runtime core' end - Racc_Main_Parsing_Routine = :_racc_do_parse_c - Racc_YY_Parse_Method = :_racc_yyparse_c - Racc_Runtime_Core_Version = Racc_Runtime_Core_Version_C - Racc_Runtime_Core_Revision = Racc_Runtime_Core_Revision_C - Racc_Runtime_Type = 'c' + Racc_Main_Parsing_Routine = :_racc_do_parse_c # :nodoc: + Racc_YY_Parse_Method = :_racc_yyparse_c # :nodoc: + Racc_Runtime_Core_Version = Racc_Runtime_Core_Version_C # :nodoc: + Racc_Runtime_Core_Revision = Racc_Runtime_Core_Revision_C # :nodoc: + Racc_Runtime_Type = 'c' # :nodoc: rescue LoadError Racc_Main_Parsing_Routine = :_racc_do_parse_rb Racc_YY_Parse_Method = :_racc_yyparse_rb @@ -55,12 +209,10 @@ module Racc Racc_Runtime_Type = 'ruby' end - def Parser.racc_runtime_type + def Parser.racc_runtime_type # :nodoc: Racc_Runtime_Type end - private - def _racc_setup @yydebug = false unless self.class::Racc_debug_parser @yydebug = false unless defined?(@yydebug) @@ -97,6 +249,14 @@ module Racc end } + # The method to fetch next token. + # If you use #do_parse method, you must implement #next_token. + # + # The format of return value is [TOKEN_SYMBOL, VALUE]. + # +token-symbol+ is represented by Ruby's symbol by default, e.g. :IDENT + # for 'IDENT'. ";" (String) for ';'. + # + # The final symbol (End of file) must be false. def next_token raise NotImplementedError, "#{self.class}\#next_token is not defined" end @@ -343,27 +503,43 @@ module Racc goto_default[k1] end + # This method is called when a parse error is found. + # + # ERROR_TOKEN_ID is an internal ID of token which caused error. + # You can get string representation of this ID by calling + # #token_to_str. + # + # ERROR_VALUE is a value of error token. + # + # value_stack is a stack of symbol values. + # DO NOT MODIFY this object. + # + # This method raises ParseError by default. + # + # If this method returns, parsers enter "error recovering mode". def on_error(t, val, vstack) raise ParseError, sprintf("\nparse error on value %s (%s)", val.inspect, token_to_str(t) || '?') end + # Enter error recovering mode. + # This method does not call #on_error. def yyerror throw :racc_jump, 1 end + # Exit parser. + # Return value is Symbol_Value_Stack[0]. def yyaccept throw :racc_jump, 2 end + # Leave error recovering mode. def yyerrok @racc_error_status = 0 end - # - # for debugging output - # - + # For debugging output def racc_read_token(t, tok, val) @racc_debug_out.print 'read ' @racc_debug_out.print tok.inspect, '(', racc_token2str(t), ') ' @@ -430,6 +606,7 @@ module Racc raise "[Racc Bug] can't convert token #{tok} to string" end + # Convert internal ID of token symbol to the string. def token_to_str(t) self.class::Racc_token_to_s_table[t] end diff --git a/lib/racc/rdoc/grammar.en.rdoc b/lib/racc/rdoc/grammar.en.rdoc new file mode 100644 index 0000000000..d7b9df0114 --- /dev/null +++ b/lib/racc/rdoc/grammar.en.rdoc @@ -0,0 +1,226 @@ += Racc Grammar File Reference + +== Global Structure + +== Class Block and User Code Block + +There's two block on toplevel. +one is 'class' block, another is 'user code' block. 'user code' block MUST +places after 'class' block. + +== Comment + +You can insert comment about all places. Two style comment can be used, +Ruby style (#.....) and C style (/*......*/) . + +== Class Block + +The class block is formed like this: + + class CLASS_NAME + [precedance table] + [token declearations] + [expected number of S/R conflict] + [options] + [semantic value convertion] + [start rule] + rule + GRAMMARS + +CLASS_NAME is a name of parser class. +This is the name of generating parser class. + +If CLASS_NAME includes '::', Racc outputs module clause. +For example, writing "class M::C" causes creating the code bellow: + + module M + class C + : + : + end + end + +== Grammar Block + +The grammar block discripts grammar which is able +to be understood by parser. Syntax is: + + (token): (token) (token) (token).... (action) + + (token): (token) (token) (token).... (action) + | (token) (token) (token).... (action) + | (token) (token) (token).... (action) + +(action) is an action which is executed when its (token)s are found. +(action) is a ruby code block, which is surrounded by braces: + + { print val[0] + puts val[1] } + +Note that you cannot use '%' string, here document, '%r' regexp in action. + +Actions can be omitted. +When it is omitted, '' (empty string) is used. + +A return value of action is a value of left side value ($$). +It is value of result, or returned value by "return" statement. + +Here is an example of whole grammar block. + + rule + goal: definition ruls source { result = val } + + definition: /* none */ { result = [] } + | definition startdesig { result[0] = val[1] } + | definition + precrule # this line continue from upper line + { + result[1] = val[1] + } + + startdesig: START TOKEN + +You can use following special local variables in action. + +* result ($$) + +The value of left-hand side (lhs). A default value is val[0]. + +* val ($1,$2,$3...) + +An array of value of right-hand side (rhs). + +* _values (...$-2,$-1,$0) + +A stack of values. +DO NOT MODIFY this stack unless you know what you are doing. + +== Operator Precedance + +This function is equal to '%prec' in yacc. +To designate this block: + + prechigh + nonassoc '++' + left '*' '/' + left '+' '-' + right '=' + preclow + +`right' is yacc's %right, `left' is yacc's %left. + +`=' + (symbol) means yacc's %prec: + + prechigh + nonassoc UMINUS + left '*' '/' + left '+' '-' + preclow + + rule + exp: exp '*' exp + | exp '-' exp + | '-' exp =UMINUS # equals to "%prec UMINUS" + : + : + +== expect + +Racc has bison's "expect" directive. + + # Example + + class MyParser + rule + expect 3 + : + : + +This directive declears "expected" number of shift/reduce conflict. +If "expected" number is equal to real number of conflicts, +racc does not print confliction warning message. + +== Declaring Tokens + +By declaring tokens, you can avoid many meanless bugs. +If decleared token does not exist/existing token does not decleared, +Racc output warnings. Declearation syntax is: + + token TOKEN_NAME AND_IS_THIS + ALSO_THIS_IS AGAIN_AND_AGAIN THIS_IS_LAST + +== Options + +You can write options for racc command in your racc file. + + options OPTION OPTION ... + +Options are: + +* omit_action_call + +omit empty action call or not. + +* result_var + +use/does not use local variable "result" + +You can use 'no_' prefix to invert its meanings. + +== Converting Token Symbol + +Token symbols are, as default, + + * naked token string in racc file (TOK, XFILE, this_is_token, ...) + --> symbol (:TOK, :XFILE, :this_is_token, ...) + * quoted string (':', '.', '(', ...) + --> same string (':', '.', '(', ...) + +You can change this default by "convert" block. +Here is an example: + + convert + PLUS 'PlusClass' # We use PlusClass for symbol of `PLUS' + MIN 'MinusClass' # We use MinusClass for symbol of `MIN' + end + +We can use almost all ruby value can be used by token symbol, +except 'false' and 'nil'. These are causes unexpected parse error. + +If you want to use String as token symbol, special care is required. +For example: + + convert + class '"cls"' # in code, "cls" + PLUS '"plus\n"' # in code, "plus\n" + MIN "\"minus#{val}\"" # in code, \"minus#{val}\" + end + +== Start Rule + +'%start' in yacc. This changes start rule. + + start real_target + +This statement will not be used forever, I think. + +== User Code Block + +"User Code Block" is a Ruby source code which is copied to output. +There are three user code block, "header" "inner" and "footer". + +Format of user code is like this: + + ---- header + ruby statement + ruby statement + ruby statement + + ---- inner + ruby statement + : + : + +If four '-' exist on line head, +racc treat it as beginning of user code block. +A name of user code must be one word. -- cgit v1.2.3