aboutsummaryrefslogtreecommitdiffstats
path: root/regparse.c
Commit message (Collapse)AuthorAgeFilesLines
* st_foreach now free from ANYARGS卜部昌平2019-08-271-4/+4
| | | | | | | | After 5e86b005c0f2ef30df2f9906c7e2f3abefe286a2, I now think ANYARGS is dangerous and should be extinct. This commit deletes ANYARGS from st_foreach. I strongly believe that this commit should have had come with b0af0592fdd9e9d4e4b863fde006d67ccefeac21, which added extra parameter to st_foreach callbacks.
* Fixed String#grapheme_clusters with wide encodingsNobuyoshi Nakada2019-06-291-0/+6
| | | | | | | | * string.c (get_reg_grapheme_cluster): make regexp from properly encoded sources fro wide-char encodings. [Bug #15965] * regparse.c (node_extended_grapheme_cluster): suppress false duplicated range warning for the time being.
* convert check for array length to assertion and comment outduerst2018-12-071-1/+1
| | | | | | | | | | In regparse.c, in function node_extended_grapheme_cluster, we used a raw if() with exit(1) as a cross-check for our length calculations for the common node array. Convert this to an assertion and comment it out because it is not needed for active code. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66269 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* remove code duplication and put everything into forward orderduerst2018-12-071-158/+143
| | | | | | | | | | | | In file regparse.c, in function node_extended_grapheme_cluster(), eliminate code duplication of CRLF and '.' (any character). This uses the fact that both for Unicode encodings and for non-Unicode encodings, the first alternative is CRLF, and the last alternative is '.' (any character). This puts all of the pieces into forward order (the order of the code follows the order of the syntax definition). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66267 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* remove an unused variableduerst2018-12-061-2/+0
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66240 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* make sure all nodes are freed on error in node_extended_grapheme_cluster()duerst2018-12-061-36/+37
| | | | | | | | | regparse.c: In function node_extended_grapheme_cluster(), use function-global array node_common and use it for list and alternate construction. This is done so that in case of error, all nodes that have already been constructed can be correctly freed in a single for loop. Document the layout structure. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* remove code duplication and streamline identifiersduerst2018-12-061-58/+37
| | | | | | | | | | | | | | | | In regparse.c: * Reduce coode duplication by merging the almost identical functions create_sequence_node and create_alternate_node into a new function create_node_from_array, adding a parameter that distinguishes between creating a list and creating an alternative. * Streamline variable/function naming. Unicode UAX #29 uses 'sequence', but the regular expression library uses 'list' for the same concept. Keep 'sequence' in the ccmments that are taken from UAX #29, but use 'list' in variable names. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66234 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* remove obsolete data from unicode.cduerst2018-12-061-4/+0
| | | | | | | | | | * unicode.c: Remove the arrays onigenc_unicode_GCB_ranges_GAZ, onigenc_unicode_GCB_ranges_E_Base, and onigenc_unicode_GCB_ranges_Emoji, because they are not needed anymore for Unicode 11.0.0. * regparse.c: Remove external declarations for above arrays. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* remove unused variables in node_extended_grapheme_cluster()duerst2018-12-051-8/+0
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66218 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tweak/remove comments [ci skip]duerst2018-12-051-10/+2
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66217 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* adjust some comments in node_extended_grapheme_cluster() [ci skip]duerst2018-12-051-13/+9
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66214 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* update to Unicode 11.0.0 (main step, not complete yet)duerst2018-12-051-309/+174
| | | | | | | | | | | | | | - common.mk: Change Unicode version to 11.0.0, and Emoji version to 11.0 - test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version - enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h: Add generated files. Files for Unicode 10.0.0 will be removed once we are sure 11.0.0 works. - lib/unicode_normalize/tables.rb: Updated table. - regparse.c: Almost completely reimplement grapheme cluster detection in function node_extended_grapheme_cluster(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* remove unnecessary settings with NULL_NODE in \X implementationduerst2018-12-021-10/+1
| | | | | | | | | | | | Remove unnecessary settings of node_array elements to NULL_NODE. We can do this because we initialize the whole array to NULL_NODEs and set everything again to NULL_NODEs when creating a sequence or alternative node. Also, fix an index error in the initialization of node_array. (issue #15343) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66139 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix order of declarations and code at start of node_extended_grapheme_cluster()duerst2018-12-021-3/+2
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66138 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix last commit (r66135)ko12018-12-021-2/+3
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66137 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* make sure all nodes are freed on error in node_extended_grapheme_cluster()duerst2018-12-021-17/+16
| | | | | | | | | regparse.c: In function node_extended_grapheme_cluster(), introduce function-global array node_array and use it for sequence and alternate construction. This is done so that in case of error, all nodes that have already been constructed can be correctly freed. (issue #15343) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66135 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* expand a small comment [ci skip]duerst2018-12-021-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66132 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* add/change some comments in node_extended_grapheme_cluster() [ci skip]duerst2018-12-021-5/+7
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66123 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* reformat code [ci skip]duerst2018-12-021-13/+5
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66122 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* remove unnecessary code removing CR/LF from rangeduerst2018-12-011-16/+1
| | | | | | | | Remove code that tries to remove CR and LF from Grapheme_Cluster_Break=Control. This code is unnecessary because Grapheme_Cluster_Break=Control already excludes CR and LF. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66116 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * remove trailing spaces.svn2018-12-011-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66115 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* introduce and use create_alternate_node()duerst2018-12-011-52/+68
| | | | | | | Introduce new function create_alternate_node() to create an alternative node from a list of nodes in one go. Use it once (two more uses expected). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66114 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* eliminate a list with only one elementduerst2018-12-011-7/+2
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66113 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* remove two unnecessary variables (np2 and np3)duerst2018-11-281-4/+0
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* eliminate intermediate variable in very short block (3 times)duerst2018-11-281-12/+3
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66071 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* use create_sequence_node() four more timesduerst2018-11-281-78/+59
| | | | | | | Four more use of create_sequence_node() in node_extended_grapheme_cluster (a few more to come). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66070 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* use create_sequence_node() once moreduerst2018-11-281-19/+9
| | | | | | | One more use of create_sequence_node() in node_extended_grapheme_cluster (several more to come). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66063 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* introduce macro R_ERR to reduce repetitive codeduerst2018-11-281-85/+46
| | | | | | | | Introduce a new preprocessor macro R_ERR to visually reduce repetitive code checking for return values and going to the err: label at the end of the function node_extended_grapheme_cluster(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66057 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* reduce number of arguments on quantify_property_node()duerst2018-11-281-20/+38
| | | | | | | | | There are only four patterns of the last two arguments to quantify_property_node(). By replacing the lower/upper arguments with a single char, we get more expressive calls, the last argument directly corresponding to the quantifier that we want to use (except for '2', which means exactly two). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66052 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix order of subexpressions for Hangulduerst2018-11-271-6/+6
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66048 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * remove trailing spaces.svn2018-11-271-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66047 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* introduce two more uses of create_sequence_node in ↵duerst2018-11-271-51/+27
| | | | | | node_extended_grapheme_cluster git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66046 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* correctly handle return value from create_sequence_node()duerst2018-11-271-1/+2
| | | | | | | In function node_extended_grapheme_cluster(), store and test return value from create_sequence_node(). Never forget this! git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * remove trailing spaces.svn2018-11-271-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* declare array for sequence at start of code creating sequenceduerst2018-11-271-13/+11
| | | | | | | | | | In function node_extended_grapheme_cluster(), move declaration up so that block encompasses all of the regular expression creation that finally makes up the sequence. Having blocks like this will be great because it directly shows the extent of code belonging to each subexpression of the regular expression being created. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66043 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* make sure all nodes are correctly freed in create_property_node()duerst2018-11-271-0/+4
| | | | | | | We make sure that the newly created tree and all remaining nodes passed in in the node_array are freed. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66042 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* regparse.c: conform C90k0kubun2018-11-271-1/+4
| | | | | | | | | | ../regparse.c:5908:28: error: initializer for aggregate is not a compile-time constant [-Werror,-Wc99-extensions] Node* sequence[] = { np1, np2, np3, ((Node* )0) }; ^~~ https://travis-ci.org/ruby/ruby/jobs/460197620 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66034 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* introduce helper function create_sequence_node()duerst2018-11-271-19/+31
| | | | | | | | The new function create_sequence_node() uses its second argument (an array of Node*, from left to right, ending with NULL_NODE) to create a sequence of expressions using node_new_list(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66033 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * remove trailing spaces.svn2018-11-271-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66032 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* introduce helper function quantify_property_node()duerst2018-11-271-58/+28
| | | | | | | The new function quantify_property_node() combines the functions create_property_node() and quantify_node(), which frequently appear together. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66031 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* introduce helper function quantify_node() to wrap function node_new_quantifierduerst2018-11-271-110/+60
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66030 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* use explicit property name when creating nodes for ↵duerst2018-11-271-18/+7
| | | | | | "Grapheme_Cluster_Break=Extend" git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66021 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* use 'Regional_Indicator' script property instead of fixed constantsduerst2018-11-271-4/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66020 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* add some comments in function node_extended_grapheme_cluster() [ci skip]duerst2018-11-271-2/+24
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66014 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* create function create_property_node to extract recurring functionalityduerst2018-11-251-72/+33
| | | | | | | | Refactoring: In regparse.c, extract creation of a new CClass node and initialization using a property into a new function create_property_node(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65972 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* regparse.c: check the result of propname2ctypenobu2018-10-161-26/+39
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65094 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* unicode.c: moved addtional GCB rangesnobu2018-10-151-44/+8
| | | | | | | * enc/unicode.c: moved additional Grapheme Cluster Break ranges which depend on the Unicode version. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* regparse.c: Suppress duplicated range warning by mere \Xnobu2018-10-151-2/+11
| | | | | | | | * regparse.c (node_extended_grapheme_cluster): as Unicode 10 has added Grapheme_Cluster_Break properties to some characters, remove duplicated ranges for Unicode 9. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* regparse.c: warn all duplicated ranges when debuggingnobu2018-10-151-4/+8
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Fix typos.hsbt2018-01-181-1/+1
| | | | | | | * rememberd -> remembered * refered -> referred git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61933 b2dd03c8-39d4-4d8f-98ff-823fe69b080e