aboutsummaryrefslogtreecommitdiffstats
path: root/re.c
Commit message (Collapse)AuthorAgeFilesLines
* Introduce encoding check macroS-H-GAMELINKS2022-12-021-1/+2
|
* Prevent segfault in String#scan with ObjectSpace.each_objectYusuke Endoh2022-12-011-0/+7
| | | | | | | | | | Calling `String#scan` without a block creates an incomplete MatchData object whose `RMATCH(match)->str` is Qfalse. Usually this object is not leaked, but it was possible to pull it by using ObjectSpace.each_object. This change hides the internal MatchData object by using rb_obj_hide. Fixes [Bug #19159]
* Using UNDEF_P macroS-H-GAMELINKS2022-11-161-2/+2
|
* Suppress false warning by a bug of gccNobuyoshi Nakada2022-11-081-4/+5
| | | | | | | GCC [Bug 99578] seems triggered by calling `rb_reg_last_match` before `match_check(match)`, probably by `NIL_P(match)` in `rb_reg_nth_match`. [Bug 99578]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
* Refactor timeout-setting code to a functionYusuke Endoh2022-10-241-13/+12
|
* Refactor timeout-related code in re.c a littleYusuke Endoh2022-10-241-9/+9
|
* Fix per-instance Regexp timeout (#6621)Yusuke Endoh2022-10-241-2/+8
| | | | | | | | | | Fix per-instance Regexp timeout This makes it follow what was decided in [Bug #19055]: * `Regexp.new(str, timeout: nil)` should respect the global timeout * `Regexp.new(str, timeout: huge_val)` should use the maximum value that can be represented in the internal representation * `Regexp.new(str, timeout: 0 or negative value)` should raise an error
* Fix argument & Remove enumS-H-GAMELINKS2022-10-231-9/+3
|
* Introduce rb_memsearch_with_char_size functionS-H-GAMELINKS2022-10-231-10/+14
|
* * expand tabs. [ci skip]git2022-10-101-2/+2
| | | | | Tabs were expanded because the file did not have any tab indentation in unedited lines. Please update your editor config, and use misc/expand_tabs.rb in the pre-commit hook.
* Should use dedecated function `Check_Type`Nobuyoshi Nakada2022-10-101-12/+4
|
* Add MatchData#deconstruct/deconstruct_keysVladimir Dementyev2022-10-101-0/+85
|
* [DOC] `offset` argument of Regexp#matchNobuyoshi Nakada2022-08-181-1/+6
|
* Speed up setting the backref match objectAaron Patterson2022-08-021-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch speeds up setting the backref match object by avoiding some memcopies. Take the following code for example: ```ruby "hello world" =~ /hello/ p $~ ``` When the RE matches the string, we have to set the Match object in the backref global. So we would allocate a match object[^1] and use `rb_reg_region_copy`[^2] to make a deep copy of the stack allocated `re_registers` struct[^3] in to the newly created Ruby object. This could possibly trigger GC[^4], and would allocate new memory. This patch makes a shallow copy of the `re_registers` struct on to the Match object allowing the match object to manage the `re_registers` pointer and also avoiding some calls to `xmalloc` and some manual memcopy. Benchmark looks like this: ```ruby require "benchmark/ips" def test_re thing thing =~ /hello/ end Benchmark.ips do |x| x.report("re hit") do test_re "hello world" end x.report("re miss") do test_re "world" end end ``` Before this patch: ``` $ ruby -v test.rb ruby 3.2.0dev (2022-07-27T22:29:00Z master 4ad69899b7) [arm64-darwin21] Ignoring bcrypt-3.1.16 because its extensions are not built. Try: gem pristine bcrypt --version 3.1.16 Warming up -------------------------------------- re hit 345.401k i/100ms re miss 673.584k i/100ms Calculating ------------------------------------- re hit 3.452M (± 0.5%) i/s - 17.270M in 5.002535s re miss 6.736M (± 0.4%) i/s - 34.353M in 5.099593s ``` After this patch: ``` $ ./ruby -v test.rb ruby 3.2.0dev (2022-08-01T21:24:12Z less-memcpy 0ff2a56606) [arm64-darwin21] Warming up -------------------------------------- re hit 419.578k i/100ms re miss 673.251k i/100ms Calculating ------------------------------------- re hit 4.201M (± 0.7%) i/s - 21.398M in 5.093593s re miss 6.716M (± 0.4%) i/s - 33.663M in 5.012756s ``` Matches get faster and misses maintain the same speed [^1]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1737 [^2]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1738 [^3]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1686 [^4]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L981
* Expand tabs [ci skip]Takashi Kokubun2022-07-211-636/+636
| | | | [Misc #18891]
* [DOC] Fix a typo [ci skip]Kazuhiro NISHIYAMA2022-06-261-1/+1
|
* Document that Regexp#source does not retain lexer escapesJeremy Evans2022-06-201-1/+5
| | | | Related to [Feature #18838]
* [Feature #18788] [DOC] String options to `Regexp.new`Nobuyoshi Nakada2022-06-201-0/+5
| | | | Co-Authored-By: Janosch Müller <janosch.mueller@betterplace.org>
* [Feature #18788] Support options as `String` to `Regexp.new`Nobuyoshi Nakada2022-06-201-0/+21
| | | | | `Regexp.new` now supports passing the regexp flags not only as an `Integer`, but also as a `String. Unknown flags raise errors.
* Warn suspicious flag to `Regexp.new`Nobuyoshi Nakada2022-06-201-1/+3
| | | | | Now second argument should be `true`, `false`, `nil` or Integer. This flag is confused with third argument some times.
* [DOC] Refine Regexp.new argument descriptionsNobuyoshi Nakada2022-06-201-6/+19
|
* [DOC] Regexp timeout is float or nilNobuyoshi Nakada2022-06-201-3/+3
|
* [DOC] Fixed omissions in Regexp.new argumentsNobuyoshi Nakada2022-06-201-2/+6
|
* Ignore invalid escapes in regexp commentsJeremy Evans2022-06-061-8/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Invalid escapes are handled at multiple levels. The first level is in parse.y, so skip invalid unicode escape checks for regexps in parse.y. Make rb_reg_preprocess and unescape_nonascii accept the regexp options. In unescape_nonascii, if the regexp is an extended regexp, when "#" is encountered, ignore all characters until the end of line or end of regexp. Unfortunately, in extended regexps, you can use "#" as a non-comment character inside a character class, so also parse "[" and "]" specially for extended regexps, and only skip comments if "#" is not inside a character class. Handle nested character classes as well. This issue doesn't just affect extended regexps, it also affects "(#?" comments inside all regexps. So for those comments, scan until trailing ")" and ignore content inside. I'm not sure if there are other corner cases not handled. A better fix would be to redesign the regexp parser so that it unescaped during parsing instead of before parsing, so you already know the current parsing state. Fixes [Bug #18294] Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
* [DOC] Enhanced RDoc for MatchData (#5822)Burdette Lamar2022-04-181-50/+69
| | | | | | | | | | Treats: #to_s #named_captures #string #inspect #hash #==
* Enhanced RDoc for MatchData (#5821)Burdette Lamar2022-04-181-32/+41
| | | | | | Treats: #[] #values_at
* Enhanced RDoc for MatchData (#5820)Burdette Lamar2022-04-181-33/+41
| | | | | | | | Treats: #pre_match #post_match #to_a #captures
* [DOC] Enhanced RDoc for MatchData (#5819)Burdette Lamar2022-04-181-45/+47
| | | | | | | | Treats: #begin #end #match #match_length
* [DOC] Enhanced RDoc for MatchData (#5818)Burdette Lamar2022-04-181-31/+32
| | | | | | | | Treats: #regexp #names #size #offset
* [DOC] Enhanced RDoc for Regexp (#5815)Burdette Lamar2022-04-181-100/+136
| | | | | | | | | Treats: ::new ::escape ::try_convert ::union ::last_match
* [DOC] Enhanced RDoc for Regexp (#5812)Burdette Lamar2022-04-161-91/+105
| | | | | | | | | | | | | | | | | Treats: #fixed_encoding? #hash #== #=~ #match #match? Also, in regexp.rdoc: Changes heading from 'Special Global Variables' to 'Regexp Global Variables'. Add tiny section 'Regexp Interpolation'.
* [DOC] Enhanced RDoc for Regexp (#5807)Burdette Lamar2022-04-151-78/+84
| | | | | | | | | | | | Treats: #source #inspect #to_s #casefold? #options #names #named_captures
* Return only captured range in `MatchData` [Bug #18670]Nobuyoshi Nakada2022-03-311-1/+1
|
* re.c: stop a wrong warning of "flags ignored" on Regexp.new(//)Yusuke Endoh2022-03-311-1/+1
| | | | [Bug #18669]
* internal/ractor.h: AddedYusuke Endoh2022-03-301-1/+1
| | | | Currently it has only one function prototype.
* re.c: raise Regexp::TimeoutError instead of RuntimeErrorYusuke Endoh2022-03-301-2/+3
|
* re.c: Add `timeout` keyword for Regexp.new and Regexp#timeoutYusuke Endoh2022-03-301-14/+63
|
* re.c: Add Regexp.timeout= and Regexp.timeoutYusuke Endoh2022-03-301-0/+88
| | | | [Feature #17837]
* Add String#byteindex, String#byterindex, and MatchData#byteoffset (#5518)Shugo Maeda2022-02-191-0/+33
| | | | | | * Add String#byteindex, String#byterindex, and MatchData#byteoffset [Feature #13110] Co-authored-by: NARUSE, Yui <naruse@airemix.jp>
* LONG2NUM() should be used for rmatch_offset::{beg,end}Shugo Maeda2022-02-181-4/+4
| | | | https://github.com/ruby/ruby/pull/5518#discussion_r809645406
* [DOC] Fix broken links to literals.rdocNobuyoshi Nakada2022-02-081-1/+1
|
* Replace to RBOOL macroS-H-GAMELINKS2022-01-171-4/+1
|
* Adding links to literals and Kernel (#5192)Burdette Lamar2021-12-031-0/+4
| | | | * Adding links to literals and Kernel
* Using NIL_P macro instead of `== Qnil`S.H2021-10-031-5/+5
|
* Avoid race condition in Regexp#matchJeremy Evans2021-10-011-27/+19
| | | | | | | | | | | | | | | | | | | | | | | | In certain conditions, Regexp#match could return a MatchData with missing captures. This seems to require at the least, multiple threads calling a method that calls the same block/proc/lambda which calls Regexp#match. The race condition happens because the MatchData is passed from indirectly via the backref, and other threads can modify the backref. Fix the issue by: 1. Not reusing the existing MatchData from the backref, and always allocating a new MatchData. 2. Passing the MatchData directly to the caller using a VALUE*, instead of indirectly through the backref. It's likely that variants of this issue exist for other Regexp methods. Anywhere that MatchData is passed implicitly through the backref is probably vulnerable to this issue. Fixes [Bug #17507]
* [Feature #18172] Add MatchData#match_lengthNobuyoshi Nakada2021-09-161-0/+37
| | | | | The method to return the length of the matched substring corresponding to the given argument.
* [Feature #18172] Add MatchData#matchNobuyoshi Nakada2021-09-161-0/+34
| | | | | The method to return the single matched substring corresponding to the given argument.
* Refactor and Using RBOOL macroS.H2021-09-151-6/+2
|
* Extract backref_number_checkNobuyoshi Nakada2021-09-121-6/+10
|
* Preserve the encoding of the argument in IndexError [Bug #18160]Nobuyoshi Nakada2021-09-121-10/+10
|