aboutsummaryrefslogtreecommitdiffstats
path: root/string.c
Commit message (Collapse)AuthorAgeFilesLines
* Refined the warning message for $, and $;Nobuyoshi Nakada2019-12-201-1/+1
| | | | [Bug #16438]
* Added Symbol#start_with? and Symbol#end_with? method. [Feature #16348]NARUSE, Yui2019-11-281-0/+43
|
* delete unused codes卜部昌平2019-11-181-2/+0
| | | | Suppress compiler warnings.
* rb_tainted_str_new_with_enc is no longer usedNobuyoshi Nakada2019-11-181-7/+0
|
* Deprecate taint/trust and related methods, and make the methods no-opsJeremy Evans2019-11-181-101/+19
| | | | | | This removes the related tests, and puts the related specs behind version guards. This affects all code in lib, including some libraries that may want to support older versions of Ruby.
* delete unused functions卜部昌平2019-11-141-7/+0
| | | | | | | | | | | | Looking at the list of symbols inside of libruby-static.a, I found hundreds of functions that are defined, but used from nowhere. There can be reasons for each of them (e.g. some functions are specific to some platform, some are useful when debugging, etc). However it seems the functions deleted here exist for no reason. This changeset reduces the size of ruby binary from 26,671,456 bytes to 26,592,864 bytes on my machine.
* Revert "[EXPERIMENTAL] Make Symbol#to_s return a frozen String [Feature #16150]"NARUSE, Yui2019-11-051-3/+2
| | | | This reverts commit 6ffc045a817fbdf04a6945d3c260b55b0fa1fd1e.
* Documentation improvements for Ruby corezverok2019-10-261-20/+31
| | | | | | | | | | | * Top-level `return`; * Documentation for comments syntax; * `rescue` inside blocks; * Enhance `Object#to_enum` docs; * Make `chomp:` option more obvious for `String#each_line` and `#lines`; * Enhance `Proc#>>` and `#<<` docs; * Enhance `Processs` class docs.
* Reduce the minimum string buffer size from 127 to 63 bytesLourens Naudé2019-10-111-1/+1
|
* avoid overflow in integer multiplication卜部昌平2019-10-091-1/+1
| | | | | | | This changeset basically replaces `ruby_xmalloc(x * y)` into `ruby_xmalloc2(x, y)`. Some convenient functions are also provided for instance `rb_xmalloc_mul_add(x, y, z)` which allocates x * y + z byes.
* [EXPERIMENTAL] Make Symbol#to_s return a frozen StringBenoit Daloze2019-09-261-2/+3
| | | | | | * Always the same frozen String for a given Symbol. * Avoids extra allocations whenever calling Symbol#to_s. * See [Feature #16150]
* Rename STR_IS_SHARED_M to STR_BORROWEDAlan Wu2019-09-261-6/+7
| | | | | | | Since the introduction of STR_SHARED_ROOT, the word "shared" has become very overloaded with respect to String's internal states. Use a different name for STR_IS_SHARED_M and explain its purpose.
* Tag string shared roots to fix use-after-freeAlan Wu2019-09-261-4/+16
| | | | | | | | | | | | | | The buffer deduplication codepath in rb_fstring can be used to free the buffer of shared string roots, which leads to use-after-free. Introudce a new flag to tag strings that at one point have been a shared root. Check for it in rb_fstring to avoid freeing buffers that are shared by multiple strings. This change is based on nobu's idea in [ruby-core:94838]. The included test case test for the sequence of calls to internal functions that lead to this bug. See attached ticket for Ruby level repros. [Bug #16151]
* Make Symbol#to_proc calls handle keyword argumentsJeremy Evans2019-09-051-2/+2
| | | | | | Make rb_sym_proc_call take a flag for whether a keyword argument is used, and use the new rb_funcall_with_block_kw function to pass that information.
* drop-in type check for rb_define_singleton_method卜部昌平2019-08-291-1/+23
| | | | | | We can check the function pointer passed to rb_define_singleton_method like how we do so in rb_define_method. Doing so revealed many arity mismatches.
* Fixed heap-use-after-freeNobuyoshi Nakada2019-08-151-1/+2
| | | | | | * string.c (rb_str_sub_bang): retrieves a pointer to the replacement string buffer just before using it, for the case of replacement with the receiver string itself. [Bug #16105]
* * expand tabs. [ci skip]git2019-08-151-2/+2
|
* Fold to lowercase intead of uppercase for String#casecmpJeremy Evans2019-08-141-4/+4
| | | | strcasecmp(3) and String#casecmp? both fold to lowercase.
* Update docs to use more natural EnglishAaron Patterson2019-08-121-10/+10
| | | | Just a few updates to make the English sound a bit more natural
* string.c (rb_str_sub, _gsub): improve the rdocYusuke Endoh2019-08-121-21/+58
| | | | | | | | | | | | This change: * Added an explanation about back references except \n and \k<n> (\` \& \' \+ \0) * Added an explanation about an escape (\\) * Added some rdoc references * Rephrased and clarified the reason why double escape is needed, added some examples, and moved the note to the last (because it is not specific to the method itself).
* leafify opt_plus卜部昌平2019-08-061-0/+31
| | | | | | Inspired by 346aa557b31fe96760e505d30da26eb7a846bac9 Closes: https://github.com/ruby/ruby/pull/2321
* Make opt_eq and opt_neq insns leafTakashi Kokubun2019-08-041-18/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | # Benchmark zero? ``` require 'benchmark/ips' Numeric.class_eval do def ruby_zero? self == 0 end end Benchmark.ips do |x| x.report('0.zero?') { 0.ruby_zero? } x.report('1.zero?') { 1.ruby_zero? } x.compare! end ``` ## VM No significant impact for VM. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) [x86_64-linux] 0.zero?: 21855445.5 i/s 1.zero?: 21770817.3 i/s - same-ish: difference falls within error ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) [x86_64-linux] 1.zero?: 21958912.3 i/s 0.zero?: 21881625.9 i/s - same-ish: difference falls within error ## JIT The performance improves about 1.23x. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) +JIT [x86_64-linux] 0.zero?: 36343111.6 i/s 1.zero?: 36295153.3 i/s - same-ish: difference falls within error ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) +JIT [x86_64-linux] 0.zero?: 44740467.2 i/s 1.zero?: 44363616.1 i/s - same-ish: difference falls within error # Benchmark str == str / str != str ``` # frozen_string_literal: true require 'benchmark/ips' Benchmark.ips do |x| x.report('a == a') { 'a' == 'a' } x.report('a == b') { 'a' == 'b' } x.report('a != a') { 'a' != 'a' } x.report('a != b') { 'a' != 'b' } x.compare! end ``` ## VM No significant impact for VM. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) [x86_64-linux] a == a: 27286219.0 i/s a != a: 24892389.5 i/s - 1.10x slower a == b: 23623635.8 i/s - 1.16x slower a != b: 21800958.0 i/s - 1.25x slower ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) [x86_64-linux] a == a: 27224016.2 i/s a != a: 24490109.5 i/s - 1.11x slower a == b: 23391052.4 i/s - 1.16x slower a != b: 21811321.7 i/s - 1.25x slower ## JIT The performance improves on JIT a little. ### before ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) +JIT [x86_64-linux] a == a: 42010674.7 i/s a != a: 38920311.2 i/s - same-ish: difference falls within error a == b: 32574262.2 i/s - 1.29x slower a != b: 32099790.3 i/s - 1.31x slower ### after ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) +JIT [x86_64-linux] a == a: 46902738.8 i/s a != a: 43097258.6 i/s - 1.09x slower a == b: 35822018.4 i/s - 1.31x slower a != b: 33377257.8 i/s - 1.41x slower This is needed towards Bug#15589. Closes: https://github.com/ruby/ruby/pull/2318
* Reuse match dataNobuyoshi Nakada2019-07-281-2/+5
| | | | * string.c (rb_str_split_m): reuse occupied match data. [Bug #16024]
* Occupy match dataNobuyoshi Nakada2019-07-271-1/+3
| | | | | * string.c (rb_str_split_m): occupy match data not to be modified during yielding the block. [Bug #16024]
* string.c (str_succ): refactoringYusuke Endoh2019-07-141-3/+3
| | | | Use more communicative variable name
* string.c (str_succ): remove a unnecessary assignmentYusuke Endoh2019-07-141-1/+0
| | | | This change will suppress Coverity Scan warnings
* * expand tabs.git2019-07-141-1/+1
|
* Prefer `rb_error_arity` to `rb_check_arity` when it can be usedYusuke Endoh2019-07-141-1/+1
|
* Check that String#scrub block does not modify receiverJeremy Evans2019-07-021-7/+12
| | | | | | | Similar to the check used for String#gsub. Can fix possible segfault. Fixes [Bug #15941]
* Make String#-@ not freeze receiver if called on unfrozen subclass instanceJeremy Evans2019-07-021-0/+3
| | | | | | | | | rb_fstring behavior in this case is to freeze the receiver. I'm not sure if that should be changed, so this takes the conservative approach of duping the receiver in String#-@ before passing to rb_fstring. Fixes [Bug #15926]
* * expand tabs.git2019-06-291-2/+2
|
* Fixed String#grapheme_clusters with wide encodingsNobuyoshi Nakada2019-06-291-2/+23
| | | | | | | | * string.c (get_reg_grapheme_cluster): make regexp from properly encoded sources fro wide-char encodings. [Bug #15965] * regparse.c (node_extended_grapheme_cluster): suppress false duplicated range warning for the time being.
* Resize capacity for fstringJohn Hawthorn2019-06-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | When a string is #frozen, it's capacity is resized to fit (if it is much larger), since we know it will no longer be mutated. > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000)) {"type":"STRING", "class":"0x7feaf00b7bf0", "bytesize":30, "capacity":1000, "value":"... > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000).freeze) {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "bytesize":30, "value":"... (ObjectSpace.dump doesn't show capacity if capacity is equal to bytesize) Previously, if we dedup into an fstring, using String#-@, capacity would not be reduced. > puts ObjectSpace.dump(-String.new("a"*30, capacity: 1000)) {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "fstring":true, "bytesize":30, "capacity":1000, "value":"... This commit makes rb_fstring call rb_str_resize, the same as rb_str_freeze does. Closes: https://github.com/ruby/ruby/pull/2256
* * expand tabs.git2019-06-211-1/+1
|
* Get rid of undefined behaviorNobuyoshi Nakada2019-06-211-1/+1
| | | | | * string.c (rb_str_sub_bang): str and repl can be same. [Bug #15946]
* New buffer for shared stringNobuyoshi Nakada2019-06-191-0/+9
| | | | | * string.c (rb_str_init): allocate new buffer if the string is shared. [Bug #15937]
* Preserve the string content at self-copyingNobuyoshi Nakada2019-06-191-1/+4
| | | | | * string.c (rb_str_init): preserve the embedded content when self-copying with a capacity. [Bug #15937]
* Fix memory leakNobuyoshi Nakada2019-06-181-1/+4
| | | | | | | * string.c (str_make_independent_expand): free independent buffer. [Bug# 15935] Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
* * expand tabs.git2019-06-181-1/+1
|
* String#b: Don't depend on dependent stringAlan Wu2019-06-181-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | Registering a string that depend on a dependent string as fstring can lead to use-after-free. See c06ddfe and 3f95620 for details. The following script triggers use-after-free on trunk, 2.4.6, 2.5.5 and 2.6.3. Credits to @wanabe for using eval as a cross-version way of registering a fstring. ```ruby a = ('j' * 24).b.b eval('', binding, a) p a 4.times { GC.start } p a ``` - string.c (str_replace_shared_without_enc): when given a dependent string, depend on the root of the dependent string. [Bug #15934]
* Fix memory leakNobuyoshi Nakada2019-06-161-0/+7
| | | | | | | | | | | | | | * string.c (str_replace_shared_without_enc): free previous buffer before replaced. * parse.y (gettable): make sure in advance that the `__FILE__` object shares a fstring, to get rid of replacement with the fstring later. TODO: this hack may be needed in other places. [Bug #15916] Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
* Symbol just represents a nameNobuyoshi Nakada2019-05-141-2/+2
|
* str_duplicate: Don't share with a frozen shared stringAlan Wu2019-05-091-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up for 3f9562015e651735bfc2fdd14e8f6963b673e22a. Before this commit, it was possible to create a shared string which shares with another shared string by passing a frozen shared string to `str_duplicate`. Such string looks like: ``` -------- ----------------- | root | ------ owns -----> | root's buffer | -------- ----------------- ^ ^ ^ ----------- | | | shared1 | ------ references ----- | ----------- | ^ | ----------- | | shared2 | ------ references --------- ----------- ``` This is bad news because `rb_fstring(shared2)` can make `shared1` independent, which severs the reference from `shared1` to `root`: ```c /* from fstr_update_callback() */ str = str_new_frozen(rb_cString, shared2); /* can return shared1 */ if (STR_SHARED_P(str)) { /* shared1 is also a shared string */ str_make_independent(str); /* no frozen check */ } ``` If `shared1` was the only reference to `root`, then `root` can be reclaimed by the GC, leaving `shared2` in a corrupted state: ``` ----------- -------------------- | shared1 | -------- owns --------> | shared1's buffer | ----------- -------------------- ^ | ----------- ------------------------- | shared2 | ------ references ----> | root's buffer (freed) | ----------- ------------------------- ``` Here is a reproduction script for the situation this commit fixes. ```ruby a = ('a' * 24).strip.freeze.strip -a p a 4.times { GC.start } p a ``` - string.c (str_duplicate): always share with the root string when the original is a shared string. - test_rb_str_dup.rb: specifically test `rb_str_dup` to make sure it does not try to share with a shared string. [Bug #15792] Closes: https://github.com/ruby/ruby/pull/2159
* Revert "UTF-8 is one of byte based encodings"Nobuyoshi Nakada2019-05-061-1/+1
| | | | | | This reverts commit 5776ae347540ac19c40d146a3566a806cd176bf1. Mistaken `max` as `min`.
* Improve documentation for String#{dump,undump}Marcus Stollsteimer2019-05-051-4/+6
|
* * expand tabs.git2019-05-031-2/+2
|
* Improve performance of case-conversion methodsNobuyoshi Nakada2019-05-031-57/+160
|
* UTF-8 is one of byte based encodingsNobuyoshi Nakada2019-05-031-2/+2
|
* * expand tabs.git2019-05-021-2/+2
|
* Fix potential memory leakNobuyoshi Nakada2019-05-021-17/+32
|