aboutsummaryrefslogtreecommitdiffstats
path: root/string.c
Commit message (Collapse)AuthorAgeFilesLines
* Reuse match dataNobuyoshi Nakada2019-07-281-2/+5
| | | | * string.c (rb_str_split_m): reuse occupied match data. [Bug #16024]
* Occupy match dataNobuyoshi Nakada2019-07-271-1/+3
| | | | | * string.c (rb_str_split_m): occupy match data not to be modified during yielding the block. [Bug #16024]
* string.c (str_succ): refactoringYusuke Endoh2019-07-141-3/+3
| | | | Use more communicative variable name
* string.c (str_succ): remove a unnecessary assignmentYusuke Endoh2019-07-141-1/+0
| | | | This change will suppress Coverity Scan warnings
* * expand tabs.git2019-07-141-1/+1
|
* Prefer `rb_error_arity` to `rb_check_arity` when it can be usedYusuke Endoh2019-07-141-1/+1
|
* Check that String#scrub block does not modify receiverJeremy Evans2019-07-021-7/+12
| | | | | | | Similar to the check used for String#gsub. Can fix possible segfault. Fixes [Bug #15941]
* Make String#-@ not freeze receiver if called on unfrozen subclass instanceJeremy Evans2019-07-021-0/+3
| | | | | | | | | rb_fstring behavior in this case is to freeze the receiver. I'm not sure if that should be changed, so this takes the conservative approach of duping the receiver in String#-@ before passing to rb_fstring. Fixes [Bug #15926]
* * expand tabs.git2019-06-291-2/+2
|
* Fixed String#grapheme_clusters with wide encodingsNobuyoshi Nakada2019-06-291-2/+23
| | | | | | | | * string.c (get_reg_grapheme_cluster): make regexp from properly encoded sources fro wide-char encodings. [Bug #15965] * regparse.c (node_extended_grapheme_cluster): suppress false duplicated range warning for the time being.
* Resize capacity for fstringJohn Hawthorn2019-06-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | When a string is #frozen, it's capacity is resized to fit (if it is much larger), since we know it will no longer be mutated. > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000)) {"type":"STRING", "class":"0x7feaf00b7bf0", "bytesize":30, "capacity":1000, "value":"... > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000).freeze) {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "bytesize":30, "value":"... (ObjectSpace.dump doesn't show capacity if capacity is equal to bytesize) Previously, if we dedup into an fstring, using String#-@, capacity would not be reduced. > puts ObjectSpace.dump(-String.new("a"*30, capacity: 1000)) {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "fstring":true, "bytesize":30, "capacity":1000, "value":"... This commit makes rb_fstring call rb_str_resize, the same as rb_str_freeze does. Closes: https://github.com/ruby/ruby/pull/2256
* * expand tabs.git2019-06-211-1/+1
|
* Get rid of undefined behaviorNobuyoshi Nakada2019-06-211-1/+1
| | | | | * string.c (rb_str_sub_bang): str and repl can be same. [Bug #15946]
* New buffer for shared stringNobuyoshi Nakada2019-06-191-0/+9
| | | | | * string.c (rb_str_init): allocate new buffer if the string is shared. [Bug #15937]
* Preserve the string content at self-copyingNobuyoshi Nakada2019-06-191-1/+4
| | | | | * string.c (rb_str_init): preserve the embedded content when self-copying with a capacity. [Bug #15937]
* Fix memory leakNobuyoshi Nakada2019-06-181-1/+4
| | | | | | | * string.c (str_make_independent_expand): free independent buffer. [Bug# 15935] Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
* * expand tabs.git2019-06-181-1/+1
|
* String#b: Don't depend on dependent stringAlan Wu2019-06-181-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | Registering a string that depend on a dependent string as fstring can lead to use-after-free. See c06ddfe and 3f95620 for details. The following script triggers use-after-free on trunk, 2.4.6, 2.5.5 and 2.6.3. Credits to @wanabe for using eval as a cross-version way of registering a fstring. ```ruby a = ('j' * 24).b.b eval('', binding, a) p a 4.times { GC.start } p a ``` - string.c (str_replace_shared_without_enc): when given a dependent string, depend on the root of the dependent string. [Bug #15934]
* Fix memory leakNobuyoshi Nakada2019-06-161-0/+7
| | | | | | | | | | | | | | * string.c (str_replace_shared_without_enc): free previous buffer before replaced. * parse.y (gettable): make sure in advance that the `__FILE__` object shares a fstring, to get rid of replacement with the fstring later. TODO: this hack may be needed in other places. [Bug #15916] Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
* Symbol just represents a nameNobuyoshi Nakada2019-05-141-2/+2
|
* str_duplicate: Don't share with a frozen shared stringAlan Wu2019-05-091-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up for 3f9562015e651735bfc2fdd14e8f6963b673e22a. Before this commit, it was possible to create a shared string which shares with another shared string by passing a frozen shared string to `str_duplicate`. Such string looks like: ``` -------- ----------------- | root | ------ owns -----> | root's buffer | -------- ----------------- ^ ^ ^ ----------- | | | shared1 | ------ references ----- | ----------- | ^ | ----------- | | shared2 | ------ references --------- ----------- ``` This is bad news because `rb_fstring(shared2)` can make `shared1` independent, which severs the reference from `shared1` to `root`: ```c /* from fstr_update_callback() */ str = str_new_frozen(rb_cString, shared2); /* can return shared1 */ if (STR_SHARED_P(str)) { /* shared1 is also a shared string */ str_make_independent(str); /* no frozen check */ } ``` If `shared1` was the only reference to `root`, then `root` can be reclaimed by the GC, leaving `shared2` in a corrupted state: ``` ----------- -------------------- | shared1 | -------- owns --------> | shared1's buffer | ----------- -------------------- ^ | ----------- ------------------------- | shared2 | ------ references ----> | root's buffer (freed) | ----------- ------------------------- ``` Here is a reproduction script for the situation this commit fixes. ```ruby a = ('a' * 24).strip.freeze.strip -a p a 4.times { GC.start } p a ``` - string.c (str_duplicate): always share with the root string when the original is a shared string. - test_rb_str_dup.rb: specifically test `rb_str_dup` to make sure it does not try to share with a shared string. [Bug #15792] Closes: https://github.com/ruby/ruby/pull/2159
* Revert "UTF-8 is one of byte based encodings"Nobuyoshi Nakada2019-05-061-1/+1
| | | | | | This reverts commit 5776ae347540ac19c40d146a3566a806cd176bf1. Mistaken `max` as `min`.
* Improve documentation for String#{dump,undump}Marcus Stollsteimer2019-05-051-4/+6
|
* * expand tabs.git2019-05-031-2/+2
|
* Improve performance of case-conversion methodsNobuyoshi Nakada2019-05-031-57/+160
|
* UTF-8 is one of byte based encodingsNobuyoshi Nakada2019-05-031-2/+2
|
* * expand tabs.git2019-05-021-2/+2
|
* Fix potential memory leakNobuyoshi Nakada2019-05-021-17/+32
|
* this variable is not guaranteed alignedUrabe, Shyouhei2019-04-291-1/+1
| | | | No problem for unaligned-ness because we never dereference.
* fix typoUrabe, Shyouhei2019-04-291-1/+1
|
* Get rid of indirect sharingNobuyoshi Nakada2019-04-271-3/+8
| | | | | | | | | * string.c (str_duplicate): share the root shared string if the original string is already sharing, so that all shared strings refer the root shared string directly. indirect sharing can cause a dangling pointer. [Bug #15792]
* string.c: warn non-nil $;nobu2019-04-181-0/+6
| | | | | | | | * string.c (rb_str_split_m): warn use of non-nil $;. * string.c (rb_fs_setter): warn when set to non-nil value. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67603 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: improve splitting into charsnobu2019-04-171-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | * string.c (rb_str_split_m): improve splitting into chars by an empty string, without a regexp. Comparison: to_chars-1 built-ruby: 1273527.6 i/s compare-ruby: 189423.3 i/s - 6.72x slower to_chars-10 built-ruby: 120993.5 i/s compare-ruby: 37075.8 i/s - 3.26x slower to_chars-100 built-ruby: 15646.4 i/s compare-ruby: 4012.1 i/s - 3.90x slower to_chars-1000 built-ruby: 1295.1 i/s compare-ruby: 408.5 i/s - 3.17x slower git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67582 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] fix reference to sprintf [ci skip]nobu2019-03-201-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67312 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] remove unnecessary markups [ci skip]nobu2019-03-201-97/+98
| | | | | | | * string.c: remove <code> markups, which are not only unnecessary but also prevented cross-references. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67311 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] fix indent [ci skip]nobu2019-03-201-43/+43
| | | | | | | * string.c (rb_str_crypt): fix indent not to make the whole list verbatim entirely. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67310 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: respect the actual encodingnobu2019-03-051-2/+3
| | | | | | | | * string.c (rb_enc_str_coderange): respect the actual encoding of if a BOM presents, and scan for the actual code range. [ruby-core:91662] [Bug #15635] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67167 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (chopped_length): early return for empty stringsnobu2019-02-071-1/+1
| | | | | | | | [Bug #11391] From: Franck Verrot <franck@verrot.fr> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67018 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Add more example of `String#dump`kazu2019-01-221-2/+3
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66906 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Improvements to documentation.samuel2019-01-211-4/+4
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66897 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_dump): Fix the rdocmame2019-01-211-1/+4
| | | | | | * Officially states that String#dump is intended for round-trip. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Use `&` instead of `modulo`nobu2019-01-151-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66830 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* setbyte / ungetbyte allow out-of-range integersshyouhei2019-01-151-5/+4
| | | | | | | | | | | | * string.c: String#setbyte to accept arbitrary integers [Bug #15460] * io.c: ditto for IO#ungetbyte * ext/strringio/stringio.c: ditto for StringIO#ungetbyte git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66824 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Defer escaping control char in error messagesnobu2019-01-081-15/+22
| | | | | | | | * eval_error.c (print_errinfo): defer escaping control char in error messages until writing to stderr, instead of quoting at building the message. [ruby-core:90853] [Bug #15497] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66753 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: remove the deprecation warnings of `String#bytes` with blockmame2018-12-261-18/+1
| | | | | | | And its friends: lines, chars, grapheme_clusters, and codepoints. [Feature #6670] [ruby-core:90728] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66579 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Revert "string.c: remove the deprecation warnings of `String#bytes` with block"mame2018-12-261-1/+18
| | | | | | Forgot to write the ticket number in the commit log... git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66578 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: remove the deprecation warnings of `String#bytes` with blockmame2018-12-261-18/+1
| | | | | | And its friends: lines, chars, grapheme_clusters, and codepoints. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] fix typosstomar2018-12-121-5/+5
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66375 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* implement special behavior for Georgian for String#capitalizeduerst2018-12-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The modern Georgian script is special in that it has an 'uppercase' variant called MTAVRULI which can be used for emphasis of whole words, for screamy headlines, and so on. However, in contrast to all other bicameral scripts, there is no usage of capitalizing the first letter in a word or a sentence. Words with mixed capitalization are not used at all. We therefore implement special behavior for String#capitalize. Formally, we define String#capitalize as first applying String#downcase for the whole string, then using titlecase on the first letter. Because Georgian defines titlecase as the identity function both for MTAVRULI ('uppercase') and Mkhedruli (lowercase), this results in String#capitalize being equivalent to String#downcase for Georgian. This avoids undesirable mixed case. * enc/unicode.c: Actual implementation * string.c: Add mention of this special case for documentation * test/ruby/enc/test_case_mapping.rb: Add two tests, a general one that uses String#capitalize on some (including nonsensical) combinations of MTAVRULI and Mkhedruli, and a canary test to detect the potential assignment of characters to the currently open slots (holes) at U+1CBB and U+1CBC. * test/ruby/enc/test_case_comprehensive.rb: Tweak generation of expectation data. Together with r65933, this closes issue #14839. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* suppress warning: unused variable 'vbits'naruse2018-12-061-1/+0
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66245 b2dd03c8-39d4-4d8f-98ff-823fe69b080e