aboutsummaryrefslogtreecommitdiffstats
path: root/string.c
Commit message (Collapse)AuthorAgeFilesLines
* Guard match from GC when scanning stringPeter Zhu2023-11-271-11/+17
| | | | | We need to guard match from GC because otherwise it could end up being reclaimed or moved in compaction.
* Specialize String#dupJean Boussier2023-11-201-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | `String#+@` is 2-3 times faster than `String#dup` because it can directly go through `rb_str_dup` instead of using the generic much slower `rb_obj_dup`. This fact led to the existance of the ugly `Performance/UnfreezeString` rubocop performance rule that encourage users to rewrite the much more readable and convenient `"foo".dup` into the ugly `(+"foo")`. Let's make that rubocop rule useless. ``` compare-ruby: ruby 3.3.0dev (2023-11-20T02:02:55Z master 701b0650de) [arm64-darwin22] last_commit=[ruby/prism] feat: add encoding for IBM865 (https://github.com/ruby/prism/pull/1884) built-ruby: ruby 3.3.0dev (2023-11-20T12:51:45Z faster-str-lit-dup 6b745bbc5d) [arm64-darwin22] warming up.. | |compare-ruby|built-ruby| |:------|-----------:|---------:| |uplus | 16.312M| 16.332M| | | -| 1.00x| |dup | 5.912M| 16.329M| | | -| 2.76x| ```
* String#force_encoding don't clear coderange if encoding is unchangedJean Boussier2023-11-091-1/+17
| | | | | | | | | | Some code out there blind calls `force_encoding` without checking what the original encoding was, which clears the coderange uselessly. If the String is big, it can be a rather costly mistake. For instance the `rack-utf8_sanitizer` gem does this on request bodies.
* String for string literal is not resizableNobuyoshi Nakada2023-11-081-1/+1
|
* Make String.new size pools aware.Jean Boussier2023-11-021-0/+93
| | | | | | | If the required capacity would fit in an embded string, returns one. This can reduce malloc churn for code that use string buffers.
* [DOC] Missing comment markersNobuyoshi Nakada2023-09-271-1/+1
|
* [Bug #19902] Update the coderange regarding the changed regionNobuyoshi Nakada2023-09-261-0/+27
|
* Use end of char boundary in start_with?John Hawthorn2023-09-011-2/+2
| | | | | | | | | | | | | | | | Previously we used the next character following the found prefix to determine if the match ended on a broken character. This had caused surprising behaviour when a valid character was followed by a UTF-8 continuation byte. This commit changes the behaviour to instead look for the end of the last character in the prefix. [Bug #19784] Co-authored-by: ywenc <ywenc@github.com> Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
* [Bug #19784] Fix behaviors against prefix with broken encodingNobuyoshi Nakada2023-08-261-13/+43
| | | | | | - String#start_with? - String#delete_prefix - String#delete_prefix!
* Introduce `at_char_boundary` functionNobuyoshi Nakada2023-08-261-5/+4
|
* Fix premature string collection during appendAlan Wu2023-08-231-0/+2
| | | | | | | | | | | | | | | | | | | | Previously, the following crashed due to use-after-free with AArch64 Alpine Linux 3.18.3 (aarch64-linux-musl): ```ruby str = 'a' * (32*1024*1024) p({z: str}) ``` 32 MiB is the default for `GC_MALLOC_LIMIT_MAX`, and the crash could be dodged by setting `RUBY_GC_MALLOC_LIMIT_MAX` to large values. Under a debugger, one can see the `str2` of rb_str_buf_append() getting prematurely collected while str_buf_cat4() allocates capacity. Add GC guards so the buffer of `str2` lives across the GC run initiated in str_buf_cat4(). [Bug #19792]
* Use STR_EMBED_P instead of testing STR_NOEMBEDPeter Zhu2023-08-221-9/+9
|
* Don't check for STR_NOEMBED in rb_fstringPeter Zhu2023-08-181-1/+2
| | | | | We don't need to check for STR_NOEMBED because the check above for STR_EMBED_P means that it can never be false.
* [DOC] Don't suppress autolinks (#8208)Burdette Lamar2023-08-111-31/+31
|
* No computing embed_capa_max in str_subseqKunshan Wang2023-08-031-4/+19
| | | | | Fix str_subseq so that it does not attempt to predict the size of the object returned by str_alloc_heap.
* Fill terminator properlyNobuyoshi Nakada2023-07-281-1/+3
|
* [Bug #19769] Fix range of size 1 in `String#tr`alexandre1842023-07-151-2/+4
|
* Make the string index functions closer to symmetricNobuyoshi Nakada2023-07-091-52/+32
| | | | So that irregular parts may be more noticeable.
* Make `rb_str_rindex` return byte indexNobuyoshi Nakada2023-07-091-3/+7
| | | | | | Leave callers to convert byte index to char index, as well as `rb_str_index`, so that `rb_str_rpartition` does not need to re-convert char index to byte index.
* [Bug #19763] Raise same message exception for regexpNobuyoshi Nakada2023-07-091-2/+3
|
* Ensure the byte position is a valid boundaryNobuyoshi Nakada2023-06-281-20/+11
|
* [Bug #19748] Fix out-of-bound access in `String#byteindex`Nobuyoshi Nakada2023-06-281-10/+7
|
* [Bug #19746] `String#index` with regexp should clear `$~` unless matchedNobuyoshi Nakada2023-06-281-2/+6
|
* [DOC] Regexp doc (#7923)Burdette Lamar2023-06-201-3/+3
|
* Assign into optimal size pools using String#split("")Matt Valentine-House2023-06-091-1/+4
| | | | | | | | | | | | | When String#split is used with an empty string as the field seperator it effectively splits the original string into chars, and there is a pre-existing fast path for this using SPLIT_TYPE_CHARS. However this path creates an empty array in the smallest size pool and grows from there, despite already knowing the size of the desired array. This commit pre-allocates the correct size array in this case in order to allow the arrays to be embedded and avoid being allocated in the transient heap
* Unify length field for embedded and heap strings (#7908)Peter Zhu2023-06-061-56/+48
| | | | | | | | * Unify length field for embedded and heap strings The length field is of the same type and position in RString for both embedded and heap allocated strings, so we can unify it. * Remove RSTRING_EMBED_LEN
* [DOC] Update flags doc for stringsPeter Zhu2023-06-051-1/+0
| | | | The length of an embedded string is no longer in the flags.
* Simplify duplicated codePeter Zhu2023-06-011-7/+3
| | | | | The capacity of the string can be calculated using the str_capacity function.
* Don't refetch ptr and lenPeter Zhu2023-06-011-4/+0
| | | | | The call to RSTRING_GETMEM already fetched the pointer and length, so we don't need to fetch it again.
* Remove dead code in string.cPeter Zhu2023-05-261-11/+0
| | | | The STR_DEC_LEN macro is not used.
* [Feature #19474] Refactor NEWOBJ macrosMatt Valentine-House2023-04-061-8/+8
| | | | NEWOBJ_OF is now our canonical newobj macro. It takes an optional ec
* [Feature #19579] Remove !USE_RVARGC code (#7655)Peter Zhu2023-04-041-74/+7
| | | | | | | | | | | Remove !USE_RVARGC code [Feature #19579] The Variable Width Allocation feature was turned on by default in Ruby 3.2. Since then, we haven't received bug reports or backports to the non-Variable Width Allocation code paths, so we assume that nobody is using it. We also don't plan on maintaining the non-Variable Width Allocation code, so we are going to remove it.
* RJIT: Optimize String#bytesizeTakashi Kokubun2023-03-181-1/+1
|
* Stop exporting symbols for MJITTakashi Kokubun2023-03-061-7/+7
|
* Optimize String#getbyteTakashi Kokubun2023-03-051-1/+1
|
* rb_str_modify_expand: clear the string coderangeRĂ´mulo Ceccon2023-03-031-0/+1
| | | | | | | | | | [Bug #19468] b0b9f7201acab05c2a3ad92c3043a1f01df3e17f errornously stopped clearing the coderange. Since `rb_str_modify` clears it, `rb_str_modify_expand` should too.
* Fix spelling (#7389)John Bampton2023-02-271-1/+1
|
* Symbol#end_with? accepts Strings onlyAdam Daniels2023-02-271-1/+1
| | | | Regular expressions are not supported (same as String#end_with?).
* Remove (newly unneeded) remarks about aliasesBurdetteLamar2023-02-191-19/+0
|
* [DOC] Small adjustment for String method docszverok2023-02-191-1/+13
| | | | | * Hide freeze method (no useful docs, same as Object#freeze) * Add dedup to call-seq of str_uminus
* Rename rb_str_splice_{0,1} -> rb_str_update_{0,1}Matt Valentine-House2023-02-091-6/+6
|
* Remove alias macro rb_str_spliceMatt Valentine-House2023-02-091-7/+5
|
* Merge gc.h and internal/gc.hMatt Valentine-House2023-02-091-1/+0
| | | | [Feature #19425]
* Mark "mapping_buffer" as write barrier protectedJean Boussier2023-02-031-1/+2
| | | | It doesn't have any reference so it can be marked as protected.
* [Feature #19314] Add new arguments of String#bytespliceShugo Maeda2023-01-201-31/+76
| | | | | | | bytesplice(index, length, str, str_index, str_length) -> string bytesplice(range, str, str_range) -> string In these forms, the content of +self+ is replaced by str.byteslice(str_index, str_length) or str.byteslice(str_range); however the substring of +str+ is not allocated as a new string.
* String#bytesplice should return selfShugo Maeda2023-01-191-2/+2
| | | | | | | | In Feature #19314, we concluded that the return value of String#bytesplice should be changed from the source string to the receiver, because the source string is useless and confusing when extra arguments are added. This change should be included in Ruby 3.2.1.
* Use str_enc_copy_direct to improve performanceMatt Valentine-House2023-01-131-1/+1
| | | | | | | str_enc_copy_direct copies the string encoding over without checking the frozen status of the string. Because we know that we're safe here (we only use this function when interpolating strings on the stack via a concatstrings instruction) we can safely skip this check
* Remove MIN_PRE_ALLOC_SIZE from Strings.Matt Valentine-House2023-01-131-13/+4
| | | | | This optimisation is no longer helpful now that we use VWA to allocate strings in larger size pools where they can be embedded.
* Add str_enc_copy_directPeter Zhu2023-01-121-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds str_enc_copy_direct, which is like str_enc_copy but does not check the frozen status of str1 and does not check the validity of the encoding of str2. This makes certain string operations ~5% faster. ```ruby puts(Benchmark.measure do 100_000_000.times do "a".downcase end end) ``` Before this patch: ``` 7.587598 0.040858 7.628456 ( 7.669022) ``` After this patch: ``` 7.133128 0.039809 7.172937 ( 7.183124) ```
* Set STR_SHARED_ROOT flag on root of stringPeter Zhu2023-01-091-0/+1
|