aboutsummaryrefslogtreecommitdiffstats
path: root/string.c
Commit message (Collapse)AuthorAgeFilesLines
* * transcode.c: new file to provide encoding conversion features.matz2007-12-101-2/+2
| | | | | | code contributed by Martin Duerst. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14172 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * re.c (rb_reg_search): return byte offset. [ruby-dev:32452]nobu2007-12-101-17/+31
| | | | | | | | | | | | * re.c (rb_reg_match, rb_reg_match2, rb_reg_match_m): convert byte offset to char index. * string.c (rb_str_index): return byte offset. [ruby-dev:32472] * string.c (rb_str_split_m): calculate in byte offset. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14171 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * re.c (rb_reg_expr_str): use \xHH instead of \OOO.akr2007-12-091-2/+2
| | | | | | | | | | | | | | | | | | * regerror.c (to_ascii): ditto. (onig_snprintf_with_pattern): ditto. (onig_snprintf_with_pattern): ditto. * string.c (rb_str_inspect): ditto. (rb_str_dump): ditto. * parse.y (parser_yylex): ditto. * ruby.c (proc_options): ditto. * file.c (rb_f_test): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14164 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (tr_find): returns true if no characters to be removed isnobu2007-12-091-2/+2
| | | | | | | specified. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14151 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (tr_trans): get rid of segfaults when has mulitbytes butnobu2007-12-091-2/+2
| | | | | | | source sets have no mulitbytes. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14148 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * encoding.c (rb_enc_mbclen): make it never fail.akr2007-12-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (rb_enc_nth): don't check the return value of rb_enc_mbclen. (rb_enc_strlen): ditto. (rb_enc_precise_mbclen): return needmore(1) if e <= p. (rb_enc_get_ascii): new function for extracting ASCII character. * include/ruby/encoding.h (rb_enc_get_ascii): declared. * include/ruby/regex.h (ismbchar): removed. * re.c (rb_reg_expr_str): use rb_enc_get_ascii. (unescape_escaped_nonascii): use rb_enc_precise_mbclen to determine the termination of escaped non-ASCII character. (unescape_nonascii): use rb_enc_precise_mbclen. (rb_reg_quote): use rb_enc_get_ascii. (rb_reg_regsub): use rb_enc_get_ascii. * string.c (rb_str_reverse) don't check the return value of rb_enc_mbclen. (rb_str_split_m): don't call rb_enc_mbclen with e <= p. * parse.y (is_identchar): use ISASCII. (parser_ismbchar): removed. (parser_precise_mbclen): new macro. (parser_isascii): new macro. (parser_tokadd_mbchar): use parser_precise_mbclen to check invalid character precisely. (parser_tokadd_string): use parser_isascii. (parser_yylex): ditto. (is_special_global_name): don't call is_identchar with e <= p. (rb_enc_symname_p): ditto. [ruby-dev:32455] * ext/tk/sample/tkextlib/vu/canvSticker2.rb: remove coding cookie because the encoding is not UTF-8. [ruby-dev:32475] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14131 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * encoding.c (rb_enc_precise_mbclen): new function for mbclen withakr2007-12-061-11/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | validation. * include/ruby/encoding.h (rb_enc_precise_mbclen): declared. (MBCLEN_CHARFOUND): new macro. (MBCLEN_INVALID): new macro. (MBCLEN_NEEDMORE): new macro. * include/ruby/oniguruma.h (OnigEncodingTypeST): replace mbc_enc_len by precise_mbc_enc_len. (ONIGENC_PRECISE_MBC_ENC_LEN): new macro. (ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND): new macro. (ONIGENC_CONSTRUCT_MBCLEN_INVALID): new macro. (ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE): new macro. (ONIGENC_MBCLEN_CHARFOUND): new macro. (ONIGENC_MBCLEN_INVALID): new macro. (ONIGENC_MBCLEN_NEEDMORE): new macro. (ONIGENC_MBC_ENC_LEN): use ONIGENC_PRECISE_MBC_ENC_LEN. * enc/euc_jp.c: validation implemented. * enc/sjis.c: ditto. * enc/utf8.c: ditto. * string.c (rb_str_inspect): use rb_enc_precise_mbclen for invalid encoding. (rb_str_valid_encoding_p): new method String#valid_encoding?. * io.c (rb_io_getc): use rb_enc_precise_mbclen. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14119 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * include/ruby/encoding.h, encoding.c, re.c, string.c, parse.y: akr2007-11-271-5/+5
| | | | | | | | | | rename ENC_CODERANGE_SINGLE to ENC_CODERANGE_7BIT. rename ENC_CODERANGE_MULTI to ENC_CODERANGE_8BIT. Because single byte 8bit character, such as Shift_JIS 1byte katakana, is represented by ENC_CODERANGE_MULTI even if it is not multi byte. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14027 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * include/ruby/encoding.h (rb_enc_str_asciionly_p): declared.akr2007-11-251-0/+17
| | | | | | | | | | | | | | (rb_enc_str_asciicompat_p): defined. * re.c (rb_reg_initialize_str): use rb_enc_str_asciionly_p. (rb_reg_quote): return ascii-8bit string if the argument is ascii-only to generate encoding generic regexp if possible. (rb_reg_s_union): fix encoding handling. [ruby-dev:32094] * string.c (rb_enc_str_asciionly_p): defined. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14013 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * include/ruby/ruby.h: introduce 2 macros:ko12007-11-131-1/+1
| | | | | | | | | | | | | | | RFLOAT_VALUE(v), DOUBLE2NUM(dbl). Rename RFloat#value -> RFloat#double_value. Do not touch RFloat#double_value directly. * bignum.c, insns.def, marshal.c, math.c, numeric.c, object.c, pack.c, parse.y, process.c, random.c, sprintf.c, string.c, time.c: apply above changes. * ext/dl/mkcallback.rb, ext/json/ext/generator/generator.c: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13913 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (tr_trans): cast to unsigned char after dereferenceakr2007-11-101-1/+1
| | | | | | | | a pointer to a char to avoid SEGV with "\377".tr("a", "b"). on FreeBSD/amd64. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13872 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_squeeze_bang): initialize squeezing table if nonobu2007-11-091-1/+2
| | | | | | | arguments given. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13851 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (tr_setup_table, tr_trans): fix test failures in ↵davidflanagan2007-11-071-18/+20
| | | | | | test/ruby/test_string.rb git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13834 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (tr_setup_table): use C array for characters that fitmatz2007-11-031-37/+106
| | | | | | | | | | | | | | in a byte to gain performance. * string.c (rb_str_delete_bang): ditto. * string.c (rb_str_squeeze_bang): ditto. * string.c (rb_str_count): ditto. * string.c (tr_trans): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13812 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_substr): perfomance improvement. [ruby-dev:31806]nobu2007-10-291-11/+34
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13791 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_ord): use encoding.nobu2007-10-161-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13726 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_new4): should copy encoding. a patch from NARUSE,nobu2007-10-161-0/+1
| | | | | | | Yui <naruse AT airemix.com>. [ruby-dev:32076] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13714 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * encoding.c (rb_cEncoding): new Encoding class.nobu2007-10-131-11/+2
| | | | | | | | | | | | | | | | | * encoding.c (rb_to_encoding, rb_to_encoding_index): helper functions. * encoding.c (rb_obj_encoding): return Encoding object now. * gc.c (garbage_collect): mark Encoding objects. * string.c (rb_str_force_encoding): accept Encoding object as well as encoding name. * include/ruby/encoding.h (rb_to_encoding_index, rb_to_encoding): prototypes. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13692 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_enc_str_coderange): fixed checkfor non-ascii.nobu2007-10-101-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13669 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_to_i): update RDoc since base can be any valuematz2007-10-061-1/+1
| | | | | | between 2 and 36. [ruby-talk:272879] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13645 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * encoding.c (rb_enc_register): returns new index or -1 if failed.nobu2007-10-061-1/+10
| | | | | | | | | | | | * encoding.c (rb_enc_alias): check if original name is registered. * encoding.c (rb_enc_init): register in same order as kcode options in re.c. added new aliases. * string.c (rb_str_force_encoding): check if valid encoding name. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13643 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * insns.def (opt_eq): get rid of gcc bug.nobu2007-10-061-7/+7
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13641 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* revert rb_memcmp() change to pacify GCC optimizermatz2007-10-041-6/+6
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13623 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * re.c (rb_memcmp): no longer useful without ruby_ignorecase.matz2007-10-041-6/+6
| | | | | | * re.c (rb_reg_prepare_re): revert recompile condition. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13622 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * re.c (ignorecase_setter): change warning message.matz2007-10-041-3/+1
| | | | | | | | * re.c (ignorecase_getter): now gives warning. * string.c (rb_str_cmp_m): update RDoc document. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13620 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * encoding.c (rb_obj_encoding): returns encoding of the given object.nobu2007-10-041-16/+2
| | | | | | | | | * re.c (Init_Regexp): new method Regexp#encoding. * string.c (str_encoding): moved to encoding.c git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13613 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_append): always set encoding, and coderangenobu2007-09-301-1/+5
| | | | | | | | | | cache bits. * include/ruby/encoding.h (ENC_CODERANGE_SET): fixed a bug not to set chache bits. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13578 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * array.c (rb_ary_combination): new method to give all combinationmatz2007-09-291-1/+1
| | | | | | | | | | | | | of elements from an array. [ruby-list:42671] * array.c (rb_ary_product): a new method to get all combinations of elements from two arrays. can be extended to combinations of n-arrays, e.g. a.product(b,c,d). anyone volunteer? * array.c (rb_ary_permutation): empty function body to calculate permutations of array elements. need volunteer. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13568 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * encoding.c (rb_enc_alias): allow encodings multiple aliases.nobu2007-09-281-2/+26
| | | | | | | | | | | | | | | | | | | | * encoding.c (rb_enc_find_index): search the encoding which has the given name and return its index if found, or -1. * st.c (type_strcasehash): case-insensitive string hash type. * string.c (rb_str_force_encoding): force encoding of self. this name comes from [ruby-dev:31894] by Martin Duerst. [ruby-dev:31744] * include/ruby/encoding.h (rb_enc_find_index, rb_enc_associate_index): prototyped. * include/ruby/encoding.h (rb_enc_isctype): direct interface to ctype. * include/ruby/st.h (st_init_strcasetable): prototyped. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13556 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_comparable): need not to check asciicompat here.matz2007-09-281-18/+20
| | | | | | | | | | * encoding.c (rb_enc_check): ditto. * string.c (rb_enc_str_coderange): tuned a bit; no broken check. * encoding.c (rb_enc_check): new encoding comparison criteria. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13547 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_associate_encoding): commit miss.nobu2007-09-261-16/+0
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13530 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * encoding.c (rb_enc_associate_index): deal with ASCII compatiblenobu2007-09-261-19/+70
| | | | | | | | | | | | | | | | | | | | flags. * encoding.c (rb_enc_check): allow ASCII compatible strings. * parse.y (rb_intern_str): use ASCII encoding for ASCII string. * string.c (rb_enc_str_coderange): check for code-range. * string.c (rb_str_modify): clear code-range flags. * string.c (rb_str_hash, rb_str_eql): ASCII compatible strings are comparable. * include/ruby/encoding.h: added code-range flags. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13529 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * encoding.c (rb_enc_check): check for ASCII-compatibilities.nobu2007-09-261-5/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * parse.y (parser_tokadd_string, parser_parse_string, parser_here_document, parser_yylex): set encoding to US-ASCII. * parse.y (rb_enc_symname_p): check if valid with encoding. * parse.y (rb_intern3): let symbols have encoding. * string.c (rb_str_hash): add encoding index. * string.c (rb_str_comparable, rb_str_equal, rb_str_eql): check if compatible encoding. * string.c (sym_inspect): made encoding aware. * insns.def (opt_eq): compare with encoding. * include/ruby/encoding.h (rb_enc_asciicompat): check if ASCII compatible. * include/ruby/encoding.h (rb_enc_get_index): added prototype. * include/ruby/intern.h (rb_str_comparable, rb_str_equal): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13518 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * re.c (rb_reg_match_m): evaluate a block if match. it would makematz2007-09-201-2/+19
| | | | | | | | condition statement much shorter, if no else clause is needed. * string.c (rb_str_match_m): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13475 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_rstrip_bang): fixed too much rstrip. [ruby-dev:31786]kou2007-09-151-0/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13449 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * encoding.c (rb_enc_associate_index, rb_enc_get_index): check ifnobu2007-09-151-0/+1
| | | | | | | | | object is encoding capable. [ruby-dev:31780] * string.c (rb_str_subpat_set): check for if the argument is a String. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13447 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * array.c (rb_ary_cycle): typo in rdoc. a patch from Yuguimatz2007-09-061-6/+9
| | | | | | <yugui@yugui.sakura.ne.jp>. [ruby-dev:31748] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13348 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_succ, rb_str_chop_bang, rb_str_chop): m17n support.nobu2007-09-061-26/+100
| | | | | | | [ruby-dev:31734] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13347 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_splice): integer overflow for length.matz2007-09-051-1/+1
| | | | | | [ruby-dev:31739] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13342 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (tr_trans, rb_str_squeeze_bang, rb_str_split_m): suppressnobu2007-08-301-2/+3
| | | | | | | warnings. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13315 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (str_gsub): should not use mbclen2() which has broken API.matz2007-08-291-2/+4
| | | | | | * re.c: remove rb_reg_mbclen2(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13308 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_subseq): retrieve substring based on byte offset.matz2007-08-281-15/+27
| | | | | | | * string.c (rb_str_rindex_m): was confusing character offset and byte offset. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13295 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_splice_0): should check to modify. [ruby-dev:31665]nobu2007-08-281-0/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13293 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_each_line): should swallow sequence of newlinesmatz2007-08-271-1/+7
| | | | | | if rs (optional argument) is an empty string. [ruby-dev:31652] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13289 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_rstrip_bang): wrong strip point. [ruby-dev:31652]matz2007-08-271-2/+2
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13288 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (sym_encoding): return the encoding of a Symbol.nobu2007-08-271-1/+7
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13283 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (tr_trans): wrong condition for mbmaxlen==1 strings.matz2007-08-271-4/+5
| | | | | | [ruby-dev:31652] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13281 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c, include/ruby/intern.h: export rb_str_length().ko12007-08-251-1/+1
| | | | | | | | * insns.def: use rb_str_lengt() in opt_length. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13271 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * string.c (rb_str_splice): return from void funtion.nobu2007-08-251-3/+3
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13268 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * encoding.c: provide basic features for M17N.matz2007-08-251-393/+718
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * parse.y: encoding aware parsing. * parse.y (pragma_encoding): encoding specification pragma. * parse.y (rb_intern3): encoding specified symbols. * string.c (rb_str_length): length based on characters. for older behavior, bytesize method added. * string.c (rb_str_index_m): index based on characters. rindex as well. * string.c (succ_char): encoding aware succeeding string. * string.c (rb_str_reverse): reverse based on characters. * string.c (rb_str_inspect): encoding aware string description. * string.c (rb_str_upcase_bang): encoding aware case conversion. downcase, capitalize, swapcase as well. * string.c (rb_str_tr_bang): tr based on characters. delete, squeeze, tr_s, count as well. * string.c (rb_str_split_m): split based on characters. * string.c (rb_str_each_line): encoding aware each_line. * string.c (rb_str_each_char): added. iteration based on characters. * string.c (rb_str_strip_bang): encoding aware whitespace stripping. lstrip, rstrip as well. * string.c (rb_str_justify): encoding aware justifying (ljust, rjust, center). * string.c (str_encoding): get encoding attribute from a string. * re.c (rb_reg_initialize): encoding aware regular expression * sprintf.c (rb_str_format): formatting (i.e. length count) based on characters. * io.c (rb_io_getc): getc to return one-character string. for older behavior, getbyte method added. * ext/stringio/stringio.c (strio_getc): ditto. * io.c (rb_io_ungetc): allow pushing arbitrary string at the current reading point. * ext/stringio/stringio.c (strio_ungetc): ditto. * ext/strscan/strscan.c: encoding support. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13261 b2dd03c8-39d4-4d8f-98ff-823fe69b080e