aboutsummaryrefslogtreecommitdiffstats
path: root/enc
Commit message (Collapse)AuthorAgeFilesLines
* * unicode.c (PROPERTY_NAME_MAX_SIZE): use MAX_WORD_LENGTH.naruse2009-08-261-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24677 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode.c (onigenc_unicode_mbc_case_fold): balanced braces.nobu2009-08-261-2/+3
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24658 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/name2ctype.h: updated.nobu2009-08-251-365/+9744
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24657 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Update Oniguruma's UnicodeData to 5.1.naruse2009-08-256-9062/+49628
| | | | | | | | | | | | | | | | | | | | * tool/enc-unicode.rb: added for generate name2ctype.kwd. contributed by Run Paint Run Run [ruby-core:24775] use like following: ruby19 tool/enc-unicode.rb enc/unicode/UnicodeData.txt \ enc/unicode/Scripts.txt > enc/unicode/name2ctype.kwd * enc/unicode.c (CodeRanges): move definitions to name2ctype.h. * enc/unicode/name2ctype.h.blt, enc/unicode/name2ctype.kwd, enc/unicode/name2ctype.src: updated to v5.1. * enc/unicode/UnicodeData.txt, enc/unicode/Scripts.txt: added v5.1. * Makefile.in: add rule to generate name2ctype.kwd from UnicodeData.txt and Scripts.txt. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24651 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/name2ctype.h: split from enc/unicode.c and made anobu2009-08-215-150/+1448
| | | | | | | perfect hash. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24613 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/utf_8.c (code_to_mbc): suppressed a warning.nobu2009-08-211-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24607 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode.c (CodeRanges): initialized statically.nobu2009-08-191-163/+133
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24582 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/Makefile.in (MKDIRS): revert r24525.naruse2009-08-141-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24538 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * configure.in, Makefile.in (MAKEDIRS): used MKDIR_P instead ofnobu2009-08-131-1/+1
| | | | | | | as_mkdir_p. [ruby-dev:39063] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24525 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/encdb.c (ENC_SET_BASE): fix typo. patch by ujihisa [ruby-dev:39004]naruse2009-08-041-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24386 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* More strict for Big5 series.naruse2009-08-041-5/+74
| | | | | | | | | | | | | | * enc/big5.c (EncLen_Big5): back to original Big5 table. (EncLen_Big5_HKSCS): for Big5-HKSCS. (trans): add the lead byte table for Big5-HKSCS. (big5_mbc_enc_len): abstract function for Big5 series. (big5_mbc_enc_len): for Big5. (big5_hkscs_mbc_enc_len): for Big5-HKSCS. (BIG5_HKSCS_P): added. (BIG5_ISMB_FIRST): add routine for Big5-HKSCS. (big5_hkscs): add for Big5-HKSCS. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24384 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Add functions and macros for second encoding definitions.naruse2009-08-041-0/+2
| | | | | | | | | | | | | * encoding.c (rb_enc_set_base): Add for setting base encoding with their names. this is internal function. * template/encdb.h.tmpl: specify ENC_SET_BASE for second encodings in each encoding files. * enc/encdb.c (rb_enc_set_base): add a declaration. (ENC_SET_BASE): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24383 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/big5.c: not executable.nobu2009-07-251-0/+0
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24269 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/big5.c: Fix EncLen_BIG5 for Big5-HKSCS. see [ruby-core:24390]naruse2009-07-241-18/+34
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24267 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/big5.trans, big5-hkscs-tbl.rb:duerst2009-07-243-0/+18390
| | | | | | | | | | | | | | | | new Chinese BIG5-HKSCS transcoding (with Tatsuya Mizuno) * test/ruby/test_transcode.rb: added tests for the above (with Tatsuya Mizuno) * enc/big5.c: Added BIG5-HKSCS as a replicate encoding of BIG5 (short term solution, needs more work; with Tatsuya Mizuno) * tool/transcode-tblgen.rb: made 'pat' directly accessible in class StrSet git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24264 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * ruby.c (process_options), enc/prelude.rb: encdb and transdb arenobu2009-06-221-1/+1
| | | | | | | extension libraries. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23813 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/utf8_mac.trans: remove wrong optimization.naruse2009-06-131-9/+0
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23686 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Fix: DON'T move in_p because before in_p is replaced by buffered data.naruse2009-05-051-7/+7
| | | | | | | | | | | | | | | | * transcode.c: NOMAP is now multibyte direct map. * transcode.c: remove ASIS. * transcode_data.h: ditto. * tool/transcode-tb (ActionMap#generate_info): remove :asis. * tool/transcode-tb (ActionMap#generate_info): add :nomap0. * enc/trans/utf8_mac.trans: replace :asis by :nomap0. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23344 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/utf8_mac-tbl.rb: don't use Unicode escape.naruse2009-05-022-955/+949
| | | | | | * enc/trans/utf8_mac.trans: follow above. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23325 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/utf8_mac.trans: get rid of a 1.9 feature for crossnobu2009-04-301-2/+2
| | | | | | | compile. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23309 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Add new transcoder: CP51932 <-> CP50221.naruse2009-04-292-0/+201
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23307 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/utf8_mac.trans: Add converter for UTF8-MAC.naruse2009-04-262-0/+1202
| | | | | | | | * enc/trans/utf8_mac-tbl.rb: ditto. * test/ruby/test_econv.rb: tests for above. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23296 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/depend (link_so): replaces $(TARGET) with basename of thenobu2009-03-221-2/+9
| | | | | | | target. [ruby-talk:330286] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23035 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/depend: extract comile rules to each target for VC++.usa2009-01-301-11/+4
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21892 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * common.mk (distclean-enc, realclean-enc): do not call clean ofnobu2009-01-271-0/+1
| | | | | | | | | | | | | | | | | | | enc.mk twice or more. * enc/depend (cleanobjs): added deffile. * lib/mkmf.rb (create_makefile): removes deffile at clean instead of distclean. * win32/Makefile.sub (miniruby, LIBRUBY_SO): removes lib and exp files. * win32/Makefile.sub (clean, distclean): have moved to common.mk. * win32/rmdirs.bat: omits `not empty' message. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21790 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/gb18030.trans: get rid of a 1.9 feature for crossnobu2009-01-141-5/+4
| | | | | | | compile. [ruby-core:21345] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21512 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/gb18030.trans, gb18030-tbl.rb:duerst2009-01-142-0/+63415
| | | | | | | | | | | | | | | new Chinese GB18030 transcoding (from Yoshihiro Kambayashi) * test/ruby/test_transcode.rb: added tests for the above (from Yoshihiro Kambayashi) * transcode_data.h, transcode.c, tool/transcode_tblgen.rb: added support for GB18030-specific 4-byte sequences (with Yoshihiro Kambayashi) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21509 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * template/{encdb,transdb}.h.tmpl: moved enc/make_encdb.rb andnobu2009-01-132-138/+0
| | | | | | | | | enc/trans/make_transdb.rb using tool/generic_erb.rb. * common.mk (encdb.h, transdb.h): generates from avobe template. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21490 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/make_transdb.rb (converters): should not depend on thenobu2009-01-131-1/+5
| | | | | | | hash order for cross compile. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21489 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/gbk.trans, gbk-tbl.rb:duerst2009-01-042-0/+21809
| | | | | | | | | | new Chinese GBK transcoding (from Yoshihiro Kambayashi) * test/ruby/test_transcode.rb: added tests for the above (from Yoshihiro Kambayashi) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21315 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * test/ruby/test_transcode.rb: added tests for GB2312duerst2009-01-041-0/+3
| | | | | | | | | | (from Yoshihiro Kambayashi) * enc/trans/chinese.trans: set valid byte patterns for GB2312 and GB12345 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21314 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/big5.trans, big5-tbl.rb:duerst2009-01-042-0/+13721
| | | | | | | | | | new Chinese Big5 transcoding (from Yoshihiro Kambayashi) * test/ruby/test_transcode.rb: added tests for the above (from Yoshihiro Kambayashi) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21313 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* change encoding name.naruse2009-01-031-2/+2
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21285 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/chinese.trans: added for transcoding EUC-CN and GB12345.naruse2009-01-035-0/+30331
| | | | | | * enc/trans/GB/: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21283 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/single_byte.trans, cp850-tbl.rb, cp852-tbl.rb,duerst2008-12-097-0/+745
| | | | | | | | | | | cp855-tbl.rb, koi8-r-tbl.rb, koi8-u-tbl.rb, tis-620-tbl.rb: new single-byte transcodings (from Yoshihiro Kambayashi) * test/ruby/test_transcode.rb: added tests for the above (from Yoshihiro Kambayashi), small cosmetic fixes git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@20599 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/depend (clean-srcs): split out from clean.nobu2008-12-082-3/+9
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@20582 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/depend (LIBS): fixed for disable-shared. [ruby-dev:37103]nobu2008-11-172-1/+7
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@20241 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/single_byte.trans, macgreek-tbl.rb, macroman-tbl.rb,duerst2008-11-1118-0/+2210
| | | | | | | | | | | | | | macromania-tbl.rb, macturkish-tbl.rb, macukraine-tbl.rb, ibm437-tbl.rb, ibm852-tbl.rb, ibm855-tbl.rb, ibm857-tbl.rb, ibm860-tbl.rb, ibm861-tbl.rb, ibm862-tbl.rb, ibm863-tbl.rb, ibm865-tbl.rb, ibm866-tbl.rb, ibm869-tbl.rb, ibm775-tbl.rb: new single-byte transcodings (from Yoshihiro Kambayashi) * test/ruby/test_transcode.rb: added tests for the above (from Yoshihiro Kambayashi) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@20178 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/single_byte.trans, maccroatioan-tbl.rb,duerst2008-10-314-0/+391
| | | | | | | | | | | maccyrillic-tbl.rb, maciceland-tbl.rb: new single-byte transcodings (from Yoshihiro Kambayashi) * test/ruby/test_transcode.rb: added tests for the above (from Yoshihiro Kambayashi) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@20075 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/single_byte.trans: refactoring to make it easierduerst2008-10-302-54/+135
| | | | | | | | | | to add more transcodings (with Yoshihiro Kambayashi) * enc/trans/iso-8859-1-tbl.rb: new file to avoid having to treat ISO-8859-1 as special git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@20054 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/us_ascii.c (us_ascii_mbc_enc_len): made static. a patch bynobu2008-10-241-1/+1
| | | | | | | Tadashi Saito <shiba AT mail2.accsnet.ne.jp> at [ruby-dev:36916] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19929 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/single_byte.trans: adding WINDOWS-wwww encodingsduerst2008-10-199-0/+994
| | | | | | | | | | | | | | (wwww = 874/1250/1251/1253/1254/1255/1256/1257) (contributed by Yoshihiro Kambayashi) * enc/trans/windows-wwww-tbl.rb: 8 new files (contributed by Yoshihiro Kambayashi) * test/ruby/test_transcode.rb: added test_windows_wwww (contributed by Yoshihiro Kambayashi) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19846 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * tool/transcode-tblgen.rb: added set_valid_byte_patternduerst2008-10-181-0/+1
| | | | | | | | | | to reduce coupling between table generation script and specific encodings. * enc/trans/single_byte.trans: using set_valid_byte_pattern git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19831 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * common.mk, enc/depend (enc, trans): targets for sources.nobu2008-10-161-1/+4
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19799 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/single_byte.trans (transcode_tblgen_singlebyte): renamedakr2008-10-141-11/+4
| | | | | | | | from transcode_tblgen_windows. (transcode_tblgen_iso8859): use transcode_tblgen_singlebyte. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19780 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/trans/single_byte.trans: added windows-1252duerst2008-10-142-0/+136
| | | | | | | | | | | | | * enc/trans/windows-1252-tbl.rb: new file (contributed by Yoshihiro Kambayashi) * tool/transcode-tblgen.rb: listed windows-1252 as '1byte' * test/ruby/test_transcode.rb: added test_windows_1252 (contributed by Yoshihiro Kambayashi) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19778 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * grapheme cluster implementation reverted. [ruby-dev:36375]akr2008-09-1815-745/+84
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19417 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix typos.akr2008-09-161-7/+7
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * include/ruby/oniguruma.h (OnigEncodingTypeST): add precise_retakr2008-09-1615-84/+745
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument for mbc_to_code. (ONIGENC_MBC_TO_CODE): provide NULL for precise_ret. (ONIGENC_MBC_PRECISE_CODEPOINT): defined. * include/ruby/encoding.h (rb_enc_mbc_precise_codepoint): defined. * regenc.h (onigenc_single_byte_mbc_to_code): precise_ret argument added. (onigenc_mbn_mbc_to_code): ditto. * regenc.c (onigenc_single_byte_mbc_to_code): precise_ret argument added. (onigenc_mbn_mbc_to_code): ditto. * string.c (count_utf8_lead_bytes_with_word): removed. (str_utf8_nth): removed. (str_utf8_offset): removed. (str_strlen): UTF-8 codepoint oriented optimization removed. (rb_str_substr): ditto. (enc_succ_char): use rb_enc_mbc_precise_codepoint. (enc_pred_char): ditto. (rb_str_succ): ditto. * encoding.c (rb_enc_ascget): check length with rb_enc_mbc_precise_codepoint. (rb_enc_codepoint): use rb_enc_mbc_precise_codepoint. * regexec.c (string_cmp_ic): add text_end argument. (match_at): check end of character after exact string matches. * enc/utf_8.c (graphme_table): defined for extended graphme cluster boundary. (grapheme_cmp): defined. (get_grapheme_properties): defined. (grapheme_boundary_p): defined. (MAX_BYTES_LENGTH): defined. (comb_char_enc_len): defined. (mbc_to_code0): extracted from mbc_to_code. (mbc_to_code): use mbc_to_code0. (left_adjust_combchar_head): defined. (utf_8): use a extended graphme cluster as a unit. * enc/unicode.c (onigenc_unicode_mbc_case_fold): use ONIGENC_MBC_PRECISE_CODEPOINT to extract codepoints. (onigenc_unicode_get_case_fold_codes_by_str): ditto. * enc/euc_jp.c (mbc_to_code): follow mbc_to_code field change. use onigenc_mbn_mbc_to_code. * enc/shift_jis.c (mbc_to_code): ditto. * enc/emacs_mule.c (mbc_to_code): ditto. * enc/gbk.c (gbk_mbc_to_code): follow mbc_to_code field and onigenc_mbn_mbc_to_code change. * enc/cp949.c (cp949_mbc_to_code): ditto. * enc/big5.c (big5_mbc_to_code): ditto. * enc/euc_tw.c (euctw_mbc_to_code): ditto. * enc/euc_kr.c (euckr_mbc_to_code): ditto. * enc/gb18030.c (gb18030_mbc_to_code): ditto. * enc/utf_32be.c (utf32be_mbc_to_code): follow mbc_to_code field change. * enc/utf_16be.c (utf16be_mbc_to_code): ditto. * enc/utf_32le.c (utf32le_mbc_to_code): ditto. * enc/utf_16le.c (utf16le_mbc_to_code): ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19389 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * transcode_data.h (rb_transcoder): resetsize_func and resetstate_funcakr2008-09-151-1/+1
| | | | | | | | | | also returns ssize_t. * enc/trans/iso2022.trans: follow the type change. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19354 b2dd03c8-39d4-4d8f-98ff-823fe69b080e