aboutsummaryrefslogtreecommitdiffstats
path: root/enc/unicode
Commit message (Collapse)AuthorAgeFilesLines
* remove special processing for U+03B9/U+03BC/U+A64Bduerst2016-12-041-5/+1
| | | | | | | | | | | | | * enc/unicode.c: Remove special processing for U+03B9/U+03BC/U+A64B (GREEK SMALL LETTERs IOTA/MU, CYRILLIC SMALL LETTER MONOGRAPH UK) from onigenc_unicode_case_map and simplify code. * enc/unicode/case-folding.rb: Remove check for U+03B9/U+03BC/U+A64B. This and the previous few related commits make sure that we won't hit the equivalent of bug #12990 anymore for future updates of Unicode versions. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56976 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Reorder codepoints in some entries of CaseUnfold_11_Tableduerst2016-12-042-8/+17
| | | | | | | | | * enc/unicode/case-folding.rb: Reorder codepoints so that the upper-case mapping comes first. * enc/unicode/9.0.0/casefold.h: Codepoints reordered, upper-case mapping flag added. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56975 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Use offsetof macro and shrink table sizenobu2016-12-011-786/+786
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56952 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* constify CaseMappingSpecialsnobu2016-12-012-2/+2
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56951 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Regexp supports Unicoe 9.0.0's \Xnaruse2016-11-301-1889/+3139
| | | | | | | | | | | | | | | | | | | | | | | | | * meta character \X matches Unicode 9.0.0 characters with some workarounds for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences. [Feature #12831] [ruby-core:77586] The term "character" can have many meanings bytes, codepoints, combined characters, and so on. "grapheme cluster" is highest one of such words, which means user-perceived characters. Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION specifies how to handle grapheme clusters (extended grapheme cluster). But some specs aren't updated to current situation because Unicode Emoji is rapidly extended without well definition. It breaks the precondition of UTR#29 "Grapheme cluster boundaries can be easily tested by looking at immediately adjacent characters". (the sentence will be removed in the next version) Though some of its detail are described in Unicode Technical Report #51 UNICODE EMOJI but it is not merged into UTR#29 yet. http://unicode.org/reports/tr29/ http://unicode.org/reports/tr51/ http://unicode.org/Public/emoji/4.0/ git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix uppercasing for U+A64B, CYRILLIC SMALL LETTER MONOGRAPH UKduerst2016-11-301-4/+6
| | | | | | | | | | | | | | | | | * enc/unicode.c: Add U+A64B to the special cases 03B9 and 03BC at the end of onigenc_unicode_case_map (Bug #12990). * enc/unicode/case-folding.rb: Add U+A64B to the special cases 03B9 and 03BC. Add a comment pointing to enc/unicode.c. Change warnings to exceptions for unpredicted cases, because this would have been more easily noticed (the warning was not noticed when upgrading to Unicode 9.0.0). * test/ruby/enc/test_case_comprehensive.rb: Remove temporary exclusion of U+A64B from testing. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56941 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * unicode/8.0.0/casefold.h, name2ctype.h, unicode/data/8.0.0:duerst2016-09-072-40414/+0
| | | | | | | removing directories/files related to Unicode version 8.0.0 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56090 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * common.mk: Updated Unicode version to 9.0.0 [Feature #12513]duerst2016-09-072-0/+42457
| | | | | | | | * unicode/9.0.0/casefold.h, name2ctype.h, unicode/data/9.0.0: new directories/files for Unicode version 9.0.0 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk: separate unicode headersnobu2016-08-162-0/+0
| | | | | | | * common.mk (UNICODE_HDR_DIR): separate unicode header files from unicode data files. [ruby-core:76879] [Bug #12677] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55942 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Move generated headers to unicode data directorynobu2016-07-172-0/+0
| | | | | | | * common.mk, enc/depend (casefold.h, name2ctype.h): move to unicode data directory per version. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55701 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode: check Unicode versionsnobu2016-07-153-28/+44
| | | | | | | * enc/unicode/case-folding.rb, tool/enc-unicode.rb: check if Unicode versions are consistent with each other. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55687 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk: update enc/unicode/name2ctype.hnobu2016-07-143-85779/+0
| | | | | | | | | | * Makefile.in (enc/unicode/name2ctype.h): remove stale recipe, which did not support Unicode age properties. * common.mk (enc/unicode/name2ctype.h): update by --header option of tool/enc-unicode.rb. enc/unicode/name2ctype.kwd file has not been used. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55678 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* case-folding.rb: define version numbersnobu2016-06-302-0/+11
| | | | | | * enc/unicode/case-folding.rb: define Unicode version numbers. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55546 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* case-folding.rb: check version numbersnobu2016-06-301-6/+18
| | | | | | | * enc/unicode/case-folding.rb: check if version numbers in each data files match. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55545 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Revert "Use gperf 3.0.4"naruse2016-06-281-8/+8
| | | | | | It is wrong commit. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55518 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Use gperf 3.0.4naruse2016-06-271-8/+8
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55514 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Read CaseFolding.txt in binary modenobu2016-06-241-1/+1
| | | | | | | * enc/unicode/case-folding.rb (CaseFolding#load): read in binary mode to deal with non-ASCII charater in CaseFolding.txt. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55496 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* touchnobu2016-06-241-1/+3
| | | | | | * enc/unicode/case-folding.rb: touch the destination file. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55494 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Updating casefold.hnobu2016-06-241-3/+14
| | | | | | | | | | | | * common.mk (lib/unicode_normalize/tables.rb): should not depend on Unicode data files unless ALWAYS_UPDATE_UNICODE=yes, to get rid of downloading Unicode data unnecessary. [ruby-dev:49681] * common.mk (enc/unicode/casefold.h): update Unicode files in a sub-make, not to let the header depend on the files always. * enc/unicode/case-folding.rb: if gperf is not usable, assume the existing file is OK. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55492 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb, casefold.h: Data generation to implementduerst2016-04-012-149/+158
| | | | | | | | | | | | | swapcase functionality for titlecase characters. Swapcase isn't defined by Unicode, because the purpose/usage of swapcase is unclear anyway. The implementation follows a proposal from Nobu, swaping the case of each component of a titlecase character individually. This means that the titlecase characters have to be decomposed. * enc/unicode.c: Code using the above data. * test/ruby/enc/test_case_mapping.rb: Tests for the above. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54469 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb, casefold.h: Tweaked handling of 6duerst2016-03-292-37/+54
| | | | | | | | | | special cases in CaseUnfold_11_Table. * enc/unicode.c: Adjustments for above. * test/ruby/enc/test_case_mapping.rb: Tests for the above: Some tests in test_titlecase activated; test_greek added. A test in test_cherokee fixed. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54383 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb, casefold.h: Removing data for idempotentduerst2016-03-292-159/+159
| | | | | | | | titlecasing. * enc/unicode.c: Adjust code to data removal. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54347 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * remove trailing spaces.svn2016-03-221-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54230 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * include/ruby/oniguruma.h: Additional flag for characters that are titlecase.duerst2016-03-222-35/+39
| | | | | | | | | | * enc/unicode/case-folding.rb, casefold.h: Using above flag in data. * enc/unicode.c: Marking capitalized character as unmodified if it is already titlecase. * test/ruby/enc/test_case_mapping.rb: Tests for above functionality. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54229 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb, casefold.h: Streamlining approach toduerst2016-03-112-197/+289
| | | | | | | | | | case mapping data not available from case folding by unifying all three cases (special title, special upper, special lower). * enc/unicode.c: Adjust macro names for above (macros are currently inactive). (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb, casefold.h: Reducing size of TitleCaseduerst2016-02-272-125/+92
| | | | | | | | table by eliminating duplicates. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53957 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb: Adding possibility for debugging outputduerst2016-02-251-2/+11
| | | | | | | | for TitleCase table in casefold.h. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53930 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb, casefold.h: Outputting actual titlecaseduerst2016-02-232-94/+198
| | | | | | | | | data (new table, with indices from other tables). * enc/unicode.c: Ignoring titlecase data indices for the moment. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53906 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb, casefold.h: Reading casing data fromduerst2016-02-232-80/+91
| | | | | | | | SpecialCasing.txt. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53904 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb, casefold.h: Adding flag for title-case,duerst2016-02-222-13/+14
| | | | | | | | not yet operational. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53891 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb, casefold.h: Fixed bug that avoided inclusionduerst2016-02-222-157/+156
| | | | | | | | of compatibility characters in uppper-/lower-case mappings. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53890 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb, casefold.h: Used only first elementduerst2016-02-162-13/+14
| | | | | | | | (rather than all) of target in CaseUnfold_11 array. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53843 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb: Added debugging optionduerst2016-02-151-1/+15
| | | | | | | (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53833 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * remove trailing spaces.svn2016-02-081-2/+2
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53780 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb, enc/unicode/casefold.h: Flags forduerst2016-02-082-2236/+2242
| | | | | | | | upper/lower conversion added (titlecase and SpecialCasing still missing) (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53779 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode.c: Shortened macros for enc/unicode/casefold.h toduerst2016-02-082-1328/+1333
| | | | | | | | | | | single-letter; use flags in casefold.h for logic. * enc/unicode/case-folding.rb: Added flag for case folding. Changed parameter passing. * enc/unicode/casefold.h: New flags added. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53775 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * remove trailing spaces.svn2016-02-071-3/+3
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53768 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * common.mk: Added two more precondition files for enc/unicode/casefold.hduerst2016-02-071-4/+31
| | | | | | | | | | | * enc/unicode.c: Added shortening macros for enc/unicode/casefold.h * enc/unicode/case-folding.rb: Fixed file encoding for CaseFolding.txt to ASCII-8BIT (should fix some ci errors). Clarified usage. Created class MapItem. Partially implemented class CaseMapping. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53767 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb: Fixing parameter passing.duerst2016-02-071-15/+15
| | | | | | | (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53765 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * enc/unicode/case-folding.rb: New classes CaseMapping/CaseMappingDummyduerst2016-02-071-8/+29
| | | | | | | | to pass as parameters; not yet implemented or used. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53764 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * common.mk: using new option in recipe for enc/unicode/casefold.hduerst2016-02-071-1/+1
| | | | | | | | * enc/unicode/case-folding.rb: Correctly specify argument to new option. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53762 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * remove trailing spaces.svn2016-02-071-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53760 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53759 ↵duerst2016-02-071-1/+17
| | | | b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* reverting accidental commit at r53124 by re-committing version from r52612duerst2015-12-151-2792/+4140
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53127 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/ebcdic.h, enc/trans/ebcdic.trans,duerst2015-12-151-4140/+2792
| | | | | | | | test/ruby/test_transcode.rb: Fixed encoding name to the correct one in the IANA registry (IBM037) and added an alias (ebcdic-cp-us) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53124 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode/casefold.h, name2ctype.h: Change Unicode Version forduerst2015-11-172-5209/+7186
| | | | | | | regular expressions from 7.0.0 to 8.0.0 (with help from Kimihito Matsui) [Feature #11563] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52612 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode/name2ctype.h.blt: update for r46831nobu2015-01-171-2979/+6936
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49292 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* * regcomp.c: Merge Onigmo 5.14.1 25a8a69fc05ae3b56a09.naruse2014-07-163-5267/+9571
| | | | | | this includes Support for Unicode 7.0 [Bug #9092]. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46831 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* fix usagekazu2014-06-021-1/+1
| | | | git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46317 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* case-folding.rb: perfect hash for case unfolding3nobu2014-05-302-6/+96
| | | | | | | * enc/unicode/case-folding.rb (lookup_hash): make perfect hash to lookup case unfolding table 3. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46272 b2dd03c8-39d4-4d8f-98ff-823fe69b080e