ruby.git - rhe's working repository

	Commit message (Collapse)	Author	Age	Files	Lines
*	implement special behavior for Georgian for String#capitalize	duerst	2018-12-09	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The modern Georgian script is special in that it has an 'uppercase' variant called MTAVRULI which can be used for emphasis of whole words, for screamy headlines, and so on. However, in contrast to all other bicameral scripts, there is no usage of capitalizing the first letter in a word or a sentence. Words with mixed capitalization are not used at all. We therefore implement special behavior for String#capitalize. Formally, we define String#capitalize as first applying String#downcase for the whole string, then using titlecase on the first letter. Because Georgian defines titlecase as the identity function both for MTAVRULI ('uppercase') and Mkhedruli (lowercase), this results in String#capitalize being equivalent to String#downcase for Georgian. This avoids undesirable mixed case. * enc/unicode.c: Actual implementation * string.c: Add mention of this special case for documentation * test/ruby/enc/test_case_mapping.rb: Add two tests, a general one that uses String#capitalize on some (including nonsensical) combinations of MTAVRULI and Mkhedruli, and a canary test to detect the potential assignment of characters to the currently open slots (holes) at U+1CBB and U+1CBC. * test/ruby/enc/test_case_comprehensive.rb: Tweak generation of expectation data. Together with r65933, this closes issue #14839. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	remove obsolete data from unicode.c	duerst	2018-12-06	1	-26/+0
\| \| \| \| \| \| \| \| \| \|	* unicode.c: Remove the arrays onigenc_unicode_GCB_ranges_GAZ, onigenc_unicode_GCB_ranges_E_Base, and onigenc_unicode_GCB_ranges_Emoji, because they are not needed anymore for Unicode 11.0.0. * regparse.c: Remove external declarations for above arrays. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	solve the genie/zombie/wrestlers bug	duerst	2018-12-02	1	-8/+10
\| \| \| \| \| \| \| \| \| \|	enc/unicode.c: - Add U+1F93C (WRESTLERS), U+1F9DE (GENIE), and U+1F9DF to onigenc_unicode_GCB_ranges_E_Base. - Add comments with character names. test/ruby/enc/test_emoji_breaks.rb: Activate tests for genie/zombie/wrestlers. This closes issue #15343. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66133 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	Added words in the comment at r65088 [ci skip]	nobu	2018-11-30	1	-2/+2
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66103 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	deal with ONIGENC_CASE_IS_TITLECASE flag on lowercase characters	duerst	2018-11-25	1	-4/+9
\| \| \| \| \| \| \| \| \|	In the function onigenc_unicode_case_map() in enc/unicode.c, deal with the case that the ONIGENC_CASE_IS_TITLECASE flag is set on lowercase characters. This is in preparation for Georgian Mtavruli, which are uppercase but not titlecase, in Unicode 11.0.0. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65971 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	enc/unicode.c: 'a' is bigger than 'A'	shyouhei	2018-11-16	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ASCII, 'a' is bigger than 'A'. Which means 'A' - 'a' is a negative number (-32, to be precise). In C, the type of 'a' and 'A' are signed int (cf: ISO/IEC 9899:1990 section 6.1.3.4). So 'A' - 'a' is also a signed int. It is `(signed int)-32`. The problem is, OnigCodePoint is unsigned int. Adding a negative number to a variable of OnigCodepoint (`code` here) introduces an unintentional cast of `(unsigned)(signed)-32`, which is 4,294,967,264. Adding this value to code then overflows, and the result eventually becomes normal codepoint. The series of operations are not a serious problem but because `code >= 'a'` holds, we can `(code - 'a') + 'A'` to reroute this. See also: https://github.com/k-takata/Onigmo/pull/107 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65752 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	revert r65091, r65090 because ci fails	duerst	2018-10-16	1	-9/+4
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65093 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	update to Unicode 11.0.0 (basic step, not complete yet)	duerst	2018-10-16	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- common.mk: Change Unicode version to 11.0.0 - enc/unicode/case-folding.rb, enc/unicode.c: Initial changes to deal with Gregorian Mtavruli. This should bring us up to the same level as e.g. Python 3.7, by following the Unicode tables exactly. But it will produce undesirable (mixed-case) results for String#capitalize. This will be addressed in a later commit. - enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h: Add generated files. - lib/unicode_normalize/tables.rb: Updated table. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	Removed data for old Unicode [ci skip]	nobu	2018-10-16	1	-28/+2
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	unicode.c: moved addtional GCB ranges	nobu	2018-10-15	1	-0/+52
\| \| \| \| \| \| \|	* enc/unicode.c: moved additional Grapheme Cluster Break ranges which depend on the Unicode version. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	regparse.c: Suppress duplicated range warning by mere \X	nobu	2018-10-15	1	-2/+0
\| \| \| \| \| \| \| \|	* regparse.c (node_extended_grapheme_cluster): as Unicode 10 has added Grapheme_Cluster_Break properties to some characters, remove duplicated ranges for Unicode 9. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	Merge Onigmo 6.0.0	naruse	2016-12-10	1	-123/+125
\| \| \| \| \| \| \| \| \| \| \|	* https://github.com/k-takata/Onigmo/blob/Onigmo-6.0.0/HISTORY * fix for ruby 2.4: https://github.com/k-takata/Onigmo/pull/78 * suppress warning: https://github.com/k-takata/Onigmo/pull/79 * include/ruby/oniguruma.h: include onigmo.h. * template/encdb.h.tmpl: ignore duplicated definition of EUC-CN in enc/euc_kr.c. It is defined in enc/gb2313.c with CRuby macro. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	remove special processing for U+03B9/U+03BC/U+A64B	duerst	2016-12-04	1	-13/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	* enc/unicode.c: Remove special processing for U+03B9/U+03BC/U+A64B (GREEK SMALL LETTERs IOTA/MU, CYRILLIC SMALL LETTER MONOGRAPH UK) from onigenc_unicode_case_map and simplify code. * enc/unicode/case-folding.rb: Remove check for U+03B9/U+03BC/U+A64B. This and the previous few related commits make sure that we won't hit the equivalent of bug #12990 anymore for future updates of Unicode versions. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56976 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	constify CaseMappingSpecials	nobu	2016-12-01	1	-1/+1
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56951 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	fix uppercasing for U+A64B, CYRILLIC SMALL LETTER MONOGRAPH UK	duerst	2016-11-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* enc/unicode.c: Add U+A64B to the special cases 03B9 and 03BC at the end of onigenc_unicode_case_map (Bug #12990). * enc/unicode/case-folding.rb: Add U+A64B to the special cases 03B9 and 03BC. Add a comment pointing to enc/unicode.c. Change warnings to exceptions for unpredicted cases, because this would have been more easily noticed (the warning was not noticed when upgrading to Unicode 9.0.0). * test/ruby/enc/test_case_comprehensive.rb: Remove temporary exclusion of U+A64B from testing. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56941 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* regenc.h/c, include/ruby/oniguruma.h, enc/ascii.c, big5.c, cp949.c,	duerst	2016-07-24	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \|	emacs_mule.c, euc_jp.c, euc_kr.c, euc_tw.c, gb18030.c, gbk.c, iso_8859_1\|2\|3\|4\|5\|6\|7\|8\|9\|10\|11\|13\|14\|15\|16.c, koi8_r.c, koi8_u.c, shift_jis.c, unicode.c, us_ascii.c, utf_16\|32be\|le.c, utf_8.c, windows_1250\|51\|52\|53\|54\|57.c, windows_31j.c, unicode.c: Remove conditional compilation macro ONIG_CASE_MAPPING. [Feature #12386]. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55740 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	Move generated headers to unicode data directory	nobu	2016-07-17	1	-2/+20
\| \| \| \| \| \| \|	* common.mk, enc/depend (casefold.h, name2ctype.h): move to unicode data directory per version. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55701 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* string.c: Raise ArgumentError when invalid string is detected in	duerst	2016-06-02	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \|	case mapping methods. * enc/unicode.c: Check for invalid string and signal with negative length value. * test/ruby/enc/test_case_mapping.rb: Add tests for above. * test/ruby/test_m17n_comb.rb: Add a message to clarify test failure. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55253 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Handle DOTLESS_i by hand because it isn't involved in folding.	duerst	2016-05-25	1	-1/+5
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55164 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Fix flag error for switch from titlecase to lowercase.	duerst	2016-05-24	1	-1/+4
\| \| \| \| \| \| \|	* test/ruby/enc/test_case_mapping.rb: Tests for above error. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55153 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.h: Additional uses of ONIG_CASE_MAPPING compilation switch	duerst	2016-05-16	1	-0/+4
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55020 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* append newline at EOF.	svn	2016-05-16	1	-1/+1
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55019 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* include/ruby/oniguruma.h: Introducing ONIG_CASE_MAPPING compilation	duerst	2016-05-16	1	-0/+4
\| \| \| \| \| \| \| \| \|	switch * include/ruby/oniguruma.h, enc/unicode.h: Using ONIG_CASE_MAPPING compilation switch git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55018 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode/case-folding.rb, casefold.h: Data generation to implement	duerst	2016-04-01	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	swapcase functionality for titlecase characters. Swapcase isn't defined by Unicode, because the purpose/usage of swapcase is unclear anyway. The implementation follows a proposal from Nobu, swaping the case of each component of a titlecase character individually. This means that the titlecase characters have to be decomposed. * enc/unicode.c: Code using the above data. * test/ruby/enc/test_case_mapping.rb: Tests for the above. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54469 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	fix a typo [ci skip]	kazu	2016-03-29	1	-1/+1
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54400 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode/case-folding.rb, casefold.h: Tweaked handling of 6	duerst	2016-03-29	1	-5/+10
\| \| \| \| \| \| \| \| \| \|	special cases in CaseUnfold_11_Table. * enc/unicode.c: Adjustments for above. * test/ruby/enc/test_case_mapping.rb: Tests for the above: Some tests in test_titlecase activated; test_greek added. A test in test_cherokee fixed. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54383 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Cleaned up some comments.	duerst	2016-03-29	1	-7/+6
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54349 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode/case-folding.rb, casefold.h: Removing data for idempotent	duerst	2016-03-29	1	-8/+6
\| \| \| \| \| \| \| \|	titlecasing. * enc/unicode.c: Adjust code to data removal. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54347 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Refactoring in preparation for data reduction for	duerst	2016-03-28	1	-5/+8
\| \| \| \| \| \| \|	titlecase. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54313 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Minor refactoring for I WITH DOT ABOVE.	duerst	2016-03-28	1	-4/+3
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54312 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Removed code now covered by data from table.	duerst	2016-03-28	1	-6/+0
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54311 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Adding comments. [ci skip]	duerst	2016-03-28	1	-7/+7
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54310 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* include/ruby/oniguruma.h: Additional flag for characters that are titlecase.	duerst	2016-03-22	1	-1/+6
\| \| \| \| \| \| \| \| \| \|	* enc/unicode/case-folding.rb, casefold.h: Using above flag in data. * enc/unicode.c: Marking capitalized character as unmodified if it is already titlecase. * test/ruby/enc/test_case_mapping.rb: Tests for above functionality. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54229 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Fixed two macro definitions.	duerst	2016-03-17	1	-2/+2
\| \| \| \| \| \| \| \|	* test/ruby/enc/test_case_mapping.rb: Test cases that detected the above bugs. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54140 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Eliminating common code.	duerst	2016-03-15	1	-29/+13
\| \| \| \| \| \| \|	(with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54118 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Expansion of some code repetition in preparation for	duerst	2016-03-15	1	-9/+13
\| \| \| \| \| \| \| \|	elimination of common code pieces. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54117 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* remove trailing spaces.	svn	2016-03-15	1	-1/+1
\| \| \| \|	git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54113 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Additional macros and code to use mapping data in	duerst	2016-03-15	1	-21/+67
\| \| \| \| \| \| \| \|	CaseMappingSpecials array. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54112 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* include/ruby/oniguruma.h, enc/unicode.c: Adjusting flag assignments	duerst	2016-03-14	1	-0/+7
\| \| \| \| \| \| \| \|	and macros to work with unified CaseMappingSpecials array. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54101 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	unicode.c: off-by-one error	nobu	2016-03-12	1	-1/+1
\| \| \| \| \| \|	* enc/unicode.c (CodePointListValidP): fix off-by-one error. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	unicode.c: boundary check	nobu	2016-03-12	1	-6/+14
\| \| \| \| \| \| \|	* enc/unicode.c (CodePointListValidP): add pathological boundary check, for gcc 4.9. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54090 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode/case-folding.rb, casefold.h: Streamlining approach to	duerst	2016-03-11	1	-2/+10
\| \| \| \| \| \| \| \| \| \|	case mapping data not available from case folding by unifying all three cases (special title, special upper, special lower). * enc/unicode.c: Adjust macro names for above (macros are currently inactive). (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* include/ruby/oniguruma.h: Rearranging flag assignments and making	duerst	2016-02-24	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \|	space for titlecase indices; adding additional macros to add or extract titlecase index; adding comments for better documentation. * enc/unicode.c: Moving some macros to include/ruby/oniguruma.h; activating use of titlecase indices. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode/case-folding.rb, casefold.h: Outputting actual titlecase	duerst	2016-02-23	1	-2/+2
\| \| \| \| \| \| \| \| \|	data (new table, with indices from other tables). * enc/unicode.c: Ignoring titlecase data indices for the moment. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53906 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Activated use of case mapping data in CaseUnfold_11 array.	duerst	2016-02-19	1	-0/+9
\| \| \| \| \| \| \|	(with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53870 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* string.c, enc/unicode.c: Disassociating ONIGENC_CASE_FOLD flag from	duerst	2016-02-08	1	-1/+1
\| \| \| \| \| \| \| \|	ONIGENC_CASE_DOWNCASE. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53778 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	unicode.c: magic numbers	nobu	2016-02-08	1	-33/+37
\| \| \| \| \| \| \|	* enc/unicode.c (I_WITH_DOT_ABOVE, DOTLESS_i, DOT_ABOVE): name magic numbers. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53776 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* enc/unicode.c: Shortened macros for enc/unicode/casefold.h to	duerst	2016-02-08	1	-10/+11
\| \| \| \| \| \| \| \| \| \| \|	single-letter; use flags in casefold.h for logic. * enc/unicode/case-folding.rb: Added flag for case folding. Changed parameter passing. * enc/unicode/casefold.h: New flags added. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53775 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* common.mk: Added two more precondition files for enc/unicode/casefold.h	duerst	2016-02-07	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \|	* enc/unicode.c: Added shortening macros for enc/unicode/casefold.h * enc/unicode/case-folding.rb: Fixed file encoding for CaseFolding.txt to ASCII-8BIT (should fix some ci errors). Clarified usage. Created class MapItem. Partially implemented class CaseMapping. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53767 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
*	* test/ruby/enc/test_regex_casefold.rb: Added data-based testing for	duerst	2016-02-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	String#downcase :fold. * enc/unicode.c: Fixed a range error (lowest non-ASCII character affected by case operations is U+00B5, MICRO SIGN) * test/ruby/enc/test_case_mapping.rb: Explicit test for case folding of MICRO SIGN to Greek mu. (with Kimihito Matsui) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53749 b2dd03c8-39d4-4d8f-98ff-823fe69b080e