aboutsummaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorHiroya Fujinami <make.just.on@gmail.com>2023-11-10 01:24:15 +0900
committerGitHub <noreply@github.com>2023-11-10 01:24:15 +0900
commitc49adfab5d269942c44ebfd83e8c107299fc8015 (patch)
treef7570841a89c2222f5eab19a1a2dff0dfdec317c /doc
parentad3db6711c4aa48c82f4091342aab7394ee45736 (diff)
downloadruby-c49adfab5d269942c44ebfd83e8c107299fc8015.tar.gz
Add "Optimization" section to regexp.rdoc (#8849)
* Add "Optimization" section to regexp.rdoc * Apply the suggestions by @BurdetteLamar --------- Co-authored-by: Burdette Lamar <BurdetteLamar@Yahoo.com>
Diffstat (limited to 'doc')
-rw-r--r--doc/regexp.rdoc27
1 files changed, 27 insertions, 0 deletions
diff --git a/doc/regexp.rdoc b/doc/regexp.rdoc
index 6b4b435746..309e109afd 100644
--- a/doc/regexp.rdoc
+++ b/doc/regexp.rdoc
@@ -1228,6 +1228,33 @@ when regexp.timeout is non-+nil+, that value controls timing out:
| nil | Float | Times out in Float seconds. |
| Float | Any | Times out in Float seconds. |
+== Optimization
+
+For certain values of the pattern and target string,
+matching time can grow polynomially or exponentially in relation to the input size;
+the potential vulnerability arising from this is the {regular expression denial-of-service}[https://en.wikipedia.org/wiki/ReDoS] (ReDoS) attack.
+
+\Regexp matching can apply an optimization to prevent ReDoS attacks.
+When the optimization is applied, matching time increases linearly (not polynomially or exponentially)
+in relation to the input size, and a ReDoS attach is not possible.
+
+This optimization is applied if the pattern meets these criteria:
+
+- No backreferences.
+- No subexpression calls.
+- No nested lookaround anchors or atomic groups.
+- No nested quantifiers with counting (i.e. no nested <tt>{n}</tt>,
+ <tt>{min,}</tt>, <tt>{,max}</tt>, or <tt>{min,max}</tt> style quantifiers)
+
+You can use method Regexp.linear_time? to determine whether a pattern meets these criteria:
+
+ Regexp.linear_time?(/a*/) # => true
+ Regexp.linear_time?('a*') # => true
+ Regexp.linear_time?(/(a*)\1/) # => false
+
+However, an untrusted source may not be safe even if the method returns +true+,
+because the optimization uses memoization (which may invoke large memory consumption).
+
== References
Read (online PDF books):