Part 12: Strings and Pattern Matching — The DSA Handbook

12.0intermediate
Naive string matching
The O(nm) double loop hidden inside str.find and str.indexOf. Average-case behavior on real text, and the adversarial inputs that motivate KMP and Rabin-Karp.
10 min
12.1intermediate
Rabin-Karp and rolling hashes
Rolling hashes give O(1)-per-slide hash maintenance for fixed-length windows. Double hashing defeats adversarial collisions and unlocks the duplicate-substring problem family.
20 min
12.2advanced
KMP and the failure function
Knuth-Morris-Pratt: deterministic O(n+m) substring search via the failure function, and the prefix-suffix table that solves a small family of problems on its own.
15 min
12.3advanced
Z-algorithm
Z-array as the prefix-match table on a string; the [l, r) window invariant that makes it linear; how it compares with KMP for exact pattern matching.
20 min
12.4advanced
Aho-Corasick: many patterns, one pass
A trie with failure links: how Aho-Corasick scans a text once and reports every occurrence of every pattern in a fixed dictionary in O(n + m + z) time.
20 min
12.5advanced
Suffix arrays: a sorted index of every suffix
A sorted index of every suffix in roughly 4n bytes, plus the LCP array. The substring-search, longest-repeated-substring, and BWT applications built on these two arrays.
20 min