Strings and Pattern Matching
Naive matching, KMP, Rabin-Karp, Z-algorithm, suffix arrays. Strings as first-class objects, not character arrays.
- Chapters
- 6
- Hours
- 2
- Difficulty
- Intermediate to Advanced
- 12.0intermediate
Naive string matching
The O(nm) double loop hidden inside str.find and str.indexOf. Average-case behavior on real text, and the adversarial inputs that motivate KMP and Rabin-Karp.
- 12.1intermediate
Rabin-Karp and rolling hashes
Rolling hashes give O(1)-per-slide hash maintenance for fixed-length windows. Double hashing defeats adversarial collisions and unlocks the duplicate-substring problem family.
- 12.2advanced
KMP and the failure function
Knuth-Morris-Pratt: deterministic O(n+m) substring search via the failure function, and the prefix-suffix table that solves a small family of problems on its own.
- 12.3advanced
Z-algorithm
Z-array as the prefix-match table on a string; the [l, r) window invariant that makes it linear; how it compares with KMP for exact pattern matching.
- 12.4advanced
Aho-Corasick: many patterns, one pass
A trie with failure links: how Aho-Corasick scans a text once and reports every occurrence of every pattern in a fixed dictionary in O(n + m + z) time.
- 12.5advanced
Suffix arrays: a sorted index of every suffix
A sorted index of every suffix in roughly 4n bytes, plus the LCP array. The substring-search, longest-repeated-substring, and BWT applications built on these two arrays.