Part 12 of 14

Strings and Pattern Matching

Naive matching, KMP, Rabin-Karp, Z-algorithm, suffix arrays. Strings as first-class objects, not character arrays.

Chapters
6
Hours
2
Difficulty
Intermediate to Advanced
  1. 12.0intermediate

    Naive string matching

    The O(nm) double loop hidden inside str.find and str.indexOf. Average-case behavior on real text, and the adversarial inputs that motivate KMP and Rabin-Karp.

    10 min
  2. 12.1intermediate

    Rabin-Karp and rolling hashes

    Rolling hashes give O(1)-per-slide hash maintenance for fixed-length windows. Double hashing defeats adversarial collisions and unlocks the duplicate-substring problem family.

    20 min
  3. 12.2advanced

    KMP and the failure function

    Knuth-Morris-Pratt: deterministic O(n+m) substring search via the failure function, and the prefix-suffix table that solves a small family of problems on its own.

    15 min
  4. 12.3advanced

    Z-algorithm

    Z-array as the prefix-match table on a string; the [l, r) window invariant that makes it linear; how it compares with KMP for exact pattern matching.

    20 min
  5. 12.4advanced

    Aho-Corasick: many patterns, one pass

    A trie with failure links: how Aho-Corasick scans a text once and reports every occurrence of every pattern in a fixed dictionary in O(n + m + z) time.

    20 min
  6. 12.5advanced

    Suffix arrays: a sorted index of every suffix

    A sorted index of every suffix in roughly 4n bytes, plus the LCP array. The substring-search, longest-repeated-substring, and BWT applications built on these two arrays.

    20 min