KMP Algorithm
How to search a text in O(N)?
Given strings s
and w
, return the positions in s
that match to w
. Let the length of s
and w
be N
and M
, respectively. The time complexity of the naive approach is $O(NM)$.
On the other hand, the KMP (Knuth-Morris-Pratt) algorithm runs in $O(N)$ to do the same task.
As we can see from the above example, the KMP algorithm utilizes the structure of w
. Regardless of whether it finds a match or a mismatch in an iteration, it shifts w
using that knowledge.
Let's take a closer look at the knowledge here.
Check 5 letters and find the 5th character does not match.
Because w[3:5] = w[0:2] = AA, we slide w by 5-2 = 3.
0123456789012
s = AABAAABAABAAA
w = AABAAB
012345
Matching has proceeded to the 5th letter in w
means s
and w
matched up to the 4th letter in w
. Examining w[0:5] (=AABAA)
, we can see that the longest proper prefix of w[0:5]
that is also a suffix of w[0:5]
is AA
. That teaches us that if we slide w
by 5-2=3, we can have w[0:2]
matched and start matching from w[2]
.
Creating LPS Array
As we have seen, we need to prepare an array of the length of the longest proper prefix that is also a suffix (LPS) for all w[0:i]
$(0 \lt i \le M)$. Here is how we calculate the LPS array by hand.
How to create an LPS array in O(M)
===================================
w[0:1] = A
There is no proper prefix in A.
LPS[0] = 0
w[0:2] = AA
The suffix A is also a proper prefix of AA.
LPS[1] = 1
w[0:3] = AAB
There is no suffix of AAB that is also a proper prefix of AAB.
LPS[2] = 0
W[0:4] = AABA
The suffix A is also a proper prefix of AABA.
LPS[3] = 1
W[0:5] = AABAA
The suffix AA is also a proper prefix of AABAA.
LPS[4] = 2
W[0:6] = AABAAB
The suffix AAB is also a proper prefix of AABAAB.
LPS[5] = 3
LPS = [0, 1, 0, 1, 2, 3]
Implementation
Here is my implementation of the KMP algorithm in TypeScript.
Proof of len=lps[len-1]
Time Complexity
The preparation of the LPS array takes $O(M)$ and the search takes $O(N)$. Assuming $N > M$, we can say that the whole algorithm runs in $O(N)$.