Have you ever imagined, how google or any search engine performs searching of keywords that you have entered in the bucket of millions, trillions of words and sentences?
Do you how the implementation of detecting plagiarism is done, due to which all students are scared of their assignments?
Yes, behind all these findings and searching for a pattern either in a text or sequence is called String Matching particular to finding a pattern in a given text.
“String Matching” algorithms can be very helpful for text-editing programs as these are also used, for example, to search for particular patterns in DNA sequences.
Applications of String Matching Algorithms:
- Digital Forensics
- Spelling Checker
- Spam Filters
- Intrusion Detection System
There are various algorithms that are useful in the case of searching a string within another string which helps in performing time-efficient tasks in multiple domains. In this article, we’ll be going to concentrate on the Naive String Algorithm(Character comparison) and Rabin Karp Algorithm(Hashing String Matching).
Naive String-Matching Algorithm
A very simple and inefficient way to check whether a pattern occurs inside a given text at a particular index or not, we see if there’s a replica of the needle starting at the first character of our Haystack. If it’s not we move ahead and compare the same with the next character, and so forth.
The number of comparisons in the worst-case scenario for this algorithm is O(m*(n-m+1)), where m is the length of the pattern and n is the length of the text. Remember, we always assume that m≤n as the pattern to be searched in the text must always be smaller than the main text.
Because the information gathered about the text for one value of s is completely ignored when examining other values of s, the Naive string match is not an efficient strategy. The worst case in Naive is further improved by Rabin and Karp, let’s have a look upon that👀
The Rabin-Karp Algorithm
Rabin and Karp devised a string-matching algorithm that works well in reality and generalises to other methods for similar issues like two-dimensional pattern matching. The Rabin-Karp algorithm uses Θ(m) preprocessing time, and its worst-case running time is Θ((n — m +1)m). However, it has a better average-case running time based on specific assumptions.
The following procedure of Rabin-Karp-Matcher works as follows:
The RABIN-KARP-MATCHER takes Θ(m) preprocessing time, and its matching time is Θ((n — m + 1)m) in the worst case, since (just like the naive string-matching algorithm) the Rabin-Karp algorithm explicitly verifies every valid shift ‘s’. If P = a^m and T = a^n, then the verification takes time Θ((n — m + 1)m), since each of the n — m + 1 possible shifts is valid.
Limitations of Rabin-Karp Algorithm
Spurious Hit
It is a condition when the hash value of the window of the text matches, the hash value of the pattern but the window is not the actual pattern.
The time complexity of this algorithm is increased due to the spurious hits. In order to minimize it, we use modulus. It reduces the spurious hits.
When spurious hits occur a number for all the windows, the worst-case complexity occurs.
There are various other algorithms as well like Knuth-Morris-Pratt Algorithm, Boyer Moore Algorithm, etc.
However, because both computations are modulo 13, the value computed for the first window is 7, and the value computed for the next window is 8.
To get the gist, see this explanatory video of mine: https://youtu.be/OdMtqRruMfg