Java Program to Implement Bitap Algorithm for String Matching
Last Updated :
22 Mar, 2023
Improve
The Bitap Algorithm is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern. Here approximately equal states that if the substring and pattern are within a given distance k of each other. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. This does most of the bitwise operations, which are quick.
The Bitap Algorithm is also known as shift-or, shift-and, or Baeza Yates Gonnet Algorithm.
Example:
Input: Text: geeksforgeeks Pattern: geeks Output: Pattern found at index: 0 Input: Text: Youareawesome Pattern: Youareamazing Output: No Match.
Approach:
- Input the text pattern as a string.
- We will convert this String to a simple Char Array
- If the length is 0 or exceeding 63, we return “No Match”.
- Now, by taking array R which complements 0.
- Taking an array “pattern_mask” which complements 0, to 1
- Taking the pattern as an index of "pattern_mask" then by using the and operator we and it with the result of the complement of 1L(long integer) by shifting it to left side by i times.
- Now by running the loop till text length.
- We now or it with R and pattern mask.
- Now by left shifting the 1L by the length of the pattern and then the result is an and with R
- If it is equal to zero we print it by I-len+1 else return -1
- Output is the index of the text, where the pattern matches.
Code:
// Java Program to implement Bitap Algorithm.
import java.io.*;
import java.io.IOException;
public class GFG {
public static void main(String[] args)
throws IOException
{
System.out.println("Bitap Algorithm!");
String text = "geeksforgeeks";
String pattern = "geeks";
// This is an object created of the class for
// extension of functions.
GFG g = new GFG();
// Now here we go to findPattern functions , we two
// arguments.
g.findPattern(text, pattern);
}
public void findPattern(String t, String p)
{
// we convert the String text to Character Array for
// easy indexing
char[] text = t.toCharArray();
// we convert the String pattern to Character Array
// to access each letter in the String easily.
char[] pattern = p.toCharArray();
// Index shows the function bitap search if they are
// equal at a particular index or not
int index = bitap_search(text, pattern);
// If the pattern is not equal to the text of the
// string approximately Then we tend to return -1 If
// index is -1 Then we print there is No match
if (index == -1) {
System.out.println("\nNo Match\n");
}
else {
// Else if there is a match
// Then we print the position of the index at
// where the pattern and the text matches.
System.out.println(
"\nPattern found at index: \n" + index);
}
}
private int bitap_search(char[] text, char[] pattern)
{
// Here the len variable is taken
// This variable accepts the pattern length as its
// value
int len = pattern.length;
// This is an array of pattern_mask of all
// character values in it.
long pattern_mask[]
= new long[Character.MAX_VALUE + 1];
// Here the variable of being long type is
// complemented with 1;
long R = ~1;
// Now if the length of the pattern is 0
// we would return -1
if (len == 0) {
return -1;
}
// Or if the length of the pattern exceeds the
// length of the character array Then we would
// declare that the pattern is too long. We would
// return -1
if (len > 63) {
System.out.println("Pattern too long!");
return -1;
}
// Now filling the values in the pattern mask
// We would run th eloop until the max value of
// character Initially all the values of Character
// are put up inside the pattern mask array And
// initially they are complemented with zero
for (int i = 0; i <= Character.MAX_VALUE; ++i)
pattern_mask[i] = ~0;
// Now len being the variable of pattern length ,
// the loop is set till there Now the pattern being
// the index of the pattern_mask 1L means the long
// integer is shifted to left by i times The result
// of that is now being complemented the result of
// the above is now being used as an and operator We
// and the pattern_mask and the result of it
for (int i = 0; i < len; ++i)
pattern_mask[pattern[i]] &= ~(1L << i);
// Now the loop is made to run until the text length
// Now what we do this the R array is used
// as an Or function with pattern_mask at index of
// text of i
for (int i = 0; i < text.length; ++i) {
R |= pattern_mask[text[i]];
// Now result of the r after the above
// operation
// we shift it to left side by 1 time
R <<= 1;
// If the 1L long integer if shifted left of the
// len And the result is used to and the result
// and R array
// If that result is equal to 0
// We return the index value
// Index=i-len+1
if ((R & (1L << len)) == 0)
return i - len + 1;
}
// if the index is not matched
// then we return it as -1
// stating no match found.
return -1;
}
}
Output
Bitap Algorithm! Pattern found at index: 0