C++ Program to Implement Suffix Array
A suffix array is a data structure which can be used to store all possible suffixes of a given string in the sorted order. It stores the starting indices of the suffixes in lexicographical order. It is similar to trie data structure but is more space efficient then tries.
Example
Let the given string be "banana".
0 banana 5 a
1 anana Sort the Suffixes 3 ana
2 nana ----------------> 1 anana
3 ana alphabetically 0 banana
4 na 4 na
5 a 2 nanaSo the suffix array for "banana" is {5, 3, 1, 0, 4, 2}
In this article, we will learn how to implement a suffix array for a given string in C++.
How to Create Suffix Array?
Following are the steps involved in creating the suffix array:
- Create the vector of string where we will store all the suffixes and also create the vector of integer where we will store staring position of all suffixes.
- Now generate all the suffixes simply using the loop and store all the suffixes in the vector of string.
- Sort all the suffixes alphabetically.
- Now according to the alphabetically sorted suffixes we have to create suffix array using the staring position of all the suffixes.
Code Implementation
Below is the program for creating the suffix array:
// C++ Program to illustrate how to create the
// suffix array
#include <bits/stdc++.h>
using namespace std;
vector<int> buildSufArr(string &s) {
int n = s.length();
vector<int> sufArr(n);
vector<string> suf(n);
// Generating all the suffixes
for (int i = 0; i < n; i++)
suf[i] = s.substr(i);
// Sort all suffixes alphabetically
sort(suf.begin(), suf.end());
// Create the suffix array using the
// starting position of all the suffixes
// by subtracting it from the length of
// the original string
for (int i = 0; i < n; i++)
sufArr[i] = n - suf[i].length();
return sufArr;
}
int main() {
string s = "banana";
vector<int> sufArr = buildSufArr(s);
for (int i : sufArr)
cout << i << " ";
return 0;
}
Output
5 3 1 0 4 2
Time Complexity: O(k * n log n), where n is the length of string and k is the maximum length of suffixes.
Auxiliary Space: O(n), where n is the length of the string.
Applications of Suffix Arrays
Suffix Arrays can be used in various problems some of which are given below:
- Pattern Matching: It Quickly finds a substring within a larger string.
- Data Compression: It helps in algorithms that reduce the size of data.
- Bioinformatics: This algorithm is used in analysing DNA sequences.
Advantages
- After it is built it allows for quick substring searches.
- It uses less space compared to other structures like suffix trees.