Construct a String from another String using Suffix Trie
Last Updated : 23 Nov, 2023
Comments
Improve
Suggest changes
Like Article
Like
Report
A suffix tree is a data structure based on trie compression that stores all the suffixes of a given string. Apart from detecting patterns in strings, it has a range of applications in fields like bioinformatics, string algorithms, and data compression.
Features of Suffix Trie:
A Suffix Trie, commonly referred to as a suffix tree, is a data structure that resembles a tree and is used to store and look for patterns in strings.
Each route in a suffix trie represents a particular suffix, and it keeps all the suffixes of a given string as pathways in a tree.
We commence with a blank tree and add each suffix of the string to a tree to generate a suffix trie for a text sequence.
The empty string would serve as the root node of the output tree, and then each leaf node will symbolize a suffix of the input string.
A frequent substring that appears in at least two of the suffixes is represented by each internal node.
The ability to quickly find substrings inside a text is one of the key benefits of employing a suffix trie.
By moving down the tree along the route that the pattern specifies, we may search for a pattern in the suffix trie given a pattern.
We shall arrive at a leaf node that represents the suffix that begins with the pattern if the pattern is found in the string.
Explanation: In this solution, we construct the suffix trie for the string "str1". Then, for each substring of "str2", we check if it exists in the suffix trie of "str1". If it exists, we record the starting and ending indices of the substring in "str1" that form the given substring of "str2".
In the example given, the first substring of "str2" is "g". We search for this substring in the suffix trie of "str1" and find it at index 3. Therefore, we record the starting and ending indices of this substring in "str1" as (3, 3). Similarly, we find that the substring "am" in "str2" can be constructed from the suffix trie of "str1" using indices (5, 6) in "str1". Finally, we find that the substring "ing" in "str2" can be constructed from the suffix trie of "str1" using indices (8, 10) in "str1".
Therefore, the output of the program is [(3, 3), (5, 6), (8, 10)], which represents the starting and ending indices of each substring of "str1" that can be used to construct the corresponding substring in "str2".
Explanation: A suffix trie is a data structure that stores all the suffixes of a given string in a tree-like structure. To construct str2 from str1 using a suffix trie, we first build a suffix trie for str1. Then, we search for str2 in the suffix trie by traversing down the tree, following the edges labeled with the characters of str2.
To construct "ana" from "banana", we start at the root of the suffix trie and follow the edges labeled "a", "n", and "a", respectively, until we reach the end of the string. The indices of the characters we traverse are (1, 3), which correspond to the substring "ana" in str1.
Approach: This can be solved with the following idea:
Step 1: Create a Suffix Trie for the Original String
The first step is to construct a trie data structure that represents all the suffixes of the original string. This data structure is called a suffix trie and can be constructed using any standard algorithm.
Step 2: Identify Suffixes Beginning with the Initial Substring
After constructing the suffix trie, the next step is to locate all the suffixes that start with the initial substring of interest. This can be achieved by traversing the trie from the root to the leaf node that corresponds to the initial substring. By following the edges that match the characters of the initial substring, we can identify all the suffixes that begin with it.
Step 3: Determine the Longest Common Prefix (LCP) of the Suffixes
Once we have identified all the suffixes that begin with the initial substring, we need to determine their LCP. To accomplish this, we must identify the lowest common ancestor of the leaf nodes that correspond to the suffixes. The LCA represents the longest common prefix of the suffixes.
Step 4: Add the LCP to the Output String
After determining the LCP of the suffixes, we can add it to the output string.
Step 5: Repeat for Additional Substrings
To find the LCP for every additional substring, we repeat steps 2-4, beginning at the end of the previous substring. We identify all the suffixes that begin with the additional substring, determine their LCP, and add it to the output string.
Below is the code for the above approach:
C++
// C++ implementation of the above approach#include<iostream>#include<vector>usingnamespacestd;// Implementing Trie using TrieNode classclassTrieNode{public:vector<TrieNode*>children;boolisEndOfWord;TrieNode(){children=vector<TrieNode*>(26,nullptr);isEndOfWord=false;}};// Trie data structure classclassTrie{private:TrieNode*root;TrieNode*getNode(){returnnewTrieNode();}intcharToIndex(charch){returnch-'a';}public:Trie(){root=getNode();}voidinsert(stringkey){TrieNode*word=root;intlength=key.length();for(intlevel=0;level<length;level++){intindex=charToIndex(key[level]);if(!word->children[index]){word->children[index]=getNode();}word=word->children[index];}word->isEndOfWord=true;}boolsearch(stringkey){TrieNode*word=root;intlength=key.length();intlevel=0;while(level<length){intindex=charToIndex(key[level]);if(!word->children[index]){returnfalse;}word=word->children[index];level++;}if(level==length){returntrue;}else{returnfalse;}}staticvector<pair<int,int>>buildViaSubstrings(stringP,stringQ){if(P.length()==1){for(inti=0;i<Q.length();i++){if(Q[i]!=P[0]){return{};}}vector<pair<int,int>>substrings(Q.length(),make_pair(0,0));returnsubstrings;}else{Triex;for(inti=0;i<P.length();i++){x.insert(P.substr(i));}intstartPos=0;vector<pair<int,int>>substrings;booly=true;intk=1;while(k<=Q.length()){y=x.search(Q.substr(startPos,k-startPos));if(!y){if(k==startPos+1){return{};}else{stringsub=Q.substr(startPos,k-1-startPos);intlt=sub.length();intm=P.find(sub);substrings.push_back(make_pair(m,m+lt-1));startPos=k-1;k=k-1;y=true;}}elseif(y&&k==Q.length()){stringsub=Q.substr(startPos);intlt=sub.length();intm=P.find(sub);substrings.push_back(make_pair(m,m+lt-1));}k++;}if(y&&substrings.empty()){return{make_pair(P.find(Q),Q.length()-1)};}else{returnsubstrings;}}}};intmain(){stringstr1="ssrtssr";stringstr2="rsstsr";vector<pair<int,int>>ans=Trie::buildViaSubstrings(str1,str2);for(autop:ans){cout<<"("<<p.first<<", "<<p.second<<") ";}cout<<endl;return0;}// This code is contributed by Tapesh(tapeshdua420)
# Implementing Trie using Trie and# TrieNode classes""" The Trie_Node class is defined with a __init__ method that creates a list of None values with length 26 to represent children nodes and a Boolean variable isEndOfWord to mark the end of a word in the trie."""classTrie_Node:# Trie node classdef__init__(self):self.children=[None]*26# Property to represent end# of a word in trieself.isEndOfWord=False"""The Trie class is defined with a __init__ method that creates a root node using the getNode method and a _charToIndex private helper method to convert a character to its index in the children list."""classTrie(Trie_Node):# Trie data structure classdef__init__(self):self.root=self.getNode()defgetNode(self):# Returns new trie node with# Null valuesreturnTrie_Node()def_charToIndex(self,ch):# Private helper functionreturnord(ch)-ord('a')"""The insert method is defined to insert a new key (a string) into the trie. It iterates over each character of the key and checks if the character is already present in the trie. If it's not present, it creates a new node and adds it to the children list of the current node. The method marks the last node as the end of the word."""definsert(self,key):# When word is already# present in trieword=self.rootlength=len(key)forlevelinrange(length):index=self._charToIndex(key[level])# If character is not present# in trieifnotword.children[index]:word.children[index]=self.getNode()word=word.children[index]word.isEndOfWord=True"""The search method is defined to search for a key (a string) in the trie. It iterates over each character of the key and checks if the character is present in the trie. If it's not present, the method returns False. if the method reaches the end of the key and the last node is marked as the end of the word, the method returns True."""defsearch(self,key):# Search substring in the trieword=self.rootlength=len(key)level=0whilelevel<length:index=self._charToIndex(key[level])ifnotword.children[index]:returnFalseword=word.children[index]level+=1iflevel==length:returnTrueelse:returnFalse"""The build_via_substrings method is defined to build a suffix trie for a given input string P and search for all substrings of another input string Q in the trie."""defbuild_via_substrings(P,Q):# handling when length of S is 1iflen(P)==1:foriinrange(len(Q)):ifQ[i]!=P:returnFalsereturn[(0,0)]*len(Q)else:# creating suffix triex=Trie()foriinrange(len(P)):x.insert(P[i:])start_pos=0substrings=[]y=Truek=1# Search substrings in triewhilek<=len(Q):y=x.search(Q[start_pos:k])ify==False:# Unsuccessful search# for a single lettered# substring.ifk==start_pos+1:returnFalseelifk!=start_pos+1:# When search fails# for a substring# greater than# length = 1sub=Q[start_pos:k-1]lt=len(sub)m=P.find(sub)substrings.append((m,m+lt-1))start_pos=k-1k=k-1y=Trueelify==Trueandk==len(Q):# We check whether we# have reached the# last lettersub=Q[start_pos:]lt=len(sub)m=P.find(sub)substrings.append((m,m+lt-1))k=k+1ify==Trueandsubstrings==[]:return[(P.find(Q),len(Q)-1)]else:returnsubstrings# Driver codeif__name__=="__main__":str1="ssrtssr"str2="rsstsr"# Function callans=Trie.build_via_substrings(str1,str2)print(ans)
C#
// C# implementation for the above approachusingSystem;usingSystem.Collections.Generic;// Implementing Trie using TrieNode classclassTrieNode{publicTrieNode[]Children;publicboolIsEndOfWord;publicTrieNode(){Children=newTrieNode[26];IsEndOfWord=false;}}// Trie data structure classclassTrie{privateTrieNoderoot;privateTrieNodeGetNode(){returnnewTrieNode();}privateintCharToIndex(charch){returnch-'a';}publicTrie(){root=GetNode();}publicvoidInsert(stringkey){TrieNodeword=root;intlength=key.Length;for(intlevel=0;level<length;level++){intindex=CharToIndex(key[level]);if(word.Children[index]==null){word.Children[index]=GetNode();}word=word.Children[index];}word.IsEndOfWord=true;}publicboolSearch(stringkey){TrieNodeword=root;intlength=key.Length;intlevel=0;while(level<length){intindex=CharToIndex(key[level]);if(word.Children[index]==null){returnfalse;}word=word.Children[index];level++;}return(level==length);}publicstaticList<Tuple<int,int>>BuildViaSubstrings(stringP,stringQ){if(P.Length==1){for(inti=0;i<Q.Length;i++){if(Q[i]!=P[0]){returnnewList<Tuple<int,int>>();}}List<Tuple<int,int>>substrings=newList<Tuple<int,int>>();for(inti=0;i<Q.Length;i++){substrings.Add(newTuple<int,int>(0,i));}returnsubstrings;}else{Triex=newTrie();for(inti=0;i<P.Length;i++){x.Insert(P.Substring(i));}intstartPos=0;List<Tuple<int,int>>substrings=newList<Tuple<int,int>>();booly=true;intk=1;while(k<=Q.Length){y=x.Search(Q.Substring(startPos,k-startPos));if(!y){if(k==startPos+1){returnnewList<Tuple<int,int>>();}else{stringsub=Q.Substring(startPos,k-1-startPos);intlt=sub.Length;intm=P.IndexOf(sub);substrings.Add(newTuple<int,int>(m,m+lt-1));startPos=k-1;k=k-1;y=true;}}elseif(y&&k==Q.Length){stringsub=Q.Substring(startPos);intlt=sub.Length;intm=P.IndexOf(sub);substrings.Add(newTuple<int,int>(m,m+lt-1));}k++;}if(y&&substrings.Count==0){returnnewList<Tuple<int,int>>{newTuple<int,int>(P.IndexOf(Q),Q.Length-1)};}else{returnsubstrings;}}}}classGFG{staticvoidMain(string[]args){stringstr1="ssrtssr";stringstr2="rsstsr";List<Tuple<int,int>>ans=Trie.BuildViaSubstrings(str1,str2);foreach(varpinans){Console.Write("("+p.Item1+", "+p.Item2+") ");}Console.WriteLine();}}
JavaScript
// Implementing Trie using TrieNode classclassTrieNode{constructor(){// Initialize an array to store child TrieNodes for each characterthis.children=newArray(26).fill(null);this.isEndOfWord=false;}}// Trie data structure classclassTrie{constructor(){this.root=newTrieNode();}// Helper function to get a new TrieNodegetNode(){returnnewTrieNode();}// Helper function to get the index of a charactercharToIndex(ch){returnch.charCodeAt(0)-'a'.charCodeAt(0);}// Insert a word into the Trieinsert(key){letword=this.root;constlength=key.length;for(letlevel=0;level<length;level++){constindex=this.charToIndex(key[level]);if(!word.children[index]){word.children[index]=this.getNode();}word=word.children[index];}word.isEndOfWord=true;}// Search for a word in the Triesearch(key){letword=this.root;constlength=key.length;letlevel=0;while(level<length){constindex=this.charToIndex(key[level]);if(!word.children[index]){returnfalse;}word=word.children[index];level++;}returnlevel===length;}// Build substrings of Q that can be formed using non-overlapping substrings of PstaticbuildViaSubstrings(P,Q){if(P.length===1){for(leti=0;i<Q.length;i++){if(Q[i]!==P[0]){return[];}}constsubstrings=Array(Q.length).fill().map((_,i)=>[0,i]);returnsubstrings;}else{constx=newTrie();for(leti=0;i<P.length;i++){x.insert(P.substr(i));}letstartPos=0;constsubstrings=[];lety=true;letk=1;while(k<=Q.length){y=x.search(Q.substr(startPos,k-startPos));if(!y){if(k===startPos+1){return[];}else{constsub=Q.substr(startPos,k-1-startPos);constlt=sub.length;constm=P.indexOf(sub);substrings.push([m,m+lt-1]);startPos=k-1;k=k-1;y=true;}}elseif(y&&k===Q.length){constsub=Q.substr(startPos);constlt=sub.length;constm=P.indexOf(sub);substrings.push([m,m+lt-1]);}k++;}if(y&&substrings.length===0){return[[P.indexOf(Q),Q.length-1]];}else{returnsubstrings;}}}}// Main functionfunctionmain(){conststr1="ssrtssr";conststr2="rsstsr";constans=Trie.buildViaSubstrings(str1,str2);for(constpofans){console.log(`(${p[0]}, ${p[1]})`);}}// Run the main functionmain();
Output
[(2, 2), (0, 1), (3, 4), (2, 2)]
Time Complexity: O(n2 + m) Auxiliary Space: O(n*26)
Applications of Suffix Trie:
Suffix trie is used to find all occurrences of a pattern in a given text by searching for all substrings of the pattern in the text in pattern matching algorithms.
It is also used to assemble genome sequences from short DNA sequences by matching and aligning the short reads to the reference genome in bioinformatics.
Widely used to check whether a word is spelled correctly by searching for all possible substrings of the input word in spell-checking software.
It is preferably used to identify and optimize frequently used code patterns in compilers and code optimization tools.
Suffix trie is also used in natural language processing applications to properly match and categorize words and phrases based on their morphological and syntactical properties.
We use cookies to ensure you have the best browsing experience on our website. By using our site, you
acknowledge that you have read and understood our
Cookie Policy &
Privacy Policy
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.