2

In C, how exactly can I compare a string that contains * (that can be any combination of characters) with a two dimensional matrix of strings?

For example, I have the word go*s. It can generate the words "gorgeous" (* is "orgeou"), "goodness", "goats", "goes" etc. I am supposed to read the words from a whole dictionary and compare them with a word that contains one or more asterisks (*). Each word that can be generated from the word with the asterisk will have to be printed. If the two words have the same length then it is easy to compare because the * can only be one letter.

int fq(char *s1, char *s2){
int i, a=0, b=0, s=0;
while (1){
    if (s1[a]=='\0')
        break;
    a++;
}
if (strlen(s1)==strlen(s2)){
    for(i=0; i<a; i++){
        if (s1[i]=='*'){
            b++;
            }
        if (s1[i]==s2[i]){
            b++;
            }
        }
    }
if (b==a)
    return 1;
4
  • 1
    You really should check out regular expression library, such as regcomp(3). Commented May 4, 2014 at 14:25
  • man is your friend. glob is your friend. % man 3 glob Commented May 4, 2014 at 14:48
  • 1
    @CharlieBurns You mean fnmatch(3). glob(3) is for file names only. Commented May 4, 2014 at 21:43
  • Huh. That's weird. I could have sworn I used glob to match strings. But that was years ago. Maybe I had a special version in my own library. Commented May 4, 2014 at 21:54

6 Answers 6

7

You can fairly easily write a recursive function for comparing a sting to another string with a wildcard in it by examining the pattern string character by character and applying these rules as follows:

  • if pattern[p] == '\0': the patterns match if candidate[c] == '\0'
  • if pattern[p] == '*': try to match candidate[c]...candidate[c+n] with pattern[p+1]
  • if pattern[p] != '?' and pattern[p] != candidate[c]: No match
  • otherwise, match pattern[p+1] with candidate[c+1]

These few rules can easily be written as a recursive function for matching:

#include <stdbool.h>

bool match(const char *pattern, const char *candidate, int p, int c) {
  if (pattern[p] == '\0') {
    return candidate[c] == '\0';
  } else if (pattern[p] == '*') {
    for (; candidate[c] != '\0'; c++) {
      if (match(pattern, candidate, p+1, c))
        return true;
    }
    return match(pattern, candidate, p+1, c);
  } else if (pattern[p] != '?' && pattern[p] != candidate[c]) {
    return false;
  }  else {
    return match(pattern, candidate, p+1, c+1);
  }
}

then, you can do:

match("f*o", "foo", 0, 0);

This is not an effective method, but I think it is easy to understand and implement. If you need something more efficient, you can start from these: http://en.wikipedia.org/wiki/String_searching_algorithm

Sign up to request clarification or add additional context in comments.

10 Comments

(Very minor note) bool, true and false are C++. Make sure to #include <stdbool.h> for C.
Thank you for this. I've got another question. Would this work if a word also contains any number of (?) along with any number of (). The question mark can be any character from the English alphabet. For example, "?os" can generate the word "poets" and thus the function should return 1.
Not directly, however you can easily modify the last else if statement like this: (pattern[p] != '?' && pattern[p] != candidate[c]). (now it does)
@imreal It seems you flipped the "pattern" and "candidate" parameters of the two last cases. Currently, you ask if *cdxcd matches the pattern zcdxzcdxcd, which it does not, and not the other way around.
@imreal Approximately O(n^m) where "n" is the number of characters in the string, and "m" is the number of stars in the pattern. If you care about efficiency, you probably want something more efficient, this is written for simplicity, not for speed.
|
1

I liked Robert James Mieta's approach in his single-pass answer, just that the wildcard logic didn't quite work out for me. I've tweaked his solution as follows. It's probably not perfect either, might have some unforeseen edge cases, but does the job for my present needs:

int is_match(char* line, char* pattern)
{
  int wildcard = 0;

  do
  {
    if ((*pattern == *line) || (*pattern == '?'))
    {
      line++;
      pattern++;
    }
    else if (*pattern == '*')
    {
      if (*(++pattern) == '\0')
      {
        return 1;
      }
      wildcard = 1;
    }
    else if (wildcard)
    {
      if (*line == *pattern)
      {
        wildcard = 0;
        line++;
        pattern++;
      }
      else
      {
        line++;
      }
    } 
    else
    {
      return 0;
    }
  } while (*line);

  if (*pattern == '\0')
  {
    return 1;
  }
  else
  {
    return 0;
  }
}

Comments

0

The following will work with an input string containing a single *.

  1. Start comparing one character at a time at the start of your input word.
  2. If you encounter a *, compare both strings from the last character, backwards.
  3. When you have a non-match, return 0
  4. If you encounter the * you are done.
  5. On a non-match, return 0.

(Add.: This algorithm would also work with the * in the very first or last position.)

3 Comments

According to OP, the pattern may contain one or more asterisks. Will this algorithm work for multiple asterisk patterns?
@user4815162342: fair question. Calling this function recursively may work, but that's just a guess. In that case one needs to check forwards only (although in the case of a single asterisk, starting at the end is way faster).
I have thought about doing that but it didn't seem right in my head at the time. But now that I think of it it's pretty much the best way to do it I think. What happens when a word contains more than one asterisks though?
0

You can check if the first two chars are "go" and if the last one is 's'.

To simplest way to do this I see is with a simple condition:

if (strncmp(str, "go", 2) == 0 && str[strlen(str) - 1] == 's')

strncmp returns 0 if the strings match.

You should also ensure you the string is at least 2 chars long, otherwise you'll have segmentation fault as you're comparing 2 chars of your string, you can add in the above condition:

strlen(str) >= 2

Comments

0

I also liked the code but it does not pass some of my unit tests. So here is my improvement. Like Gurce said it might not be perfect but it is improved.

int is_match(const char* line, const char* pattern)
{
    // returns 1 (true) if there is a match
    // returns 0 if the pattern is not whitin the line
    int wildcard = 0;

    const char* last_pattern_start = 0;
    const char* last_line_start = 0;
    do
    {
        if (*pattern == *line)
        {
            if(wildcard == 1)
                last_line_start = line + 1;

            line++;
            pattern++;
            wildcard = 0;
        }
        else if (*pattern == '?')
        {
            if(*(line) == '\0') // the line is ended but char was expected
                return 0;
            if(wildcard == 1)
                last_line_start = line + 1;
            line++;
            pattern++;
            wildcard = 0;
        }
        else if (*pattern == '*')
        {
            if (*(pattern+1) == '\0')
            {
                return 1;
            }

            last_pattern_start = pattern;
            //last_line_start = line + 1;
            wildcard = 1;

            pattern++;
        }
        else if (wildcard)
        {
            if (*line == *pattern)
            {
                wildcard = 0;
                line++;
                pattern++;
                last_line_start = line + 1 ;
            }
            else
            {
                line++;
            }
        }
        else
        {
            if ((*pattern) == '\0' && (*line) == '\0')  // end of mask
                return 1; // if the line also ends here then the pattern match
            else
            {
                if (last_pattern_start != 0) // try to restart the mask on the rest
                {
                    pattern = last_pattern_start;
                    line = last_line_start;
                    last_line_start = 0;
                }
                else
                {
                    return 0;
                }
            }
        }

    } while (*line);



    if (*pattern == '\0')

    {
        return 1;
    }
    else
    {
        return 0;
    }
}

Here is the test cases it does however pass:

TEST_CASE("is_match") {

    REQUIRE( is_match("abcdefg", "abcdefg")           == 1);
    REQUIRE( is_match("abcdefg", "aacdefg")           == 0);
    REQUIRE( is_match("abcdefg", "*cde*")             == 1);
    REQUIRE( is_match("abcdefg", "cde*")              == 0);
    REQUIRE( is_match("abcdefg", "*cde")              == 0);
    REQUIRE( is_match("abcdfcdefg", "**cde")          == 0);
    REQUIRE( is_match("abcdfcdefg", "**cde*")         == 1);
    REQUIRE( is_match("abcdexxxcdefg", "*cdef*cdef")  == 0);
    REQUIRE( is_match("abcxxdexxxcdefg", "*xxx*?g")   == 1);
    REQUIRE( is_match("abcdexxxcdefg", "*cdef*cdef*") == 0);
    REQUIRE( is_match("abcdefg", "*cdf*cdef*")        == 0);
    REQUIRE( is_match("abcdefg", "*")                 == 1);
    REQUIRE( is_match("", "*")                        == 1);
    REQUIRE( is_match("d", "?")                       == 1);
    CHECK  ( is_match("ddd", "*?")                    == 1);
    CHECK  ( is_match("ddd", "?*")                    == 1);
    CHECK  ( is_match("ddd", "???")                   == 1);
    CHECK  ( is_match("ddd", "*?*")                   == 1);
    CHECK  ( is_match("", "?")                        == 0);
    REQUIRE( is_match("abcdefg", "*?cde*")            == 1);
    REQUIRE( is_match("abcdefg", "??cde??")           == 1);
    REQUIRE( is_match("abcdefg", "??cde?")            == 0);
    REQUIRE( is_match("abcdefg", "ppppp")             == 0);
    REQUIRE( is_match("abcdefg", "??c??")             == 0);
    REQUIRE( is_match("abcdefxxxcdefg", "*cdef*cdef?") == 1);
    REQUIRE( is_match("abcdefxxxcdefg", "*cdef*cdef*") == 1);
    REQUIRE( is_match("abcdefxxxcdefg", "*cdef*cdef") == 0);
}

Comments

0

Had the same question a while ago, and came up with this. Does a single pass of the main string for efficiency without using recursive functions. If you've stumbled across this thread, hopefully you find this helpful.

#include <stdio.h>

_Bool wildcard_strcmp(char *line, char *pattern)
{
    _Bool wildcard = 0;
    char *placeholder;

    do
    {
        if ((*pattern == *line) || (*pattern == '?'))
        {
            line++;
            pattern++;
        }
        else if (*pattern == '*')
        {
            if (*(++pattern) == '\0')
                return 1;
            else
            {
                placeholder = pattern;
                wildcard = 1;
            }
        }
        else if (wildcard)
        {
            if (pattern == placeholder)
                line++;
            else
                pattern = placeholder;
        } 
        else
            return 0;
    } while (*line);

    if (*pattern == '\0')
        return 1;
    else
        return 0;
}

int main()
{
    char string[200] = "foobarfoobar";
    char pattern[200] = "fo?*barfoo*";

    if (wildcard_strcmp(string, pattern))
        printf("Match\n");

    else
        printf("No Match\n");

    return 0;
}

2 Comments

I've tried to give this approach a go, but not quite sure the purpose of the 'placeholder' var. My compiler warns it gets used without being given a value. So I thought to perhaps set it to NULL. Still, the wildcard-checking logic in relation to this 'placeholder' var doesn't seem quite right to me.
Thank you @Gurce! Forgot one line of code, placeholder = pattern if a '*' character is found

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.