1

I have a list of string/regex that I want to check if its matched from the string input.
Lets just say I have these lists:

$list = [ // an array list of string/regex that i want to check
  "lorem ipsum", // a words
  "example", // another word
  "/(nulla)/", // a regex
];

And the string:

$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";

And so, I want it to check like this:

if( $matched_string >= 1 ){ // check if there was more than 1 string matched or something...
 // do something...
 // output matched string: "lorem ipsum", "nulla"
}else{
 // nothing matched
}

How can I do something like that?

3

3 Answers 3

1

Try the following:

<?php
$input_string = "assasins: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";

$list = [ // an array list of string/regex that i want to check
"ass", // should match the ass in assasins
"Lorem ipsum", // a words
"consectetur", // another word
"/(nu[a-z]{2}a)/", // a regex
];
$regex_list = [];
foreach($list as $line) {
    if ($line[0] == '/' and $line[-1] == '/')
        $regex = '(?:' . substr($line, 1, -1) . ')';
    else
        $regex = '\\b' . preg_quote($line, $delimiter='/') . '\\b';
    $regex_list[] = $regex;
}
$regex = '/' . implode('|', $regex_list) . '/';
echo "$regex\n";
preg_match_all($regex, $input_string, $matches, PREG_SET_ORDER);
print_r($matches);

$s = [];
foreach ($matches as &$match) {
    $s[] = $match[0];
}
$s = json_encode($s);
echo "Matched strings: ", substr($s, 1, -1), "\n";

Prints:

/\bass\b|\bLorem ipsum\b|\bconsectetur\b|(?:(nu[a-z]{2}a))/
Array
(
    [0] => Array
        (
            [0] => Lorem ipsum
        )

    [1] => Array
        (
            [0] => consectetur
        )

    [2] => Array
        (
            [0] => nulla
            [1] => nulla
        )

)
Matched strings: "Lorem ipsum","consectetur","nulla"

Discussion and Limitations

In processing each element of $list, if the string begins and ends with '/', it is assumed to be a regular expression and the '/' characters are removed from the start and end of the string. Therefore, anything else that does not begin and end with these characters must be a plain string. This implies that if the OP wanted to match a plain string that just happens to begin and end with '/', e.g. '/./', they would have to do it instead as a regular expression: '/\/.\//'. A plain string is replaced by the results of calling preg_quote on it to escape special characters that have meaning in regular expressions thus converting it into a regex without the opening and closing '/' delimiters. Finally, all the strings are joined together with the regular expression or character, '|', and then prepended and appended with '/' characters to create a single regular expression from the input.

The main limitation is that this does not automatically adjust backreference numbers if multiple regular expressions in the input list have capture groups, since the group numberings will be effected when the regular expressions are combined. Therefore such regex patterns must be cognizant of prior regex patterns that have capture groups and adjust its backreferences accordingly (see demo below).

Regex flags (i.e. pattern modifiers) must be embedded within the regex itself. Since such flags in one regex string of $list will effect the processing of another regex string, if flags are used in one regex that do not apply to a subsequent regex, then the flags must be specifically turned off:

<?php
$input_string = "This is an example by Booboo.";

$list = [ // an array list of string/regex that i want to check
"/(?i)booboo/", // case insensitive
"/(?-i)EXAMPLE/" // explicitly not case sensitive
];
$regex_list = [];
foreach($list as $line) {
    if ($line[0] == '/' and $line[-1] == '/')
        $regex_list[] = substr($line, 1, -1);
    else
        $regex_list[] = preg_quote($line, $delimiter='/');
}
$regex = '/' . implode('|', $regex_list) . '/';
echo $regex, "\n";
preg_match_all($regex, $input_string, $matches, PREG_SET_ORDER);
print_r($matches);

$s = [];
foreach ($matches as &$match) {
    $s[] = $match[0];
}
$s = json_encode($s);
echo "Matched strings: ", substr($s, 1, -1), "\n";

Prints:

/(?i)booboo|(?-i)EXAMPLE/
Array
(
    [0] => Array
        (
            [0] => Booboo
        )

)
Matched strings: "Booboo"

This shows how to correctly handle backreferences by manually adjusting the group numbers:

<?php
$input_string = "This is the 22nd example by Booboo.";

$list = [ // an array list of string/regex that i want to check
"/([0-9])\\1/", // two consecutive identical digits
"/(?i)([a-z])\\2/" // two consecutive identical alphas
];
$regex_list = [];
foreach($list as $line) {
    if ($line[0] == '/' and $line[-1] == '/')
        $regex_list[] = substr($line, 1, -1);
    else
        $regex_list[] = preg_quote($line, $delimiter='/');
}
$regex = '/' . implode('|', $regex_list) . '/';
echo $regex, "\n";
preg_match_all($regex, $input_string, $matches, PREG_SET_ORDER);
print_r($matches);

$s = [];
foreach ($matches as &$match) {
    $s[] = $match[0];
}
$s = json_encode($s);
echo "Matched strings: ", substr($s, 1, -1), "\n";

Prints:

/([0-9])\1|(?i)([a-z])\2/
Array
(
    [0] => Array
        (
            [0] => 22
            [1] => 2
        )

    [1] => Array
        (
            [0] => oo
            [1] =>
            [2] => o
        )

    [2] => Array
        (
            [0] => oo
            [1] =>
            [2] => o
        )

)
Matched strings: "22","oo","oo"
11
  • I do not recommend this answer because it makes the mistake of implementing preg_quote() without declaring a slash as the second function parameter.
    – mickmackusa
    Commented Nov 28, 2022 at 0:57
  • @mickmackusa You make a good point and I have updated my answer accordingly.
    – Booboo
    Commented Nov 28, 2022 at 3:09
  • This answer may not be reliable if pattern delimiters other than a forward slash are used. This answer may not be reliable if pattern modifiers are added after the ending pattern delimiter.
    – mickmackusa
    Commented Nov 28, 2022 at 3:27
  • @mickmackusa See revised Limitations section on how regex pattern modifiers are to be handled.
    – Booboo
    Commented Nov 28, 2022 at 12:16
  • It is not necessary to declare $match as "modifiable by reference" inside of foreach(), you are not modifying it. To comply with PSR-12 guidelines, curly braces should be used with if and else. I avoid using and in PHP to prevent unintended "precedence" bugs -- not that I suspect a problem here.
    – mickmackusa
    Commented Nov 28, 2022 at 19:58
1

I'm not sure if this approach would work for your case but, you could treat them all like regexes.

$list = [ // an array list of string/regex that i want to check
  "lorem ipsum", // a words
  "Donec mattis",
  "example", // another word
  "/(nulla)/", // a regex
  "/lorem/i"
];
$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";

$is_regex = '/^\/.*\/[igm]*$/';
$list_matches = [];
foreach($list as $str){
    // create a regex from the string if it isn't already
    $patt = (preg_match($is_regex, $str))? $str: "/$str/";
    $item_matches = [];
    preg_match($patt, $input_string, $item_matches);
    if(!empty($item_matches)){
        // only add to the list if matches
        $list_matches[$str] = $item_matches;
    }
}
if(empty($list_matches)){
    echo 'No matches from the list found';
}else{
    var_export($list_matches);
}

The above will output the following:

array (
  'Donec mattis' => 
  array (
    0 => 'Donec mattis',
  ),
  '/(nulla)/' => 
  array (
    0 => 'nulla',
    1 => 'nulla',
  ),
  '/lorem/i' => 
  array (
    0 => 'Lorem',
  ),
)

Sandbox

2
  • I do not recommend this answer because it does not implement preg_quote().
    – mickmackusa
    Commented Nov 28, 2022 at 0:56
  • empty() is not necessary when a variable is unconditionally declared -- !$list_matches will do.
    – mickmackusa
    Commented Nov 28, 2022 at 20:06
1

Typically, I scream bloody murder if someone dares to stink up their code with error suppressors. If your input data is so out-of-your-control that you are allowing a mix of regex an non-regex input strings, then I guess you'll probably condone @ in your code as well.

Validate the search string to be regex or not as demonstrated here. If it is not a valid regex, then wrap it in delimiters and call preg_quote() to form a valid regex pattern before passing it to the actual haystack string.

Code: (Demo)

$list = [ // an array list of string/regex that i want to check
  "lorem ipsum", // a words
  "example", // another word
  "/(nulla)/", // a valid regex
  "/[,.]/", // a valid regex
  "^dolor^", // a valid regex
  "/path/to/dir/", // not a valid regex
  "[integer]i", // valid regex not implementing a character class
];

$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, /path/to/dir/ nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";

$result = [];
foreach($list as $v) {
    if (@preg_match($v, '') === false) {
        // not a regex, make into one
        $v = '/' . preg_quote($v, '/') . '/';
    }
    preg_match($v, $input_string, $m);
    $result[$v] = $m[0] ?? null;
}
var_export($result);

Or you could write the same thing this way, but I don't know if there is any drag in performance by checking the pattern against a non-empty string: (Demo)

$result = [];
foreach($list as $v) {
    if (@preg_match($v, $input_string, $m) === false) {
        preg_match('/' . preg_quote($v, '/') . '/', $input_string, $m);
    }
    $result[$v] = $m[0] ?? null;
}
var_export($result);
3
  • The OP wanted all matched strings so what if a given regex matched multiple occurrences in the input? So I think you want to be using preg_match_all.
    – Booboo
    Commented Nov 28, 2022 at 11:04
  • There is a lack of specificity in the problem definition, so It's not unreasonable to assume that the OP consistently uses '/' as the regex delimiters and therefore anything else that does not begin and end with these characters must be a plain string. This implies that if the OP wanted to match a plain string that just happens to begin and end with '/', e.g. '/./', they would have to do it instead as a regular expression: '/\\/.\\//'. Furthermore, this implies that you will erroneously consider '|.|' to be a regex because of the way you are testing for a regex.
    – Booboo
    Commented Nov 28, 2022 at 11:57
  • I would not consider |.| to be erroneously considered regex -- it is valid regex and can logically be treated as such within the scope of this question. For an input that may or may not be a regex pattern, it would be a flaw in the application if it did not respect a valid pattern. If the input does not give the result that the user/developer wanted, then the onus is on them to craft a better search string.
    – mickmackusa
    Commented Nov 28, 2022 at 20:02

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.