3

I'm having a bit of trouble getting my pattern to validate the string entry correctly. The PHP portion of this assignment is working correctly, so I won't include that here as to make this easier to read. Can someone tell me why this pattern isn't matching what I'm trying to do?

This pattern has these validation requirements:

  1. Should first have 3-6 lowercase letters
  2. This is immediately followed by either a hyphen or a space
  3. Followed by 1-3 digits

    $codecheck = '/^([[:lower:]]{3,6}-)|([[:lower:]]{3,6} ?)\d{1,3}$/';
    

Currently this catches most of the requirements, but it only seems to validate the minimum character requirements - and doesn't return false when more than 6 or 3 characters (respectively) are entered.

Thanks in advance for any assistance!

4
  • 1
    Instead of the "or" pipe, use [\s-] to look for a - or a space.
    – kainaw
    Commented Mar 22, 2016 at 19:45
  • You don't need to duplicate the lower /^[[:lower:]]{3,6}[\s-]\d{1,3}$/ Commented Mar 22, 2016 at 19:55
  • 1
    I don't think design-patterns tag is appropriate here. You should maybe use regex instead.
    – TerraPass
    Commented Mar 22, 2016 at 19:55
  • @AbraCadaver, please write it as an answer and not a comment.
    – ndnenkov
    Commented Mar 22, 2016 at 20:11

2 Answers 2

4

The problem here lies in how you group the alternatives. Right now, the regex matches a string that

  • ^([[:lower:]]{3,6}-) - starts with 3-6 lowercase letters followed with a hyphen
  • | - or
  • ([[:lower:]]{3,6} ?)\d{1,3}$ - ends with 3-6 lowercase letters followed with an optional space and followed with 1-3 digits.

In fact, you can get rid of the alternation altogether:

$codecheck = '/^\p{Ll}{3,6}[- ]\d{1,3}$/';

See the regex demo

Explanation:

  • ^ - start of string
  • \p{Ll}{3,6} - 3-6 lowercase letters
  • [- ] - a positive character class matching one character, either a hyphen or a space
  • \d{1,3} - 1-3 digits
  • $ - end of string
3
  • 1
    @Shaw: Just to clarify that your regex can work, too, once the alternations are placed into a group: ^(?:([[:lower:]]{3,6}-)|([[:lower:]]{3,6} ?))\d{1,3}$. However, it is not efficient. Commented Mar 22, 2016 at 20:21
  • Thanks for pointing that out - as I'm just learning it, it's good to know I was on the right track for educational purposes, but not yet seeing the whole picture in regards to efficiency.
    – Shaw
    Commented Mar 23, 2016 at 0:49
  • When using alternation like this, backtracking is increasing the number of steps needed to return a match/or detect a non-matching string. See the green box showing number of steps with your regex and mine. Although that number of steps is not a direct indicator of performance, when the step count is doubled like in this case, it hints at performance difference between the two patterns. Commented Mar 23, 2016 at 7:27
2

You need to delimit the scope of the | operator in the middle of your regex.

As it is now:

  • the right-side argument of that OR runs up until the very end of your regex, even including the $. So the digits, nor the end-of-string condition do not apply for the left side of the |.

  • the left-side argument of the OR starts with ^, and only applies to the left side.

That is why you get a match when you supply 7 lowercase characters. The first character is ignored, and the rest matches with the right-side of the regex pattern.

1
  • Thanks for breaking it down!
    – Shaw
    Commented Mar 23, 2016 at 0:41

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.