0

I would like to validate if user input string is in correct form for further processing / database update.

Form:

elephant1:elephant2:elephant3;cat1:cat2:cat3;unicorn1:unicorn2:unicorn3

: as separator between siblings and ; as separator between groups of siblings

Rules: There are ALWAYS 3 siblings, since it is meant just for personal bulk import, i just want to avoid mistakes with very long strings. As for the groups, there could be one or more, so group separator not obligatory. Siblings names are letters only with exception of underscore (_) for spaces when there are two or more words in a name.

i was thinking regex, but i am not very familiar with it. If there are any other, simpler ways to achieve this, please suggest.

Valid examples

N-number of groups, separated by semicolon, each of which containing exactly three (3) members separated by punctuation. As mentioned before, names are letters only, with exception of underscore as space for names with multiple words.

VALIDS:

john:mike:dave;jenny:helen:jessica

dog:cat:frog;car:boat:ship;house:flat:shack

meat:vegetable:fruit

UPDATE:

This is what i came up with while trying to understand your answers, it works fine so far

"/^(([a-z]+:[a-z]+:[a-z]+;?)+)$/"

Upgraded to Roman's answer

"/([a-z_]+:[a-z_]+:[a-z_]+;?)+/i"

allowing function to ignore spaces, tabs and allow underscores where items have multiple words.

5
  • You've shown a form but no rules. What if there is only 1 group, then no ;. What if elephant1 has no siblings, then no : in that group, etc... What is valid and what is not? Commented Feb 23, 2017 at 22:40
  • if user input string is in correct form - what are the rules of "correct" form? Commented Feb 23, 2017 at 22:40
  • Please see edit
    – Biker John
    Commented Feb 23, 2017 at 22:46
  • And rules for the sibling names? Numbers and letters, periods, dashes? Commented Feb 23, 2017 at 22:58
  • @AbraCadaver sibling names are simple, letters only, exception is underscore (_) for space, when there are two words for a name.
    – Biker John
    Commented Feb 23, 2017 at 23:04

2 Answers 2

1

The solution using preg_match function with specific regex pattern:

$str = 'og:cat:frog;car:boat:ship;house:flat:shack';

if (preg_match("/([a-z_]+:[a-z_]+:[a-z_]+;?)+/i", $str)) {
    echo 'valid';
} else {
    echo 'invalid';
}
4
  • Sibling names inside groups can be anything, i just used root names to visualise it better :)
    – Biker John
    Commented Feb 23, 2017 at 23:07
  • it would be good if you presented actual valid and invalid examples Commented Feb 23, 2017 at 23:09
  • great! this is what i came up with in the meanwhile... can you please tell me the differences between yours and mine so i would understand better? "/^(([a-z]+:[a-z]+:[a-z]+;?)+)$/"
    – Biker John
    Commented Feb 24, 2017 at 16:43
  • 1
    @BikerJohn, hi, as for the difference: this version /^(([a-z]+:[a-z]+:[a-z]+;?)+)$/ allows only sequences of 3 siblings from start to end of the initial string. It won't allow any whitespace characters(or any other except a letter) at the string boundaries. For ex. it won't allow <tab> og:cat:frog;car:boat:ship;<space>. Also, your version won't allow such input og:cat:big_frog;car:small_boat:ship;house:flat:shack (with underscores) Commented Feb 24, 2017 at 20:51
1

^(?:[a-zA-Z_]+:[a-zA-Z_]+:[a-zA-Z_]+(?:;(?!$)|$))+$ (demo, with multiline flag on)

^               # Anchors to beginning of string
(?:             # Opens non-capturing group
  [a-zA-Z_]+    # Any number of letters/underscore, one or more times
  :             # Literal :
  [a-zA-Z_]+    # Any number of letters/underscore, one or more times
  :             # Literal :
  [a-zA-Z_]+    # Any number of letters/underscore, one or more times
  (?:           # Opens non-capturing group
    ;           # Literal ;
    (?!$)       # Negative Lookahead, ensuring that semi-colons are not at the end of line
  |             # Or
    $           # End of string
  )             # Closes non-capturing group
)+              # Repeats overall non-capturing-group one or more times
$               # Anchors to end of string

You didn't specify if siblings could be 0 characters, if that's the case, change each [a-zA-Z_]+ to [a-zA-Z_]*

// PHP Code generated by Regex101.
$re = '/^(?:[a-zA-Z_]+:[a-zA-Z_]+:[a-zA-Z_]+(?:;(?!$)|$))+$/m';
$str = 'a_b:bread:stack_overflow;test:this_thing:jane;Get_me:h:down
 ab:bread:stack_overflow;test:this_thing:jane;Get_me:h:down
a_b:any other characters break it:stack_overflow;test:this_thing:jane;Get_me:h:down
a_b:bread:format_messed_up-test:this_thing:jane;Get_me:h:down
a_b:bread:stack_overflow;test:this_thing:jane;semi_colon_at_end;';

preg_match_all($re, $str, $matches);

// Print the entire match result
print_r($matches);
2
  • great explanation!
    – Biker John
    Commented Feb 24, 2017 at 16:44
  • @BikerJohn Thanks. I just wanted to let you know that without a lookahead (preferably negative lookahead) after the ;, the other regex permits the string to end with ; which could conceivably cause you some issues. Also, if you want to show code in comments, wrap the 'code' in Graves (apostrophe-ish character next to 1 key)
    – Regular Jo
    Commented Feb 24, 2017 at 22:23

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.