0

This is the string that I want to parse: 2 Sep 27 Sep 28 SOME TEXT HERE 35.00

I want to parse it into a list so that the values look like:

list[0] = 'Sep 28'
list[1] = 'SOME TEXT HERE'
list[2] = '35.00'

The RegEx that I've been working on:

^\d{1}\s{1}[a-zA-Z]{3}\s{1}\d{2}\s{1}([a-zA-Z]{3}\s{1}\d{2})\s{1}([a-zA-Z0-9]*\s{1})+(\d+.\d+)

My values are:

list[0] = 'Sep 28'
list[1] = 'HERE'
list[2] = '35.00' 

The list[1] value is off. I'm also probably not parsing the spaces right, but I couldn't find any guidance in the "Pickaxe" book or online.

2 Answers 2

4

Your problem is in your second capture group:

([a-zA-Z0-9]*\s{1})+

The parenthesized group is repeated, matching each of the words 'SOME', 'TEXT', and 'HERE' individually, leaving your second capture group with only the final match, 'HERE'.

You need to put the + inside the capturing parenthesized groups, and use non-capturing parentheses (?:...) to enclose your existing group. Non-capturing parentheses, which use (?: to start the group and ) to end the group, are a way in a regular expression to group parts of your match together without capturing the group. You can use repetition operators (+, *, {n}, or {n,m}) on a non-capturing group and then capture the entire expression:

((?:[a-zA-Z0-9]*\s{1})+)

In total:

/^\d{1}\s{1}[a-zA-Z]{3}\s{1}\d{2}\s{1}([a-zA-Z]{3}\s{1}\d{2})\s{1}((?:[a-zA-Z0-9]*\s{1})+)(\d+.\d+)/

As a side note, this is a pretty clunky regex. You never really need to specify {1} in a regex as a single match is the default. Similarly, \d\d is one character less typing than \d{2}. Also, you probably just want \w instead of [a-zA-Z0-9]. Since you don't seem to care about case, you probably just want to use the /i option and simplify the letter character classes. Something like this is a more idiomatic regular expression:

/^\d [a-z]{3} \d\d ([a-z]{3} \d\d) ((?:\w* )+)(\d+.\d+)/i

Finally, though the Ruby documentation for regular expressions is a little thin, Ruby uses somewhat standard Perl-compatible regular expressions, and you can find more information about regular expressions generally at regular-expressions.info

1
  • The code works perfectly, but can you please explain what this means? " and use non-capturing parentheses (?:...) to enclose your existing group:"
    – tsurantino
    Commented Aug 26, 2012 at 17:51
1

You may have also been here and tried this tool, but I would highly recommend Rubular. It offers very quick string parsing.

It looks like you already got the specific answer to your question, so I just wanted to drop this in for other people coming by so they can know where to go test their regex or just practice.

2
  • This would have been more appropriate as a comment, instead of an answer. Commented Aug 27, 2012 at 6:08
  • Indeed it would have, now that I think about it. Still learning the ropes of Stack Overflow.
    – Matt
    Commented Sep 9, 2012 at 15:14

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.