(Ruby) parsing a string with RegEx

Question

This is the string that I want to parse: 2 Sep 27 Sep 28 SOME TEXT HERE 35.00

I want to parse it into a list so that the values look like:

list[0] = 'Sep 28'
list[1] = 'SOME TEXT HERE'
list[2] = '35.00'

The RegEx that I've been working on:

^\d{1}\s{1}[a-zA-Z]{3}\s{1}\d{2}\s{1}([a-zA-Z]{3}\s{1}\d{2})\s{1}([a-zA-Z0-9]*\s{1})+(\d+.\d+)

My values are:

list[0] = 'Sep 28'
list[1] = 'HERE'
list[2] = '35.00'

The list[1] value is off. I'm also probably not parsing the spaces right, but I couldn't find any guidance in the "Pickaxe" book or online.

the Tin Man · Accepted Answer · 2012-08-27 06:07:52Z

Your problem is in your second capture group:

([a-zA-Z0-9]*\s{1})+

The parenthesized group is repeated, matching each of the words 'SOME', 'TEXT', and 'HERE' individually, leaving your second capture group with only the final match, 'HERE'.

You need to put the + inside the capturing parenthesized groups, and use non-capturing parentheses (?:...) to enclose your existing group. Non-capturing parentheses, which use (?: to start the group and ) to end the group, are a way in a regular expression to group parts of your match together without capturing the group. You can use repetition operators (+, *, {n}, or {n,m}) on a non-capturing group and then capture the entire expression:

((?:[a-zA-Z0-9]*\s{1})+)

In total:

/^\d{1}\s{1}[a-zA-Z]{3}\s{1}\d{2}\s{1}([a-zA-Z]{3}\s{1}\d{2})\s{1}((?:[a-zA-Z0-9]*\s{1})+)(\d+.\d+)/

As a side note, this is a pretty clunky regex. You never really need to specify {1} in a regex as a single match is the default. Similarly, \d\d is one character less typing than \d{2}. Also, you probably just want \w instead of [a-zA-Z0-9]. Since you don't seem to care about case, you probably just want to use the /i option and simplify the letter character classes. Something like this is a more idiomatic regular expression:

/^\d [a-z]{3} \d\d ([a-z]{3} \d\d) ((?:\w* )+)(\d+.\d+)/i

Finally, though the Ruby documentation for regular expressions is a little thin, Ruby uses somewhat standard Perl-compatible regular expressions, and you can find more information about regular expressions generally at regular-expressions.info

The code works perfectly, but can you please explain what this means? " and use non-capturing parentheses (?:...) to enclose your existing group:" — tsurantino, Commented Aug 26, 2012 at 17:51

Matt · Accepted Answer · 2012-08-26 19:39:03Z

1

You may have also been here and tried this tool, but I would highly recommend Rubular. It offers very quick string parsing.

It looks like you already got the specific answer to your question, so I just wanted to drop this in for other people coming by so they can know where to go test their regex or just practice.

answered Aug 26, 2012 at 19:39

Matt

3814 silver badges13 bronze badges

This would have been more appropriate as a comment, instead of an answer.
– the Tin Man
Commented Aug 27, 2012 at 6:08
Indeed it would have, now that I think about it. Still learning the ropes of Stack Overflow.
– Matt
Commented Sep 9, 2012 at 15:14

Add a comment |

Collectives™ on Stack Overflow

(Ruby) parsing a string with RegEx

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related