Colecciones y Expresiones Regulares: grep

El método grep permite seleccionar los elementos de una lista:

generaciondecodigos@nereida:~/src/groovy/strings$ cat -n grep.groovy
     1  regexp = ~ ".*${args[0]}.*"
     2
     3  args[1 .. -1].each { filename ->
     4
     5    file = new File(filename)
     6    list = file.readLines()
     7
     8    select = list.grep(regexp)
     9    select.each {
    10      println it
    11    }
    12  }
Sigue un ejemplo de ejecución:
generaciondecodigos@nereida:~/src/groovy/strings$ groovy grep.groovy '\d+' c2f.groovy  grep.groovy
print "Enter a temperature (i.e. 32F, 100C): ";
      def (num, type) = m[0][1..2]
        farenheit = (celsius * 9/5)+32
      celsius = (farenheit -32)*5/9;
      printf "%.2f C = %.2f F\n", celsius, farenheit;
    print "Enter a temperature (i.e. 32F, 100C): ";
regexp = ~ ".*${args[0]}.*"
args[1 .. -1].each { filename ->

Colecciones y Expresiones Regulares por Ted Naleid

El siguiente parrafo también está tomado de Groovy: Don’t Fear the RegExp por Ted Naleid

Groovy also makes significant additions to what you can do with Collections. In addition to each, collect, inject, etc, there is a regular expression aware iterator called grep that will pass each item in the Collection through a filter and return a subset of items that match the filter. We can use a regular expression as a filter:

// regular expression says 0 or more characters (".*") followed by the string "bar" that is at the end of the string ("$")
assert ["foobar", "bazbar"] == ["foobar", "bazbar", "barquux"].grep(~/.*bar$/)

You can achieve the same thing with findAll but it takes a little more typing:

assert ["foobar", "bazbar"] == ["foobar", "bazbar", "barquux"].findAll { it ==~ /.*bar$/ }

Working with Matchers

As we’ve seen, using the =~ operator will return a Matcher object. Many of the existing regular expression examples on the web work by treating the Matcher as a list and getting the first (zero-based) element out of the list:

def matcher = "foobazaarquux" =~ "o(b.*r)q"
assert ["obazaarq", "bazaar"] == matcher[0]
assert "bazaar" == matcher[0][1] // get the first grouping of the first map

This is a little fragile as matcher[0] will throw an error if there was not actually a match. Calling matches() doesn’t help as matches only checks if the regular expression matches the WHOLE string:

("foobazaarquux" =~ "o(b.*r)q").matches()  // returns false!
("foobazaarquux" =~ ".*(b.*r).*").matches()  // returns true, ".*" matches 0 or more chars of any type

You can check getCount() to see how many matches there were for some safety:

def m = "foobar" =~ /quux/
if (m.getCount()) {
        // example won't get here as "quux" doesn't exist in "foobar", the count is 0
        println m[0]
}

A groovier way to work with Matchers leverages collection iterators and the built in closures that Groovy provides to them. Matcher supports the iterator() method and with that, gets everything else that any groovy List or Collection would have, including collect, inject, findAll, etc.

def paragraph = """
    Lorem ipsum dolor 12:30 AM sit amet, 
    consectetuer adipiscing 1:15 AM elit. 
    Nunc rutrum diam sagittis nisi 9:22 PM.
"""
 
def HOUR = /10|11|12|[0-9]/
def MINUTE = /[0-5][0-9]/
def AM_PM = /AM|PM/
def time = /($HOUR):($MINUTE) ($AM_PM)/
 
assert ["12:30 AM", "1:15 AM", "9:22 PM"] == (paragraph =~ time).collect { it }
 
assert ["12:30 AM", "1:15 AM"] == (paragraph =~ time).grep(~/.*AM$/)

A limitation of the iterator-based methods is that they don’t give you access to the individual groups (hour, minute, am/pm), just the full matched string (”12:30 AM”). The each method is more powerful because as it iterates through, it passes the full match as well as each of the individual groups into the closure.

("foo1 bar30 foo27 baz9 foo600" =~ /foo(\d+)/).each { match, digit -> println "+$digit" }
 
// result:
// +1
// +27
// +600

Another example (using the paragraph and time Matcher from above) showing how to pretty print all of the timestamps:

(paragraph =~ time).each {match, hour, minute, amPm -> 
    println "$hour:$minute ${amPm == 'AM' ? 'this morning' : 'this evening' }"
}
 
// result: 
// 12:30 this morning
// 1:15 this morning
// 9:22 this evening

Regular expressions are a powerful tool that Groovy makes as accessible as any other top-tier scripting language. Using techniques to break more complicated regular expressions into their component pieces can make them much more readable (as in the time example above).

If you’re doing any sort of string processing beyond a simple contains or split, regular expressions in groovy can turn mountains of Java into a couple of lines of code.



Subsecciones
Casiano Rodríguez León
2010-04-30