Subsecciones

Casando con las claves de un hash

In some situations a grammar may need a rule that matches dozens, hundreds, or even thousands of one-word alternatives. For example, when matching command names, or valid userids, or English words. In such cases it is often impractical (and always inefficient) to list all the alternatives between | alterators:

    <rule: shell_cmd>
        a2p | ac | apply | ar | automake | awk | ...
        # ...and 400 lines later
        ... | zdiff | zgrep | zip | zmore | zsh

    <rule: valid_word>
        a | aa | aal | aalii | aam | aardvark | aardwolf | aba | ...
        # ...and 40,000 lines later... 
        ... | zymotize | zymotoxic | zymurgy | zythem | zythum

To simplify such cases, Regexp::Grammars provides a special construct that allows you to specify all the alternatives as the keys of a normal hash. The syntax for that construct is simply to put the hash name inside angle brackets (with no space between the angles and the hash name).

Which means that the rules in the previous example could also be written:

    <rule: shell_cmd>
        <%cmds>

    <rule: valid_word>
        <%dict>

provided that the two hashes (%cmds and %dict) are visible in the scope where the grammar is created.

Internally, the construct is converted to something equivalent to:

    <rule: shell_cmd>
        (<.hk>)  <require: exists $cmds{$CAPTURE}>

    <rule: valid_word>
        (<.hk>)  <require: exists $dict{$CAPTURE}>

The special <hk> rule is created automatically, and defaults to \S+, but you can also define it explicitly to handle other kinds of keys. For example:

    <rule: hk>
        .+            # Key may be any number of chars on a single line

    <rule: hk>
        [ACGT]{10,}   # Key is a base sequence of at least 10 pairs

Matching a hash key in this way is typically significantly faster than matching a full set of alternations. Specifically, it is O(length of longest potential key), instead of O(number of keys).

Ejemplo de uso de la directiva hash

Sigue un ejemplo:

pl@nereida:~/Lregexpgrammars/demo$ cat -n hash.pl
 1  #!/usr/bin/env perl5.10.1
 2  use strict;
 3  use warnings;
 4  use 5.010;
 5  use Data::Dumper;
 6  $Data::Dumper::Deparse = 1;
 7
 8  my %cmd = map { ($_ => undef ) } qw( uname pwd date );
 9
10  my $rbb = do {
11      use Regexp::Grammars;
12
13      qr{
14        ^<command>$
15
16        <rule: command>
17          <cmd=%cmd> (?: <[arg]> )*
18
19        <token: arg> [^\s<>`&]+
20      }xms;
21  };
22
23  while (my $input = <>) {
24      chomp($input);
25      if ($input =~ m{$rbb}) {
26          say("matches: <$&>");
27          say Dumper \%/;
28          system $/{''}
29      }
30      else {
31          say("does not match");
32      }
33  }

Sigue un ejemplo de ejecución:

pl@nereida:~/Lregexpgrammars/demo$ perl5.10.1 hash.pl
a2p f1 f2
matches: <a2p f1 f2>
$VAR1 = {
          '' => 'a2p f1 f2',
          'command' => {
                         '' => 'a2p f1 f2',
                         'cmd' => 'a2p',
                         'arg' => [
                                    'f1',
                                    'f2'
                                  ]
                       }
        };

pocho 2 5
does not match

Casiano Rodríguez León
2009-12-09