Subsecciones

Llamadas a subreglas desmemoriadas

By default, every subrule call saves its result into the result-hash, either under its own name, or under an alias.

However, sometimes you may want to refactor some literal part of a rule into one or more subrules, without having those submatches added to the result-hash. The syntax for calling a subrule, but ignoring its return value is:

    <.SUBRULE>

(which is stolen directly from Perl 6).

For example, you may prefer to rewrite a rule such as:

    <rule: paren_pair> 

        \( 
            (?: <escape> | <paren_pair> | <brace_pair> | [^()] )*
        \)

without any literal matching, like so:

    <rule: paren_pair> 

        <.left_paren>
            (?: <escape> | <paren_pair> | <brace_pair> | <.non_paren> )*
        <.right_paren>
    
    <token: left_paren>   \(
    <token: right_paren>  \)
    <token: non_paren>    [^()]

Moreover, as the individual components inside the parentheses probably aren't being captured for any useful purpose either, you could further optimize that to:

    <rule: paren_pair> 

        <.left_paren>
            (?: <.escape> | <.paren_pair> | <.brace_pair> | <.non_paren> )*
        <.right_paren>

Note that you can also use the dot modifier on an aliased subpattern:

    <.Alias= (SUBPATTERN) >

This seemingly contradictory behaviour (of giving a subpattern a name, then deliberately ignoring that name) actually does make sense in one situation. Providing the alias makes the subpattern visible to the debugger, while using the dot stops it from affecting the result-hash. See Debugging non-grammars for an example of this usage.

Ejemplo: Números entre comas

Por ejemplo, queremos reconocer listas de números separados por comas. Supongamos también que queremos darle un nombre a la expresión regular de separación. Quizá, aunque no es el caso, porque la expresión regular de separación sea suficientemente compleja. Si no usamos la notación punto la coma aparecerá en la estructura:

pl@nereida:~/Lregexpgrammars/demo$ cat -n numberscomma.pl
     1  use strict;
     2  use warnings;
     3  use 5.010;
     4  use Data::Dumper;
     5  $Data::Dumper::Indent = 1;
     6
     7  my $rbb = do {
     8      use Regexp::Grammars;
     9
    10      qr{
    11        <numbers>
    12
    13        <objrule: numbers>
    14          <[number]> (<comma> <[number]>)*
    15
    16        <objtoken: number> \s*\d+
    17        <token: comma>  \s*,
    18      }xms;
    19  };
    20
    21  while (my $input = <>) {
    22      if ($input =~ m{$rbb}) {
    23          say("matches: <$&>");
    24          say Dumper \%/;
    25      }
    26  }
En efecto, aparece la clave comma:
pl@nereida:~/Lregexpgrammars/demo$ perl5.10.1 numberscomma.pl
2, 3, 4
matches: <2, 3, 4>
$VAR1 = {
  '' => '2, 3, 4',
  'numbers' => bless( {
    '' => '2, 3, 4',
    'number' => [
      bless( { '' => '2' }, 'number' ),
      bless( { '' => '3' }, 'number' ),
      bless( { '' => '4' }, 'number' )
    ],
    'comma' => ','
  }, 'numbers' )
};
Si cambiamos la llamada a la regla <comma> por <.comma>

pl@nereida:~/Lregexpgrammars/demo$ diff numberscomma.pl numberscomma2.pl
14c14
<         <[number]> (<comma> <[number]>)*
---
>         <[number]> (<.comma> <[number]>)*
eliminamos la aparición de la innecesaria clave:

pl@nereida:~/Lregexpgrammars/demo$ perl5.10.1 numberscomma2.pl
2, 3, 4
matches: <2, 3, 4>
$VAR1 = {
  '' => '2, 3, 4',
  'numbers' => bless( {
    '' => '2, 3, 4',
    'number' => [
      bless( { '' => '2' }, 'number' ),
      bless( { '' => '3' }, 'number' ),
      bless( { '' => '4' }, 'number' )
    ]
  }, 'numbers' )
};

Casiano Rodríguez León
2009-12-09