next up previous contents index practicapracticaPP2moodleLHPmoodlepserratacpanmodulospauseperlgoogleetsiiullpcgull
Sig: Usando Text::Balanced Sup: Construcción de Analizadores Léxicos Ant: Condiciones de arranque Err: Si hallas una errata ...

La Clase Parse::CLex

Relacionada con Parse::Lex está la clase Parse::CLex, la cual avanza consumiendo la cadena analizada mediante el uso del operador de sustitución (s///). Los analizadores producidos mediante esta segunda clase no permiten el uso de anclas en las expresiones regulares. Tampoco disponen de acceso a la subclase Parse::Token. He aqui el mismo ejemplo, usando la clase Parse::CLex:
> cat -n ctokenizer.pl
 1  #!/usr/local/bin/perl -w
 2
 3  require 5.000;
 4  BEGIN {  unshift @INC, "../lib"; }
 5  use Parse::CLex;
 6
 7  @token = (
 8            qw(
 9               ADDOP    [-+]
10               LEFTP    [\(]
11               RIGHTP   [\)]
12               INTEGER  [1-9][0-9]*
13               NEWLINE  \n
14              ),
15            qw(STRING),   [qw(" (?:[^"]+|"")* ")],
16            qw(ERROR  .*), sub {
17              die qq!can\'t analyze: "$_[1]"!;
18            }
19           );
20
21  Parse::CLex->trace;
22  $lexer = Parse::CLex->new(@token);
23
24  $lexer->from(\*DATA);
25  print "Tokenization of DATA:\n";
26
27  TOKEN:while (1) {
28    $token = $lexer->next;
29    if (not $lexer->eoi) {
30      print "Record number: ", $lexer->line, "\n";
31      print "Type: ", $token->name, "\t";
32      print "Content:->", $token->getText, "<-\n";
33    } else {
34      last TOKEN;
35    }
36  }
37
38  __END__
39  1+2-5
40  "This is a multiline
41  string with an embedded "" in it"
42  this is an invalid string with a "" in it"
43
44

> ctokenizer.pl
Trace is ON in class Parse::CLex
Tokenization of DATA:
[main::lexer|Parse::CLex] Token read (INTEGER, [1-9][0-9]*): 1
Record number: 1
Type: INTEGER   Content:->1<-
[main::lexer|Parse::CLex] Token read (ADDOP, [-+]): +
Record number: 1
Type: ADDOP     Content:->+<-
[main::lexer|Parse::CLex] Token read (INTEGER, [1-9][0-9]*): 2
Record number: 1
Type: INTEGER   Content:->2<-
[main::lexer|Parse::CLex] Token read (ADDOP, [-+]): -
Record number: 1
Type: ADDOP     Content:->-<-
[main::lexer|Parse::CLex] Token read (INTEGER, [1-9][0-9]*): 5
Record number: 1
Type: INTEGER   Content:->5<-
[main::lexer|Parse::CLex] Token read (NEWLINE, \n):

Record number: 1
Type: NEWLINE   Content:->
<-
[main::lexer|Parse::CLex] Token read (STRING, \"(?:[^\"]+|\"\")*\"): "This is a multiline
string with an embedded "" in it"
Record number: 3
Type: STRING    Content:->"This is a multiline
string with an embedded "" in it"<-
[main::lexer|Parse::CLex] Token read (NEWLINE, \n):

Record number: 3
Type: NEWLINE   Content:->
<-
[main::lexer|Parse::CLex] Token read (ERROR, .*): this is an invalid string with a "" in it"
can't analyze: "this is an invalid string with a "" in it"" at ctokenizer.pl line 17, <DATA> line 4.


next up previous contents index practicapracticaPP2moodleLHPmoodlepserratacpanmodulospauseperlgoogleetsiiullpcgull
Sig: Usando Text::Balanced Sup: Construcción de Analizadores Léxicos Ant: Condiciones de arranque Err: Si hallas una errata ...
Casiano Rodríguez León
2006-02-21