samedi 25 avril 2015

How to handle x*, x+, or x? regex-like operators in an LR parser?


I have implemented recursive descent and PEG-like parsers in the past, where you could do things like this:

Path -> Segment+
Segment -> Slash Name
Segment -> /
Name -> /\w+/
Slash -> /

  • where Segment+ means "match one or more Segment"
  • and there's a plain old regular expression for matching one or more word characters with \w+

How do you typically accomplish this same sort of thing with LR grammars/parsers? All of the examples of LR parsers I have seen are very basic, such as parsing 1 + 2 * 3, or (())(), where the patterns are very simple and don't seem to involve "one or more" functionality (or zero or more with *, or optional with ?). How do you do that in an LR parser generally?

Or does LR parsing require a lexing phase first (i.e. an LR parser requires terminal and nonterminal "tokens"). Hoping that there is a way to do LR parsing without two phases like that.


Aucun commentaire:

Enregistrer un commentaire