Writing a Grammar

This section begins with a discussion of how to divide work between a lexical analyzer and a parser. We then consider several transformations that could be applied to get a grammar more suitable for parsing. One technique can eliminate ambiguity in the grammar, and other techniques, left-recursion elimination and left factoring, are useful for rewriting grammars so they become suitable for top-down parsing.

Lexical Versus Syntactic Analysis

We typically use regular expressions to construct lexical analyzers while using grammars to construct parsers. In fact, everything that can be described by a regular expression can also be described by a grammar. We may ask, why use regular expressions to define the lexical syntax. There are several reasons:

The lexical rules of a language are frequently quite simple, and to describe them we do not need a notation as powerful as grammars.
Regular expressions generally provide a more concise and easier-to-understand notation for tokens than grammars.

Eliminating Ambiguity

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity. As an example, we shall eliminate the ambiguity from the following "dangling-else" grammar:

is an example showing Grammar (1) is ambiguous since it has two parse trees

ambiguity-1 ambiguity-2

In all programming languages with conditional statements of this form, the first parse tree is preferred. The general rule is, Match each with the closest unmatched . This disambiguating rule can be incorporated directly into a grammar by using the following observations.

A statement appearing between a and a must be matched ( pairs).
Thus statements must split into kinds: and .
The unambiguous grammar for statements can be described as

Elimination of Left Recursion

A grammar is left recursive if it has a nonterminal such that there is a derivation for some string . Top-down parsing methods cannot handle left-recursive grammars, so a transformation is needed to eliminate left recursion.

A simple left recursion

The left-recursive production could be replaced by non-left-recursive productions without changing the strings derivable from .

Exercise

Consider the following grammar for arithmetic expressions.

Eliminating the immediate left recursion (production of the form ) to the production for E and then for T.

Solution

are replaced by

The general case

Here, we shall deduce the general form for this formula. Immediate left recursion can be eliminated by the following technique, which works for any number of -productions. First, group the productions as

where no begins with an . Then replace the -productions by

This eliminates all immediate left recursion, but it doesn't eliminate left recursion involving derivations of two ore more steps (not immediate). For example

Eliminating the immediate left recursion among these -productions yields the following grammar.

Left Factoring

Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive, or top-down, parsing. When the choice between two alternative -productions is not clear, we may be able to rewrite the productions to defer the decision until enough of the input has been seen that we can make the right choice.

For example, if we have the two productions

(1)

on seeing the input token , we cannot immediately tell which production to choose to expand . In general, suppose a grammar where represents all alternatives that do not begin with , and the input begin with a nonempty string derived from , we do not know which production to use.

However, we may defer the decision by changing the grammar into

Here is a new nonterminal. Repeatedly apply this transformation until no two alternatives for a nonterminal have a common prefix.

Left factoring conditional statement

The grammar (1) becomes

Writing a Grammar #

Lexical Versus Syntactic Analysis #

Eliminating Ambiguity #

Elimination of Left Recursion #

A simple left recursion #

Exercise #

The general case #

Left Factoring #

Left factoring conditional statement #