ImpParser: Lexing and Parsing in Coq

The development of the Imp language in Imp.v completely ignores issues of concrete syntax -- how an ascii string that a programmer might write gets translated into abstract syntax trees defined by the datatypes aexp, bexp, and com. In this chapter, we illustrate how the rest of the story can be filled in by building a simple lexical analyzer and parser using Coq's functional programming facilities.

It is not important to understand all the details here (and accordingly, the explanations are fairly terse and there are no exercises). The main point is simply to demonstrate that it can be done. You are invited to look through the code -- most of it is not very complicated, though the parser relies on some monadic programming idioms that may require a little work to make out -- but most readers will probably want to just skim down to the Examples section at the very end to get the punchline.

Internals

Lexical Analysis

Parsing

Options With Errors

An option type with error messages:

Some syntactic sugar to make writing nested match-expressions on optionE more convenient.

Generic Combinators for Building Parsers

A (step-indexed) parser that expects zero or more ps:

A parser that expects a given token, followed by p:

A parser that expects a particular token:

A Recursive-Descent Parser for Imp

Identifiers:

Numbers:

Parse arithmetic expressions

Parsing boolean expressions:

Parsing commands:

Examples