In an Inductive type definition, each constructor can take any number of arguments -- none (as with true and O), one (as with S), or more than one (as with nybble, and here):
This declaration can be read: There is just one way to
construct a pair of numbers: by applying the constructor pair to
two arguments of type nat.
Here are simple functions for extracting the first and second components of a pair.
Since pairs will be used heavily, it is nice to be able to write them with the standard mathematical notation (x,y) instead of pair x y. We can tell Coq to allow this with a Notation declaration.
The new pair notation can be used both in expressions and in pattern matches.
Note that pattern-matching on a pair (with parentheses: (x, y))
is not to be confused with the multiple pattern
syntax
(with no parentheses: x, y) that we have seen previously.
The above examples illustrate pattern matching on a pair with elements x and y, whereas minus below (taken from Basics) performs pattern matching on the values n and m.
Fixpoint minus (n m : nat) : nat := match n, m with | O , _ => O | S _ , O => n | S n', S m' => minus n' m' end.
The distinction is minor, but it is worth knowing that they are not the same. For instance, the following definitions are ill-formed:
(* Can't match on a pair with multiple patterns: *) Definition bad_fst (p : natprod) : nat := match p with | x, y => x end.
(* Can't match on multiple values with pair patterns: *) Definition bad_minus (n m : nat) : nat := match n, m with | (O , _ ) => O | (S _ , O ) => n | (S n', S m') => bad_minus n' m' end.
Let's try to prove a few simple facts about pairs.
If we state things in a slightly peculiar way, we can complete proofs with just reflexivity (and its built-in simplification):
But reflexivity is not enough if we state the lemma in a more natural way:
We have to expose the structure of p so that simpl can perform the pattern match in fst and snd. We can do this with destruct.
Notice that, unlike its behavior with nats, where it generates two subgoals, destruct generates just one subgoal here. That's because natprods can only be constructed in one way.
Generalizing the definition of pairs, we can describe the
type of lists of numbers like this: A list is either the empty
list or else a pair of a number and another list.
For example, here is a three-element list:
As with pairs, it is more convenient to write lists in
familiar programming notation. The following declarations
allow us to use :: as an infix cons operator and square
brackets as an outfix
notation for constructing lists.
It is not necessary to understand the details of these declarations, but here is roughly what's going on in case you are interested. The right associativity annotation tells Coq how to parenthesize expressions involving multiple uses of :: so that, for example, the next three declarations mean exactly the same thing:
The at level 60 part tells Coq how to parenthesize expressions that involve both :: and some other infix operator. For example, since we defined + as infix notation for the plus function at level 50,
Notation x + y
:= (plus x y)
(at level 50, left associativity).
the + operator will bind tighter than ::, so 1 + 2 :: [3] will be parsed, as we'd expect, as (1 + 2) :: [3] rather than 1 + (2 :: [3]).
(Expressions like 1 + 2 :: [3]
can be a little confusing when
you read them in a .v file. The inner brackets, around 3, indicate
a list, but the outer brackets, which are invisible in the HTML
rendering, are there to instruct the coqdoc
tool that the bracketed
part should be displayed as Coq code rather than running text.)
The second and third Notation declarations above introduce the standard square-bracket notation for lists; the right-hand side of the third one illustrates Coq's syntax for declaring n-ary notations and translating them to nested sequences of binary constructors.
A number of functions are useful for manipulating lists. For example, the repeat function takes a number n and a count and returns a list of length count where every element is n.
The length function calculates the length of a list.
The app function concatenates (appends) two lists.
Since app will be used extensively in what follows, it is again convenient to have an infix operator for it.
Here are two smaller examples of programming with lists.
The hd function returns the first element (the head
) of the
list, while tl returns everything but the first element (the
tail
). Since the empty list has no first element, we must pass
a default value to be returned in that case.
Complete the definitions of nonzeros, oddmembers, and countoddmembers below. Have a look at the tests to understand what these functions should do.
Complete the definition of alternate, which interleaves two lists into one, alternating between elements taken from the first list and elements from the second. See the tests below for more specific examples.
(Note: one natural and elegant way of writing alternate will
fail to satisfy Coq's requirement that all Fixpoint definitions
be obviously terminating.
If you find yourself in this rut,
look for a slightly more verbose solution that considers elements
of both lists at the same time. One possible solution involves
defining a new kind of pairs, but this is not the only way.)
A bag (or multiset) is like a set, except that each element can appear multiple times rather than just once. One possible representation for a bag of numbers is as a list.
Complete the following definitions for the functions count, sum, add, and member for bags.
All these proofs can be done just by reflexivity.
Multiset sum is similar to set union: sum a b contains all the elements of a and of b. (Mathematicians usually define union on multisets a little bit differently -- using max instead of sum -- which is why we don't use that name for this operation.) For sum we're giving you a header that does not give explicit names to the arguments. Moreover, it uses the keyword Definition instead of Fixpoint, so even if you had names for the arguments, you wouldn't be able to process them recursively. The point of stating the question this way is to encourage you to think about whether sum can be implemented in another way -- perhaps by using functions that have already been defined.
Here are some more bag functions for you to practice with.
When remove_one is applied to a bag without the number to remove, it should return the same bag unchanged. (This exercise is optional, but students following the advanced track will need to fill in the definition of remove_one for a later exercise.)
Write down an interesting theorem bag_theorem about bags involving the functions count and add, and prove it. Note that, since this problem is somewhat open-ended, it's possible that you may come up with a theorem which is true, but whose proof requires techniques you haven't learned yet. Feel free to ask for help if you get stuck!
As for numbers, simple facts about list-processing functions can sometimes be proved entirely by simplification. For example, the simplification performed by reflexivity is enough for this theorem...
...because the [] is substituted into the
scrutinee
(the expression whose value is being scrutinized
by
the match) in the definition of app, allowing the match itself
to be simplified.
Also, as with numbers, it is sometimes helpful to perform case analysis on the possible shapes (empty or non-empty) of an unknown list.
Here, the nil case works because we've chosen to define tl nil = nil. Notice that the as annotation on the destruct tactic here introduces two names, n and l', corresponding to the fact that the cons constructor for lists takes two arguments (the head and tail of the list it is constructing).
Usually, though, interesting theorems about lists require induction for their proofs.
Simply reading proof scripts will not get you very far! It is important to step through the details of each one using Coq and think about what each step achieves. Otherwise it is more or less guaranteed that the exercises will make no sense when you get to them. 'Nuff said.
Proofs by induction over datatypes like natlist are a little less familiar than standard natural number induction, but the idea is equally simple. Each Inductive declaration defines a set of data values that can be built up using the declared constructors: a boolean can be either true or false; a number can be either O or S applied to another number; a list can be either nil or cons applied to a number and a list.
Moreover, applications of the declared constructors to one another are the only possible shapes that elements of an inductively defined set can have, and this fact directly gives rise to a way of reasoning about inductively defined sets: a number is either O or else it is S applied to some smaller number; a list is either nil or else it is cons applied to some number and some smaller list; etc. So, if we have in mind some proposition P that mentions a list l and we want to argue that P holds for all lists, we can reason as follows:
Since larger lists can only be built up from smaller ones, eventually reaching nil, these two arguments together establish the truth of P for all lists l. Here's a concrete example:
Notice that, as when doing induction on natural numbers, the as... clause provided to the induction tactic gives a name to the induction hypothesis corresponding to the smaller list l1' in the cons case. Once again, this Coq proof is not especially illuminating as a static document -- it is easy to see what's going on if you are reading the proof in an interactive Coq session and you can see the current goal and context at each point, but this state is not visible in the written-down parts of the Coq proof. So a natural-language proof -- one written for human readers -- will need to include more explicit signposts; in particular, it will help the reader stay oriented if we remind them exactly what the induction hypothesis is in the second case.
For comparison, here is an informal proof of the same theorem.
Theorem: For all lists l1, l2, and l3, (l1 ++ l2) ++ l3 = l1 ++ (l2 ++ l3).
Proof: By induction on l1.
( ++ l2) ++ l3 = ++ (l2 ++ l3),
which follows directly from the definition of ++.
(l1' ++ l2) ++ l3 = l1' ++ (l2 ++ l3)
(the induction hypothesis). We must show
((n :: l1') ++ l2) ++ l3 = (n :: l1') ++ (l2 ++ l3).
By the definition of ++, this follows from
n :: ((l1' ++ l2) ++ l3) = n :: (l1' ++ (l2 ++ l3)),
which is immediate from the induction hypothesis.
For a slightly more involved example of inductive proof over lists, suppose we use app to define a list-reversing function rev:
Now, for something a bit more challenging than the proofs we've seen so far, let's prove that reversing a list does not change its length. Our first attempt gets stuck in the successor case...
So let's take the equation relating ++ and length that would have enabled us to make progress and state it as a separate lemma.
Note that, to make the lemma as general as possible, we quantify over all natlists, not just those that result from an application of rev. This should seem natural, because the truth of the goal clearly doesn't depend on the list having been reversed. Moreover, it is easier to prove the more general property.
Now we can complete the original proof.
For comparison, here are informal proofs of these two theorems:
Theorem: For all lists l1 and l2, length (l1 ++ l2) = length l1 + length l2.
Proof: By induction on l1.
length ( ++ l2) = length + length l2,
which follows directly from the definitions of length and ++.
length (l1' ++ l2) = length l1' + length l2.
We must show
length ((n::l1') ++ l2) = length (n::l1') + length l2).
This follows directly from the definitions of length and ++ together with the induction hypothesis.
Theorem: For all lists l, length (rev l) = length l.
Proof: By induction on l.
length (rev ) = length ,
which follows directly from the definitions of length and rev.
length (rev l') = length l'.
We must show
length (rev (n :: l')) = length (n :: l').
By the definition of rev, this follows from
length ((rev l') ++ n) = S (length l')
which, by the previous lemma, is the same as
length (rev l') + length n = S (length l').
This follows directly from the induction hypothesis and the definition of length.
The style of these proofs is rather longwinded and pedantic. After the first few, we might find it easier to follow proofs that give fewer details (which we can easily work out in our own minds or on scratch paper if necessary) and just highlight the non-obvious steps. In this more compressed style, the above proof might look like this:
Theorem: For all lists l, length (rev l) = length l.
Proof: First, observe that length (l ++ [n]) = S (length l) for any l (this follows by a straightforward induction on l). The main property again follows by induction on l, using the observation together with the induction hypothesis in the case where l = n'::l'.
Which style is preferable in a given situation depends on the sophistication of the expected audience and how similar the proof at hand is to ones that the audience will already be familiar with. The more pedantic style is a good default for our present purposes.
We've seen that proofs can make use of other theorems we've already proved, e.g., using rewrite. But in order to refer to a theorem, we need to know its name! Indeed, it is often hard even to remember what theorems have been proven, much less what they are called.
Coq's Search command is quite helpful with this. Typing Search foo into your .v file and evaluating this line will cause Coq to display a list of all theorems involving foo. For example, try uncommenting the following line to see a list of theorems that we have proved about rev:
Keep Search in mind as you do the following exercises and throughout the rest of the book; it can save you a lot of time!
If you are using ProofGeneral, you can run Search with C-c C-a C-a. Pasting its response into your buffer can be accomplished with C-c C-;.
More practice with lists:
There is a short solution to the next one. If you find yourself getting tangled up, step back and try to look for a simpler way.
An exercise about your implementation of nonzeros:
Fill in the definition of eqblist, which compares lists of numbers for equality. Prove that eqblist l l yields true for every list l.
Here are a couple of little theorems to prove about your definitions about bags above.
The following lemma about leb might help you in the next exercise.
Before doing the next exercise, make sure you've filled in the definition of remove_one above.
Write down an interesting theorem bag_count_sum about bags involving the functions count and sum, and prove it using Coq. (You may find that the difficulty of the proof depends on how you defined count!)
Prove that the rev function is injective -- that is,
forall (l1 l2 : natlist), rev l1 = rev l2 -> l1 = l2.
(There is a hard way and an easy way to do this.)
Suppose we want to write a function that returns the nth element of some list. If we give it type nat -> natlist -> nat, then we'll have to choose some number to return when the list is too short...
This solution is not so good: If nth_bad returns 42, we can't tell whether that value actually appears on the input without further processing. A better alternative is to change the return type of nth_bad to include an error value as a possible outcome. We call this type natoption.
We can then change the above definition of nth_bad to return None when the list is too short and Some a when the list has enough members and a appears at position n. We call this new function nth_error to indicate that it may result in an error.
(In the HTML version, the boilerplate proofs of these examples are elided. Click on a box if you want to see one.)
This example is also an opportunity to introduce one more small feature of Coq's programming language: conditional expressions...
Coq's conditionals are exactly like those found in any other language, with one small generalization. Since the boolean type is not built in, Coq actually supports conditional expressions over any inductively defined type with exactly two constructors. The guard is considered true if it evaluates to the first constructor in the Inductive definition and false if it evaluates to the second.
The function below pulls the nat out of a natoption, returning a supplied default in the None case.
Using the same idea, fix the hd function from earlier so we don't have to pass a default element for the nil case.
This exercise relates your new hd_error to the old hd.
As a final illustration of how data structures can be defined in Coq, here is a simple partial map data type, analogous to the map or dictionary data structures found in most programming languages.
First, we define a new inductive datatype id to serve as the
keys
of our partial maps.
Internally, an id is just a number. Introducing a separate type by wrapping each nat with the tag Id makes definitions more readable and gives us the flexibility to change representations later if we wish.
We'll also need an equality test for ids:
Now we define the type of partial maps:
This declaration can be read: There are two ways to construct a
partial_map: either using the constructor empty to represent an
empty partial map, or by applying the constructor record to
a key, a value, and an existing partial_map to construct a
partial_map with an additional key-to-value mapping.
The update function overrides the entry for a given key in a partial map by shadowing it with a new one (or simply adds a new entry if the given key is not already present).
Last, the find function searches a partial_map for a given key. It returns None if the key was not found and Some val if the key was associated with val. If the same key is mapped to multiple values, find will return the first one it encounters.
Consider the following inductive definition:
How many elements does the type baz have? (Explain in words, in a comment.)