Poly: Polymorphism and Higher-Order Functions

Polymorphism

In this chapter we continue our development of basic concepts of functional programming. The critical new ideas are polymorphism (abstracting functions over the types of the data they manipulate) and higher-order functions (treating functions as data). We begin with polymorphism.

Polymorphic Lists

For the last couple of chapters, we've been working just with lists of numbers. Obviously, interesting programs also need to be able to manipulate lists with elements from other types -- lists of strings, lists of booleans, lists of lists, etc. We could just define a new inductive datatype for each of these, for example...

... but this would quickly become tedious, partly because we have to make up different constructor names for each datatype, but mostly because we would also need to define new versions of all our list manipulating functions (length, rev, etc.) for each new datatype definition.

To avoid all this repetition, Coq supports polymorphic inductive type definitions. For example, here is a polymorphic list datatype.

This is exactly like the definition of natlist from the previous chapter, except that the nat argument to the cons constructor has been replaced by an arbitrary type X, a binding for X has been added to the header, and the occurrences of natlist in the types of the constructors have been replaced by list X. (We can re-use the constructor names nil and cons because the earlier definition of natlist was inside of a Module definition that is now out of scope.)

What sort of thing is list itself? One good way to think about it is that list is a function from Types to Inductive definitions; or, to put it another way, list is a function from Types to Types. For any particular type X, the type list X is an Inductively defined set of lists whose elements are of type X.

The parameter X in the definition of list automatically becomes a parameter to the constructors nil and cons -- that is, nil and cons are now polymorphic constructors; when we use them, we must now provide a first argument that is the type of the list they are building. For example, nil nat constructs the empty list of type nat.

Similarly, cons nat adds an element of type nat to a list of type list nat. Here is an example of forming a list containing just the natural number 3.

What might the type of nil be? We can read off the type list X from the definition, but this omits the binding for X which is the parameter to list. Type -> list X does not explain the meaning of X. (X : Type) -> list X comes closer. Coq's notation for this situation is forall X : Type, list X.

Similarly, the type of cons from the definition looks like X -> list X -> list X, but using this convention to explain the meaning of X results in the type forall X, X -> list X -> list X.

(Side note on notation: In .v files, the forall quantifier is spelled out in letters. In the generated HTML files and in the way various IDEs show .v files (with certain settings of their display controls), forall is usually typeset as the usual mathematical upside down A, but you'll still see the spelled-out forall in a few places. This is just a quirk of typesetting: there is no difference in meaning.)

Having to supply a type argument for each use of a list constructor may seem an awkward burden, but we will soon see ways of reducing that burden.

(We've written nil and cons explicitly here because we haven't yet defined the [] and :: notations for the new version of lists. We'll do that in a bit.)

We can now go back and make polymorphic versions of all the list-processing functions that we wrote before. Here is repeat, for example:

As with nil and cons, we can use repeat by applying it first to a type and then to an element of this type (and a number):

To use repeat to build other kinds of lists, we simply instantiate it with an appropriate type parameter:

Exercise: 2 stars, standard (mumble_grumble)

Consider the following two inductively defined types.

Which of the following are well-typed elements of grumble X for some type X? (Add YES or NO to each line.)

  • d (b a 5)
  • d mumble (b a 5)
  • d bool (b a 5)
  • e bool true
  • e mumble (b c 0)
  • e bool (b c 0)
  • c

Type Annotation Inference

Let's write the definition of repeat again, but this time we won't specify the types of any of the arguments. Will Coq still accept it?

Indeed it will. Let's see what type Coq has assigned to repeat':

It has exactly the same type as repeat. Coq was able to use type inference to deduce what the types of X, x, and count must be, based on how they are used. For example, since X is used as an argument to cons, it must be a Type, since cons expects a Type as its first argument; matching count with 0 and S means it must be a nat; and so on.

This powerful facility means we don't always have to write explicit type annotations everywhere, although explicit type annotations are still quite useful as documentation and sanity checks, so we will continue to use them most of the time. You should try to find a balance in your own code between too many type annotations (which can clutter and distract) and too few (which forces readers to perform type inference in their heads in order to understand your code).

Type Argument Synthesis

To use a polymorphic function, we need to pass it one or more types in addition to its other arguments. For example, the recursive call in the body of the repeat function above must pass along the type X. But since the second argument to repeat is an element of X, it seems entirely obvious that the first argument can only be X -- why should we have to write it explicitly?

Fortunately, Coq permits us to avoid this kind of redundancy. In place of any type argument we can write a hole _, which can be read as Please try to figure out for yourself what belongs here. More precisely, when Coq encounters a _, it will attempt to unify all locally available information -- the type of the function being applied, the types of the other arguments, and the type expected by the context in which the application appears -- to determine what concrete type should replace the _.

This may sound similar to type annotation inference -- indeed, the two procedures rely on the same underlying mechanisms. Instead of simply omitting the types of some arguments to a function, like

repeat' X x count : list X :=

we can also replace the types with _

repeat' (X : _) (x : _) (count : _) : list X :=

to tell Coq to attempt to infer the missing information.

Using holes, the repeat function can be written like this:

In this instance, we don't save much by writing _ instead of X. But in many cases the difference in both keystrokes and readability is nontrivial. For example, suppose we want to write down a list containing the numbers 1, 2, and 3. Instead of writing this...

...we can use holes to write this:

Implicit Arguments

We can go further and even avoid writing _'s in most cases by telling Coq always to infer the type argument(s) of a given function.

The Arguments directive specifies the name of the function (or constructor) and then lists its argument names, with curly braces around any arguments to be treated as implicit. (If some arguments of a definition don't have a name, as is often the case for constructors, they can be marked with a wildcard pattern _.)

Now, we don't have to supply type arguments at all:

Alternatively, we can declare an argument to be implicit when defining the function itself, by surrounding it in curly braces instead of parens. For example:

(Note that we didn't even have to provide a type argument to the recursive call to repeat'''; indeed, it would be invalid to provide one!)

We will use the latter style whenever possible, but we will continue to use explicit Argument declarations for Inductive constructors. The reason for this is that marking the parameter of an inductive type as implicit causes it to become implicit for the type itself, not just for its constructors. For instance, consider the following alternative definition of the list type:

Because X is declared as implicit for the entire inductive definition including list' itself, we now have to write just list' whether we are talking about lists of numbers or booleans or anything else, rather than list' nat or list' bool or whatever; this is a step too far.

Let's finish by re-implementing a few other standard list functions on our new polymorphic lists...

Supplying Type Arguments Explicitly

One small problem with declaring arguments Implicit is that, occasionally, Coq does not have enough local information to determine a type argument; in such cases, we need to tell Coq that we want to give the argument explicitly just this time. For example, suppose we write this:

(The Fail qualifier that appears before Definition can be used with any command, and is used to ensure that that command indeed fails when executed. If the command does fail, Coq prints the corresponding error message, but continues processing the rest of the file.)

Here, Coq gives us an error because it doesn't know what type argument to supply to nil. We can help it by providing an explicit type declaration (so that Coq has more information available when it gets to the application of nil):

Alternatively, we can force the implicit arguments to be explicit by prefixing the function name with @.

Using argument synthesis and implicit arguments, we can define convenient notation for lists, as before. Since we have made the constructor type arguments implicit, Coq will know to automatically infer these when we use the notations.

Now lists can be written just the way we'd hope:

Exercises

Exercise: 2 stars, standard, optional (poly_exercises)

Here are a few simple exercises, just like ones in the Lists chapter, for practice with polymorphism. Complete the proofs below.

Exercise: 2 stars, standard, optional (more_poly_exercises)

Here are some slightly more interesting ones...

Polymorphic Pairs

Following the same pattern, the type definition we gave in the last chapter for pairs of numbers can be generalized to polymorphic pairs, often called products:

As with lists, we make the type arguments implicit and define the familiar concrete notation.

We can also use the Notation mechanism to define the standard notation for product types:

(The annotation : type_scope tells Coq that this abbreviation should only be used when parsing types. This avoids a clash with the multiplication symbol.)

It is easy at first to get (x,y) and X*Y confused. Remember that (x,y) is a value built from two other values, while X*Y is a type built from two other types. If x has type X and y has type Y, then (x,y) has type X*Y.

The first and second projection functions now look pretty much as they would in any functional programming language.

The following function takes two lists and combines them into a list of pairs. In other functional languages, it is often called zip; we call it combine for consistency with Coq's standard library.

Exercise: 1 star, standard, optional (combine_checks)

Try answering the following questions on paper and checking your answers in Coq:

  • What is the type of combine (i.e., what does Check @combine print?)
  • What does

    Compute (combine 1;2 false;false;true;true).

    print?

  • Exercise: 2 stars, standard, recommended (split)

    The function split is the right inverse of combine: it takes a list of pairs and returns a pair of lists. In many functional languages, it is called unzip.

    Fill in the definition of split below. Make sure it passes the given unit test.

    Polymorphic Options

    One last polymorphic type for now: polymorphic options, which generalize natoption from the previous chapter. (We put the definition inside a module because the standard library already defines option and it's this one that we want to use below.)

    We can now rewrite the nth_error function so that it works with any type of lists.

    Exercise: 1 star, standard, optional (hd_error_poly)

    Complete the definition of a polymorphic version of the hd_error function from the last chapter. Be sure that it passes the unit tests below.

    Once again, to force the implicit arguments to be explicit, we can use @ before the name of the function.

    Functions as Data

    Like many other modern programming languages -- including all functional languages (ML, Haskell, Scheme, Scala, Clojure, etc.) -- Coq treats functions as first-class citizens, allowing them to be passed as arguments to other functions, returned as results, stored in data structures, etc.

    Higher-Order Functions

    Functions that manipulate other functions are often called higher-order functions. Here's a simple one:

    The argument f here is itself a function (from X to X); the body of doit3times applies f three times to some value n.

    Filter

    Here is a more useful higher-order function, taking a list of Xs and a predicate on X (a function from X to bool) and filtering the list, returning a new list containing just those elements for which the predicate returns true.

    For example, if we apply filter to the predicate evenb and a list of numbers l, it returns a list containing just the even members of l.

    We can use filter to give a concise version of the countoddmembers function from the Lists chapter.

    Anonymous Functions

    It is arguably a little sad, in the example just above, to be forced to define the function length_is_1 and give it a name just to be able to pass it as an argument to filter, since we will probably never use it again. Moreover, this is not an isolated example: when using higher-order functions, we often want to pass as arguments one-off functions that we will never use again; having to give each of these functions a name would be tedious.

    Fortunately, there is a better way. We can construct a function on the fly without declaring it at the top level or giving it a name.

    The expression (fun n => n * n) can be read as the function that, given a number n, yields n * n.

    Here is the filter example, rewritten to use an anonymous function.

    Exercise: 2 stars, standard (filter_even_gt7)

    Use filter (instead of Fixpoint) to write a Coq function filter_even_gt7 that takes a list of natural numbers as input and returns a list of just those that are even and greater than 7.

    Exercise: 3 stars, standard (partition)

    Use filter to write a Coq function partition:

    partition : forall X : Type, (X -> bool) -> list X -> list X * list X

    Given a set X, a test function of type X -> bool and a list X, partition should return a pair of lists. The first member of the pair is the sublist of the original list containing the elements that satisfy the test, and the second is the sublist containing those that fail the test. The order of elements in the two sublists should be the same as their order in the original list.

    Map

    Another handy higher-order function is called map.

    It takes a function f and a list l = [n1, n2, n3, ...] and returns the list [f n1, f n2, f n3,...] , where f has been applied to each element of l in turn. For example:

    The element types of the input and output lists need not be the same, since map takes two type arguments, X and Y; it can thus be applied to a list of numbers and a function from numbers to booleans to yield a list of booleans:

    It can even be applied to a list of numbers and a function from numbers to lists of booleans to yield a list of lists of booleans:

    Exercises

    Exercise: 3 stars, standard (map_rev)

    Show that map and rev commute. You may need to define an auxiliary lemma.

    Exercise: 2 stars, standard, recommended (flat_map)

    The function map maps a list X to a list Y using a function of type X -> Y. We can define a similar function, flat_map, which maps a list X to a list Y using a function f of type X -> list Y. Your definition should work by 'flattening' the results of f, like so:

    flat_map (fun n => n;n+1;n+2) 1;5;10 = 1; 2; 3; 5; 6; 7; 10; 11; 12.

    Lists are not the only inductive type for which map makes sense. Here is a map for the option type:

    Exercise: 2 stars, standard, optional (implicit_args)

    The definitions and uses of filter and map use implicit arguments in many places. Replace the curly braces around the implicit arguments with parentheses, and then fill in explicit type parameters where necessary and use Coq to check that you've done so correctly. (This exercise is not to be turned in; it is probably easiest to do it on a copy of this file that you can throw away afterwards.)

    Fold

    An even more powerful higher-order function is called fold. This function is the inspiration for the reduce operation that lies at the heart of Google's map/reduce distributed programming framework.

    Intuitively, the behavior of the fold operation is to insert a given binary operator f between every pair of elements in a given list. For example, fold plus [1;2;3;4] intuitively means 1+2+3+4. To make this precise, we also need a starting element that serves as the initial second input to f. So, for example,

    fold plus 1;2;3;4 0

    yields

    1 + (2 + (3 + (4 + 0))).

    Some more examples:

    Exercise: 1 star, advanced (fold_types_different)

    Observe that the type of fold is parameterized by two type variables, X and Y, and the parameter f is a binary operator that takes an X and a Y and returns a Y. Can you think of a situation where it would be useful for X and Y to be different?

    Functions That Construct Functions

    Most of the higher-order functions we have talked about so far take functions as arguments. Let's look at some examples that involve returning functions as the results of other functions. To begin, here is a function that takes a value x (drawn from some type X) and returns a function from nat to X that yields x whenever it is called, ignoring its nat argument.

    In fact, the multiple-argument functions we have already seen are also examples of passing functions as data. To see why, recall the type of plus.

    Each -> in this expression is actually a binary operator on types. This operator is right-associative, so the type of plus is really a shorthand for nat -> (nat -> nat) -- i.e., it can be read as saying that plus is a one-argument function that takes a nat and returns a one-argument function that takes another nat and returns a nat. In the examples above, we have always applied plus to both of its arguments at once, but if we like we can supply just the first. This is called partial application.

    Additional Exercises

    Exercise: 2 stars, standard (fold_length)

    Many common functions on lists can be implemented in terms of fold. For example, here is an alternative definition of length:

    Prove the correctness of fold_length. (Hint: It may help to know that reflexivity simplifies expressions a bit more aggressively than simpl does -- i.e., you may find yourself in a situation where simpl does nothing but reflexivity solves the goal.)

    Exercise: 3 stars, standard (fold_map)

    We can also define map in terms of fold. Finish fold_map below.

    Write down a theorem fold_map_correct in Coq stating that fold_map is correct, and prove it. (Hint: again, remember that reflexivity simplifies expressions a bit more aggressively than simpl.)

    Exercise: 2 stars, advanced (currying)

    In Coq, a function f : A -> B -> C really has the type A -> (B -> C). That is, if you give f a value of type A, it will give you function f' : B -> C. If you then give f' a value of type B, it will return a value of type C. This allows for partial application, as in plus3. Processing a list of arguments with functions that return functions is called currying, in honor of the logician Haskell Curry.

    Conversely, we can reinterpret the type A -> B -> C as (A * B) -> C. This is called uncurrying. With an uncurried binary function, both arguments must be given at once as a pair; there is no partial application.

    We can define currying as follows:

    As an exercise, define its inverse, prod_uncurry. Then prove the theorems below to show that the two are inverses.

    As a (trivial) example of the usefulness of currying, we can use it to shorten one of the examples that we saw above:

    Thought exercise: before running the following commands, can you calculate the types of prod_curry and prod_uncurry?

    Exercise: 2 stars, advanced (nth_error_informal)

    Recall the definition of the nth_error function:

    Fixpoint nth_error {X : Type} (l : list X) (n : nat) : option X := match l with | => None | a :: l' => if n =? O then Some a else nth_error l' (pred n) end.

    Write an informal proof of the following theorem:

    forall X n l, length l = n -> @nth_error X l n = None

    The following exercises explore an alternative way of defining natural numbers, using the so-called Church numerals, named after mathematician Alonzo Church. We can represent a natural number n as a function that takes a function f as a parameter and returns f iterated n times.

    Let's see how to write some numbers with this notation. Iterating a function once should be the same as just applying it. Thus:

    Similarly, two should apply f twice to its argument:

    Defining zero is somewhat trickier: how can we apply a function zero times? The answer is actually simple: just return the argument untouched.

    More generally, a number n can be written as fun X f x => f (f ... (f x) ...), with n occurrences of f. Notice in particular how the doit3times function we've defined previously is actually just the Church representation of 3.

    Complete the definitions of the following functions. Make sure that the corresponding unit tests pass by proving them with reflexivity.

    Exercise: 1 star, advanced (church_succ)

    Successor of a natural number: given a Church numeral n, the successor succ n is a function that iterates its argument once more than n.

    Exercise: 1 star, advanced (church_plus)

    Addition of two natural numbers:

    Exercise: 2 stars, advanced (church_mult)

    Multiplication:

    Exercise: 2 stars, advanced (church_exp)

    Exponentiation:

    (Hint: Polymorphism plays a crucial role here. However, choosing the right type to iterate over can be tricky. If you hit a Universe inconsistency error, try iterating over a different type. Iterating over cnat itself is usually problematic.)