Final Project Ideas
Done in 2015 by Charlie Murphy and Patrick Gray:
Verified Machine Learning in Coq: Perceptron
The Perceptron
is a simple linear classifier that learns, from training data, how to classify
real-valued feature vectors V as "positive" or "negative".
Potential tasks:
-
Implement Perceptron in Coq. You might assume, first, that feature
vectors are integer-valued. Then work up to operating with real
numbers in Coq, using a standard library like Coq.Reals.
-
Prove some property of your Perceptron implementation. For example,
that Perceptron training converges assuming the input training set
is linearly separable (i.e., positive examples can be distinguished
from negative ones by some hyperplane).
-
As a bonus, learn about extraction
in Coq in order to generate, from your Coq Perceptron
implementation, executable code in the functional programming
language OCaml. Benchmark your implementation to see how fast it is!
Verified Algorithms and Data Structures: 2-3 Trees
2-3 Trees, invented by
John Hopcroft in 1970, are a balanced binary-tree data structure
similar to AVL or Red-Black trees.
Potential tasks:
-
Implement 2-3 trees in Coq (lookup, insert, delete).
-
Prove your 2-3 implementation correct with respect to a set
abstraction of the data structure, as in the midterm on splay trees.
-
Interesting but difficult: prove a theorem stating that
lookup, insert, and delete are each O(log n) time.
-
As a bonus, learn about extraction
in Coq in order to generate, from your 2-3 tree
implementation, executable code in the functional programming
language OCaml. Benchmark your implementation to see how fast it is!
One possible reference: Sozeau, Finger Trees in Coq
Inverse Transform Sampling in Coq
A discrete distribution D
over outcomes of type A can be represented in Coq as
a function that maps each value of type A to a (rational, Q or real)
probability
in the range [0,1], subject to \sum_{a:A} D a = 1.
The CDF
(cumulative distribution function) of D is the function
F(a) that tells us the probability that an outcome is "less than
or equal" a, for some ordering of the possible outcomes in A.
The inverse of the CDF, a function that maps values in [0,1]
to associated outcomes of type A, can be used to sample the
distribution, assuming access to a source of random values
in the range [0,1].
Potential tasks:
-
Define a notion of discrete
probability distribution over values of type A in Coq
as a function that maps each a : A to real (or perhaps rational) numbers.
-
Define cumulative distribution functions (CDFs).
-
Define inverse CDFs.
-
Axiomatize a function that produces uniform random values
in the range [0,1], in order to define a sampling function
over inverse CDFs.
-
Prove that your sampling procedure produces values
according to the original distribution D.
Difficulty of this last part is probably high!
Monte Carlo in Coq: Estimating Pi
As this page
demonstrates, it's possible to estimate pi=3.14159... using a Monte
Carlo-style random simulation (choose points randomly within the
bounding box (-1,-1), (1, 1); then pi can be estimated from the ratio of
points within the circle centered at (0,0) [approximately pi*R^2] to
the total number of points in the bounding box [approximately 4R^2]).
Potential tasks:
-
Define a notion of discrete
probability distribution over values of type A in Coq
as a function that maps each a : A to real (or perhaps rational) numbers.
-
Define a notion of Monte Carlo simulation over probability distributions,
as a function that operates over a sequence of random draws.
-
Using your formulation of Monte Carlo, define the approximation of pi
described above.
-
Prove that your approximation is equal to the real value of pi up to
some approximation factor. Difficulty of this last part is probably high!