A5: SSA

Due: Sunday, 4/15 at 11:59pm

`exp` to Static Single Assignment RTL

Your job in this assignment: translate the Grumpy expression language exp.mli to a new intermediate representation called RTL ("Register Transfer Language"). The definition of this new RTL IL is given (and heavily commented) in ssa.mli. We call the file ssa.mli (rather than, e.g., rtl.mli) because your translation function will generate RTL programs in Static Single Assignment (SSA) form.

Download the entire set of A5 files here. You won't rely very heavily on the Grumpy language specification for this assignment but we link to it just in case.

Before doing any actual programming, read through all the instructions below. And as always, ask early on Piazza if something's unclear!

Pair Programming

On this assignment (in contrast with previous assignments), you may — if you like — pair-program with up to one other person. If you do so, write that person's name in a comment at the top of your ssa.ml. For example:

  (* NAME: Your name
     OUID: Ohio University ID

     I worked with ... on this assignment. *)

Only one student in a pair needs to turn in tycheck.ml. However, the submitter should make sure they clearly indicate their partner in the README as above (so we can assign credit).

Pair programming does not mean each student does half of the assignment. Instead, it means the two of you construct tycheck.ml collaboratively, while both sitting at the same computer screen.

1. Download the assignment files

First, download the assignment files and untar the resulting tarfile into a new directory.

$ tar xvf a5.tar

In the resulting directory src you'll find the following file structure:

  src/               -- compiler source files
    Makefile         -- the project Makefile
    _tags            -- the tags file for ocamlbuild
    AST.mli          -- language-independent abstract syntax stuff
    AST.ml           -- associated helper functions
    exp.mli          -- the definition of Grumpy's abstract syntax
    exp.ml           -- associated functions
    lexer.mll        -- ocamllex source file (stub)
    parser.mly       -- Menhir source file (stub)
    tycheck.mli      -- The type-checker interface
    tycheck.ml       -- The type-checker (stub)
    ssa.mli          -- Defines the RTL intermediate language
    ssa.ml           -- (Part II)
    grumpy.ml        -- the toplevel compiler program    
    tests/           -- test cases

To build the project, type

  $ make

As in a3 and a4, you'll see a bunch of warnings at this point. That's OK. The lexer, parser, and type-checker files are the same stubs that were given to you in the last assignment. Before you get started on this assignment, copy your own lexer.mll, parser.mly, and tycheck.ml in their place.

If your type-checker doesn't work quite right, you can request a working version from Alex. Just shoot him an email. We only ask that you don't share this file with others or post it on the internet.

Tests

Run the tests by doing

  $ make test

or by typing ./run.sh from within the tests directory. (Here's sample passing test output and sample failing test output.)

For this assignment, the *.expected files in the tests directory contain the values we expect each Grumpy program to return. Building the test target does the following:

Build grumpy.native
Run grumpy.native on each .gpy file in tests/. This will generate, for each X.gpy file:
- X.gpy.tychecked
- X.gpy.rtl
- X.gpy.rtl.interp

The last two files are most relevant to this assignment.

X.gpy.rtl is a textual representation of X.gpy after translation to RTL.
X.gpl.rtl.interp contains the result of interpreting the RTL program you produced for X.gpy using an interpreter for the RTL language defined at the end of ssa.ml.

You won't have to edit the RTL interpreter (ssa_interp) for this assignment; however, it may benefit you to read through it, especially as you're diagnosing errors in your compilation function (generally, these errors will result in testcase exceptions caught at "runtime" by ssa_interp).

2. ssa.ml

TL;DR Complete the definitions of fresh_ids, instrs_of_exp, and instrs_of_explist, which together map type-annotated Grumpy source programs of type (ty, ty exp) prog to RTL programs of type (ty, instr list) prog.

Your job in this part is to implement the compilation functions sketched out as stubs in src/ssa.ml. The top-level compilation function

  val ssa_of_prog : (ty, ty exp) prog -> (ty, instr list) prog

maps type-annotated Grumpy expression ASTs to programs in which function bodies and the program result are lists of RTL instructions; it's implemented for you. The main functions you need to implement are described below.

a.)

Start by opening ssa.mli. Read the commented descriptions of the types iexp and instr, which collectively define the RTL language. For example:

  (** [ty]-annotated identifiers *)
  type iid = ty tid

  (** As in the [exp] language, we define two versions of RTL
      expressions, which together correspond to a subset of the                  
      expressions of type [exp] in [exp.mli]. The first, [raw_iexp],             
      defines the RTL language's expression constructors. The second,            
      [iexp], wraps [raw_iexp]s with a type [ty]. *)
  type raw_iexp =
    | IInt of int32                 (** 32-bit integers *)
    | IFloat of float               (** Double-precision floats *)
    | IId of id                     (** Identifiers *)

  ...

You may need to reference other files, such as AST.mli, to recall the definitions of types such as tid (typed identifiers).

b.)

Once you've thoroughly read the comments describing RTL, define the provided stub functions:

  let fresh_ids (p : gensym_pkg) (n : int) : id list = ...
  let rec instrs_of_exp (p : gensym_pkg) (out : id) (e : ty exp) 
    : instr list = ...
  and instrs_of_explist (p : gensym_pkg) (out : id) (el : (ty exp) list) 
    : instr list = ...

Informal specifications for each of these functions are given in comments in the file. In short:

fresh_ids generates, given a gensym_pkg p (for generating fresh names; see gensym.mli for details), a list of n fresh identifiers.
Implement this function in terms of the function fresh_id defined right above it in the file. This should be reasonably straightforward.
instrs_of_exp (p: gensym_pkg) (out: id) (e: ty exp): instr list is the main function in this assignment. It takes as parameters:
- A gensym_pkg p
- An identifier, out, intended to hold the result of expression e
- The expression e : ty exp to be compiled.
The result of instrs_of_exp is a list of RTL instructions corresponding to expression e. It may not be immediately obvious how to compile some expressions to instruction lists. If you get stuck, consider taking a break to think about it away from your computer.
instrs_of_explist extends instrs_of_exp to lists of expressions (as might appear in ESeq). This function should be implemented mutually recursively with instrs_of_exp (instrs_of_exp should call instrs_of_explist and vice versa).

A few of the cases of instrs_of_exp are given for you. For example:

  | EInt n -> [IAssign(out, mk_iexp (IInt n) e.ety_of)]
  | EFloat f -> [IAssign(out, mk_iexp (IFloat f) e.ety_of)]
  | EId x -> [IAssign(out, mk_iexp (IId x) e.ety_of)]

define the integer, float, and id cases. In each, you'll note that we convert an expression into an assignment instruction (not every expression is compiled to just an assignment of course), enforcing the following invariant: In each call to instrs_of_exp, the ultimate result of expression e is always stored in variable out.

To maintain this programming discipline in other cases, you'll need to introduce fresh variable names, using fresh_id or fresh_ids, binding these vars to intermediate results.

Other Requirements

Your implementation of instrs_of_exp must additionally ensure that

the same identifier is never assigned twice (SSA).
To enforce this property, you may find yourself generating fresh variables at a number of points. Think hard, in particular, about ELet. (The new function subst_var described in exp.mli may come in handy while implementing this case.)
type information is properly preserved in the instructions you generate.
Pay attention to the compilation of type information, which will be used in a later stage of the compiler. In particular, make sure that you're correctly compiling types through from decorated expressions to generated instructions (the types themselves are not compiled, but should be properly labeled in each iid and iexp that appears in your generated code). Read ssa.mli for pointers on how the types correspond across the two languages.

Hints

As in many languages, the Grumpy operators && and || have short-circuit behavior (as explained in the Grumpy spec). Think about how to compile these operators (perhaps by desugaring to other existing expressions) in order to get the right semantics.
The RTL interpreter expects that phi nodes
```
      IPhi(phi_lbl, res, (x,l1), (y,l2)
```
are entered via either label l1 or label l2, with no intervening labels. You may need to think a bit about this when implementing case EIf, the only case of instrs_of_exp in which phi nodes need appear...

3. Describe Your Implementation

1) In a detailed paragraph, describe any difficulties you encountered while completing this assignment, along with how you resolved them (or not :-)).

2) Give an English explanation of the purpose of each line of your implementations of the EIf and EWhile cases of instrs_of_exp.

Do both 1) and 2) in a comment at the top of your ssa.ml.

For 2), you can use a format like:
EIf:
line 1: ... English explanation of line 1 of EIf implementation
line 2: ... English explanation of line 2 of EIf implementation
...
EWhile:
line 1: ... English explanation of line 1 of EWhile implementation
line 2: ... English explanation of line 2 of EWhile implementation
...

4. Submit

Submit your ssa.ml on or before the due date, via Blackboard.

5. Piazza

Finally: if any of these instructions are unclear, ask for clarification early and often on Piazza!