def fact(x : int) : int { if x == 0 then 1 else if x == 1 then 1 else x * fact(x - 1) } fact(4)defines a Grumpy function
def fact(x : int) : int { ... }the computes the factorial of x. The final line of the program calls fact on the integer 4, with result 24. You'll notice that function-call syntax in Grumpy follows C style rather than OCaml style.
In general, every Grumpy program consists of a number of function definitions followed by an expression (which may call one or more of the defined functions).
def f(...) : ... { ... } def g(...) : ... { ... } ... def z(...) : ... { ... } ... some result expression here ...
Here's a second example that illustrates a couple additional features of Grumpy: mutable references and variable scope:
def f(x:int, y:bool) : int { // -+ x and y go into scope let z = ref x in // -+ z goes into scope { // | let w = !z in // ---+ w goes into scope z := w + 1 // | | }; // ---+ w goes out of scope !z + 1 // | } // -+ x,y, and z go out of scope f(3, false)The code above defines a function f that takes an int x and a bool y as arguments and returns an integer (type int). The first line (let z = ref x) defines a let-bound mutable reference z initialized to x.
The function's second line introduces a new block scope with brackets { ... }. What's the effect of this scope? Any let-bound variables we declare inside it won't be accessible outside the { ... } (for example, it would be illegal to refer to w in the expression !z + 1, by rewriting it to something like !z + w).
The overall result of the program is 5: Reference z is initialized to 3. Variable w equals 3 in the update to z := w + 1 (= 3 + 1 = 4). Finally, the result of the function is the last sequenced expression in its body, !z + 1 = 4 + 1 = 5.
$ tar xzvf a3.tgzIn the resulting directory src you'll find the following file structure:
src/ -- compiler source files Makefile -- the project Makefile _tags -- the tags file for ocamlbuild AST.mli -- language-independent abstract syntax stuff AST.ml -- associated helper functions exp.mli -- the definition of Grumpy's abstract syntax exp.ml -- associated functions lexer.mll -- ocamllex source file (Part 2) parser.mly -- Menhir source file (Part 3) grumpy.ml -- the toplevel compiler program tests/ -- test casesTo build the project, type
$ makeAt this point, you may see a bunch of warnings of the form
... File "parser.mly", line 13, characters 15-20: Warning: the token WHILE is unused. Finished, 22 targets (0 cached) in 00:00:00.That's OK -- it's just Menhir telling you that the token WHILE (defined in parser.mly), and so on for all the other token kinds, is unused.
Now try running
$ make testThe tests won't pass yet of course (you haven't yet completed the assignment) so at this point you'll see a bunch of error messages of the form:
$ make test ocamlbuild -use-menhir -use-ocamlfind grumpy.native Finished, 22 targets (22 cached) in 00:00:00. cd tests && ./run.sh test01-unary-negation.gpy:1:2: Unexpected char: - *** test01-unary-negation.gpy FAILED *** test02-boolean-negation.gpy:1:2: Unexpected char: n *** test02-boolean-negation.gpy FAILED *** ... followed by many more ...To run the tests manually, you can do ./run.sh from within the tests directory. Within that directory, you'll also find a bunch of sample Grumpy programs, for example:
... test50-fractal.gpy test50-fractal.gpy.expected test51-loopref.gpy test51-loopref.gpy.expectedEach Grumpy source program (extension .gpy) is paired with a second file (extension .expected) that gives that program's expected output. You won't use the expected output in this assignment (you're just lexing and parsing) but the output files may be useful for understanding what each program does.
Start by opening lexer.mll; now navigate to the (mostly empty) definition of rule token. You'll see that, initially, it contains only one rule:
rule token = parse | _ { raise (Syntax_err ("Unexpected char: " ^ Lexing.lexeme lexbuf)) }No matter what initial character the input file contains, token initially always raises a syntax exception "Unexpected char: ...". The wildcard "_" is the catch-all pattern; the code within the braces beginning raise (Syntax_err ...) defines the action to perform in this case (raise an exception).
In general, each rule in the definition of token pairs a regular expression (on the left) with a chunk of OCaml code (in braces on the right). For example, the following few rules
rule token = parse "//" { comment lexbuf } | ['0'-'9']+ as lxm { INTCONST(Int32.of_string lxm) } | ... and comment = parse ... { ... do something ... } | ...lex comments (the definition of the comment rule is elided above -- you'll have to implement it) and 32-bit integer constants. The comment rule is defined mutually recursively -- you're free to add as many additional mutually recursive rules as you like. Note that when comment is called, we pass it the special argument lexbuf, which gives the current state of the lexer buffer.
In the regexp-style pattern
['0'-'9']+ as lxmlxm is bound to whichever strings match the regexp ['0'-'9']+ at lex time (that is, nonempty strings of characters either 0, 1, ..., 9), and can be used within the braces in the right-hand side of the rule. For example, INTCONST(Int32.of_string lxm) returns an INTCONST token (standing for "integer constant") containing the integer interpretation of lxm (Int32.of_string lxm converts lxm to the corresponding 32-bit integer, e.g., Int32.of_string "45" = 45).
The definition of the tokens themselves is given in parser.mly. Here are the first few:
%token <int32> INTCONST %token <float> FLOATCONST %token <bool> BOOLCONST %token <string> ID %token DEF LET WHILE IF THEN ELSE REF INT FLOAT BOOL UNIT TT INThe first declarations define token types that contain values of OCaml types. For example, %token <string> ID defines a new token type ID that contains OCaml strings. The last line defines a bunch of token types that contain no OCaml data.
... and nested_comment level = parse ...that takes a "nesting level" as its first argument....
| ['0'-'9']+ as lxm { INTCONST(Int32.of_string lxm) }but as:
| ['0'-'9']+ as lxm { print_string "INT("; let i = Int32.of_string lxm in print_int (Int32.to_int i); print_string ")"; INTCONST(i) }and likewise for the other rules you add (the print debug statements for tokens that don't contain OCaml values will be less complicated).
(** Is type [t] an arithmetic type? *) val is_arith_ty : ty -> boolThis declaration (and all other such declarations) must be accompanied by a corresponding function definition in AST.ml.
In the assignment code, interface files AST.mli and exp.mli define the abstract syntax of the Grumpy source language. The Grumpy lexer and parser convert concrete Grumpy programs into values of type (AST.ty, unit Exp.exp) AST.prog.
In general, your job in this file is to add a number of new nonterminal rules to the Menhir grammar, corresponding to the Grumpy syntax given in the Grumpy spec.
Each such rule will look something like the following:
unop: | MINUS { UMinus } | NOT { UNot } | DEREF { UDeref }which defines a new nonterminal called unop (unary operation) with 3 productions, one for each unary operation in the language. The rule
| MINUS { UMinus }says that the token MINUS is an acceptable unary operator; when a MINUS is parsed, the rule returns, as defined by the code within the braces { ... }, the abstract syntax UMinus. The whole abstract syntax of unary and binary operations (and of identifiers, function definitions, and whole programs) is given in file AST.mli. That file is quite heavily documented; see it for additional details.
Menhir rules can be more complicated. Here's a second example:
exp_list: | l = separated_list(COMMA, exp) { l }Assuming we've defined a nonterminal rule for expressions, called exp, this rule defines a new nonterminal exp_list that parses lists of expressions separated by COMMA tokens.
In the rule, the result of parsing this list (a list of abstract syntax expressions) is bound to variable l, which may then appear within the braces { ... }. In this case, our exp_list rule just returns l. However in general, a rule might do more interesting things with intermediate expressions. For more information on Menhir's special separated_list function (and other useful functions), see Section 5.4 of the Menhir manual, which describes Menhir's "Standard Library" (a collection of useful, pre-defined parsing functions).
Here's one more:
arg: | arg_id = id COLON arg_ty = mytype { mk_tid arg_id arg_ty }This rule defines a nonterminal arg that parses function parameters of the form id COLON ty, e.g. x : int. It assumes we've already defined rules for the nonterminals id and mytype. Within the braces, we return the expression mk_tid arg_id arg_ty, which constructs the abstract syntax corresponding to a typed identifier (an identifier together with it's type; see AST.mli for details).
To encode precedence and associativity in Menhir, add precedence and associativity directives to the top of your parser.mly file (before the %%) as documented in Section 4 of the Menhir manual.
For example, the following pair of directives encode that TIMES and DIV bind tighter than PLUS and MINUS:
%left PLUS MINUS %left TIMES DIVPrecedence directives lower in the file (at higher line numbers) have higher priority (bind tighter) than those appearing earlier. The %left indicates left associativity.
$ make testWe won't grade additional test cases, but you're very welcome to add some if you like. If you come up with what you think are particularly nasty tests (e.g., exploiting corner cases), please email them to me (gstewart) or to Sam.