CS 331 Spring 2025  >  Assignment 4 (Writing a Parser)


CS 331 Spring 2025
Assignment 4 (Writing a Parser)

Assignment 4 is due at 5 pm on Thursday, February 27. It is worth 90 points.

Procedures

This assignment is to be done individually.

Turn in answers to the exercises below on the UA Canvas site, under Assignment 4 for this class.

Exercises (A only, 90 pts)

Exercise A — Predictive Recursive-Descent Parser in Lua

Purpose

In this exercise you will write a Lua module that does parsing for a simple programming language called Fulmar. Your parser will determine syntactic correctness; when the input is correct, the parser will output an abstract syntax tree.

The parser will use the Predictive Recursive-Descent method. It will be built on top of the lexer from the previous assignment.

In a later assignment, you will build an interpreter that takes an AST in the form returned by your parser and executes the code.

Instructions

Write a Lua module parseit, contained in the file parseit.lua. Your module must parse Fulmar programs using Predictive Recursive Descent.

Be sure to follow the Coding Standards.

Fulmar Programming Language: Syntax & AST

The syntax of Fulmar is specified here, along with the format of an abstract syntax tree. The semantics of Fulmar will be covered in a later assignment.

Introduction—Fulmar is a very small programming language. Here is an example Fulmar program.

[Fulmar]

# Fulmar Example #1
# Glenn G. Chappell
# 2025-02-18
nn = 3
println(nn+4)

If function parseit.parse is called, passing the source of the above program as the argument, then the return value must be the following triple.

true, true, {1, {5, {17, "nn"}, {14, "3"}}, {3, {{12, "+"}, {17, "nn"}, {14, "4"}}}}
Above, the first true indicates that the parsing function returned syntactically correct. The second true indicates that the parser reached the end of the input. The table is the abstract syntax tree of the program. The AST may make more sense if we replace the numbers by symbolic constants (defined later), giving the following.
{PROGRAM,
  {ASSN_STMT, {SIMPLE_VAR, "nn"}, {NUMLIT_VAL, "3"}},
  {PRINTLN_STMT, {{BIN_OP, "+"}, {SIMPLE_VAR, "nn"}, {NUMLIT_VAL, "4"}}}}

Here is a longer Fulmar program—previously seen on the lecture slides.

[Fulmar]

# fibo.fmar
# Glenn G. Chappell
# 2025-01-10
#
# For CS 331 Spring 2025
# Compute Fibonacci Numbers


# The Fibonacci number F(n), for n >= 0, is defined by F(0) = 0,
# F(1) = 1, and for n >= 2, F(n) = F(n-2) + F(n-1).


# fibo (param in variable n)
# Return Fibonacci number F(n).
func fibo()
    currfib = 0
    nextfib = 1
    i = 0  # Loop counter
    while i < n
        # Advance (currfib, nextfib)
        tmp = currfib + nextfib
        currfib = nextfib
        nextfib = tmp
        i = i+1
    end
    return currfib
end


# Main program
# Print some Fibonacci numbers
how_many_to_print = 20

println("Fibonacci Numbers")

j = 0  # Loop counter
while j < how_many_to_print
    n = j  # Set param for fibo
    ff = fibo()
    println("F(", j, ") = ", ff)
    j = j+1
end

Lexical Structure—Lexemes are as on Assignment 3.

Grammar—A grammar for Fulmar follows. Italic lower-case words are nonterminals. Roman upper-case WORDS are terminals representing lexeme categories from module lexit. A single-quoted ‘word’ in a typewriter font is a terminal that is required to be a specific string. There is no end-of-input lexeme.

The start symbol is program. Lines are numbered for later reference.

1.   program   →   statement }
2.   statement   →   ( ‘print’ | ‘println’ ) ‘(’ [ print_arg { ‘,’ print_arg } ] ‘)
3.     |   return’ expr
4.     |   ID ( ‘(’ ‘)’ | [ ‘[’ expr ‘]’ ] ‘=’ expr )
5.     |   func’ ID ‘(’ ‘)’ program ‘end
6.     |   if’ expr program { ‘elif’ expr program } [ ‘else’ program ] ‘end
7.     |   while’ expr program ‘end
8.   print_arg   →   STRLIT
9.     |   chr’ ‘(’ expr ‘)
10.     |   expr
11.   expr   →   compare_expr { ( ‘&&’ | ‘||’ ) compare_expr }
12.   compare_expr   →   arith_expr { ( ‘==’ | ‘!=’ | ‘<’ | ‘<=’ | ‘>’ | ‘>=’ ) arith_expr }
13.   arith_expr   →   term { ( ‘+’ | ‘’ ) term }
14.   term   →   factor { ( ‘*’ | ‘/’ | ‘%’ ) factor }
15.   factor   →   NUMLIT
16.     |   (’ expr ‘)
17.     |   ( ‘+’ | ‘-’ | ‘!’ ) factor
18.     |   readnum’ ‘(’ ‘)
19.     |   rnd’ ‘(’ expr ‘)
20.     |   ID [ ‘(’ ‘)’ | ‘[’ expr ‘]’ ]

The above grammar is already in a form that is usable by a Predictive Recursive-Descent parser; you will not need to transform it.

Precedence & Associativity—There are 13 binary operators that may appear in an expression: &&, ||, ==, !=, <, <=, >, >=, binary +, binary -, *, /, %. All are left-associative.

All other associativity, and all operator precedence, is completely encoded in the grammar.

AST Specification—This subsection will use the following named constants—which I suggest defining in your code.

[Lua]

local PROGRAM       = 1
local PRINT_STMT    = 2
local PRINTLN_STMT  = 3
local RETURN_STMT   = 4
local ASSN_STMT     = 5
local FUNC_CALL     = 6
local FUNC_DEF      = 7
local IF_STMT       = 8
local WHILE_LOOP    = 9
local STRLIT_OUT    = 10
local CHR_CALL      = 11
local BIN_OP        = 12
local UN_OP         = 13
local NUMLIT_VAL    = 14
local READNUM_CALL  = 15
local RND_CALL      = 16
local SIMPLE_VAR    = 17
local ARRAY_VAR     = 18

Every AST is a Lua array. We will specify the format of an AST for each of the lines in the above grammar. Lines are referred to by number.

  1. Program. The AST for the program is an array whose first item is PROGRAM. The following items are the ASTs for the various statements, if any, in order.

    For example, the AST for the programtt=3 print(tt)” is {PROGRAM, {ASSN_STMT, {SIMPLE_VAR, "tt"}, {NUMLIT_VAL, "3"}},{PRINT_STMT, {SIMPLE_VAR, "tt"}}}.

    If there are no statements, then the AST is {PROGRAM}.

  2. Statement: Print/Println. The AST for the statement is an array whose first item is either PRINT_STMT or PRINTLN_STMT, depending on the opening keyword. The following items are the ASTs for the print_args, if any, in order.

    For example, the AST for the statementprint('abc', 123)” is {PRINT_STMT, {STRLIT_OUT, "'abc'"}, {SIMPLE_VAR, "123"}}.

    As another example, the AST for the statementprintln()” is {PRINTLN_STMT}.

  3. Statement: Return. The AST for the statement is {RETURN_STMT, EEE}, where EEE is the AST for the expr.

    For example, the AST for the statementreturn x” is {RETURN_STMT, {SIMPLE_VAR, "x"}}.

  4. Statement: Beginning with Identifier.

    • If the ID is followed by parentheses, then the AST for the statement is {FUNC_CALL, II}, where II is the string form of the ID lexeme.

      For example, the AST for the statementfoo()” is {FUNC_CALL, "foo"}.

      (Note that a function call may also occur as part of an expression. The AST of such a function call is identical. See Factor: Beginning with Identifier.)

    • If the ID is followed by “=” without the brackets, then the AST for the statement is {ASSN_STMT, {SIMPLE_VAR, II}, EEE}, where II is the string form of the ID lexeme, and EEE is the AST for the expr.

      For example, the AST for the statementnn = 3” is {ASSN_STMT, {SIMPLE_VAR, "nn"}, {NUMLIT_VAL, "3"}}.

    • If the ID is followed by “[”, then the AST for the statement is {ASSN_STMT, {ARRAY_VAR, II, EEE}, FFF}, where II is the string form of the ID lexeme, EEE is the AST for the expr between the brackets, FFF is the AST for the expr after the “=”.

      For example, the AST for the statementaa[22] = 3” is {ASSN_STMT, {ARRAY_VAR, "aa", {NUMLIT_VAL, "22"}}, {NUMLIT_VAL, "3"}}.

      As another example, the AST for the statementxx[n+1] = yy” is {ASSN_STMT, {ARRAY_VAR, "xx", {{BIN_OP, "+"}, {SIMPLE_VAR, "n"}. {NUMLIT_VAL, "1"}}}, {SIMPLE_VAR, "yy"}}.

  5. Statement: Function Definition. The AST for the statement is {FUNC_DEF, II, PPP}, where II is the string form of the ID lexeme, and PPP is the AST for the program.

    For example, the AST for the statementfunc foo() print('xy') end” is {FUNC_DEF, "foo", {PROGRAM, {PRINT_STMT, {STRLIT_OUT, "'x'"}}}}.

    The AST for the statementfunc nuthin() end” is {FUNC_DEF, "nuthin", {PROGRAM}}.

  6. Statement: If. The AST for the statement is an array whose first item is IF_STMT. The following items are the ASTs for the various exprs and programs, in order.

    For example, the AST for the statementif aa bb = readnum() else print('x') end” is {IF_STMT, {SIMPLE_VAR, "aa"}, {PROGRAM, {ASSN_STMT, {SIMPLE_VAR, "bb"}, {READNUM_CALL}}}, {PROGRAM, {PRINT_STMT, {STRLIT_OUT, "'x'"}}}}.

  7. Statement: While. The AST for the statement is {WHILE_LOOP, EEE, PPP}, where EEE is the AST for the expr, and PPP is the AST for the program.

    For example, the AST for the statementwhile 1 print('Hello') end” is {WHILE_LOOP, {NUMLIT_VAL, "1"}, {PROGRAM, {PRINT_STMT, {STRLIT_OUT, "'Hello'"}}}}.

  8. Print-Argument: StringLiteral. The AST for the print_arg is {STRLIT_OUT, SS}, where SS is the string form of the STRLIT lexeme—which includes its opening and closing quotes.

    For example, the AST for the print_arg'abc'” is {STRLIT_OUT, "'abc'"}.

  9. Output-Argument: Character Code. The AST for the print_arg is {CHR_CALL, EEE}, where EEE is the AST for the expr between the parentheses.

    For example, the AST for the print_argchr(65)” is {CHR_CALL, {NUMLIT_VAL, "65"}}.

  10. Output-Argument: Expression. The AST for the print_arg is the AST for the expr.

  11. Expression. This is handled much as in rdparser3. If there is a single compare_expr, then the AST for the expr is the AST for the compare_expr. Otherwise the AST is {{BIN_OP, OO}, AAA, BBB}, where OO is the string form of the last operator, AAA is the AST for everything that precedes it, and BBB is the AST for the last compare_expr.

    For example, the AST for the expr1 && 2” is {{BIN_OP, "&&"}, {NUMLIT_VAL, "1"}, {NUMLIT_VAL, "2"}}.

    And the AST for the expr1 && 2 || 3” is {{BIN_OP, "||"}, {{BIN_OP, "&&"}, {NUMLIT_VAL, "1"}, {NUMLIT_VAL, "2"}}, {NUMLIT_VAL, "3"}}.

  12. Comparison-Expression. Again, this is handled much as in rdparser3. If there is a single arith_expr, then the AST for the compare_expr is the AST for the arith_expr. Otherwise the AST is {{BIN_OP, OO}, AAA, BBB}, where OO is the string form of the last operator, AA is the AST for everything that precedes it, and BB is the AST for the last arith_expr.

    (See Expression for an example of an AST involving multiple operators.)

  13. Arithmetic-Expression. Once again, this is handled much as in rdparser3. If there is a single term, then the AST for the arith_expr is the AST for the term. Otherwise the AST is {{BIN_OP, OO}, AAA, BBB}, where OO is the string form of the last operator, AAA is the AST for everything that precedes it, and BBB is the AST for the last term.

    (See Expression for an example of an AST involving multiple operators.)

  14. Term. Yet again, this is handled much as in rdparser3. If there is a single factor, then the AST for the term is the AST for the factor. Otherwise the AST is {{BIN_OP, OO}, AAA, BBB}, where OO is the string form of the last operator, AAA is the AST for everything that precedes it, and BBB is the AST for the last factor.

    (See Expression for an example of an AST involving multiple operators.)

  15. Factor: Numeric Literal. The AST for the factor is {NUMLIT_VAL, NN}, where NN is the string form of the NUMLIT lexeme.

    For example, the AST for the factor42” is {NUMLIT_VAL, "42"}.

  16. Factor: Parenthesized Expression. The AST for the factor is the AST for the expr between the parentheses.

  17. Factor: Unary Operator. The AST is {{UN_OP, OO}, FFF}, where OO is the string form of the operator, and FFF is the AST for the factor on the right-hand side.

    For example, the AST for the factor-xx” is {{UN_OP, "-"}, {SIMPLE_VAR, "xx"}}.

    As another example, the AST for the factor!42” is {{UN_OP, "!"}, {NUMLIT_VAL, "42"}}.

  18. Factor: Read Number. The AST for the factor is {READNUM_CALL}.

  19. Factor: Random Number. The AST for the factor is {RND_CALL, EEE}, where EEE is the AST for the expr between the parentheses.

    For example, the AST for the factorrnd(42)” is {RND_CALL, {NUMLIT_VAL, "42"}}.

  20. Factor: Beginning with Identifier.

    • If the ID is followed by parentheses, then the AST for the factor is {FUNC_CALL, II}, where II is the string form of the ID lexeme.

      For example, the AST for the factorfoo()” is {FUNC_CALL, "foo"}.

      (Note that a function call may also occur as a separate statement; in that case it will end with a semicolon. The AST of such a statement is identical. See Statement: Beginning with Identifier.)

    • If the ID is followed by neither parentheses nor a left bracket, then the AST for the factor is {SIMPLE_VAR, II}, where II is the string form of the ID lexeme.

      For example, the AST for the factornn” is {SIMPLE_VAR, "nn"}.

    • If the ID is followed by “[”, then the AST for the factor is {ARRAY_VAR, II, EEE}, where II is the string form of the ID lexeme, and EEE is the AST for the expr between the brackets.

      For example, the AST for the factoraa[22]” is {ARRAY_VAR, "aa", {NUMLIT_VAL, "22"}}.

      As another example, the AST for the factorxx[n+1]” is {ARRAY_VAR, "xx", {{BIN_OP, "+"}, {SIMPLE_VAR, "n"}. {NUMLIT_VAL, "1"}}}.

Test Program

A test program is available in the Git repository: parseit_test.lua. If you compile and run this program (unmodified!) with your code, then it will test whether your code works properly.

Do not turn in the test program.

Notes