CS 331 Spring 2025 > Assignment 4 (Writing a Parser)
CS 331 Spring 2025
Assignment 4 (Writing a Parser)
Assignment 4 is due at 5 pm on Thursday, February 27. It is worth 90 points.
Procedures
This assignment is to be done individually.
Turn in answers to the exercises below on the UA Canvas site, under Assignment 4 for this class.
- Your answers must consist of
the source code for Exercise A
(file
parseit.lua
), along with filelexit.lua
from Assignment 3. - Turn in
lexit.lua
even if you have turned it in before. - I may not look at your homework submission immediately. If you have questions, e-mail me.
Exercises (A only, 90 pts)
Exercise A — Predictive Recursive-Descent Parser in Lua
Purpose
In this exercise you will write a Lua module that does parsing for a simple programming language called Fulmar. Your parser will determine syntactic correctness; when the input is correct, the parser will output an abstract syntax tree.
The parser will use the Predictive Recursive-Descent method. It will be built on top of the lexer from the previous assignment.
In a later assignment, you will build an interpreter that takes an AST in the form returned by your parser and executes the code.
Instructions
Write a Lua module parseit
,
contained in the file parseit.lua
.
Your module must parse Fulmar programs
using Predictive Recursive Descent.
Be sure to follow the Coding Standards.
- The interface of module
parseit
is very similar to that ofrdparser3
, which was written in class. In particular, the interface of moduleparseit
consists of a single functionparseit.parse
.- Function
parse
takes a string, holding the source code of a proposed Fulmar program. - Function
parse
returns three values: good, done, ast.- good is a boolean.
It is
true
if parsing gave a result of syntactically correct (even if the end of the input was not reached), andfalse
otherwise. - done is a boolean.
It is
true
if the end of the input was reached, andfalse
otherwise. - If good and done are both
true
, then ast is the abstract syntax tree of the Fulmar program; the format of this is discussed later. If either of good or done isfalse
, then ast can be anything (includingnil
).
- good is a boolean.
It is
- Function
parse
must parse based on the grammar of the Fulmar programming language, and it must return an AST formatted as in the Fulmar specification. See Fulmar Programming Language: Syntax & AST, below. - Function
parse
must parse using the Predictive Recursive-Descent method.
- Function
- Module
parseit
must export nothing other than functionparse
. - Use module
lexit
, in filelexit.lua
, as your lexer. This file must still meet the requirements of Assignment 3. You may fix problems withlexit.lua
, but you may not add any additional functionality to it.
Fulmar Programming Language: Syntax & AST
The syntax of Fulmar is specified here, along with the format of an abstract syntax tree. The semantics of Fulmar will be covered in a later assignment.
Introduction—Fulmar is a very small programming language. Here is an example Fulmar program.
[Fulmar]
# Fulmar Example #1 # Glenn G. Chappell # 2025-02-18 nn = 3 println(nn+4)
If function parseit.parse
is called,
passing the source of the above program as the argument,
then the return value must be the following triple.
Above, the firsttrue, true, {1, {5, {17, "nn"}, {14, "3"}}, {3, {{12, "+"}, {17, "nn"}, {14, "4"}}}}
true
indicates that the parsing function returned syntactically correct.
The second true
indicates that the parser reached the
end of the input.
The table is the abstract syntax tree of the program.
The AST may make more sense if we replace the numbers
by symbolic constants (defined later),
giving the following.
{PROGRAM, {ASSN_STMT, {SIMPLE_VAR, "nn"}, {NUMLIT_VAL, "3"}}, {PRINTLN_STMT, {{BIN_OP, "+"}, {SIMPLE_VAR, "nn"}, {NUMLIT_VAL, "4"}}}}
Here is a longer Fulmar program—previously seen on the lecture slides.
[Fulmar]
# fibo.fmar # Glenn G. Chappell # 2025-01-10 # # For CS 331 Spring 2025 # Compute Fibonacci Numbers # The Fibonacci number F(n), for n >= 0, is defined by F(0) = 0, # F(1) = 1, and for n >= 2, F(n) = F(n-2) + F(n-1). # fibo (param in variable n) # Return Fibonacci number F(n). func fibo() currfib = 0 nextfib = 1 i = 0 # Loop counter while i < n # Advance (currfib, nextfib) tmp = currfib + nextfib currfib = nextfib nextfib = tmp i = i+1 end return currfib end # Main program # Print some Fibonacci numbers how_many_to_print = 20 println("Fibonacci Numbers") j = 0 # Loop counter while j < how_many_to_print n = j # Set param for fibo ff = fibo() println("F(", j, ") = ", ff) j = j+1 end
Lexical Structure—Lexemes are as on Assignment 3.
Grammar—A
grammar for Fulmar follows.
Italic lower-case words are nonterminals.
Roman upper-case WORDS are terminals
representing lexeme categories from module lexit
.
A single-quoted ‘word
’ in a typewriter font
is a terminal that is required to be a specific string.
There is no end-of-input lexeme.
The start symbol is program. Lines are numbered for later reference.
1. program → { statement } 2. statement → ( ‘ println
’ ) ‘(
’ [ print_arg { ‘,
’ print_arg } ] ‘)
’3. | ‘ return
’ expr4. | ID ( ‘ (
’ ‘)
’ | [ ‘[
’ expr ‘]
’ ] ‘=
’ expr )5. | ‘ func
’ ID ‘(
’ ‘)
’ program ‘end
’6. | ‘ if
’ expr program { ‘elif
’ expr program } [ ‘else
’ program ] ‘end
’7. | ‘ while
’ expr program ‘end
’8. print_arg → STRLIT 9. | ‘ chr
’ ‘(
’ expr ‘)
’10. | expr 11. expr → compare_expr { ( ‘ &&
’ | ‘||
’ ) compare_expr }12. compare_expr → arith_expr { ( ‘ ==
’ | ‘!=
’ | ‘<
’ | ‘<=
’ | ‘>
’ | ‘>=
’ ) arith_expr }13. arith_expr → term { ( ‘ +
’ | ‘–
’ ) term }14. term → factor { ( ‘ *
’ | ‘/
’ | ‘%
’ ) factor }15. factor → NUMLIT 16. | ‘ (
’ expr ‘)
’17. | ( ‘ +
’ | ‘-
’ | ‘!
’ ) factor18. | ‘ readnum
’ ‘(
’ ‘)
’19. | ‘ rnd
’ ‘(
’ expr ‘)
’20. | ID [ ‘ (
’ ‘)
’ | ‘[
’ expr ‘]
’ ]
The above grammar is already in a form that is usable by a Predictive Recursive-Descent parser; you will not need to transform it.
Precedence & Associativity—There
are 13 binary operators that may appear in an expression:
&&
,
||
,
==
,
!=
,
<
,
<=
,
>
,
>=
,
binary +
,
binary -
,
*
,
/
,
%
.
All are left-associative.
All other associativity, and all operator precedence, is completely encoded in the grammar.
AST Specification—This subsection will use the following named constants—which I suggest defining in your code.
[Lua]
local PROGRAM = 1 local PRINT_STMT = 2 local PRINTLN_STMT = 3 local RETURN_STMT = 4 local ASSN_STMT = 5 local FUNC_CALL = 6 local FUNC_DEF = 7 local IF_STMT = 8 local WHILE_LOOP = 9 local STRLIT_OUT = 10 local CHR_CALL = 11 local BIN_OP = 12 local UN_OP = 13 local NUMLIT_VAL = 14 local READNUM_CALL = 15 local RND_CALL = 16 local SIMPLE_VAR = 17 local ARRAY_VAR = 18
Every AST is a Lua array. We will specify the format of an AST for each of the lines in the above grammar. Lines are referred to by number.
Program. The AST for the program is an array whose first item is
PROGRAM
. The following items are the ASTs for the various statements, if any, in order.For example, the AST for the program “
tt=3 print(tt)
” is{PROGRAM, {ASSN_STMT, {SIMPLE_VAR, "tt"}, {NUMLIT_VAL, "3"}},{PRINT_STMT, {SIMPLE_VAR, "tt"}}}
.If there are no statements, then the AST is
{PROGRAM}
.Statement: Print/Println. The AST for the statement is an array whose first item is either
PRINT_STMT
orPRINTLN_STMT
, depending on the opening keyword. The following items are the ASTs for the print_args, if any, in order.For example, the AST for the statement “
print('abc', 123)
” is{PRINT_STMT, {STRLIT_OUT, "'abc'"}, {SIMPLE_VAR, "123"}}
.As another example, the AST for the statement “
println()
” is{PRINTLN_STMT}
.Statement: Return. The AST for the statement is
{RETURN_STMT,
EEE}
, where EEE is the AST for the expr.For example, the AST for the statement “
return x
” is{RETURN_STMT, {SIMPLE_VAR, "x"}}
.Statement: Beginning with Identifier.
- If the ID is followed by parentheses,
then the AST for the statement is
{FUNC_CALL,
II}
, where II is the string form of the ID lexeme.For example, the AST for the statement “
foo()
” is{FUNC_CALL, "foo"}
.(Note that a function call may also occur as part of an expression. The AST of such a function call is identical. See Factor: Beginning with Identifier.)
- If the ID is followed by “
=
” without the brackets, then the AST for the statement is{ASSN_STMT, {SIMPLE_VAR,
II},
EEE}
, where II is the string form of the ID lexeme, and EEE is the AST for the expr.For example, the AST for the statement “
nn = 3
” is{ASSN_STMT, {SIMPLE_VAR, "nn"}, {NUMLIT_VAL, "3"}}
. - If the ID is followed by “
[
”, then the AST for the statement is{ASSN_STMT, {ARRAY_VAR,
II,
EEE},
FFF}
, where II is the string form of the ID lexeme, EEE is the AST for the expr between the brackets, FFF is the AST for the expr after the “=
”.For example, the AST for the statement “
aa[22] = 3
” is{ASSN_STMT, {ARRAY_VAR, "aa", {NUMLIT_VAL, "22"}}, {NUMLIT_VAL, "3"}}
.As another example, the AST for the statement “
xx[n+1] = yy
” is{ASSN_STMT, {ARRAY_VAR, "xx", {{BIN_OP, "+"}, {SIMPLE_VAR, "n"}. {NUMLIT_VAL, "1"}}}, {SIMPLE_VAR, "yy"}}
.
- If the ID is followed by parentheses,
then the AST for the statement is
Statement: Function Definition. The AST for the statement is
{FUNC_DEF,
II,
PPP}
, where II is the string form of the ID lexeme, and PPP is the AST for the program.For example, the AST for the statement “
func foo() print('xy') end
” is{FUNC_DEF, "foo", {PROGRAM, {PRINT_STMT, {STRLIT_OUT, "'x'"}}}}
.The AST for the statement “
func nuthin() end
” is{FUNC_DEF, "nuthin", {PROGRAM}}
.Statement: If. The AST for the statement is an array whose first item is
IF_STMT
. The following items are the ASTs for the various exprs and programs, in order.For example, the AST for the statement “
if aa bb = readnum() else print('x') end
” is{IF_STMT, {SIMPLE_VAR, "aa"}, {PROGRAM, {ASSN_STMT, {SIMPLE_VAR, "bb"}, {READNUM_CALL}}}, {PROGRAM, {PRINT_STMT, {STRLIT_OUT, "'x'"}}}}
.Statement: While. The AST for the statement is
{WHILE_LOOP,
EEE,
PPP}
, where EEE is the AST for the expr, and PPP is the AST for the program.For example, the AST for the statement “
while 1 print('Hello') end
” is{WHILE_LOOP, {NUMLIT_VAL, "1"}, {PROGRAM, {PRINT_STMT, {STRLIT_OUT, "'Hello'"}}}}
.Print-Argument: StringLiteral. The AST for the print_arg is
{STRLIT_OUT,
SS}
, where SS is the string form of the STRLIT lexeme—which includes its opening and closing quotes.For example, the AST for the print_arg “
'abc'
” is{STRLIT_OUT, "'abc'"}
.Output-Argument: Character Code. The AST for the print_arg is
{CHR_CALL,
EEE}
, where EEE is the AST for the expr between the parentheses.For example, the AST for the print_arg “
chr(65)
” is{CHR_CALL, {NUMLIT_VAL, "65"}}
.Output-Argument: Expression. The AST for the print_arg is the AST for the expr.
Expression. This is handled much as in
rdparser3
. If there is a single compare_expr, then the AST for the expr is the AST for the compare_expr. Otherwise the AST is{{BIN_OP,
OO},
AAA,
BBB}
, where OO is the string form of the last operator, AAA is the AST for everything that precedes it, and BBB is the AST for the last compare_expr.For example, the AST for the expr “
1 && 2
” is{{BIN_OP, "&&"}, {NUMLIT_VAL, "1"}, {NUMLIT_VAL, "2"}}
.And the AST for the expr “
1 && 2 || 3
” is{{BIN_OP, "||"}, {{BIN_OP, "&&"}, {NUMLIT_VAL, "1"}, {NUMLIT_VAL, "2"}}, {NUMLIT_VAL, "3"}}
.Comparison-Expression. Again, this is handled much as in
rdparser3
. If there is a single arith_expr, then the AST for the compare_expr is the AST for the arith_expr. Otherwise the AST is{{BIN_OP,
OO},
AAA,
BBB}
, where OO is the string form of the last operator, AA is the AST for everything that precedes it, and BB is the AST for the last arith_expr.(See Expression for an example of an AST involving multiple operators.)
Arithmetic-Expression. Once again, this is handled much as in
rdparser3
. If there is a single term, then the AST for the arith_expr is the AST for the term. Otherwise the AST is{{BIN_OP,
OO},
AAA,
BBB}
, where OO is the string form of the last operator, AAA is the AST for everything that precedes it, and BBB is the AST for the last term.(See Expression for an example of an AST involving multiple operators.)
Term. Yet again, this is handled much as in
rdparser3
. If there is a single factor, then the AST for the term is the AST for the factor. Otherwise the AST is{{BIN_OP,
OO},
AAA,
BBB}
, where OO is the string form of the last operator, AAA is the AST for everything that precedes it, and BBB is the AST for the last factor.(See Expression for an example of an AST involving multiple operators.)
Factor: Numeric Literal. The AST for the factor is
{NUMLIT_VAL,
NN}
, where NN is the string form of the NUMLIT lexeme.For example, the AST for the factor “
42
” is{NUMLIT_VAL, "42"}
.Factor: Parenthesized Expression. The AST for the factor is the AST for the expr between the parentheses.
Factor: Unary Operator. The AST is
{{UN_OP,
OO},
FFF}
, where OO is the string form of the operator, and FFF is the AST for the factor on the right-hand side.For example, the AST for the factor “
-xx
” is{{UN_OP, "-"}, {SIMPLE_VAR, "xx"}}
.As another example, the AST for the factor “
!42
” is{{UN_OP, "!"}, {NUMLIT_VAL, "42"}}
.Factor: Read Number. The AST for the factor is
{READNUM_CALL}
.Factor: Random Number. The AST for the factor is
{RND_CALL, EEE}
, where EEE is the AST for the expr between the parentheses.For example, the AST for the factor “
rnd(42)
” is{RND_CALL, {NUMLIT_VAL, "42"}}
.Factor: Beginning with Identifier.
- If the ID is followed by parentheses,
then the AST for the factor is
{FUNC_CALL,
II}
, where II is the string form of the ID lexeme.For example, the AST for the factor “
foo()
” is{FUNC_CALL, "foo"}
.(Note that a function call may also occur as a separate statement; in that case it will end with a semicolon. The AST of such a statement is identical. See Statement: Beginning with Identifier.)
- If the ID is followed by neither parentheses nor a left bracket,
then the AST for the factor is
{SIMPLE_VAR,
II}
, where II is the string form of the ID lexeme.For example, the AST for the factor “
nn
” is{SIMPLE_VAR, "nn"}
. - If the ID is followed by “
[
”, then the AST for the factor is{ARRAY_VAR,
II,
EEE}
, where II is the string form of the ID lexeme, and EEE is the AST for the expr between the brackets.For example, the AST for the factor “
aa[22]
” is{ARRAY_VAR, "aa", {NUMLIT_VAL, "22"}}
.As another example, the AST for the factor “
xx[n+1]
” is{ARRAY_VAR, "xx", {{BIN_OP, "+"}, {SIMPLE_VAR, "n"}. {NUMLIT_VAL, "1"}}}
.
- If the ID is followed by parentheses,
then the AST for the factor is
Test Program
A test program
is available in the Git repository:
parseit_test.lua
.
If you compile and run this program (unmodified!) with your code,
then it will test
whether your code works properly.
Do not turn in the test program.
Notes
- I have already done some of the work for you.
See
parseit.lua
, in the Git repository. This is incomplete, but what is there is correct. Also seerdparser3.lua
, which is very similar to the expression parsing you will need to write in this assignment. - To see what output your parser produces for some input,
try
use_parseit.lua
, in the Git repository. You can change thecheck
calls at the end of that file and send your parser any Fulmar code you want. - See Thoughts on Assignment 4, in the lecture slides for February 19.