CS 331 Spring 2025 > Exam Review Problems, Set B
CS 331 Spring 2025
Exam Review Problems, Set B
This is the second set of exam review problems.
Problems
Review problems are given below. Answers are in the Answers section of this document. Do not turn these in.
Answers
-
- Parsing means determining whether input is syntactically correct, and, if so, finding its structure.
- Parsing is also called syntax analysis.
- Software that does parsing is called a parser.
-
- Lexing means splitting input into lexemes and determining the category of each. Usually, lexing involves skipping things like whitespace and comments.
- Lexing is also called lexical analysis.
- Software that does lexing is called a lexer.
- When lexing is split off as a separate step, a parser takes a lexeme stream as input.
- Here are five common categories of lexemes: keyword, identifier, operator, literal, punctuation.
-
- An identifier is a name that a program gives to some entity.
- A keyword is an identifier-looking that has special meaning in a programming language.
Examples of C++ keywords include
class
,for
, andpublic
. Examples of Lua keywords includefunction
,for
, andlocal
. - A literal is a representation of a fixed value in source code.
The value itself is represented, not an identifier bound to the value,
and not a computation whose result is the value.
Examples of C++ literals include
42
,42.5f
, and"abc"
. Examples of Lua literals include42
,42.5
, and{ 3, 4 }
.
-
- An operator is a word that gives an alternate method for making something like a function call. The arguments to the call are its operands.
- The arity of an operator is the number of operands.
- An operator of arity 1 is a unary operator.
- An operator of arity 2 is a binary operator.
- An operator of arity 3 is a ternary operator.
-
- A reserved word is a word that has the general form of an identifier, but is not allowed as an identifier.
- In a number of programming languages, including Lua, the reserved words are the same as the keywords.
- The maximal munch rule says that a lexeme is always the longest substring beginning from its starting point that can be interpreted as a lexeme.
- A state machine is an abstract device that is always in some state. In each of a series of steps, the machine looks at its input and its current state. Based on these, it transitions to a new state. It may make other decisions as well.
- A state machine can handle two situations with the same state if they will react identically to all future input.
-
- We avoid backtracking in lexers and parsers because it tends to lead to slow execution, and we know of other methods that can deal with the same issues more quickly.
- Examining more than one input symbol at each step is called multi-symbol look-ahead.
- We can avoid writing any sophisticated error-handling code in a lexer by simply passing the error on to the parser. (In our lexers, we did this though the use of a Malformed lexeme. Other lexers may use different terminology.)
- Most programming lanuguages do not including a leading
“
–
” in a numeric literal, instead treating it as a unary operator. This is done because it simplifies lexing, allowing the maximal munch rule to be used, and deferring to the parser any decisions about what role is played by “–
”. -
- When given syntactically correct input, a parser will typically return an abstract syntax tree (AST).
- An AST is typically much smaller and less complicated than a parse tree, because it does not include nodes for things like parentheses, statement-ending semicolons, and various nonterminal symbols, which exist only to guide the parser in producing its output.
-
- A parser that is based on a grammar (as most are), must go through the steps to produce a derivation, since that is what we do with grammars. A top-down parser goes through these steps from top to bottom, expanding nonterminals by running productions forward. A bottom-up parser goes through these steps from bottom to top, contracting strings to nonterminals by running productions backward.
- Top-down parsers are almost always hand-written. A bottom-up parser may be handwritten or automatically generated.
- Recursive-Descent is a top-down parsing method. Shift-Reduce is a bottom-up parsing method.
-
- A grammar that can be used by a Predictive Recursive-Descent parser that does not do multi-symbol look-ahead is an LL(1) grammar.
- A grammar that can be used by a Shift-Reduce parser that does not do multi-symbol look-ahead is an LR(1) grammar.
- The category of LR(1) grammars is larger. Every LL(1) grammar is an LR(1) grammar, while the reverse is not true; there are LR(1) grammars that are not LL(1) grammars. (An example is the grammar used by the Shift-Reduce parser we examined in class.)
- A Recursive-Descent parser is predictive if it does no backtracking. Thus, it must be able to correctly predict what production to use based only on the input symbols examined at one time.
- A Recursive-Descent parser has one parsing function for each nonterminal in the grammar.
-
- In the context of grammars, left recursion means that the right-hand side of a production begins with the nonterminal on its left-hand side.
- A Predictive Recursive-Descent parser cannot be based directly on a grammar that has left recursion. Such a grammar is not LL(k) for any k. From a more practical standpoint, left recursion means that a parsing function will begin with a recursive call to itself. This recursion without a base case will result in hanging and crashing with stack overflow when the parsing function is executed.
- In a Recursive-Descent parser, we write the code for a parsing function by translating the right-hand side of the appropriate grammar production into code.