CS 331 Spring 2025  >  Exam Review Problems, Set B


CS 331 Spring 2025
Exam Review Problems, Set B

This is the second set of exam review problems.

Problems

Review problems are given below. Answers are in the Answers section of this document. Do not turn these in.

    1. What does parsing mean?
    2. What is a longer term for parsing?
    3. What do we call software that does parsing?
    1. What does lexing mean?
    2. What is a longer term for lexing?
    3. What do we call software that does lexing?
  1. When lexing is split off as a separate step, what does a parser take as input?
  2. List three common categories of lexemes.
    1. What is an identifier?
    2. What is a keyword? Give an example of a keyword, specifying which programming language.
    3. What is a literal? Give an example of a literal, specifying which programming language.
    1. Explain operator and operand.
    2. What is the arity of an operator?
    3. What do we call an operator of arity 1?
    4. What do we call an operator of arity 2?
    5. What do we call an operator of arity 3?
    1. What is a reserved word?
    2. In a number of programming languages, including Lua, the reserved words are the same as what?
  3. What is the maximal munch rule?
  4. Explain what a state machine is.
  5. When can a state machine handle two situations with the same state?
    1. Why do we generally avoid backtracking in lexers and parsers?
    2. What is it called when a lexer or parser examines more than one input symbol at each step?
  6. It is not uncommon for a lexer to identify an error in the input text. Nonetheless, we can generally avoid writing any sophisticated error-handling code in a lexer, by employing a simple strategy. Explain.
  7. Most programming lanuguages do not consider a leading “” to be part of a numeric literal that follows it. Why not?
    1. When given syntactically correct input, a parser will return a representation of the structure of its input. What form does this representation typically take?
    2. The form from the previous part could be a parse tree, but it usually is not. Why not?
    1. Most parsers fall into one of two big categories. Explain.
    2. Explain how these two categories relate to whether the code for a parser is hand-written or automatically generated.
    3. Give an example of a parsing method in each of the two big categories.
    1. What do we call a grammar that can be used by a Predictive Recursive-Descent parser that does not do multi-symbol look-ahead?
    2. What do we call a grammar that can be used by a Shift-Reduce parser that does not do multi-symbol look-ahead?
    3. One of these two categories is larger than the other. Explain.
  8. What does it mean to say that a Recursive-Descent parser is predictive?
  9. How many parsing functions does a Recursive-Descent parser have?
    1. In the context of grammars, what is left recursion?
    2. How is left recursion an issue when writing a Predictive Recursive-Descent parser?
  10. In a Recursive-Descent parser, how, roughly speaking, do we write the code for a parsing function?

Answers

    1. Parsing means determining whether input is syntactically correct, and, if so, finding its structure.
    2. Parsing is also called syntax analysis.
    3. Software that does parsing is called a parser.
    1. Lexing means splitting input into lexemes and determining the category of each. Usually, lexing involves skipping things like whitespace and comments.
    2. Lexing is also called lexical analysis.
    3. Software that does lexing is called a lexer.
  1. When lexing is split off as a separate step, a parser takes a lexeme stream as input.
  2. Here are five common categories of lexemes: keyword, identifier, operator, literal, punctuation.
    1. An identifier is a name that a program gives to some entity.
    2. A keyword is an identifier-looking that has special meaning in a programming language. Examples of C++ keywords include class, for, and public. Examples of Lua keywords include function, for, and local.
    3. A literal is a representation of a fixed value in source code. The value itself is represented, not an identifier bound to the value, and not a computation whose result is the value. Examples of C++ literals include 42, 42.5f, and "abc". Examples of Lua literals include 42, 42.5, and { 3, 4 }.
    1. An operator is a word that gives an alternate method for making something like a function call. The arguments to the call are its operands.
    2. The arity of an operator is the number of operands.
    3. An operator of arity 1 is a unary operator.
    4. An operator of arity 2 is a binary operator.
    5. An operator of arity 3 is a ternary operator.
    1. A reserved word is a word that has the general form of an identifier, but is not allowed as an identifier.
    2. In a number of programming languages, including Lua, the reserved words are the same as the keywords.
  3. The maximal munch rule says that a lexeme is always the longest substring beginning from its starting point that can be interpreted as a lexeme.
  4. A state machine is an abstract device that is always in some state. In each of a series of steps, the machine looks at its input and its current state. Based on these, it transitions to a new state. It may make other decisions as well.
  5. A state machine can handle two situations with the same state if they will react identically to all future input.
    1. We avoid backtracking in lexers and parsers because it tends to lead to slow execution, and we know of other methods that can deal with the same issues more quickly.
    2. Examining more than one input symbol at each step is called multi-symbol look-ahead.
  6. We can avoid writing any sophisticated error-handling code in a lexer by simply passing the error on to the parser. (In our lexers, we did this though the use of a Malformed lexeme. Other lexers may use different terminology.)
  7. Most programming lanuguages do not including a leading “” in a numeric literal, instead treating it as a unary operator. This is done because it simplifies lexing, allowing the maximal munch rule to be used, and deferring to the parser any decisions about what role is played by “”.
    1. When given syntactically correct input, a parser will typically return an abstract syntax tree (AST).
    2. An AST is typically much smaller and less complicated than a parse tree, because it does not include nodes for things like parentheses, statement-ending semicolons, and various nonterminal symbols, which exist only to guide the parser in producing its output.
    1. A parser that is based on a grammar (as most are), must go through the steps to produce a derivation, since that is what we do with grammars. A top-down parser goes through these steps from top to bottom, expanding nonterminals by running productions forward. A bottom-up parser goes through these steps from bottom to top, contracting strings to nonterminals by running productions backward.
    2. Top-down parsers are almost always hand-written. A bottom-up parser may be handwritten or automatically generated.
    3. Recursive-Descent is a top-down parsing method. Shift-Reduce is a bottom-up parsing method.
    1. A grammar that can be used by a Predictive Recursive-Descent parser that does not do multi-symbol look-ahead is an LL(1) grammar.
    2. A grammar that can be used by a Shift-Reduce parser that does not do multi-symbol look-ahead is an LR(1) grammar.
    3. The category of LR(1) grammars is larger. Every LL(1) grammar is an LR(1) grammar, while the reverse is not true; there are LR(1) grammars that are not LL(1) grammars. (An example is the grammar used by the Shift-Reduce parser we examined in class.)
  8. A Recursive-Descent parser is predictive if it does no backtracking. Thus, it must be able to correctly predict what production to use based only on the input symbols examined at one time.
  9. A Recursive-Descent parser has one parsing function for each nonterminal in the grammar.
    1. In the context of grammars, left recursion means that the right-hand side of a production begins with the nonterminal on its left-hand side.
    2. A Predictive Recursive-Descent parser cannot be based directly on a grammar that has left recursion. Such a grammar is not LL(k) for any k. From a more practical standpoint, left recursion means that a parsing function will begin with a recursive call to itself. This recursion without a base case will result in hanging and crashing with stack overflow when the parsing function is executed.
  10. In a Recursive-Descent parser, we write the code for a parsing function by translating the right-hand side of the appropriate grammar production into code.