CS 331 Spring 2009  >  Assignment 2

CS 331 Spring 2009
Assignment 2

Assignment 2 is due at 5 p.m. Tuesday, February 24. It is worth 25 points.

Procedures

E-mail answers to the exercises below to ffggc@uaf.edu, using the subjectPA2”.

Exercises (25 pts total)

Exercise A — Lexical Analyzer

Purpose

In this exercise, you will write a state-machine-based lexical analyzer for a C-like language.

Instructions

Write, and turn in source code for:

General requirements.

Requirements for the lexical analyzer.

Requirements for the main program.

Lexemes

The lexical analyzer should analyze according to the following description.

Whitespace (as in the analyzer we wrote in class: blank, tab, vertical-tab, new-line, carriage-return, form-feed) is generally a lexeme separator. However, whitespace that occurs in a comment or a STRING is part of the comment or STRING. Lexemes are not required to be separated by whitespace.

Comments are C-style multiline comments. They begin with “/*”, not inside a STRING, and end with “*/” or the end of the file. The beginning of a comment ends a lexeme. The characters “/*” occuring inside a STRING are part of the STRING, and are not considered to begin a comment.

There are 6 kinds of lexemes: STRING, KEYWORD, IDENTIFIER, INTEGER, FLOAT, OPERATOR.

STRING
A double-quoted string. Begins with a double quote (“"”), and continues until the next double quote or the end of the file. A STRING may contain new-lines. (Note that there are no backslash escapes, or anything like that.)

The text of a STRING lexeme does not include the quotes. For example, if x"abc"y is encountered in the file, then the analyzer should indicate three lexemes:

KEYWORD
One of “for”, “while”, “if”, or “else”.
IDENTIFIER
Any C-style identifier that is not a keyword. Begins with a letter or underscore (“_”). Contains only letters, underscores, and digits, and is not equal to any of the KEYWORDs.

Examples: “main”, “_1_2_3_4_”, “abc12345”, “___”.

INTEGER
An optional “+” or “-”, followed by one or more digits.

Examples: “1”, “-2345”, “+12”.

FLOAT
An optional “+” or “-” followed by a sequence of characters consisting only of digits and a dot (“.”), which contains exactly one dot and at least one digit, followed by an optional EXPONENT. An EXPONENT is an “e” or “E” followed by a sequence of characters conforming to the description of an INTEGER.

Examples: “.123”, “-34.2”, “-34.e45”, “.01E-24”, “+123.456e+123”, “00.E00”.

These are not FLOATs: “-.”, “+e23”, “E10”, “.e10”, “00E00”.

OPERATOR
++”, “--”, “==”, “!=”, “<=”, “>=”, or any single character which is not whitespace and not the beginning of a comment or a non-OPERATOR lexeme.

A lexeme continues as far as it legally can. Thus, “===” parses as “==” followed by “=”.

Example

Suppose your program is given the following file as input.

for+23++23e01 23.e01e01=====//comment
"/*++"/* "hello"

The output should be very much like the following.

Lexemes:
KEYWORD    for
INTEGER    +23
OPERATOR   ++
INTEGER    23
IDENTIFIER e01
FLOAT      23.e01
IDENTIFIER e01
OPERATOR   ==
OPERATOR   ==
OPERATOR   =
OPERATOR   /
OPERATOR   /
IDENTIFIER comment
STRING     /*++

Symbol table:
e01
comment


CS 331 Spring 2009: Assignment 2 / Updated: 22 Feb 2009 / Glenn G. Chappell / ffggc@uaf.edu Valid HTML 4.01!