CS 331 Spring 2013 > Assignment 2 |
Assignment 2 is due at 5 p.m. Tuesday, February 12. It is worth 25 points.
E-mail
answers to the exercises below to
ggchappell@alaska.edu
,
using the subject
“PA2
”.
lex2.h
and lex2.cpp
, from Exercise A.
The two files (or a single archive file containing them)
should be attached to your e-mail message.In this exercise, you will write a C++ class that does lexical analysis.
In the next assignment, you will be building a parser on top of your lexer.
Implement a C++ class that performs lexical analysis, according to the Lexeme Description found later in this document.
Lex2
,
and implement it in files lex2.h
and lex2.cpp
.Lex2
is essentially
the same as that of Lex
, which was written in class
(with some differences, to be listed shortly).
In particular, Lex2
has:
Token
and Lexeme
string
object.set
,
done
, advance
, current
.Lex2
differs from that of Lex
as follows:
Token
has possible values
NONE
,
ILLEGAL
,
KEY
,
ID
,
OP
,
PUNCT
,
INT
,
and FLOAT
,
represting a past-the-end lexeme, an illegal character,
and the six kinds of lexemes described below.Lex2
should have no public members
other than those listed above.
You may write any private members you want.
Similarly, the only things declared in lex2.h
should be class Lex2
and its members.
You may declare anything you want in lex2.cpp
.
The following properties of class Lex
should hold for Lex2
as well.
current
,
with no intervening calls to advance
or set
,
should return the same lexeme.Lexeme("", NONE)
.ILLEGAL
.set
,
should give the same results as constructing the object with that string.Lex
.
Aside from sticking 2
s in the names of things, you
might only need to modify functions advance
and
skipSpace
, and type Token
.Lex
.
Indeed, feel free to add a bit of modularity to the
gigantic hunk of code making up function advance
.No lexeme contains a whitespace character (blank, tab, vertical-tab, new-line, carriage-return, form-feed). A whitespace character, or any contiguous group of whitespace characters, is considered as a separator between lexemes. However, pairs of lexemes are not required to be separated by whitespace.
Comments are old-fashioned C-style comments only.
They begin with “/*
”
and end with either “*/
” or the end of the input.
Any characters at all may occur inside a comment.
Note that “/*/
”
is not a complete comment.
It does begin a comment, which may end with a later
“*/
”.
Legal characters outside comments are whitespace and printable ASCII (values 32 [blank] to 126 [tilde]). Any other characters outside comments are illegal.
Once a lexeme has begun, the complete lexeme is considered to be
the longest substring beginning from the starting point
that can be interpreted as a lexeme,
except in the following special case.
If the lexeme being read is not the first lexeme in the input,
and the previous lexeme was
an Identifier, Integer, or Float,
and the current lexeme begins with either
“+
”
or
“-
”,
then a single-character operator is returned
(“+
”
or
“-
”,
as appropriate).
Otherwise, the above longest-lexeme rule is followed.
There are six tokens: Keyword, Identifier, Operator, Punctuation, Integer, Float.
begin
”,
“end
”,
“print
”._
”),
contains only letters, digits, underscores,
and is not a Keyword.+
”,
“-
”,
“*
”,
“/
”,
“=
”,
“==
”,
“!=
”.
_
”),
or whitespace.
And is not an Operator.+
”
or “-
” at the beginning.
Note that no decimal point is allowed.+
”
or “-
” at the beginning,
and an optional exponent at the end.
An exponent is the letter “e
”
or “E
”
followed by a sequence of characters that meets the requirements
for an Integer.
For example, here are some valid Float lexemes.
1. +.23 1.23e+37 -00.0E00
The following are not valid Float lexemes.
123 e +. 1.23e+ -00..00
Note: The first string above is an Integer, and the second is an Identifier, while the last two begin with valid Float lexemes.
A test program is available:
lex2_test.cpp
.
If you compile and run your package
with this program—unmodified—then
it will test whether your package works properly.
Do not turn in lex2_test.cpp
.
The following are standards for all programming assignments in this class.
The above requirement is absolute; if your code does not compile, then there is no point in turning it in.
In addition, to receive full credit, submitted code should satisfy the following conditions.
const
-correctness, etc.).
ggchappell@alaska.edu