Databases and SQL

CS 321 2007 Lecture, Dr. Lawlor

There are lots of stupid things about ordinary binary and ASCII files:
Databases fix every one of these shortcomings.
So databases provide awesome, incredible features that often come in handy in real programs.

Accessing Database Data

Unfortunately, the only standard way to access data in a database is via SQL (pronounced either one letter at a time like "ess-que-ell", or as one word like "see-kwel"), which stands for "HORRIBLE 1970'S-ERA TELETYPE DATABASE INTERFACE".  No, that's not right.  It's "Structured Query Language".  Anyway, it's an ASCII-type language--you store the operation you want done on the database as a string, and the database server parses the string to figure out what to do.

For example, if you've set up a table (see below) called "log", you can read everything out of the table with this SQL command:
The "*" stands for the fields of the table you want to extract; in this case, all the fields.  "SELECT item FROM log;" would pull out just the "item" field from the "log" table.

You can also use a "WHERE" clause to extract a subset of the data from a table.  This would extract all the rows of the "log" table where the "importance" field is greater than or equal to 3:
	SELECT * FROM log WHERE importance>=3;
You can string together "WHERE" clauses with logical operators like "AND".  Keep in mind--the database usually does not have to read the whole database to find these rows, because it can use its index.
	SELECT * FROM log WHERE importance>=3 AND name='cool';
SQL can even transform the data before comparing it in a "WHERE" clause.  For example, you could use the SOUNDEX builtin function to grab all the log items whose name *sounds like* 'cool':
Builtin functions in MySQL include UCASE (make uppercase), SOUNDEX (map to soundex codes), CONCAT (paste together two strings), and a zillion date functions.  You can apply these functions anywhere, even as part of the "*", to uppercase the output of the SELECT command:
	SELECT UCASE(*) FROM log WHERE name='cool';

Inserting Database Data

You add rows to a database using the "INSERT" SQL command.
	INSERT INTO log (user,item) VALUES ('bob',17);
This adds a row to the "log" table, setting the fields "user" and "item" to the values 'bob' and "17".

You can also modify existing rows with the "UPDATE" command.  For example, if 'bob' changes his name to 'bobina', you can update all his "log" table entries with:
	UPDATE log SET user='bobina' WHERE user='bob';
Warning!  If you forget the "WHERE" clause, this will UPDATE every row of the "log" table!  (Don't ask me how I know this...)

Important businesses like banks need the ability to make a sequence of SQL operations into a single indivisible unit called a "transaction".  For example, to deposit a check you might do a SELECT on the source bank account number, UPDATE the source balance, SELECT the destination account, and UPDATE its balance.  It's crucially important that all of these things happen together, so that for example a crash doesn't result in the money being lost in limbo, debited from one account but not credited to the other.  So a bank will do a "BEGIN WORK;" command to start a transaction, issue their SQL, and finally do a "COMMIT;" to complete the transaction.

Setting up Databases and Tables (in MySQL)

One popular database server in the open-source world is MySQL. 

MySQL stores the database in a file (some commercial databases prefer to take over an entire disk).  This file is accessed exclusively by the "database server program" called "mysqld".  This is not your code, and it doesn't run your code.  All it does is take SQL queries from your program, and give you back the results.  This is like the "reserve desk" at a library, where they don't trust you to run around in the stacks--instead, you (the program) have to ask the librarian (the mysqld server) to get your data for you.

MySQL is network-accessible, and hence has its own conception of account names and passwords--these have no relationship to the normal OS accounts on the actual machine, they exist only for the database.  You can set your MySQL "root" password to "funker" like this:
	mysqladmin -u root password funker
You can then create a database, for example, called "testdb":
	mysqladmin -u root -pfunker create testdb
Finally, you can issue SQL commands to this database by running:
	mysql -u root -pfunker -Dtestdb
This will drop you into the MySQL command line.

There are SQL commands to create tables.  For example:
num INT NOT NULL auto_increment,
importance INT DEFAULT '0' NOT NULL,
user VARCHAR(10) DEFAULT '',
item VARCHAR(40),
KEY xdate(date),
KEY ximportance(importance)

This creates the "log" table with a bunch of fields.  "num" and "importance" are integer fields.  "date" is a TIMESTAMP field.  "user" is a string with up to 10 characters.  "item" is a string with up to 40 characters.  "xdate" and "ximportance" are indices built to allow fast searching on the "date" and "importance" fields.

Now you can apply a bunch of INSERT commands to fill the new "log" table with data.
INSERT INTO log (importance,user,item) 
VALUES (1,"olawlor",'Created table; start of debugging');
INSERT INTO log (importance,user,item)
VALUES (0,"olawlor",'Checking out tables...');
INSERT INTO log (importance,user,item)
VALUES (0,"nobody",'Silly crap');
INSERT INTO log (importance,user,item)
VALUES (1,"nobody",'Not so silly no more!');
And then you can apply SELECT commands to pull data out of the "log" table.
SELECT * FROM log WHERE importance=1;
SELECT * FROM log WHERE importance=1 AND user="nobody";
Cool, yes?  There's also a set of C functions you can call to do queries (as well as C#, Perl, Java, PHP, ...).  For example:
  MYSQL *mysql;
mysql=(MYSQL *)malloc(sizeof(MYSQL));
MYSQL_RES *result=mysql_query(mysql,"INSERT INTO log (user,item) VALUES ('bill','Stuff');");
Connecting to the database server is quite slow--it seems to take milliseconds on my box.  So you want to connect to the database once, and then issue a bunch of queries.  Each query reasonably fast (assuming the data's still cached in memory!), in my tests taking around 64 microseconds per access.  This is a thousand times slower than a single memory access, largely due to the overhead of switching to the separate database server process and parsing your ASCII request, but it's still way faster than a disk access.  And unlike memory or even most disk I/O, database operations will survive across multiple runs of the program, and scale nicely to even very large databases--multi-terabyte databases are common in the commercial world!

Learn more by trying out some of the many tutorials on the net ("mysql tutorial" has 38 million hits):