· Turning in your work: The project for CS 180, Fall 2002 will consist of a sequence of lab assignments involving database programming (including such tasks as create databases, populate them with data, modify data, write SQL queries). Unless otherwise specified, this lab assignment and all subsequent ones will be turned in electronically. We will use submit. Submit is accessed from your cats account. For information on using submit, enter "submit -m" from the console. The class directory is unknown at this time. We should have this information by the first section this week, at which point this document will be changed. The folder for each assignment will be "lab" followed by the assignment number. For lab assignment 1 the folder will be "lab1".
Please keep in mind that the deadlines for submitting lab assignments are separate from the deadlines for written homework assignments. The following late policy applies to this and subsequent lab assignments:
Lab assignment work must be submitted electronically by midnight at the end of the day that it is due. Programming work submitted after the deadline but before the "late deadline" -- midnight 48 hours after the deadline -- will be accepted but penalized 50%. No lab assignment work will be accepted after the late deadline.
One caveat: students must use ssh (both versions 1 and 2 will work), as telnet is not enabled. SSH is an encrypted (safer) method of connecting to a remote host, as opposed to telnet which transmits and receives data unencrypted. Free and legitimate SSH software is available for almost all computing platforms.
Once connected to linux1.ic.ucsc.edu, students will have full access to their CATS home directories via AFS, much the same way as when logged onto teach.ic.ucsc.edu and hawking.ic.ucsc.edu. The major different is that the binaries (including those for PostgreSQL) are Linux binaries, and thus will not work on the Sun hosts such as teach/learn/hawking/curie.
The goal of this assignment is to develop a familiarity with PostgreSQL, create tables and populate them with data.
(a) Familiarize yourself with the PostgreSQL relational DBMS by reading the document Managing a Database in the Interactive PostgreSQL documentation, logging into PostgreSQL, trying some of the examples in the document, and experimenting with the various commands. You don't need to turn anything in for this part. The PostgreSQL web page has links to documentation and other information. See PostgreSQL: Introduction and Concepts, by Bruce Momjian, which is available in printed form at the bookstore and on reserve at the library. There's also a PostgreSQL At A Glance document describing PostgreSQL's features.
Create your first database using the shell command:
createdb username
where "username" is your cats login username.
Then you can access your database with the command:
psql -a
which will start a command line interface to PostgreSQL connected to the database with your username. (The -a is so input is echoed to the console for saving a script log. If you don't want your input echoed, you can omit the -a, but please do include it when you are running an execution script to hand in.)
You may create other database names, but they must all start with your username. (By the way, username "wan" should not create databases starting with "wang"!)
To start a command line interface to PostgreSQL connecting to a different database, use the command:
psql -a databasename
where "databasename" is the name of your database (which should begin with your cats username).
(b) Choose a domain for a database. This will be the subject of some (but not all) future assignments, so you should try to choose something that you find interesting. Once you have chosen a domain, design a relational database schema. You are free to choose any relational database schema you like, but make sure that it contains at least three relational schemas and that at least one of the relational schemas has at least three attributes. Moreover, make sure that the relational schemas contain interrelated data, so that interesting tables can be created by joining appropriately some of the tables in your database. Finally, choose a key for each relational schema.
For this lab assignment, do not worry about how good your design is.
The goal this week is to
learn the methods for constructing a database in PostgreSQL. You will revisit
this design in a later lab assignment, once you have learned some tools for
design. For this part of the project, turn in a text file containing a brief
description of the data that your database will hold, and the relational
database schema, including the keys .
(c) Create relations for your database based on your relational database schema from part (b). Use the CREATE TABLE command to specify each relation its attributes and attribute types; see CREATE TABLE from the PostgreSQL interactive documentation. Many of the attribute types supported by PostgreSQL are listed on page 168 of the textbook. If you have an attribute that represents a date and/or time, you may want to look at the page on FAQ: Working with Dates and Times in PostgreSQL.
Turn in a script log showing an PostgreSQL session in which your relations are created successfully. Please see Recording Your Session below for details.
(d) For each relation in your database, create an execution script file containing a Copy Command and a few (approximately 5-10) records of "realistic" data. Then execute the script file from the psql command line using the \i option. Please see Creating execution script files below for more details.
Turn in a listing showing the contents of the files you created, the successful loading of the data into PostgreSQL, and the execution of "SELECT *" commands to show the contents of each relation.
(e) Write a program in any programming language you like that creates large files of records for each of your database relations. If you have available real data for your database, then your program will need to transform the data into files of records conforming to your relational database schema and to PostgreSQL's load format. The rest of you will need to write a program to fabricate data: your program will generate either random or nonrandom (e.g., sequential) records conforming to your schema. Note that it is both fine and expected for your data values - strings especially - to be meaningless gibberish. The point of generating large amounts of data is so that you can experiment with a database of realistic size, rather than the small "toy" databases often used in classes. The data you generate and load should be on the order of:
· At least two relations with hundreds of tuples
If your application naturally includes relations that are expected to be relatively small (e.g., schools within a university), then it is fine to use some small relations, but please ensure that you have relations of the sizes prescribed above as well. When writing a program to fabricate data, there are two important points to keep in mind:
Turn in your program code for generating or transforming data, a small sample of the records generated for each relation (5 or so records per relation), and a script log showing the successful loading of your data into PostgreSQL.
Please see Recording Your Session below for a guide to preparing output to be submitted for this and subsequent project parts.
Components (c), (d), and (e) of this project part each tell you what should be recorded in the script log that you turn in. In this and all subsequent project parts, the material you turn in should be clearly formatted and delineated, and should include comments for any aspects that are not crystal clear. Poorly assembled or documented material will not receive full credit, even if it is correct. You also will not receive full credit if you turn in your entire large data files (or large query results in later assignments) when we ask for small samples. Other than comments, truncation, and simple formatting, it is Academic Dishonesty to edit scripts before turning them in.
For this assignment the following files should be turned in electronically using your cats account and the submit program:
|
README |
Please give your name, project part number, course number, date, the title of your project, a list and description of the files you are submitting, and any other information that will be useful for the grader. |
|
create.log |
A record of your session creating your tables. |
|
data.script |
The execution script file(s) you used to populate your small database. If you used a separate file for each table use the table name to begin the file name: like product_data.script (for relation product) |
|
data.log |
script log (i.e., a recording) of your session using data.script to populate your DB and your execution of SELECT queries on each table. |
|
generate.x |
The program code to generate your large data (x = c, or java or perl...) |
|
bigdata.script |
Snippets from each of the the script file(s) produced by your generate program and used to populate your big database. If you used a separate file for each table use the table name to begin the file name: like product_bigdata.script (for relation product) |
|
bigdata.log-part |
Representative portions of the record of your use of script files to populate your big data base. Do not include the entire script log file. It will waste space and annoy the grader. Instead include only the commands and a few records for each relation as well as the psql messages. |
To submit these files (assuming you used java to generate your data) use the syntax:
submit cmps180-pk.f02 lab1 create.log data.script data.log generate.java bigdata.script bigdata.log-part
or lab1-late if your project part is late.
You will be using both your small (part c) and large (part d) databases for the rest of the course. The idea is to use the small database to experiment on meaningful-looking data, and the large one to experiment on data of more realistic size. We suggest that you keep the two databases going in parallel by creating two sets of files and two sets of relations with slightly different names.
For the duration of the project, we suggest that you establish some kind of routine that includes reloading your database from the files created in this project part each time you want to get a "fresh" start with PostgreSQL. Remember to delete the contents of each relation (or destroy and recreate the relations) before reloading. Otherwise, unless there is a declared key (or you take APPEND out of your control file), PostgreSQL will happily append the new data to your old relation, causing your relation size to double, triple, quadruple, etc. To get rid of a table called T, issue the command:
drop table T;
If you want to get rid of all tuples in T without deleting the table itself, issue the command:
delete from T;
There are several methods for creating a typescript to turn in for your programming assignments. The most primitive way is to cut and paste your terminal output and save it in a file (if you have windowing capabilities). Another method is to use the Unix command script to record the terminal interaction. The script command records everything printed on your screen. The syntax for the command is
script [ -a ] [ filename ]
The record is written to filename. If no file name is given, the record is saved in the file typescript. The -a option allows you to append the session record to filename, rather than overwrite it. To end the recording, type
exit
For more information about script, check out its man page.
You will be using the psql command line interface to interact with your data base. See Managing a Database for information on starting using psql. Script files are text files of psql commands which can be executed like a batch file using the the \i command in psql. The syntax is \i filename entered following the psql prompt, where file name is the complete (case sensitive) name of the script file you desire to run.
To run your execution script data.script and save the script log in the file data.log, do the following:
|
script data.log |
to start saving the script log |
|
psql -a databasename |
to run PostgreSQL's command line inteface using your database |
|
\i data.script |
to import an execution script |
|
\q |
to exit pqsl (PostgreSQL's command line interface) |
|
exit |
to stop saving the script log |
The execution script file you create can consist of most any series of commands which you could enter following the psql prompt. This includes all of the SQL commands which you will be using to create, modify and test your data base. Examples include the CREATE TABLE and SELECT commands. Just like when using the psql command line interface you must terminate each SQL command in your script file with a semicolon. \i and \q do not need semicolons.
If you are recording your session using the script command described above into a script log then it is useful to start psql using the -a option so that all commands included in your execution script file will be echoed to the console and thus to your script log file. Here is an example script file that creates a table (relation) named products:
CREATE TABLE products (
productID INT, name VARCHAR(80), price NUMERIC(10,2), retailPrice NUMERIC(10,2) );
Here is an example script file that loads four tuples into the table (relation) named products created using the previous script file:
COPY products FROM stdin USING DELIMITERS '|';
1419|American Greetings CreataCard Gold V4.0|21.49|25.24
1424|Barbie(R) Nail Designer(TM)|20.74|25.99
1427|Panzer Commander|21.99|30.24
1431|Riven: The Sequel to Myst|31.99|40.24
\.
This is the format you will use for the files that load data into your tables. The USING DELIMITERS '|' and the use of '|' as a delimiter is optional. The default delimiter is the tab character. The delimited data on each line must match the attributes and their types in your table in a one to one manner and in the order they were defined in your CREATE TABLE commands. The COPY data must be terminated with '\.'
For testing, some students have found it convenient to have a separate script file to populate each of the tables. I also find it convenient have a single script file to create all of my tables and another one to drop all of my tables. Examples follow.
-- Sample Script file to Create and
-- Populate a BEERS DB -- print out the current time
SELECT timeofday();
CREATE TABLE Beers (
name VARCHAR(30),
manf VARCHAR(50)
);
COPY Beers FROM stdin USING DELIMITERS '|';
Coors|Adolph Coors
Coors Lite|Adolph Coors
Miller|Miller Brewing
Miller Lite|Miller Brewing
MGD|Miller Brewing
Bud|Anheuser-Busch
Bud Lite|Anheuser-Busch
Michelob|Anheuser-Busch
Anchor Steam|Anchor Brewing
\.
CREATE TABLE Bars (
name VARCHAR(30),
addr VARCHAR(50),
license VARCHAR(50)
);
COPY Bars FROM stdin USING DELIMITERS '|';
Joe's|123 Any Street|B7462A
Sue's|456 My Way|C5473S
\.
CREATE TABLE Sells (
bar VARCHAR(20),
beer VARCHAR(30),
price REAL
);
COPY Sells FROM stdin USING DELIMITERS '|';
Joe's|Coors|2.50
Joe's|Bud|2.50
Joe's|Bud Lite|2.50
Joe's|Michelob|2.50
Joe's|Anchor Steam|3.50
Sue's|Coors|2.00
Sue's|Miller|2.00
\.
CREATE TABLE Drinkers (
name VARCHAR(30),
addr VARCHAR(50),
phone CHAR(16)
);
COPY Drinkers FROM stdin USING DELIMITERS '|';
Bill Jones|180 Saint St.|831-459-1812
Kelly Arthur|180 Alto Pl.|650-856-2002
Fred|1234 Fifth St.|831-426-1956
\.
CREATE TABLE Likes (
drinker VARCHAR(30),
beer VARCHAR(30)
);
COPY Likes FROM stdin USING DELIMITERS '|';
Bill Jones|Miller
Bill Jones|Michelob
Kelly Arthur|Anchor Steam
Fred|MGD
\.
CREATE TABLE Frequents (
drinker VARCHAR(30),
bar VARCHAR(30)
);
COPY Frequents FROM stdin USING DELIMITERS '|';
Bill Jones|Joe's
Bill Jones|Sue's
Kelly Arthur|Joe's
\.
-- Execute some SELECT queries--
SELECT * FROM Bars;
SELECT * FROM Drinkers;
-- print out the current time
SELECT timeofday();
DROP TABLE Beers;
DROP TABLE Bars;
DROP TABLE Sells;
DROP TABLE Likes;
DROP TABLE Frequents;
DROP TABLE Drinkers;