|
In this assignment, you will learn how to design a DB schema for the beer sales management system.
The DB schema consists of five relations: Frequents, Bar, Serves, Likes and Drinker.
//-- Frequents Relation --
//drinker: name of the drinker
//barid: name for the bar
//day: date on which a drinker goes to the bar
Frequents(drinker, barid, day)
//-- Bar Relation --
//barid: name for the bar
//address: address of the bar
//capacity: maximum number of people the bar can host
//owner: name of the person who owns the bar
Bar(barid, address, capacity, owner)
//-- Serves Relation --
//barid: name for the bar
//beer: brand name for a beer
//price: price for a bottle of beer in a bar
Serves(barid, beer, price)
//-- Likes Relation --
//drinker: name of the drinker
//beer: brand name for a beer that the drinker drank on a particular day
//day: date
Likes(drinker, beer, day)
//-- Drinker Relation --
//drinker: name of the drinker
//age: age of the drinker
//occupation: drinker's occupation
//salary: yearly salary of the drinker
Drinker(drinker, age, occupation, salary)
Assumptions about the schema:
1) The primary keys are underlined, note that in some cases they consists of several attributes. A primary key on table A should be defined as a foreign key on table B if it appears as an attribute in table B, e.g. drinker is a primary key on the Drinker table and a foreign key on the Likes table.
3) Assume each bar has only one owner and same owner may have multiple bars.
4) Salary is the yearly salary for the drinker.
5) Data type definition:
- capacity, age are non-negative integers; day is an integer between 1 and 7 (including 1 and 7).
- drinker, owner, beer, occupation, barid and address are of string type.
- price and salary are of float type and they are non-negative.
6) All data fields should be not null.
- Part 1: setup the DB: Read Postgresql Manual Chapter 5 (5.1-5.4) for a detailed description about how to define the DB tables. Also read Chapter 8 for all the supported data types in Postgresql.
Create five tables with appropriate data types. Some useful commands can be searched through Postgresql interactive user manual, such as CREATE TABLE.
Define constraints of primary keys, foreign keys, as well as proper field values, e.g. age should be a positive integer. 5.4 explains about the constraints. Also note that because the tables have referential dependencies, you should follow order when you create the tables, e.g. drinker in the Frenquents table is a foreign key on the Drinker table, so you'd better create the Drinker table first, otherwise, psql might complain that "relation Drinker does not exist". The above give table definitions are not in the right order, you should find out the order by yourself.
Submissions
- Justification about the data type selections. Include this in a README file. There is no standard answer for your data type selection, we just want to see how you make design decisions with a real world problem. Note that the selection of the data types are very important because it can affect DB storage utilization, query processing performance, etc. Please design your string length, and float type.
- Script files that you use to create the tables, each table should have one script with the name create_tablename.script, e.g. create_frequents.script for creating Frequents table. You will submit create_frequents.script, create_bar.script, create_serves.script, create_likes.script, and create_drinker.script in their right order.
- Execute the five script files and record the session in create.log, if you dont know how to record your session, please read "record your session" in the lab policies document. Just combine the five scripts into one and record the session to create this log file.
- Part 2: populate the DB with real data: After we've set up the DB, we need to populate data without violating the constraints. You will practise three different ways here. For people's name, you can use your friends' names; for bar names, you can use google to search "Santa Cruz bars".
- Use INSERT command: you need to mannually add at least 2 record to each table using INSERT.
- Use COPY command to import data from stdin: COPY command provides a way to import data into DB from external data sources, either a data file, or stdin. For details you can read man page for COPY. You need to use COPY command to import data from stdin with the delimiter of "*". Add at least 6 records for each table.
- The 3rd way to populate the data is use COPY to import from an external data file. But since this requires superuser permission, we do not practise here. But it is very useful in future when you are a DBA yourself, e.g. export/dump data into an external data file as backup and later import/reload those data into the same DB or another DB.
Submissions
- Five script files, each of which populates data for one table. The naming convention for the script files is like this: populate_tablename.script. Each script file should include the usage of the first two ways of populating the data, using INSERT and using COPY from stdin. A sample script file below shows the basic idea.
- Execute the five script files and record your session of execution in populate.log.
Sample Script File and Data File for Your Reference
>> Create a Beers table and populate it with data from stdin:
-- Sample Script file to Create and Populate a Beers table
SELECT timeofday(); -- print out the current time (you can omit this line if you want)
CREATE TABLE Beers ( -- create Beers table that has two columns: name, a string, and manf, anther string
name VARCHAR(30),
manf VARCHAR(50)
);
INSERT INTO Beers VALUES('Asahi Draft Beer', 'Asahi Japan'); --use INSERT to populate data
COPY Beers FROM stdin USING DELIMITERS '|'; -- copy data from stdin using delimiter '|'
Coors|Adolph Coors
Coors Lite|Adolph Coors
Miller|Miller Brewing
Miller Lite|Miller Brewing
MGD|Miller Brewing
Bud|Anheuser-Busch
Bud Lite|Anheuser-Busch
Michelob|Anheuser-Busch
Anchor Steam|Anchor Brewing
\. -- end of beers.script
| Items to submit |
Description |
| README |
Your name, CATS username, timestamp + justification for data type selection. |
create_frequents.script,create_bar.script,create_serves.script, create_likes.script, create_drinker.script
|
Five script files, each of which creates one table. |
| create.log |
Session log when you combine the five create scripts and run them together. |
| populate_frequents.script, populate_bar.script, populate_serves.script, populate_likes.script, populate_drinker.script |
Five script files, each of which populates data for one table. |
| populate.log |
Session log when you combine the five popuate scripts and run them together. |
You should use submit program to submit the assignments. If you dont know how to use submit, please read "how to use submit" for details. Sample submit commands for this lab assignment:
submit cmps180-wt.s04 lab1 README //to submit README
submit cmps180-wt.s04 lab1 create_frequents.script create_bar.script, create_serves.script //to submit three files together
peek cmps180-wt.s04 lab1 README //to examine the README file you submitted
Common Problems and Discussion in the Answers
- Justification on the data type selections
SMALLINT, INT, and BIGINT: SMALLINT fields can store numbers ranging from -32768 to +32767
(this actual size may vary slightly depending on your computer type; the previous is the most common system integer size).INT fields can store larger numbers from -2147483648 to +2147483647. BIGINT field types are for anything bigger and have no size limit. So, SMALLINT would be enough for most integer values in our schema, e.g. age, day, capacity.
- CHAR, VARCHAR, and TEXT(from pgsql mailing list): CHAR(n) uses fixed length, i.e. for a string that is shorter than n, it will be padded using the spaces; VARCHAR(n) is best for storing a string at its actual length with a fixed upper bound; TEXT is best for text that has no predefined upper limit(the physical limit for each table field still applies, e.g. 1GB for Postgresql). So, it seems VARCHAR is most suitable for our case. Someone on the mailing list suggests that "there is no performance difference for CHAR and VARCHAR in Postgresql, unless you think about those extra I/O caused by the padding spaces". If you want to read more about CHAR and VARCHAR, read this post and a tutorial about SQL Server Performance tuning.
- MONEY, NUMERIC/DECIMAL, REAL, DOUBLE PRECISION, and FLOAT: MONEY is depreciated and not encouraged;
The type NUMERIC/DECIMAL can store numbers with up to 1000 digits of precision and perform calculations exactly. It is especially recommended for storing monetary amounts and other quantities where exactness is required;
On most platforms, the REAL type has a range of at least 1E-37 to 1E+37 with a precision of at least 6 decimal digits; The DOUBLE PRECISION type typically has a range of around 1E-307 to 1E+308 with a precision of at least 15 digits.
PostgreSQL accepts FLOAT(1) to FLOAT(24) as selecting the REAL type, while FLOAT(25) to FLOAT(53) select DOUBLE PRECISION. So, it is reasonable to use NUMERIC/DECIMAL(n, 2) to store price and salary because we have data like 2.50 for beer price and 23000.45 for yearly salary.
- Common problems
(ranked according to popularity)
-
- Forget NOT NULL constraint on some table attributes: we need this constraint to make sure about data integrity.
- Forget salary>0 or salary>=0 (0 for no income), and price>0 or price>=0 (0 for free beer): salary and price should be non-negative in the real world.
- Used wrong delimiter for populate_table.script files: you are asked to use * instead of |.
- Wrong definition of foreign keys on beer and day: since we dont have a table for Beer information (although in the real world, we usually have a seperate table for Beer) where beer is the primary key, we dont need a foreign key definition on beer. This can be a problem since we cannot make sure a beer in the Likes table MUST appear in the Serves table. But we just loose this constraint here. Similar reason for not having a foreign key on day.
- How to interpret your grade
- This assignment will have 3 points towards the 15 points for all 4 lab
assignments. The grade is based on a full score of 100.
An example is like this:
ggz (65) //65 out of 100
-error1 -x //minus x points because of reason error1
-comment1 +y //add y points because it is a wise design choice
the user ggz got final point of 0.65*3=1.95 points towards his final grade
-
|