CS245 Winter 2000 ñ Solution Set for Homework 2
This solution set was assembled from actual answers that students
provided. For many of the questions there may have been one or more approaches
to solving the problem ñ the solution set simply provides an example of answers
that received full credit for each problem. In addition, there were many high
quality solutions that were submitted and which received perfect credit. The
ones we used to assemble this solution set were chosen for their correctness
and out of convenience of file format.
Solution set provided courtesy of Sandra Ganino
Size of a disk block = 8 KB= 8 *1024 bytes = 8192 bytes
Available space per block = B = 8192 - 100 bytes (for header) = 8092 bytes
Problem 1
Sequential
For a table consisting of N records each of size b bytes, the
number of disk blocks required to store the table (when records are not
spanned) is given by the expression:
Ceiling(N/Floor(B/b))
- Customer relation: N = 10000 and b = 3000.
Therefore applying the above expression, we get
Ceiling(10000/Floor(8092/3000)) = 5000
- Accounts relation: N = 20000 and b = 1000.
Therefore applying the above expression, we get
Ceiling(20000/Floor(8092/1000)) = 2500
- Transactions relation: N = 60000 and b= 500.
Therefore applying the above expression, we get
Ceiling(60000/Floor(8092/500)) = 3750
This give us a total of 5000+2500+3750 = 11250 blocks.
Clustered
In a given block, we have 1 C, 2 A, and 6 T records giving us a total of
3000 + 2 * 1000 + 6 * 500 = 8000 bytes.
With available space of only 8092 bytes per block, we can
store only one such set of 9 records in a block. Hence, we will need 10000
blocks to store all the records in the three tables.
Random
Even though the distribution of the
records among the blocks might be different, this organization also stores only
1 C, 2 A, and 6 T records per block ñ same as the clustered organization.
Hence, the number of blocks required is once again 10000.
Problem 2
Probe
In all the three organization, only 1 I/O will be required because of an
index on the search key C.N.
Ordered Scan
In all three organizations, the index on C.N can be used to sequentially scan the
entire customer relation reading in only those blocks that contain one or more
customer records (since we ignore disk accesses to load index blocks assuming
them to be in memory):
- Sequential: 5000 I/0s because there are 5000 blocks storing
customer record
- Clustered: 10000 I/0s because there are 10000 blocks storing
customer records
- Random: 10000 I/0s because there are 10000 blocks storing
customer records
Plain Scan
As in the ordered scan, the index on C.N can be used to
read in the customer records. Since the index will be sorted on the key value
(namely C.N), the customer records will automatically be retrieved in sorted
order. Therefore the number of I/O for this kind of query is the same as for
the ordered scan.
Join
A common mistake observed while grading the solutions was a repeated
misinterpretation of the [join] operation as requiring the complete information
relating to all customers. Even though this is true in general, the particular
definition of [join] used in the question requires you to retrieve only the
information pertaining to a specific customer identified by his name.
Clustered: The clustered organization is most efficient in performing
this operation since the a customer record and all itís associated account and
transaction records are stored together in a single disk block. So, given a
customer name, the index on C.N can be used to retrieve the relevant customer
block and generate the tuples of the join. ñ 1 I/O
Random and Sequential: In the case of both the sequential and the random
organizations, the steps to compute the join are as follows:
- Use the index on C.N to
retrieve the block containing the relevant customer record ñ 1 I/O
- Move the customer
record to the working area.
- Use the index on A.N to
retrieve the block containing one of the two account records associated
with that particular customer ñ 1 I/O
- Move the retrieved
account record into the working area.
- Using the account
identifier (A.I) in that account record and the index on T.I, retrieve a
block containing one of the 3 transactions records associated with that
account ñ 1 I/O
- ìJoinî the customer
record in the working area with this transaction record and return one of
the tuples of the [join].
- Repeat steps 5 and 6
two more times to retrieve the remaining two transaction records
associated with that account. Generate two more tuples of the result ñ 2
I/Os
- Repeat steps 3-7 once
more using the second account record to generate the remaining tuples of
the result ñ 4 I/O
This gives us a total of 9 I/Os.
|
storage cost (Blks) |
I/Os for [probe] |
I/Os for [ord scan] |
I/Os for [pln scan] |
I/Os for [join] |
| [sequential] |
11250 |
1 |
5000 |
5000 |
9 |
| [clustered] |
10000 |
1 |
10000 |
10000 |
1 |
| [random] |
10000 |
1 |
10000 |
10000 |
9 |