State all assumptions.
Tuples stored as fixed length, fixed format records, length = 3000 bytes
Tuples contain key attribute C.N (customer number), length 10 bytes;
other fields and record header make up rest
There are 10,000 C tuples
There is an index on attribute C.N.
Tuples stored as fixed length, fixed format records, length = 1000 bytes
Tuples contain attribute A.N (number of customer who owns account),
length 10 bytes;
Tuples also contain attribute A.I (account identifier), lenght 10 bytes;
other fields and header make up rest of the record
There are 20,000 A tuples
There is an index on attribute A.N.
Tuples stored as fixed length, fixed format records, length = 500 bytes
Tuples contain attribute T.I (the identifier of the account involved),
length 10 bytes; other fields (+header) make up rest
There are 60,000 T tuples.
There is an index on attribute T.I.
While the number of accounts associated with each customer vary,
for evaluation purposes we may assume that each customer has 2 accounts,
and each account has 3 transactions records associated with it.
The records are to be placed in a collection of 8 KB disk blocks that
have been reserved to exclusively hold C, A, or T records, or
combinations of those records. (That is, there are no other types of
records in the blocks we are discussing in this problem.) Each block
uses 100 bytes for its header; records are not spanned.
Three disk organization strategies are being considered:
We are also told there are four types of queries that constitute the vast majority of the workload:
Assume you only have one buffer page (8 KB) in memory; thus, each time you need a block it counts as one IO (unless the request is for exactly the same block you already have in the buffer). You also have enough memory to hold a single working copy of a C record, of an A record, and of a T record. So, for example, to do a join, you first read in a block containing a C record, and copy the record to the working C area. Then you read in the block containing an A record into your 8K buffer, and join the two records.
Also, ignore any IOs due to index accesses. That is, you may assume the indexes reside in memory.
| storage || IOs for | IOs for | IOs for | IOs for |
| cost(Blks)|| [probe] | [ord scan]| [pln scan]| [join] |
----------------------------------------------------------------------------
[sequential] | || | | | |
----------------------------------------------------------------------------
[clustered] | || | | | |
----------------------------------------------------------------------------
[random] | || | | | |
----------------------------------------------------------------------------
HINT: Do not concern yourself with events that are very unlikely.
For example, say you want to get the three account records for a particular
customer, and that these records are randomly placed on a large
number of disk blocks. There is some chance that the account
records you want happen to be in the same block (just by luck).
However, for this problem you may assume this probability is negligible
and it will take you exactly three IOs to get those three records.