Homework #4

Assigned: December 4th
Due: Friday, December 7th at 11:59 PM

Please read this information on how to submit homework online. Hard copy of homework will not be accepted.

All work on this homework must be your own. Please read (and follow!) the academic honesty policy for this class.

  1. On the Mac and Windows, double-clicking a file automatically launches the program that knows how to deal with the file, and hands the file as a parameter to the program. List two different ways the operating system could know which file to run. Which approach do you believe is better? How could you change the mapping? For example, suppose you wanted to open text files with emacs rather than Microsoft Word.
  2. Defragmentation is the process of reorganizing the data on a disk to improve performance.
    1. Clearly, defragmentation is necessary for contiguous allocation. How can it improve performance for file systems that use indexed allocation?
    2. Describe (in high-level outline form) how you might defragment an ext2 / ext3 file system. How could you make sure the file system remains consistent throughout the whole process? In other words, how could you prevent a crash during defragmentation from corrupting the file system and losing data? HINT: think about the order in which you might want to do disk operations to move a file to a new location on the disk.
  3. Consider a Unix (FFS / ext2 / ext3) file system with 12 direct pointers in each inode and 4 KB file blocks, and a maximum file system size of 2 terabytes (2000 GB).
    1. How much overhead would be needed to store index blocks for a 100 MB file, not including the inode itself? What percentage of total file size would the overhead be? Remember that blocks must be allocated in their entirety if any part of the block is needed.
    2. On average, how many 4 KB blocks would have to be read to get a random block from the (100 MB) file? Again, assume the inode is already in memory and doesn't need to be read from disk. HINT: figure out how many blocks can be read for each of direct, single, double, triple indirect and compute the average from this information.
  4. Recently, some file systems have begun to include continuous data protection (CDP), a technique that keeps a copy of all data ever written to the file system. The most recently written data for a file is in the current view of the file system; older data (including files that might have been deleted) is made available via time travel access to the file system. A similar technique is used with the OldFiles directory on unix.ic.
    1. What are the advantages to such an approach?
    2. What are the disadvantages?
    3. How could you reduce the space used by such a system? Are there simple criteria you might use to select some files that should be permanently erased?
  5. [From Stallings' OS textbook, 5th edition] Directories can be implemented as either special files that can only be accessed in limited ways (inserting a file, scanning for a file, etc.) or as ordinary data files. What are the advantages and disadvantages of each approach?

Last updated 4 Dec 2007 by Ethan L. Miller