Programming Project #4
CMPS 111, Spring 2006
Assigned: June 1st
Due: Friday, June 9th at 8:00 PM
(automatic extension to Sunday, June 11th at 8:00 PM)
Remember: your project must be turned in online.
Purpose
The main goal for this project is to use a combination of system calls and user program to maintain Merkle hash trees in MINIX 3. You must implement:
- A system call that returns the path to a file that has been modified.
- A user program that recalculates the MD5 hash value for a file or directory, and recursively adjusts the hash values for parent directories (see below for details).
As with the previous project, you will experiment with operating system kernels, and to do work in such a way that may very well crash the (simulated) computer. You'll get experience with modifying a kernel, and may end up with an OS that doesn't work, so you'll learn how to manage multiple kernels, at least one of which works.
You should also read over the general project information page before you start this project. In them, you will find information about MINIX 3 as well as general guidelines and hints for projects in this class.
Basics
The goal of this assignment is to give you additional experience in modifying MINIX 3 and to gain some familiarity with file systems, system calls, and Merkle hash trees. You should make minimal changes to the kernel, with most of your code being written in a user-level program.
This assignment requires you to implement Merkle hash trees, which can be used to easily detect files that have been modified. The hash of the files and subdirectories in a directory are stored in a file named .hashes in the directory. Calculating the hash of a file is relatively straightforward—simply use the functions in md5sum.c to step through all of the bytes, generating a hash value. Calculating a hash value of a directory is similar; the hash of a directory is the hash of the .hashes file in the directory. If a file's content changes, its hash value will change as well. This will cause a change to the .hashes file in its directory, which will result in the directory's hash value changing. This change will then "ripple" up the directory tree until it reaches the root. The user process you write will have to keep the hash tree up to date, recursively going up the directory tree (using the .. link to a directory's parent) to keep the hash values correct. The program should take as an argument a "root" directory below which it initializes the hash tree (checks correctness on startup), and then go into a loop calling the system for the names of changed files or directories, ignoring names that don't start with the "root," and making updates to the tree below the root when they occur. Note that the .hashes file must be kept sorted numerically (alphabetically) and should include both hash values (in hexadecimal) and file names. A sample .hashes file is available.
So far, all of the code can (and should!) be implemented at user level. However, the user process that manages these changes needs to know when it should recheck a file. To do this, you need to implement a system call that returns the name of each file that is closed (after being written), created, unlinked, truncated or renamed, or when a directory is created (mkdir) or deleted (rmdir). You only need a single system call to do this; all of the changes can be reported to the same call, with the user process figuring out the difference if necessary. The system call should buffer up file names, returning one name per call to the system call. It may be difficult to figure out the file name when the file is closed; you should consider putting the name into a buffer when the file is opened for writing, and then returning the name when the file is closed. Don't worry about files that are opened for writing but never actually written; they can be returned and will be ignored by the user program if needed. However, your system call should ignore .hashes files. You may use a set of fixed-size buffers if you like; 200 buffers of 150 characters each should be plenty of space (we won't test the program by sending lots of files at it). If you run out of buffer space, you may throw out a randomly-selected buffered name. This way, your code will work correctly even if nobody calls the system call.
Deliverables
You must hand in a compressed tar file of your project directory, including your design document. You must do a "make clean" before creating the tar file. In addition, you should include a README file to explain anything unusual to the teaching assistant. Your code and other associated files must be in a single directory; the TA will copy them to his MINIX installation and compile and run them there. You should have two subdirectories in your tar file below your main directory: one containing the kernel source files from the servers/fs directory, and the other containing your user program.
Do not submit object files, assembler files, or executables. Every file in the tar file that could be generated automatically by the compiler or assembler will result in a 5 point deduction from your programming assignment grade.
Your design document should be called design.txt (if plain ASCII text, with a maximum line length of 75 characters) or design.pdf (if in Adobe PDF), and should reside in the project directory with the rest of your code. Formats other than plain text or PDF are not acceptable; please convert other formats (MS Word, LaTeX, HTML, etc.) to PDF. Your design document should describe the design of your assignment in enough detail that a knowledgeable programmer could duplicate your work. This includes descriptions of the data structures you use, all non-trivial algorithms and formulas, and a description of each function including its purpose, inputs, outputs, and assumptions it makes about the inputs or outputs. A sample design document is available on the course web page.
Hints
- START EARLY! You should start with your design, and check it over with the course staff.
- Experiment! You're running in an emulated system—you can't crash the whole computer (and if you can, let us know...).
- You may want to edit your code outside of MINIX (using
your favorite text editor) and copy it into MINIX to
compile and run it. This has several advantages:
- Crashes in MINIX don't harm your source code (by not writing changes to disk, perhaps).
- Most OSes have better editors than what's available in MINIX.
- START EARLY!
- Look over the operating system code before writing your design document (not to mention your code!). Leverage existing code as much as possible, and modify as little as possible.
- A good place to modify the system calls is right before they exit. You'll want to modify do_close, do_mkdir, etc. right before they return to the caller. Don't worry about returning a file that wasn't really modified—your user program can simply ignore it if it discovers no changes were made.
- For this assignment, you should write fewer than 200 lines of kernel code. You might want to look at the kernel code to learn how to implement a system call by seeing how it's done already.
- Waiting for something to happen and then resuming the process afterwards is very similar to what the select() system call does (in fs/select.c). You can probably reuse much of that code for your system call. In particular, look at what suspend() and revive() do. Use the same mechanisms you used in Project #3 to return strings from the kernel.
- Write the hash tree generator first, without the system call. You can still test whether the generator works without any system calls.
- Use RCS to keep multiple revisions of your files. RCS is very space-efficient, and allows you to keep multiple "coherent" versions of your source code and other files (such as Makefiles).
- Did we mention that you should START EARLY!
We assume that you are already familiar with makefiles and debugging techniques from earlier classes such as CMPS 101 or from the sections held the first week of class. If not, this will be a considerably more difficult project because you will have to learn to use these tools as well.
This project doesn't require a lot of coding (typically fewer than 200 lines of code), but does require that you understand how to use MINIX and how to use basic system calls. You're encouraged to go to the class discussion section or talk with the course staff during office hours to get help if you need it.
You should do your design first, before writing your code. To do this, experiment with the existing shell template (if you like), inserting debugging print statements if it'll help. It may be more "fun" to just start coding without a design, but it'll also result in spending more time than you need to on the project.
IMPORTANT: As with all of the projects this quarter, the key to success is starting early. You can always take a break if you finish early, but it's impossible to complete a 20 hour project in the remaining 12 hours before it's due....
Project groups
You may do this project with a project partner of your choice. However, you can't switch partners if you already had a partner for Project #2. If you choose to work with a partner (and we encourage it), you both receive the same grade for the project. One of you should turn in a single file called partner.txt with the name and CATS account of your partner. The other partner should turn in files as above. Please make sure that both partners' names and accounts appear on all project files.
Last updated 1 Jun 2006 by Ethan L. Miller