Programming Project #4- NOT UPDATED FOR SPRING 07

CMPS 111, Spring 2006

Assigned: June 1st
Due: TBA

Remember: your project must be turned in online.

Purpose

The main goal for this project is to use a combination of system calls and user program to maintain Merkle hash trees in MINIX 3. You must implement:

As with the previous project, you will experiment with operating system kernels, and to do work in such a way that may very well crash the (simulated) computer. You'll get experience with modifying a kernel, and may end up with an OS that doesn't work, so you'll learn how to manage multiple kernels, at least one of which works.

You should also read over the general project information page before you start this project. In them, you will find information about MINIX 3 as well as general guidelines and hints for projects in this class.

Basics

The goal of this assignment is to give you additional experience in modifying MINIX 3 and to gain some familiarity with file systems, system calls, and Merkle hash trees. You should make minimal changes to the kernel, with most of your code being written in a user-level program.

This assignment requires you to implement Merkle hash trees, which can be used to easily detect files that have been modified. The hash of the files and subdirectories in a directory are stored in a file named .hashes in the directory. Calculating the hash of a file is relatively straightforward—simply use the functions in md5sum.c to step through all of the bytes, generating a hash value. Calculating a hash value of a directory is similar; the hash of a directory is the hash of the .hashes file in the directory. If a file's content changes, its hash value will change as well. This will cause a change to the .hashes file in its directory, which will result in the directory's hash value changing. This change will then "ripple" up the directory tree until it reaches the root. The user process you write will have to keep the hash tree up to date, recursively going up the directory tree (using the .. link to a directory's parent) to keep the hash values correct. The program should take as an argument a "root" directory below which it initializes the hash tree (checks correctness on startup), and then go into a loop calling the system for the names of changed files or directories, ignoring names that don't start with the "root," and making updates to the tree below the root when they occur. Note that the .hashes file must be kept sorted numerically (alphabetically) and should include both hash values (in hexadecimal) and file names. A sample .hashes file is available.

So far, all of the code can (and should!) be implemented at user level. However, the user process that manages these changes needs to know when it should recheck a file. To do this, you need to implement a system call that returns the name of each file that is closed (after being written), created, unlinked, truncated or renamed, or when a directory is created (mkdir) or deleted (rmdir). You only need a single system call to do this; all of the changes can be reported to the same call, with the user process figuring out the difference if necessary. The system call should buffer up file names, returning one name per call to the system call. It may be difficult to figure out the file name when the file is closed; you should consider putting the name into a buffer when the file is opened for writing, and then returning the name when the file is closed. Don't worry about files that are opened for writing but never actually written; they can be returned and will be ignored by the user program if needed. However, your system call should ignore .hashes files. You may use a set of fixed-size buffers if you like; 200 buffers of 150 characters each should be plenty of space (we won't test the program by sending lots of files at it). If you run out of buffer space, you may throw out a randomly-selected buffered name. This way, your code will work correctly even if nobody calls the system call.

Deliverables

You must hand in a compressed tar file of your project directory, including your design document. You must do a "make clean" before creating the tar file. In addition, you should include a README file to explain anything unusual to the teaching assistant. Your code and other associated files must be in a single directory; the TA will copy them to his MINIX installation and compile and run them there. You should have two subdirectories in your tar file below your main directory: one containing the kernel source files from the servers/fs directory, and the other containing your user program.

Do not submit object files, assembler files, or executables. Every file in the tar file that could be generated automatically by the compiler or assembler will result in a 5 point deduction from your programming assignment grade.

Your design document should be called design.txt (if plain ASCII text, with a maximum line length of 75 characters) or design.pdf (if in Adobe PDF), and should reside in the project directory with the rest of your code. Formats other than plain text or PDF are not acceptable; please convert other formats (MS Word, LaTeX, HTML, etc.) to PDF. Your design document should describe the design of your assignment in enough detail that a knowledgeable programmer could duplicate your work. This includes descriptions of the data structures you use, all non-trivial algorithms and formulas, and a description of each function including its purpose, inputs, outputs, and assumptions it makes about the inputs or outputs. A sample design document is available on the course web page.

Hints

We assume that you are already familiar with makefiles and debugging techniques from earlier classes such as CMPS 101 or from the sections held the first week of class. If not, this will be a considerably more difficult project because you will have to learn to use these tools as well.

This project doesn't require a lot of coding (typically fewer than 200 lines of code), but does require that you understand how to use MINIX and how to use basic system calls. You're encouraged to go to the class discussion section or talk with the course staff during office hours to get help if you need it.

You should do your design first, before writing your code. To do this, experiment with the existing shell template (if you like), inserting debugging print statements if it'll help. It may be more "fun" to just start coding without a design, but it'll also result in spending more time than you need to on the project.

IMPORTANT: As with all of the projects this quarter, the key to success is starting early. You can always take a break if you finish early, but it's impossible to complete a 20 hour project in the remaining 12 hours before it's due....

Project groups

You may do this project with a project partner of your choice. However, you can't switch partners if you already had a partner for Project #2. If you choose to work with a partner (and we encourage it), you both receive the same grade for the project. One of you should turn in a single file called partner.txt with the name and CATS account of your partner. The other partner should turn in files as above. Please make sure that both partners' names and accounts appear on all project files.


Last updated 23 Apr 2007 by Ethan L. Miller