Here are some text mining project ideas from Yi Zhang. 1. Sentimental text classification: classifying review documents according to the polarity of its prevailing opinion (favorable or unfavorable). (corpus: epinion review data, text classification) 2. Email classification: automatically classify emails into different folders of each person (corpus: Enron email, hierarchical text classification) 3. Study how to choose text classification algorithms (feature selection, feature numbers, classification algorithms) for different data set. (corpus: several data set with very different characteristics and class level) 4. Topic detection and tracking: discovering and threading together topically related material in streams of data such as newswire and broadcast news (online text clustering) 5. Assign Gene Ontology (GO) to full text documents from two years of three MEDLINE journals. (Text classification) 6. Favorite page recommendation (text classification) 7. Extract disease and treatment from bioscience literature (entity classification using classification algorithms such as NN, or information extraction algorithm such as HMM) 8. Using web to improve text classification accuracy Except task 6, I have the corpus and the right answer for students to evaluate the algorithms. Hope this list is useful for your class. Please let me know if you need more information. Regards, Yi