Information Retrieval (Under Construction)

Class location: Porter Acad 250    Time: 2pm-4pm T/TH

Instructor: Yi Zhang  (yiz+ism260 @ soe.ucsc.edu)

Office hours: 1:00-1:30pm Tuesday or email for appointment                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 

Office: Room 565, Engineering Building 2

WebCT (for homework submission)

 

The large amount of unstructured text information created every minute poses great opportunities and challenges due to the difficulty for the machine to understand natural language and user’s information needs. This course covers the basic principles and practical algorithms used for information retrieval and text mining, including statistical characteristics of text, several important retrieval models, text clustering, text classification, text filtering, web analysis, information extraction, and other related topics.

The intended audiences are graduate students. Students will learn how search engine works, how to build your own search engine and improve existing search engines. Students will also get hands on project experience with developing real-world applications, such as intelligent software tools for personalized search, learning from user feedback, website enhancement, and summarization and mining from emails, blogs, scientific literature, news, or call center data. The course serves as the first course for students want to do leading edge research in the area of information retrieval or text mining. It also opens the door to the increasing number of job positions in Search Technology companies, such as Google, Yahoo, as well as knowledge management divisions of major companies.

The course is lecture based. Students are expected to read some book chapters and research papers. Students will build design and implementation text retrieval and text filtering systems, . This project will be divided and evaluated at several steps as bi-weekly homework. The students will be evaluated based on homework, final exam, course project and course participation. 

Required Textbook:

MIR: Modern Information Retrieval errata

Recommended Readings:

MG: Managing Gigabytes

FSNLP: Foundations of Statistical Natural Language Processing  (UCSC online copy)

ESL: The Elements of Statistical Learning

IR: Information Retrieval (online)

Grading: Grades will be based on:

 

Item

Due Date

Value

Class Participation

 

5%

Assignments

 

15%

Reading abstracts

 

20%

Presentation

 

5%

Course project

 

35%

Mid. Exam

 

20%

 

 

FAQ:

1. How to access course material from outside UCSC intranet?