Information Retrieval (Under Construction)

Final exam data: 4-7pm, Thursday June 12

Location:

UCSC Main campus: Baskin Engineering 156

SVC-2069, UCSC Silicon Valley Center at NASA (Mountain View)

Time: 6pm-7:45 T/TH

Instructors: Yi Zhang  (yiz  + 260 @ soe.ucsc.edu)

Office hours at NASA: 5-5:45pm Thursday (at the instructors’ office)

Office hour at UCSC: 5-5:45pm Tuesday (E2-565)

TAs: Anita KrishnaKumar (anita (at) soe.ucsc.edu)

TA hour: Wed 11-12pm (E2-475 or E2-477)

Online video lecture recordings

The large amount of unstructured text information created every minute poses great opportunities and challenges due to the difficulty for the machine to understand natural language and user’s information needs. This course covers the basic principles and practical algorithms used for information retrieval and text mining, including statistical characteristics of text, several important retrieval models, text clustering, text classification, text filtering, web analysis, information extraction, and other related topics.

The intended audiences are graduate students. Students will learn how search engine works, how to build your own search engine and improve existing search engines. Students will also get hands on project experience with developing real-world applications, such as intelligent software tools for personalized search, learning from user feedback, website enhancement, and summarization and mining from emails, blogs, scientific literature, news, or call center data. The course serves as the first course for students want to do leading edge research in the area of information retrieval or text mining. It also opens the door to the increasing number of job positions in Search Technology companies, such as Google, Yahoo, as well as knowledge management divisions of major companies.

The course is lecture based. Students are expected to read some book chapters and research papers. Students will build design and implementation text retrieval and text filtering systems, . This project will be divided and evaluated at several steps as bi-weekly homework. The students will be evaluated based on homework, final exam, course project and course participation. 

 

WebCT (for homework submission)

FSNLP: Foundations of Statistical Natural Language Processing  (UCSC online copy)

IR: Information Retrieval (online)

IIR: Introduction to Information Retrieval

III: Information Retrieval Interaction

MG: Managing Gigabytes

ESL: The Elements of Statistical Learning

Grading: Grades will be based on:

 

Item

Due Date

Value

Class Participation

 

5%

Assignments

 

25%

 

 

 

Presentation

 

5%

Course project

 

25%

Final  Exam

 

40%

 

 

FAQ:

1. How to access course material from outside UCSC intranet?