Stay Informed:
Baskin Engineering COVID-19 Information and Resources
Campus Roadmap to Recovery
Zoom Links: Zoom Help | Teaching with Zoom | Zoom Quick Guide

The Case for Sort-Based Query Processing

Speaker Name: 
Goetz Graefe
Speaker Title: 
Principal Researcher
Speaker Organization: 
Google
Start Time: 
Friday, January 29, 2021 - 12:00pm
End Time: 
Friday, January 29, 2021 - 1:05pm
Location: 
https://ucsc.zoom.us/j/94297752372?pwd=WnNyS1RGWjYyUHpRZ0VHTUxzclVzZz09
Organizer: 
Shel Finkelstein

Abstract: 

Common wisdom holds that hash-based query execution algorithms are the best choice for unsorted inputs, e.g., intermediate query results, and that any efficient database query processor must include hash-based query execution algorithms, e.g., hybrid hash join and hash aggregation. In contrast, this presentation argues that sort-based query execution can be as efficient as hash-based query execution, even for large unsorted inputs; requires less memory, less overflow, and less CPU effort for sorted and partially sorted inputs; exploits sort orders commonly found in index structures, in column stores, and in intermediate query results; and provides operational advantages, e.g., for progress estimation, for resource management, and for pause-and-resume or pause-migrate-and-resume with minimal wasted effort. In other words, hash-based algorithms are not required for database query processing because equivalent sort-based algorithms, carefully designed and implemented, are always just as efficient and very often more efficient. Parallel external merge sort can serve as the only stop-and-go operation, also known as a pipeline breaker, in a database query execution engine.

Bio: 

Goetz Graefe has worked on database query optimization, query execution, indexing algorithms, database utilities, logging and recovery, and transactional concurrency control. He has published survey papers on query optimization, query execution, sorting, b-tree concurrency control, and b-tree recovery; and monographs on modern b-tree techniques, instant recovery based on write-ahead logging, and transactional concurrency control. He has invented and written the Cascades optimizer framework adopted in many products as well as the exchange operator to encapsulate query parallelism, also found in many products. In 2017, he received the ACM SIGMOD Edgar F. Codd Innovations Award. He graduated from UW-Madison with a Ph.D. in computer science in 1987.

Note:  Zoom discussion with the speaker may continue after presentation until about 2:00pm.

Event Type: 
Event