Advancement: Estimating Graph Properties Through Sampling

Speaker Name: 
Shweta Jain
Speaker Title: 
PhD Student
Speaker Organization: 
Computer Science
Start Time: 
Tuesday, March 20, 2018 - 10:30am
End Time: 
Tuesday, March 20, 2018 - 12:30pm
Engineering 2, Room 475
Seshadhri Comandur

Abstract:  Increased connectivity amongst people and devices and new advancements in recording and storing data have resulted in large graphs with millions (if not billions) of nodes and edges becoming commonplace. Properties like degree distribution, clustering coefficients, counts of patterns provide a concise way of characterizing these graphs. Due to the sheer size of these graphs, we no more have the luxury of reading an entire graph before processing it, and at times, the access model of the graph itself may not permit it. Consequently, sampling is indispensable when estimating graph properties. In this work, we study graph sampling methods and demonstrate their use through two applications. The first of these tries to answer questions of the form: given a large graph, and only restricted access to the graph, what can we say about the graph as a whole? We answer this question for the instance of estimating the degree distribution of a graph.

Even if we have access to the whole graph, the combinatorial property we may be interested in measuring may be too large for exact counting to be practical. Our second application of sampling in graphs addresses the question: can we leverage the properties of real-world graphs to design algorithms for some computationally hard problems, that will work well in practice? We demonstrate this for the problem of estimating the number of k-cliques in a graph, for any given k.

This work combines techniques from Property Testing and Graph Theory to give practical algorithms with provable error bounds. The techniques used are generic enough that they can be extended to solving other problems like counting triangles and near-cliques, which we hope to explore in future work.