Advancement: Hierarchical Bayesian Modeling for Clustering Sparse Sequences in the Context of Group Profiling

Speaker Name: 
Ishani Chakraborty
Speaker Title: 
PhD Student (Advisor: Ram Akella)
Speaker Organization: 
Computer Science
Start Time: 
Wednesday, November 28, 2018 - 10:00am
End Time: 
Wednesday, November 28, 2018 - 12:00pm
Location: 
Engineering 2, Room 475
Organizer: 
Ram Akella

Abstract:  We propose a sparse sequence clustering algorithm based on hierarchical Bayesian Modeling. This algorithm can cluster very sparse non-negative integer sequences of unequal length which do not necessarily belong to well-defined classes. Such Sequences are generated from the vast majority of normal human actions, for example, user behavior data for Wikipedia contributors expressed as a count of updates per day. Thus this algorithm and modeling technique is very useful for modeling non-negative integer sequences generated by real-life human actions, such as user-visits to websites or shops. This data model is a mixture model where every sequence is generated by a mixture of distributions associated to several clusters but does not need the data to be represented by a Gaussian mixture, which is the most commonly used representation of sequences and that gives significant modeling freedom. This algorithm also generates an interpretable profile! for the discovered  Clusters,i.e, the latent groups of users. The Cluster profile, in this case, would contain the representative visit-counts of the cluster or group of users to a restaurant.  The data is a real-life collection of sparse sequences of a number of user visits to a food joint, where each entry of a user's sequence is the number of visits of that user per week to a given restaurant.