Advancement: Probabilistic Approaches for Data Alignment and Model Discovery in Knowledge Graphs

Speaker Name: 
Varun Embar
Speaker Title: 
PhD Student (Advisor: Lise Getoor)
Speaker Organization: 
Computer Science
Start Time: 
Wednesday, November 14, 2018 - 9:00am
End Time: 
Wednesday, November 14, 2018 - 11:00am
Location: 
Baskin Engineering, Room 330
Organizer: 
Lise Getoor

Abstract:  Knowledge graphs (KG) provide a structured representation of entities, their attributes and their relationships using a graph. KGs have become ubiquitous due to the ease of constructing them from structured and unstructured sources, and due to efficient storage and retrieval. A wide range of systems such as search engines, intelligent agents, contextual recommender systems and fake news detection applications use KGs as a knowledge source. To perform effectively, these systems need to extract latent patterns in the KG that are novel, valid and useful, a task known as knowledge discovery. In my research I examine two key tasks of knowledge discovery - data alignment, which involves inferring relationships between entities, and model discovery, a data-driven way of discovering and combining rules to reason in a KG. These tasks are challenging because: (1) a large variety of alignment relations coupled with a diverse range of entity and rela! tionship types makes it hard for a single, generic data alignment approach to capture all the attributes of data, and (2) KGs are large, containing millions of entities, and at the same time incomplete, missing many relationships, making the task of model discovery hard. In this proposal, I develop robust and scalable probabilistic knowledge discovery algorithms to address these challenges. I propose novel approaches for aligning entities with a large set of alignment relations, and for discovering models that can infer missing relationships in the KG. I illustrate the effectiveness of these approaches in capturing various features, inherent in the data, that help in performing these tasks. Extensive experimental evaluations show the importance of different algorithmic choices for each of these tasks. While the evaluation illustrates the effectiveness of my approaches, they can be improved in key ways: (1) the data alignment approaches fail to capture the interactions betwe! en different alignment relations, and (2) the handling of miss! ing data by the model discovery framework. In my future work, I propose new algorithms that overcome these drawbacks by (1) jointly inferring the alignment relations taking into account their interactions, (2) interleaving the model discovery algorithm with inference of missing data, and (3) efficiently searching the space of models.