LAB42 Talk: Chat with the Data Benchmark – Dr. Juan Sequeda

STARTS AT 13:00
LAB42, L3.35

Chat with the Data Benchmark: Understanding Synergies between Large Language Models and Knowledge Graphs for Enterprise Conversations

Abstract

With the advent of Generative AI and Large Language Models (LLMs), the prospect of conversing with SQL databases has garnered significant attention in the enterprise. However, the accuracy of LLMs in this context remains unclear. Many claim (including us) that the missing link to providing essential enterprise context to LLMs is found in Knowledge Graphs (KGs). This begs the questions: to what extent do Knowledge Graphs increase accuracy? A comprehensive benchmark is required to understand these capabilities and limitations.

Even though there is a vast amount of question answering and Text-to-SQL benchmarks, we observe that they are disconnected from enterprise settings because they 1) lack enterprise relational schemas, 2) questions ranging on variety of business complexity, and 3) well defined and governed metadata that serves as business context in the form of a knowledge graph.

In this talk, we will introduce the Chat with the Data benchmark, which addresses 1) enterprises needs, 2) can be used to assess the accuracy of LLMs to answer natural language questions over enterprise SQL databases and 3) investigates how Knowledge Graphs can enhance accuracy. We will present our ongoing work, share preliminary results and discuss how organizations are adopting the benchmark framework for their internal uses.

Bio

Juan Sequeda is the Principal Scientist and Head of the AI Lab at data.world. He holds a PhD in Computer Science from The University of Texas at Austin. Juan’s research and industry work has been on the intersection of data and AI, with the goal to reliably create knowledge from inscrutable data, specifically designing and building Knowledge Graph for enterprise data and metadata management. Juan is the co-author of the book “Designing and Building Enterprise Knowledge Graph” and the co-host of Catalog and Cocktails, an honest, no-bs, non-salesy data podcast. Juan has researched and developed technology on semantic data virtualization, graph data modeling, schema mapping and data integration methodologies. He pioneered technology to construct knowledge graphs from relational databases, resulting in W3C standards, research awards, patents, software and his startup Capsenta acquired by data.world in 2019. Juan strives to build bridges between academia and industry as former co-chair of the LDBC Property Graph Schema Working Group, member of the LDCB Graph Query Languages task force, standards editor at the World Wide Web Consortium (W3C). Juan continues to be an active member of the scientific community through academic research partnerships, advising students, and member of data and AI scientific conference committees.