HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization / 3705
Gong Cheng, Cheng Jin, Yuzhong Qu
The rapid growth of open data on the Web promotes the development of data portals that facilitate finding useful datasets. To help users quickly inspect a dataset found in a portal, we propose to summarize its contents and generate a hierarchical grouping of entities connected by relations. Our generic approach, called HIEDS, considers coverage of dataset, height of hierarchy, cohesion within groups, overlap between groups, and homogeneity of groups, and integrates these configurable factors into a combinatorial optimization problem to solve. We present an efficient solution, to serve users with dynamically configured summaries with acceptable latency. We systematically experiment with our approach on real-world RDF datasets.