BIGDATA: F: DKM: Plato: A model-based database for sensor data, enabling declarative statistical queries and performance via compression

Supported by NSF 1447943

Project web page: http://www.db.ucsd.edu/NSF14Plato

The NSF project abstract page

PI: Yannis Papakonstantinou

Co-PI: Yoav Freund

Students:

·         Julaiti Alafate (PhD student)

·         Yannis Katsis (PostDoc, finished)

·         Chunbin Lin (PhD student)

·         Jianguo Wang (PhD student)

·         Ben Mandel (MS student, graduated)

·         Sai (Spoorthy) Padigi (MS student)

·         Etienne Boursier (research visitor)

·         Jacque Brito (research visitor)

Abstract:

Sensor data of diverse types and large volumes need to be combined with the current standard SQL databases, which provide context and metadata for the sensor data. The combination will lead to a new generation of analytics in a number of areas, such as smart buildings that are based on building and environmental data collected by sensors. The project argues that this new generation of analytics must be based on the same healthy database technology cornerstones that the prior (non-sensor) business intelligence platforms were based on: Declarative queries, automatic optimization, efficient storage representations and multiple layers of abstraction lead to high productivity for the developer and the analyst. Such productivity is currently absent from sensor data analytics because database technology and sensor data processing currently do not mix well. Productivity is especially low in cases involving (a) many types of sensor data, (b) combinations of sensor data with conventional database data that provide context and (c) many types of analyses. Besides low productivity, the current (limited) state of the art poses very high expertise requirements on the analysts: They must be simultaneously experts in signal processing, statistics and big data management. The project will deliver a database system for sensor data, where the analyst can rapidly develop declarative queries that are automatically optimized. By doing so, the project will deliver the envisioned productivity gains and will lower the technical sophistication bar needed for acting in the space, therefore enabling many scientists and domain specialists to engage in analytics.

This project argues that at the core of the failure of SQL databases in the management and analytics of sensor spatiotemporal data is the lack of a critical abstraction, which is the real world models, which capture the stochastic processes that generate the measurements. The proposed Plato database system will bring the real world model concept into SQL databases by using models (spatiotemporal continuous functions) as first class citizens. The delivery of Plato requires innovative solutions to multiple problems: The project will design and implement (a) a model-aware data model and respective query language features that allow seamless combination of conventional SQL querying with statistical signal processing (b) learning algorithms that learn the model components of reduced-noise, additive model representations, which are naturally compressions of the original (c) query processing algorithms that operate directly on the compressed representations and utilize the relatively few bits necessary for the required confidence of the analytics. (d) semiautomated algorithms that further compress the model representations by considering the dependencies (mutual entropy) between the models. Finally, the project will exercise the resulting system on large scale statistical sensor data processing cases, such as the ones presented by the UCSD Energy Dashboard. The exercise will measure the lines-of-code as well as the runtime efficiency of the analyses.

Publications

[Plato15] Yannis Katsis, Yoav Freund, Yannis Papakonstantinou “Combining Databases and Signal Processing in Plato”, in CIDR 2015.

[GQFast17] Chunbin Lin, Benjamin Mandel, Yannis Papakonstantinou, Matthias Springer “Fast In-Memory SQL Analytics on Typed Graphs”, in PVLDB, 10(3), 2016.

[HippogriffDB16] Jing Li, Hung-Wei Tseng, Chunbin Lin, Yannis Papakonstantinou, Steven Swanson (2016). HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics.  PVLDB. 9 (14),  1647

[GQFastDemo17] Chunbin Lin, Jianguo Wang, Yannis Papakonstantinou “GQFast: Fast Graph Exploration with Context-Aware Autocompletion.” ICDE 2017 (demo track)

[PlatoHDMS17] Jaqueline Brito, Korhan Demirkaya, Boursier Etienne, Yannis Katsis, Chunbin Lin, Yannis Papakonstantinou “Efficient Approximate Query Answering over Sensor Data with Deterministic Error Guarantees” CoRR abs/1707.01414

[MILCVLDB2017] Jianguo Wang, Chunbin Lin, Ruining He, Moojin Chae, Yannis Papakonstantinou, Steven Swanson. “MILC: Inverted List Compression in Memory” 10(8). PVLDB, 10. pg853.

[BitmapSIGMOD2017] Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, Steven Swanson. “An Experimental Study of Bitmap Compression vs. Inverted List Compression” ACM SIGMOD. pg993. 

[WaldoSIGMOD2017] Vasilis Verroios, Hector Garcia-Molina, Yannis Papakonstantinou. “Waldo: An Adaptive Human Interface for Crowd Entity Resolution”. ACM SIGMOD Conference. pg1133