A Framework for Multidimensional Indexes on Distributed and Highly-available Data Stores

A Framework for Multidimensional Indexes on Distributed and Highly-available Data Stores
Title A Framework for Multidimensional Indexes on Distributed and Highly-available Data Stores PDF eBook
Author Cesare Cugnasco
Publisher
Pages 169
Release 2019
Genre
ISBN

Download A Framework for Multidimensional Indexes on Distributed and Highly-available Data Stores Book in PDF, Epub and Kindle

Spatial Big Data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of peta bytes of spatial data per year. However, as many authors have pointed out, the lack of specialized frameworks dealing with such kind of data is limiting possible applications and probably precluding many scientific breakthroughs. In this thesis, we describe three HPC scientific applications, ranging from molecular dynamics, neuroscience analysis, and physics simulations, where we experience first hand the limits of the existing technologies. Thanks to our experience, we define the desirable missing functionalities, and we focus on two features that when combined significantly improve the way scientific data is analyzed. On one side, scientific simulations generate complex datasets where multiple correlated characteristics describe each item. For instance, a particle might have a space position (x,y,z) at a given time (t). If we want to find all elements within the same area and period, we either have to scan the whole dataset, or we must organize the data so that all items in the same space and time are stored together. The second approach is called Multidimensional Indexing (MI), and it uses different techniques to cluster and to organize similar data together. On the other side, approximate analytics has been often indicated as a smart and flexible way to explore large datasets in a short period. Approximate analytics includes a broad family of algorithms which aims to speed up analytical workloads by relaxing the precision of the results within a specific interval of confidence. For instance, if we want to know the average age in a group with 1-year precision, we can consider just a random fraction of all the people, thus reducing the amount of calculation. But if we also want less I/O operations, we need efficient data sampling, which means organizing data in a way that we do not need to scan the whole data set to generate a random sample of it. According to our analysis, combining Multidimensional Indexing with efficient data Sampling (MIS) is a vital missing feature not available in the current distributed data management solutions. This thesis aims to solve such a shortcoming and it provides novel scalable solutions. At first, we describe the existing data management alternatives; then we motivate our preference for NoSQL key-value databases. Secondly, we propose an analytical model to study the influence of data models on the scalability and performance of this kind of distributed database. Thirdly, we use the analytical model to design two novel multidimensional indexes with efficient data sampling: the D8tree and the AOTree. Our first solution, the D8tree, improves state of the art for approximate spatial queries on static and mostly read dataset. Later, we enhanced the data ingestion capability or our approach by introducing the AOTree, an algorithm that enables the query performance of the D8tree even for HPC write-intensive applications. We compared our solution with PostgreSQL and plain storage, and we demonstrate that our proposal has better performance and scalability. Finally, we describe Qbeast, the novel distributed system that implements the D8tree and the AOTree using NoSQL technologies, and we illustrate how Qbeast simplifies the workflow of scientists in various HPC applications providing a scalable and integrated solution for data analysis and management.

M-Grid : A Distributed Framework for Multidimensional Indexing and Querying of Location Based Big Data

M-Grid : A Distributed Framework for Multidimensional Indexing and Querying of Location Based Big Data
Title M-Grid : A Distributed Framework for Multidimensional Indexing and Querying of Location Based Big Data PDF eBook
Author Shashank Kumar
Publisher
Pages 56
Release 2014
Genre Databases
ISBN

Download M-Grid : A Distributed Framework for Multidimensional Indexing and Querying of Location Based Big Data Book in PDF, Epub and Kindle

"The widespread use of mobile devices and the real time availability of user-location information is facilitating the development of new personalized, location-based applications and services (LBSs). Such applications require multi-attribute query processing, handling of high access scalability, support for millions of users, real time querying capability and analysis of large volumes of data. Cloud computing aided a new generation of distributed databases commonly known as key-value stores. Key-value stores were designed to extract value from very large volumes of data while being highly available, fault-tolerant and scalable, hence providing much needed features to support LBSs. However complex queries on multidimensional data cannot be processed efficiently as they do not provide means to access multiple attributes. In this thesis we present MGrid, a unifying indexing framework which enables key-value stores to support multidimensional queries. We organize a set of nodes in a P-Grid overlay network which provides fault-tolerance and efficient query processing. We use Hilbert Space Filling Curve based linearization technique which preserves the data locality to efficiently manage multi-dimensional data in a key-value store. We propose algorithms to dynamically process range and k nearest neighbor (kNN) queries on linearized values. This removes the overhead of maintaining a separate index table. Our approach is completely independent from the underlying storage layer and can be implemented on any cloud infrastructure. Experiments on Amazon EC2 show that MGrid achieves a performance improvement of three orders of magnitude in comparison to MapReduce and four times to that of MDHBase scheme"--Abstract, page iii.

Database and Expert Systems Applications

Database and Expert Systems Applications
Title Database and Expert Systems Applications PDF eBook
Author Qiming Chen
Publisher Springer
Pages 591
Release 2015-08-10
Genre Computers
ISBN 3319228498

Download Database and Expert Systems Applications Book in PDF, Epub and Kindle

This two volume set LNCS 9261 and LNCS 9262 constitutes the refereed proceedings of the 26th International Conference on Database and Expert Systems Applications, DEXA 2015, held in Valencia, Spain, September 1-4, 2015. The 40 revised full papers presented together with 32 short papers, and 2 keynote talks, were carefully reviewed and selected from 125 submissions. The papers discuss a range of topics including: temporal, spatial and high dimensional databases; semantic Web and ontologies; modeling, linked open data; NoSQLm NewSQL, data integration; uncertain data and inconsistency tolerance; database system architecture; data mining, query processing and optimization; indexing and decision support systems; modeling, extraction, social networks; knowledge management and consistency; mobility, privacy and security; data streams, Web services; distributed, parallel and cloud databases; information retrieval; XML and semi-structured data; data partitioning, indexing; data mining, applications; WWW and databases; data management algorithms. These volumes also include accepted papers of the 8th International Conference on Data Management in Cloud, Grid and P2P Systems, Globe 2015, held in Valencia, Spain, September 2, 2015. The 8 full papers presented were carefully reviewed and selected from 13 submissions. The papers discuss a range of topics including: MapReduce framework: load balancing, optimization and classification; security, data privacy and consistency; query rewriting and streaming.

Advances in Spatial and Temporal Databases

Advances in Spatial and Temporal Databases
Title Advances in Spatial and Temporal Databases PDF eBook
Author Michael Gertz
Publisher Springer
Pages 454
Release 2017-08-07
Genre Computers
ISBN 3319643673

Download Advances in Spatial and Temporal Databases Book in PDF, Epub and Kindle

This book constitutes the refereed proceedings of the 15th International Symposium on Spatial and Temporal Databases, SSTD 2017, held in Arlington, VA, USA, in August 2017.The 19 full papers presented together with 8 demo papers and 5 vision papers were carefully reviewed and selected from 90 submissions. The papers are organized around the current research on concepts, tools, and techniques related to spatial and temporal databases.

High Performance Computing

High Performance Computing
Title High Performance Computing PDF eBook
Author Michèle Weiland
Publisher Springer Nature
Pages 682
Release 2019-12-02
Genre Computers
ISBN 3030343561

Download High Performance Computing Book in PDF, Epub and Kindle

This book constitutes the refereed post-conference proceedings of 13 workshops held at the 34th International ISC High Performance 2019 Conference, in Frankfurt, Germany, in June 2019: HPC I/O in the Data Center (HPC-IODC), Workshop on Performance & Scalability of Storage Systems (WOPSSS), Workshop on Performance & Scalability of Storage Systems (WOPSSS), 13th Workshop on Virtualization in High-Performance Cloud Computing (VHPC '18), 3rd International Workshop on In Situ Visualization: Introduction and Applications, ExaComm: Fourth International Workshop on Communication Architectures for HPC, Big Data, Deep Learning and Clouds at Extreme Scale, International Workshop on OpenPOWER for HPC (IWOPH18), IXPUG Workshop: Many-core Computing on Intel, Processors: Applications, Performance and Best-Practice Solutions, Workshop on Sustainable Ultrascale Computing Systems, Approximate and Transprecision Computing on Emerging Technologies (ATCET), First Workshop on the Convergence of Large Scale Simulation and Artificial Intelligence, 3rd Workshop for Open Source Supercomputing (OpenSuCo), First Workshop on Interactive High-Performance Computing, Workshop on Performance Portable Programming Models for Accelerators (P^3MA). The 48 full papers included in this volume were carefully reviewed and selected. They cover all aspects of research, development, and application of large-scale, high performance experimental and commercial systems. Topics include HPC computer architecture and hardware; programming models, system software, and applications; solutions for heterogeneity, reliability, power efficiency of systems; virtualization and containerized environments; big data and cloud computing; and artificial intelligence.

Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017

Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017
Title Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017 PDF eBook
Author Aboul Ella Hassanien
Publisher Springer
Pages 932
Release 2017-08-30
Genre Technology & Engineering
ISBN 3319648616

Download Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017 Book in PDF, Epub and Kindle

This book gathers the proceedings of the 3rd International Conference on Advanced Intelligent Systems and Informatics 2017 (AISI2017), which took place in Cairo, Egypt from September 9 to 11, 2017. This international and interdisciplinary conference, which highlighted essential research and developments in the field of informatics and intelligent systems, was organized by the Scientific Research Group in Egypt (SRGE). The book’s content is divided into five main sections: Intelligent Language Processing, Intelligent Systems, Intelligent Robotics Systems, Informatics, and the Internet of Things.

Peer-to-Peer Query Processing over Multidimensional Data

Peer-to-Peer Query Processing over Multidimensional Data
Title Peer-to-Peer Query Processing over Multidimensional Data PDF eBook
Author Akrivi Vlachou
Publisher Springer Science & Business Media
Pages 93
Release 2012-04-13
Genre Technology & Engineering
ISBN 1461421101

Download Peer-to-Peer Query Processing over Multidimensional Data Book in PDF, Epub and Kindle

Applications that require a high degree of distribution and loosely-coupled connectivity are ubiquitous in various domains, including scientific databases, bioinformatics, and multimedia retrieval. In all these applications, data is typically voluminous and multidimensional, and support for advanced query operators is required for effective querying and efficient processing. To address this challenge, we adopt a hybrid P2P architecture and propose novel indexing and query processing algorithms. We present a scalable framework that relies on data summaries that are distributed and maintained as multidimensional routing indices. Different types of data summaries enable efficient processing of a variety of advanced query operators.