Instant Recovery with Write-Ahead Logging
Title | Instant Recovery with Write-Ahead Logging PDF eBook |
Author | Goetz Graefe |
Publisher | Springer Nature |
Pages | 113 |
Release | 2022-05-31 |
Genre | Computers |
ISBN | 3031018575 |
Traditional theory and practice of write-ahead logging and of database recovery focus on three failure classes: transaction failures (typically due to deadlocks) resolved by transaction rollback; system failures (typically power or software faults) resolved by restart with log analysis, "redo," and "undo" phases; and media failures (typically hardware faults) resolved by restore operations that combine multiple types of backups and log replay. The recent addition of single-page failures and single-page recovery has opened new opportunities far beyond the original aim of immediate, lossless repair of single-page wear-out in novel or traditional storage hardware. In the contexts of system and media failures, efficient single-page recovery enables on-demand incremental "redo" and "undo" as part of system restart or media restore operations. This can give the illusion of practically instantaneous restart and restore: instant restart permits processing new queries and updates seconds after system reboot and instant restore permits resuming queries and updates on empty replacement media as if those were already fully recovered. In the context of node and network failures, instant restart and instant restore combine to enable practically instant failover from a failing database node to one holding merely an out-of-date backup and a log archive, yet without loss of data, updates, or transactional integrity. In addition to these instant recovery techniques, the discussion introduces self-repairing indexes and much faster offline restore operations, which impose no slowdown in backup operations and hardly any slowdown in log archiving operations. The new restore techniques also render differential and incremental backups obsolete, complete backup commands on a database server practically instantly, and even permit taking full up-to-date backups without imposing any load on the database server. Compared to the first version of this book, this second edition adds sections on applications of single-page repair, instant restart, single-pass restore, and instant restore. Moreover, it adds sections on instant failover among nodes in a cluster, applications of instant failover, recovery for file systems and data files, and the performance of instant restart and instant restore.
Advances in Databases and Information Systems
Title | Advances in Databases and Information Systems PDF eBook |
Author | Jaroslav Pokorný |
Publisher | Springer |
Pages | 358 |
Release | 2016-08-13 |
Genre | Computers |
ISBN | 331944039X |
This book constitutes the thoroughly refereed proceedings of the 20th East European Conference on Advances in Databases and Information Systems, ADBIS 2016, held in Prague, Czech Republic, in August 2016. The 21 full papers presented together with two keynote papers and one keynote abstract were carefully selected and reviewed from 85 submissions. The papers are organized in topical sections such as data quality, mining, analysis and clustering; model-driven engineering, conceptual modeling; data warehouse and multidimensional modeling, recommender systems; spatial and temporal data processing; distributed and parallel data processing; internet of things and sensor networks.
Answering Queries Using Views
Title | Answering Queries Using Views PDF eBook |
Author | Foto Afrati |
Publisher | Morgan & Claypool Publishers |
Pages | 277 |
Release | 2019-04-15 |
Genre | Computers |
ISBN | 168173463X |
The topic of using views to answer queries has been popular for a few decades now, as it cuts across domains such as query optimization, information integration, data warehousing, website design and, recently, database-as-a-service and data placement in cloud systems. This book assembles foundational work on answering queries using views in a self-contained manner, with an effort to choose material that constitutes the backbone of the research. It presents efficient algorithms and covers the following problems: query containment; rewriting queries using views in various logical languages; equivalent rewritings and maximally contained rewritings; and computing certain answers in the data-integration and data-exchange settings. Query languages that are considered are fragments of SQL, in particular select-project-join queries, also called conjunctive queries (with or without arithmetic comparisons or negation), and aggregate SQL queries. This second edition includes two new chapters that refer to tree-like data and respective query languages. Chapter 8 presents the data model for XML documents and the XPath query language, and Chapter 9 provides a theoretical presentation of tree-like data model and query language where the tuples of a relation share a tree-structured schema for that relation and the query language is a dialect of SQL with evaluation techniques appropriately modified to fit the richer schema.
Data Management in Machine Learning Systems
Title | Data Management in Machine Learning Systems PDF eBook |
Author | Matthias Boehm |
Publisher | Springer Nature |
Pages | 157 |
Release | 2022-05-31 |
Genre | Computers |
ISBN | 3031018699 |
Large-scale data analytics using machine learning (ML) underpins many modern data-driven applications. ML systems provide means of specifying and executing these ML workloads in an efficient and scalable manner. Data management is at the heart of many ML systems due to data-driven application characteristics, data-centric workload characteristics, and system architectures inspired by classical data management techniques. In this book, we follow this data-centric view of ML systems and aim to provide a comprehensive overview of data management in ML systems for the end-to-end data science or ML lifecycle. We review multiple interconnected lines of work: (1) ML support in database (DB) systems, (2) DB-inspired ML systems, and (3) ML lifecycle systems. Covered topics include: in-database analytics via query generation and user-defined functions, factorized and statistical-relational learning; optimizing compilers for ML workloads; execution strategies and hardware accelerators; data access methods such as compression, partitioning and indexing; resource elasticity and cloud markets; as well as systems for data preparation for ML, model selection, model management, model debugging, and model serving. Given the rapidly evolving field, we strive for a balance between an up-to-date survey of ML systems, an overview of the underlying concepts and techniques, as well as pointers to open research questions. Hence, this book might serve as a starting point for both systems researchers and developers.
Querying Graphs
Title | Querying Graphs PDF eBook |
Author | Angela Bonifati |
Publisher | Springer Nature |
Pages | 166 |
Release | 2022-06-01 |
Genre | Computers |
ISBN | 3031018648 |
Graph data modeling and querying arises in many practical application domains such as social and biological networks where the primary focus is on concepts and their relationships and the rich patterns in these complex webs of interconnectivity. In this book, we present a concise unified view on the basic challenges which arise over the complete life cycle of formulating and processing queries on graph databases. To that purpose, we present all major concepts relevant to this life cycle, formulated in terms of a common and unifying ground: the property graph data model—the pre-dominant data model adopted by modern graph database systems. We aim especially to give a coherent and in-depth perspective on current graph querying and an outlook for future developments. Our presentation is self-contained, covering the relevant topics from: graph data models, graph query languages and graph query specification, graph constraints, and graph query processing. We conclude by indicating major open research challenges towards the next generation of graph data management systems.
Scalable Processing of Spatial-Keyword Queries
Title | Scalable Processing of Spatial-Keyword Queries PDF eBook |
Author | Ahmed R. Mahmood |
Publisher | Springer Nature |
Pages | 98 |
Release | 2022-05-31 |
Genre | Computers |
ISBN | 3031018672 |
Text data that is associated with location data has become ubiquitous. A tweet is an example of this type of data, where the text in a tweet is associated with the location where the tweet has been issued. We use the term spatial-keyword data to refer to this type of data. Spatial-keyword data is being generated at massive scale. Almost all online transactions have an associated spatial trace. The spatial trace is derived from GPS coordinates, IP addresses, or cell-phone-tower locations. Hundreds of millions or even billions of spatial-keyword objects are being generated daily. Spatial-keyword data has numerous applications that require efficient processing and management of massive amounts of spatial-keyword data. This book starts by overviewing some important applications of spatial-keyword data, and demonstrates the scale at which spatial-keyword data is being generated. Then, it formalizes and classifies the various types of queries that execute over spatial-keyword data. Next, it discusses important and desirable properties of spatial-keyword query languages that are needed to express queries over spatial-keyword data. As will be illustrated, existing spatial-keyword query languages vary in the types of spatial-keyword queries that they can support. There are many systems that process spatial-keyword queries. Systems differ from each other in various aspects, e.g., whether the system is batch-oriented or stream-based, and whether the system is centralized or distributed. Moreover, spatial-keyword systems vary in the types of queries that they support. Finally, systems vary in the types of indexing techniques that they adopt. This book provides an overview of the main spatial-keyword data-management systems (SKDMSs), and classifies them according to their features. Moreover, the book describes the main approaches adopted when indexing spatial-keyword data in the centralized and distributed settings. Several case studies of {SKDMSs} are presented along with the applications and query types that these {SKDMSs} are targeted for and the indexing techniques they utilize for processing their queries. Optimizing the performance and the query processing of {SKDMSs} still has many research challenges and open problems. The book concludes with a discussion about several important and open research-problems in the domain of scalable spatial-keyword processing.
Cloud-Based RDF Data Management
Title | Cloud-Based RDF Data Management PDF eBook |
Author | Zoi Kaoudi |
Publisher | Morgan & Claypool Publishers |
Pages | 105 |
Release | 2020-02-26 |
Genre | Computers |
ISBN | 1681730340 |
Resource Description Framework (or RDF, in short) is set to deliver many of the original semi-structured data promises: flexible structure, optional schema, and rich, flexible Universal Resource Identifiers as a basis for information sharing. Moreover, RDF is uniquely positioned to benefit from the efforts of scientific communities studying databases, knowledge representation, and Web technologies. As a consequence, the RDF data model is used in a variety of applications today for integrating knowledge and information: in open Web or government data via the Linked Open Data initiative, in scientific domains such as bioinformatics, and more recently in search engines and personal assistants of enterprises in the form of knowledge graphs. Managing such large volumes of RDF data is challenging due to the sheer size, heterogeneity, and complexity brought by RDF reasoning. To tackle the size challenge, distributed architectures are required. Cloud computing is an emerging paradigm massively adopted in many applications requiring distributed architectures for the scalability, fault tolerance, and elasticity features it provides. At the same time, interest in massively parallel processing has been renewed by the MapReduce model and many follow-up works, which aim at simplifying the deployment of massively parallel data management tasks in a cloud environment. In this book, we study the state-of-the-art RDF data management in cloud environments and parallel/distributed architectures that were not necessarily intended for the cloud, but can easily be deployed therein. After providing a comprehensive background on RDF and cloud technologies, we explore four aspects that are vital in an RDF data management system: data storage, query processing, query optimization, and reasoning. We conclude the book with a discussion on open problems and future directions.