CONCEPT HIERARCHY-BASED PATTERN DISCOVERY IN TIME SERIES DATABASE: A CASE STUDY ON FINANCIAL DATABASE

CONCEPT HIERARCHY-BASED PATTERN DISCOVERY IN TIME SERIES DATABASE: A CASE STUDY ON FINANCIAL DATABASE
Title CONCEPT HIERARCHY-BASED PATTERN DISCOVERY IN TIME SERIES DATABASE: A CASE STUDY ON FINANCIAL DATABASE PDF eBook
Author Yan-Ping Huang
Publisher 黃燕萍工作室
Pages 73
Release 2014-07-25
Genre
ISBN

Download CONCEPT HIERARCHY-BASED PATTERN DISCOVERY IN TIME SERIES DATABASE: A CASE STUDY ON FINANCIAL DATABASE Book in PDF, Epub and Kindle

Data mining, a recent and contemporary research topic, is the process of automatically searching large volumes of data for patterns in computing. Nowadays, pattern discovery is a field within the area of data mining. In general, large volumes of time series data are contained in financial database and these data have some useful patterns which could not be found easily. Many financial studies in time series data analysis use linear regression model to estimate the variation and trend of the data. However, traditional methods of time series analysis used special types or linear models to describe the data. Linear models can achieve high accuracy when linear variation of the data is small, however, if the variation range exceeds a certain limit, the linear models has a lower performance in estimated accuracy. Among these traditional methods, SOM (Self Organizing Map) is a well-known non-linear model to extract pattern with numeric data. Many researches extract pattern from numeric data attributes rather than categorical or mixed data. It does not extract the major values from pattern rules, either. The purpose of this study is to provide a novel architecture in mining patterns from mixed data that uses a systematic approach in the financial database information mining, and try to find the patterns for estimate the trend or for special event’s occurrence. This present study employs ESA algorithm which integrates both EViSOM algorithm and EAOI algorithm. EViSOM algorithm is used to calculate the distance between the categorical and numeric data for pattern finding, whereas EAOI algorithm serves to generalize major values using conceptual hierarchies for patterns and major values extraction in financial database. The attempt of using ESA algorithm in this study is to discover the pattern in the Concept Hierarchy based Pattern Discovery (CHPD) architecture. Specifically, this architecture facilitates the direct handling of mixed data, including categorical and numeric values. This mining architecture is able to simulate human intelligence and discover patterns automatically, and it also demonstrates knowledge pattern discovery and rule extraction.

Data Abstraction and Pattern Identification in Time-series Data

Data Abstraction and Pattern Identification in Time-series Data
Title Data Abstraction and Pattern Identification in Time-series Data PDF eBook
Author Prithiviraj Muthumanickam
Publisher Linköping University Electronic Press
Pages 58
Release 2019-11-25
Genre
ISBN 9179299652

Download Data Abstraction and Pattern Identification in Time-series Data Book in PDF, Epub and Kindle

Data sources such as simulations, sensor networks across many application domains generate large volumes of time-series data which exhibit characteristics that evolve over time. Visual data analysis methods can help us in exploring and understanding the underlying patterns present in time-series data but, due to their ever-increasing size, the visual data analysis process can become complex. Large data sets can be handled using data abstraction techniques by transforming the raw data into a simpler format while, at the same time, preserving significant features that are important for the user. When dealing with time-series data, abstraction techniques should also take into account the underlying temporal characteristics. This thesis focuses on different data abstraction and pattern identification methods particularly in the cases of large 1D time-series and 2D spatio-temporal time-series data which exhibit spatiotemporal discontinuity. Based on the dimensionality and characteristics of the data, this thesis proposes a variety of efficient data-adaptive and user-controlled data abstraction methods that transform the raw data into a symbol sequence. The transformation of raw time-series into a symbol sequence can act as input to different sequence analysis methods from data mining and machine learning communities to identify interesting patterns of user behavior. In the case of very long duration 1D time-series, locally adaptive and user-controlled data approximation methods were presented to simplify the data, while at the same time retaining the perceptually important features. The simplified data were converted into a symbol sequence and a sketch-based pattern identification was then used to identify patterns in the symbolic data using regular expression based pattern matching. The method was applied to financial time-series and patterns such as head-and-shoulders, double and triple-top patterns were identified using hand drawn sketches in an interactive manner. Through data smoothing, the data approximation step also enables visualization of inherent patterns in the time-series representation while at the same time retaining perceptually important points. Very long duration 2D spatio-temporal eye tracking data sets that exhibit spatio-temporal discontinuity was transformed into symbolic data using scalable clustering and hierarchical cluster merging processes, each of which can be parallelized. The raw data is transformed into a symbol sequence with each symbol representing a region of interest in the eye gaze data. The identified regions of interest can also be displayed in a Space-Time Cube (STC) that captures both the temporal and contextual information. Through interactive filtering, zooming and geometric transformation, the STC representation along with linked views enables interactive data exploration. Using different sequence analysis methods, the symbol sequences are analyzed further to identify temporal patterns in the data set. Data collected from air traffic control officers from the domain of Air traffic control were used as application examples to demonstrate the results.

High Performance Discovery In Time Series

High Performance Discovery In Time Series
Title High Performance Discovery In Time Series PDF eBook
Author New York University
Publisher Springer Science & Business Media
Pages 195
Release 2013-11-09
Genre Computers
ISBN 1475740468

Download High Performance Discovery In Time Series Book in PDF, Epub and Kindle

This monograph is a technical survey of concepts and techniques for describing and analyzing large-scale time-series data streams. Some topics covered are algorithms for query by humming, gamma-ray burst detection, pairs trading, and density detection. Included are self-contained descriptions of wavelets, fast Fourier transforms, and sketches as they apply to time-series analysis. Detailed applications are built on a solid scientific basis.

High Performance Discovery In Time Series

High Performance Discovery In Time Series
Title High Performance Discovery In Time Series PDF eBook
Author Dennis Elliott Shasha
Publisher Springer Science & Business Media
Pages 210
Release 2004-06-03
Genre Computers
ISBN 9780387008578

Download High Performance Discovery In Time Series Book in PDF, Epub and Kindle

Time-series data—data arriving in time order, or a data stream—can be found in fields such as physics, finance, music, networking, and medical instrumentation. Designing fast, scalable algorithms for analyzing single or multiple time series can lead to scientific discoveries, medical diagnoses, and perhaps profits. High Performance Discovery in Time Series presents rapid-discovery techniques for finding portions of time series with many events (i.e., gamma-ray scatterings) and finding closely related time series (i.e., highly correlated price and return histories, or musical melodies). A typical time-series technique may compute a "consensus" time series—from a collection of time series—to use regression analysis for predicting future time points. By contrast, this book aims at efficient discovery in time series, rather than prediction, and its novelty lies in its algorithmic contributions and its simple, practical algorithms and case studies. It presumes familiarity with only basic calculus and some linear algebra. Topics and Features: *Presents efficient algorithms for discovering unusual bursts of activity in large time-series databases * Describes the mathematics and algorithms for finding correlation relationships between thousands or millions of time series across fixed or moving windows *Demonstrates strong, relevant applications built on a solid scientific basis *Outlines how readers can adapt the techniques for their own needs and goals *Describes algorithms for query by humming, gamma-ray burst detection, pairs trading, and density detection *Offers self-contained descriptions of wavelets, fast Fourier transforms, and sketches as they apply to time-series analysis This new monograph provides a technical survey of concepts and techniques for describing and analyzing large-scale time-series data streams. It offers essential coverage of the topic for computer scientists, physicists, medical researchers, financial mathematicians, musicologists, and researchers and professionals who must analyze massive time series. In addition, it can serve as an ideal text/reference for graduate students in many data-rich disciplines.

Data Analysis and Pattern Recognition in Multiple Databases

Data Analysis and Pattern Recognition in Multiple Databases
Title Data Analysis and Pattern Recognition in Multiple Databases PDF eBook
Author Animesh Adhikari
Publisher Springer
Pages 0
Release 2013-12-18
Genre Computers
ISBN 9783319034096

Download Data Analysis and Pattern Recognition in Multiple Databases Book in PDF, Epub and Kindle

Pattern recognition in data is a well known classical problem that falls under the ambit of data analysis. As we need to handle different data, the nature of patterns, their recognition and the types of data analyses are bound to change. Since the number of data collection channels increases in the recent time and becomes more diversified, many real-world data mining tasks can easily acquire multiple databases from various sources. In these cases, data mining becomes more challenging for several essential reasons. We may encounter sensitive data originating from different sources - those cannot be amalgamated. Even if we are allowed to place different data together, we are certainly not able to analyze them when local identities of patterns are required to be retained. Thus, pattern recognition in multiple databases gives rise to a suite of new, challenging problems different from those encountered before. Association rule mining, global pattern discovery and mining patterns of select items provide different patterns discovery techniques in multiple data sources. Some interesting item-based data analyses are also covered in this book. Interesting patterns, such as exceptional patterns, icebergs and periodic patterns have been recently reported. The book presents a thorough influence analysis between items in time-stamped databases. The recent research on mining multiple related databases is covered while some previous contributions to the area are highlighted and contrasted with the most recent developments.

Data Mining in Time Series Databases

Data Mining in Time Series Databases
Title Data Mining in Time Series Databases PDF eBook
Author Abraham Kandel
Publisher World Scientific
Pages 205
Release 2004
Genre Computers
ISBN 981256540X

Download Data Mining in Time Series Databases Book in PDF, Epub and Kindle

Adding the time dimension to real-world databases produces Time SeriesDatabases (TSDB) and introduces new aspects and difficulties to datamining and knowledge discovery. This book covers the state-of-the-artmethodology for mining time series databases. The novel data miningmethods presented in the book include techniques for efficientsegmentation, indexing, and classification of noisy and dynamic timeseries. A graph-based method for anomaly detection in time series isdescribed and the book also studies the implications of a novel andpotentially useful representation of time series as strings. Theproblem of detecting changes in data mining models that are inducedfrom temporal databases is additionally discussed.

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Title Data Mining: Concepts and Techniques PDF eBook
Author Jiawei Han
Publisher Elsevier
Pages 740
Release 2011-06-09
Genre Computers
ISBN 0123814804

Download Data Mining: Concepts and Techniques Book in PDF, Epub and Kindle

Data Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from the collected data. This book is referred as the knowledge discovery from data (KDD). It focuses on the feasibility, usefulness, effectiveness, and scalability of techniques of large data sets. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. It then presents information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining. Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data