A Submodular Optimization Framework for Never-ending Learning

A Submodular Optimization Framework for Never-ending Learning
Title A Submodular Optimization Framework for Never-ending Learning PDF eBook
Author Wael Emara
Publisher
Pages 0
Release 2012
Genre Data mining
ISBN

Download A Submodular Optimization Framework for Never-ending Learning Book in PDF, Epub and Kindle

The revolution in information technology and the explosion in the use of computing devices in people's everyday activities has forever changed the perspective of the data mining and machine learning fields. The enormous amounts of easily accessible, information rich data is pushing the data analysis community in general towards a shift of paradigm. In the new paradigm, data comes in the form a stream of billions of records received everyday. The dynamic nature of the data and its sheer size makes it impossible to use the traditional notion of offline learning where the whole data is accessible at any time point. Moreover, no amount of human resources is enough to get expert feedback on the data. In this work we have developed a unified optimization based learning framework that approaches many of the challenges mentioned earlier. Specifically, we developed a Never-Ending Learning framework which combines incremental/online, semi-supervised, and active learning under a unified optimization framework. The established framework is based on the class of submodular optimization methods. At the core of this work we provide a novel formulation of the Semi-Supervised Support Vector Machines (S3VM) in terms of submodular set functions. The new formulation overcomes the non-convexity issues of the S3VM and provides a state of the art solution that is orders of magnitude faster than the cutting edge algorithms in the literature. Next, we provide a stream summarization technique via exemplar selection. This technique makes it possible to keep a fixed size exemplar representation of a data stream that can be used by any label propagation based semi-supervised learning technique. The compact data steam representation allows a wide range of algorithms to be extended to incremental/online learning scenario. Under the same optimization framework, we provide an active learning algorithm that constitute the feedback between the learning machine and an oracle. Finally, the developed Never-Ending Learning framework is essentially transductive in nature. Therefore, our last contribution is an inductive incremental learning technique for incremental training of SVM using the properties of local kernels. We demonstrated through this work the importance and wide applicability of the proposed methodologies.

Submodular Optimization and Data Processing

Submodular Optimization and Data Processing
Title Submodular Optimization and Data Processing PDF eBook
Author Kai Wei
Publisher
Pages 271
Release 2016
Genre
ISBN

Download Submodular Optimization and Data Processing Book in PDF, Epub and Kindle

Data sets are large and and are getting larger. Two common paradigms – data summarization and data partitioning, are often used to handle the big data. Data summarization aims at identifying a small sized subset of the data that attains the maximum utility or information, while the goal of data partitioning is to split the data across multiple compute nodes so that the data block residing on each node becomes manageable. In this dissertation, we investigate how to apply submodularity to these two data processing paradigms. In the first part of this thesis, we study the connection of submodularity to the data summarization paradigm. First we show that data summarization subsumes a number of applications, including acoustic data subset selection for training speech recognizers [Wei et al., 2014], genomics assay panel selection [Wei et al., 2016], batch active learning [Wei et al., 2015], image summarization [Tschiatschek et al., 2014], document summarization [Lin and Bilmes, 2012], feature subset selection [Liu et al., 2013], and etc. Among these tasks, we perform case studies on the former three applications. We show how to apply the appropriate submodular set functions to model the utility for these tasks, and formulate the correspond- ing data summarization task as a constrained submodular maximization, which admits an efficient greedy heuristic for optimization [Nemhauser et al., 1978]. To better model the util- ity function for an underlying data summarization task, we also propose a novel “interactive” setting for learning mixtures of submodular functions. For such interactive learning setting, we propose an algorithmic framework and show that it is effective for both the acoustic data selection and the image summarization tasks. While the simple greedy heuristic al- ready efficiently and near-optimally solves the constrained submodular maximization, data summarization tasks may still be computationally challenging for large-scale scenarios. To this end, we introduce a novel multistage algorithmic framework called MultiGreed, to significantly scale the greedy algorithm to even larger problem instances. We theoretically show that MultGreed performs very closely to the greedy algorithm and also empirically demonstrate the significant speedup of MultGreed over the standard greedy algorithm on a number of real-world data summarization tasks. In the second part of this thesis, we connect submodularity to data partitioning. We first propose two novel submodular data partitioning problems that we collectively call Submodu- lar Partitioning. To solve the submodular partitioning, we propose several novel algorithmic frameworks (including greedy, majorization-minimization, minorization-maximization, and relaxation algorithms) that not only scale to large datasets but that also achieve theoretical approximation guarantees comparable to the state-of-the-art. We show that submodular partitioning subsumes a number of machine learning applications, including load balancing for parallel systems, intelligent data partitioning for parallel training of statistical models, and unsupervised image segmentation. We perform a case study on the last application. For this case, we demonstrate the appropriate choice of submodular utility model and the corresponding submodular partitioning formulation. Empirical evidences suggest that the submodular partitioning framework is effective for the intelligent data partitioning task.

Submodular Optimization and Machine Learning

Submodular Optimization and Machine Learning
Title Submodular Optimization and Machine Learning PDF eBook
Author Rishabh Iyer
Publisher
Pages 282
Release 2015
Genre
ISBN

Download Submodular Optimization and Machine Learning Book in PDF, Epub and Kindle

In this dissertation, we explore a class of unifying and scalable algorithms for a number of submodular optimization problems, and connect them to several machine learning applications. These optimization problems include, 1. Constrained and Unconstrained Submodular Minimization, 2. Constrained and Unconstrained Submodular Maximization, 3. Difference of Submodular Optimization 4. Submodular Optimization subject to Submodular Constraints The main focus of this thesis, is to study these problems theoretically, in the light of the machine learning problems where they naturally occur. We provide scalable, practical and unifying algorithms for all the above optimization problems, which retain good theoretical guarantees, and investigate the underlying hardness of these optimization problems. We also study natural subclasses of submodular functions, along with theoretical constructs, which help connect theory to practice by providing tighter worst case guarantees. While the focus of this thesis is mainly theoretical, we also empirically demonstrate the applicability of these techniques on several synthetic and real world problems.

Learning with Submodular Functions

Learning with Submodular Functions
Title Learning with Submodular Functions PDF eBook
Author Francis Bach
Publisher
Pages 228
Release 2013
Genre Convex functions
ISBN 9781601987570

Download Learning with Submodular Functions Book in PDF, Epub and Kindle

Submodular functions are relevant to machine learning for at least two reasons: (1) some problems may be expressed directly as the optimization of submodular functions and (2) the Lovász extension of submodular functions provides a useful set of regularization functions for supervised and unsupervised learning. In this monograph, we present the theory of submodular functions from a convex analysis perspective, presenting tight links between certain polyhedra, combinatorial optimization and convex optimization problems. In particular, we show how submodular function minimization is equivalent to solving a wide variety of convex optimization problems. This allows the derivation of new efficient algorithms for approximate and exact submodular function minimization with theoretical guarantees and good practical performance. By listing many examples of submodular functions, we review various applications to machine learning, such as clustering, experimental design, sensor placement, graphical model structure learning or subset selection, as well as a family of structured sparsity-inducing norms that can be derived and used from submodular functions.

Active Learning and Submodular Functions

Active Learning and Submodular Functions
Title Active Learning and Submodular Functions PDF eBook
Author Andrew Guillory
Publisher
Pages 128
Release 2012
Genre Submodular functions
ISBN

Download Active Learning and Submodular Functions Book in PDF, Epub and Kindle

Active learning is a machine learning setting where the learning algorithm decides what data is labeled. Submodular functions are a class of set functions for which many optimization problems have efficient exact or approximate algorithms. We examine their connections. 1. We propose a new class of interactive submodular optimization problems which connect and generalize submodular optimization and active learning over a finite query set. We derive greedy algorithms with approximately optimal worst-case cost. These analyses apply to exact learning, approximate learning, learning in the presence of adversarial noise, and applications that mix learning and covering. 2. We consider active learning in a batch, transductive setting where the learning algorithm selects a set of examples to be labeled at once. In this setting we derive new error bounds which use symmetric submodular functions for regularization, and we give algorithms which approximately minimize these bounds. 3. We consider a repeated active learning setting where the learning algorithm solves a sequence of related learning problems. We propose an approach to this problem based on a new online prediction version of submodular set cover. A common theme in these results is the use of tools from submodular optimization to extend the breadth and depth of learning theory with an emphasis on non-stochastic settings.

Automated Machine Learning

Automated Machine Learning
Title Automated Machine Learning PDF eBook
Author Frank Hutter
Publisher Springer
Pages 223
Release 2019-05-17
Genre Computers
ISBN 3030053180

Download Automated Machine Learning Book in PDF, Epub and Kindle

This open access book presents the first comprehensive overview of general methods in Automated Machine Learning (AutoML), collects descriptions of existing systems based on these methods, and discusses the first series of international challenges of AutoML systems. The recent success of commercial ML applications and the rapid growth of the field has created a high demand for off-the-shelf ML methods that can be used easily and without expert knowledge. However, many of the recent machine learning successes crucially rely on human experts, who manually select appropriate ML architectures (deep learning architectures or more traditional ML workflows) and their hyperparameters. To overcome this problem, the field of AutoML targets a progressive automation of machine learning, based on principles from optimization and machine learning itself. This book serves as a point of entry into this quickly-developing field for researchers and advanced students alike, as well as providing a reference for practitioners aiming to use AutoML in their work.

Active Learning

Active Learning
Title Active Learning PDF eBook
Author Burr Chen
Publisher Springer Nature
Pages 100
Release 2022-05-31
Genre Computers
ISBN 3031015606

Download Active Learning Book in PDF, Epub and Kindle

The key idea behind active learning is that a machine learning algorithm can perform better with less training if it is allowed to choose the data from which it learns. An active learner may pose "queries," usually in the form of unlabeled data instances to be labeled by an "oracle" (e.g., a human annotator) that already understands the nature of the problem. This sort of approach is well-motivated in many modern machine learning and data mining applications, where unlabeled data may be abundant or easy to come by, but training labels are difficult, time-consuming, or expensive to obtain. This book is a general introduction to active learning. It outlines several scenarios in which queries might be formulated, and details many query selection algorithms which have been organized into four broad categories, or "query selection frameworks." We also touch on some of the theoretical foundations of active learning, and conclude with an overview of the strengths and weaknesses of these approaches in practice, including a summary of ongoing work to address these open challenges and opportunities. Table of Contents: Automating Inquiry / Uncertainty Sampling / Searching Through the Hypothesis Space / Minimizing Expected Error and Variance / Exploiting Structure in Data / Theory / Practical Considerations