Dynamic Task Execution on Shared and Distributed Memory Architectures

Dynamic Task Execution on Shared and Distributed Memory Architectures
Title Dynamic Task Execution on Shared and Distributed Memory Architectures PDF eBook
Author Asim Yarkhan
Publisher
Pages 122
Release 2012
Genre
ISBN

Download Dynamic Task Execution on Shared and Distributed Memory Architectures Book in PDF, Epub and Kindle

Multicore architectures with high core counts have come to dominate the world of high performance computing, from shared memory machines to the largest distributed memory clusters. The multicore route to increased performance has a simpler design and better power efficiency than the traditional approach of increasing processor frequencies. But, standard programming techniques are not well adapted to this change in computer architecture design. In this work, we study the use of dynamic runtime environments executing data driven applications as a solution to programming multicore architectures. The goals of our runtime environments are productivity, scalability and performance. We demonstrate productivity by defining a simple programming interface to express code. Our runtime environments are experimentally shown to be scalable and give competitive performance on large multicore and distributed memory machines. This work is driven by linear algebra algorithms, where state-of-the-art libraries (e.g., LAPACK and ScaLAPACK) using a fork-join or block-synchronous execution style do not use the available resources in the most efficient manner. Research work in linear algebra has reformulated these algorithms as tasks acting on tiles of data, with data dependency relationships between the tasks. This results in a task-based DAG for the reformulated algorithms, which can be executed via asynchronous data-driven execution paths analogous to dataflow execution. We study an API and runtime environment for shared memory architectures that efficiently executes serially presented tile based algorithms. This runtime is used to enable linear algebra applications and is shown to deliver performance competitive with state-of-the-art commercial and research libraries. We develop a runtime environment for distributed memory multicore architectures extended from our shared memory implementation. The runtime takes serially presented algorithms designed for the shared memory environment, and schedules and executes them on distributed memory architectures in a scalable and high performance manner. We design a distributed data coherency protocol and a distributed task scheduling mechanism which avoid global coordination. Experimental results with linear algebra applications show the scalability and performance of our runtime environment.

Dynamic Task Discovery in a Data-flow, Task-based Runtime System

Dynamic Task Discovery in a Data-flow, Task-based Runtime System
Title Dynamic Task Discovery in a Data-flow, Task-based Runtime System PDF eBook
Author Reazul Hoque
Publisher
Pages 0
Release 2019
Genre Application software
ISBN

Download Dynamic Task Discovery in a Data-flow, Task-based Runtime System Book in PDF, Epub and Kindle

The successful utilization of the modern configuration of the heterogeneous many-core architectures with complex memory hierarchies is a challenge for many application developers. Portability and performance of existing and new applications are the key challenges scientific application developers are continuously facing. Many evolutionary solutions have been proposed, including ones that seek to extend the capabilities of the current message passing paradigm with intra-node features (MPI+X). A different, more revolutionary, solution explores data-flow task-based Runtime systems as a substitute to both local and distributed data dependencies management. The method of programming such a Runtime is important, as that directly affects the productivity of the developers and the performance of the applications. This work extends the capability of one of such runtime, the Parallel Runtime Scheduling and Execution Controller (PaRSEC), to the novel programming approach of allowing users to insert task in the Runtime by writing sequential code. This programming model is called Dynamic Task Discovery (DTD), which discovers tasks dynamically at runtime and uses optimized graph unrolling techniques to accommodate applications with large task graphs. In this work, PaRSEC's capability is extended by providing a new programming model, DTD. Bottlenecks of this programming model are identified and solutions to overcome its limitations are proposed. The performance of the implementation of DTD on top of dense linear algebra workload is analyzed at scale, where DTD has shown excellent results in distributed memory: 2.3x--1.3x better performance at 128 nodes for QR factorization compared to ScaLAPACK and in shared memory, 4x—5x better performance for Cholesky factorization compared to other runtimes, StarPU and QUARK. DTD was also evaluated via the coupled-cluster method of state of the art quantum chemistry application NWCHEM, where it performed remarkably well among all considered Runtimes at scale of 128 nodes. The hope is that the concept and the development of DTD, the detailed evaluation of its practical performance at scale, the analysis of the theoretical limitations of it, the thorough study and classification of various task-based Runtime systems, and the design, implementation and evaluations of the chosen Runtimes on micro-benchmarks will help the broad scientific application developer community.

Hierarchical Scheduling in Parallel and Cluster Systems

Hierarchical Scheduling in Parallel and Cluster Systems
Title Hierarchical Scheduling in Parallel and Cluster Systems PDF eBook
Author Sivarama Dandamudi
Publisher Springer Science & Business Media
Pages 284
Release 2003-06-30
Genre Computers
ISBN 9780306477614

Download Hierarchical Scheduling in Parallel and Cluster Systems Book in PDF, Epub and Kindle

Multiple processor systems are an important class of parallel systems. Over the years, several architectures have been proposed to build such systems to satisfy the requirements of high performance computing. These architectures span a wide variety of system types. At the low end of the spectrum, we can build a small, shared-memory parallel system with tens of processors. These systems typically use a bus to interconnect the processors and memory. Such systems, for example, are becoming commonplace in high-performance graph ics workstations. These systems are called uniform memory access (UMA) multiprocessors because they provide uniform access of memory to all pro cessors. These systems provide a single address space, which is preferred by programmers. This architecture, however, cannot be extended even to medium systems with hundreds of processors due to bus bandwidth limitations. To scale systems to medium range i. e. , to hundreds of processors, non-bus interconnection networks have been proposed. These systems, for example, use a multistage dynamic interconnection network. Such systems also provide global, shared memory like the UMA systems. However, they introduce local and remote memories, which lead to non-uniform memory access (NUMA) architecture. Distributed-memory architecture is used for systems with thousands of pro cessors. These systems differ from the shared-memory architectures in that there is no globally accessible shared memory. Instead, they use message pass ing to facilitate communication among the processors. As a result, they do not provide single address space.

Virtual Shared Memory for Distributed Architectures

Virtual Shared Memory for Distributed Architectures
Title Virtual Shared Memory for Distributed Architectures PDF eBook
Author Eva Kühn
Publisher Nova Publishers
Pages 138
Release 2001
Genre Architecture
ISBN 9781590331019

Download Virtual Shared Memory for Distributed Architectures Book in PDF, Epub and Kindle

Virtual Shared Memory for Distributed Architecture

Dynamic task allocation on shared memory multiprocessor systems

Dynamic task allocation on shared memory multiprocessor systems
Title Dynamic task allocation on shared memory multiprocessor systems PDF eBook
Author Jiahuang Ji
Publisher
Pages 22
Release 1990
Genre
ISBN

Download Dynamic task allocation on shared memory multiprocessor systems Book in PDF, Epub and Kindle

Languages, Compilers and Run-time Environments for Distributed Memory Machines

Languages, Compilers and Run-time Environments for Distributed Memory Machines
Title Languages, Compilers and Run-time Environments for Distributed Memory Machines PDF eBook
Author J. Saltz
Publisher Elsevier
Pages 323
Release 2014-06-28
Genre Computers
ISBN 1483295389

Download Languages, Compilers and Run-time Environments for Distributed Memory Machines Book in PDF, Epub and Kindle

Papers presented within this volume cover a wide range of topics related to programming distributed memory machines. Distributed memory architectures, although having the potential to supply the very high levels of performance required to support future computing needs, present awkward programming problems. The major issue is to design methods which enable compilers to generate efficient distributed memory programs from relatively machine independent program specifications. This book is the compilation of papers describing a wide range of research efforts aimed at easing the task of programming distributed memory machines.

Parallel Algorithm Derivation and Program Transformation

Parallel Algorithm Derivation and Program Transformation
Title Parallel Algorithm Derivation and Program Transformation PDF eBook
Author Robert Paige
Publisher Springer Science & Business Media
Pages 228
Release 2007-08-28
Genre Computers
ISBN 0585273308

Download Parallel Algorithm Derivation and Program Transformation Book in PDF, Epub and Kindle

This book contains selected papers from the ONR Workshop on Parallel Algorithm Design and Program Transformation that took place at New York University, Courant Institute, from Aug. 30 to Sept. 1, 1991. The aim of the workshop was to bring together computer scientists in transformational programming and parallel algorithm design in order to encourage a sharing of ideas that might benefit both communities. It was hoped that exposurt: to algorithm design methods developed within the algorithm community would stimulate progress in software development for parallel architectures within the transformational community. It was also hoped that exposure to syntax directed methods and pragmatic programming concerns developed within the transformational community would encourage more realistic theoretical models of parallel architectures and more systematic and algebraic approaches to parallel algorithm design within the algorithm community. The workshop Organizers were Robert Paige, John Reif, and Ralph Wachter. The workshop was sponsored by the Office of Naval Research under grant number N00014-90-J-1421. There were 44 attendees, 28 presentations, and 5 system demonstrations. All attendees were invited to submit a paper for publication in the book. Each submitted paper was refereed by participants from the Workshop. The final decision on publication was made by the editors. There were several motivations for holding the workshop and for publishing papers contributed by its participants. Transformational programming and parallel computation are two emerging fields that may ultimately depend on each other for success.