Cache Coherence Techniques for Multicore Processors

Cache Coherence Techniques for Multicore Processors
Title Cache Coherence Techniques for Multicore Processors PDF eBook
Author Michael R. Marty
Publisher
Pages 232
Release 2008
Genre
ISBN

Download Cache Coherence Techniques for Multicore Processors Book in PDF, Epub and Kindle

A Primer on Memory Consistency and Cache Coherence

A Primer on Memory Consistency and Cache Coherence
Title A Primer on Memory Consistency and Cache Coherence PDF eBook
Author Daniel Sorin
Publisher Morgan & Claypool Publishers
Pages 214
Release 2011-03-02
Genre Technology & Engineering
ISBN 1608455653

Download A Primer on Memory Consistency and Cache Coherence Book in PDF, Epub and Kindle

Many modern computer systems and most multicore chips (chip multiprocessors) support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, the memory consistency model defines the architecturally visible behavior of its memory system. Consistency definitions provide rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of supporting a memory consistency model, many machines also provide cache coherence protocols that ensure that multiple cached copies of data are kept up-to-date. The goal of this primer is to provide readers with a basic understanding of consistency and coherence. This understanding includes both the issues that must be solved as well as a variety of solutions. We present both highlevel concepts as well as specific, concrete examples from real-world systems. Table of Contents: Preface / Introduction to Consistency and Coherence / Coherence Basics / Memory Consistency Motivation and Sequential Consistency / Total Store Order and the x86 Memory Model / Relaxed Memory Consistency / Coherence Protocols / Snooping Coherence Protocols / Directory Coherence Protocols / Advanced Topics in Coherence / Author Biographies

Cache Coherence Strategies in a Many-core Processor

Cache Coherence Strategies in a Many-core Processor
Title Cache Coherence Strategies in a Many-core Processor PDF eBook
Author Christopher P. Celio
Publisher
Pages 55
Release 2009
Genre
ISBN

Download Cache Coherence Strategies in a Many-core Processor Book in PDF, Epub and Kindle

Caches are frequently employed in memory systems, exploiting memory locality to gain advantages in high-speed performance and low latency. However, as computer processor core counts increase, maintaining coherence between caches becomes increasingly difficult. Current methods of cache coherence work well in small-scale multi-core processors, however, the viability of cache coherence as processors scale to thousands of cores remains an open question. A novel many-core execution-driven performance simulator, called Graphite and implemented by the Carbon group, has been utilized to study a variety of cache coherency strategies of processors up to 256 cores. Results suggest that cache coherence may be possible in future many-core processors, but that software developers will have to exercise great care to match their algorithms to the target architecture to avoid sub-optimal performance.

Multi-Core Cache Hierarchies

Multi-Core Cache Hierarchies
Title Multi-Core Cache Hierarchies PDF eBook
Author Rajeev Balasubramonian
Publisher Morgan & Claypool Publishers
Pages 155
Release 2011-06-06
Genre Technology & Engineering
ISBN 1598297546

Download Multi-Core Cache Hierarchies Book in PDF, Epub and Kindle

A key determinant of overall system performance and power dissipation is the cache hierarchy since access to off-chip memory consumes many more cycles and energy than on-chip accesses. In addition, multi-core processors are expected to place ever higher bandwidth demands on the memory system. All these issues make it important to avoid off-chip memory access by improving the efficiency of the on-chip cache. Future multi-core processors will have many large cache banks connected by a network and shared by many cores. Hence, many important problems must be solved: cache resources must be allocated across many cores, data must be placed in cache banks that are near the accessing core, and the most important data must be identified for retention. Finally, difficulties in scaling existing technologies require adapting to and exploiting new technology constraints. The book attempts a synthesis of recent cache research that has focused on innovations for multi-core processors. It is an excellent starting point for early-stage graduate students, researchers, and practitioners who wish to understand the landscape of recent cache research. The book is suitable as a reference for advanced computer architecture classes as well as for experienced researchers and VLSI engineers. Table of Contents: Basic Elements of Large Cache Design / Organizing Data in CMP Last Level Caches / Policies Impacting Cache Hit Rates / Interconnection Networks within Large Caches / Technology / Concluding Remarks

Locality-aware Cache Hierarchy Management for Multicore Processors

Locality-aware Cache Hierarchy Management for Multicore Processors
Title Locality-aware Cache Hierarchy Management for Multicore Processors PDF eBook
Author
Publisher
Pages 194
Release 2015
Genre
ISBN

Download Locality-aware Cache Hierarchy Management for Multicore Processors Book in PDF, Epub and Kindle

Next generation multicore processors and applications will operate on massive data with significant sharing. A major challenge in their implementation is the storage requirement for tracking the sharers of data. The bit overhead for such storage scales quadratically with the number of cores in conventional directory-based cache coherence protocols. Another major challenge is limited cache capacity and the data movement incurred by conventional cache hierarchy organizations when dealing with massive data scales. These two factors impact memory access latency and energy consumption adversely. This thesis proposes scalable efficient mechanisms that improve effective cache capacity (i.e., by improving utilization) and reduce data movement by exploiting locality and controlling replication. First, a limited directory-based protocol, ACKwise is proposed to track the sharers of data in a cost-effective manner. ACKwise leverages broadcasts to implement scalable cache coherence. Broadcast support can be implemented in a 2-D mesh network by making simple changes to its routing policy without requiring any additional virtual channels. Second, a locality-aware replication scheme that better manages the private caches is proposed. This scheme controls replication based on data reuse information and seamlessly adapts between private and logically shared caching of on-chip data at the fine granularity of cache lines. A low-overhead runtime profiling capability to measure the locality of each cache line is built into hardware. Private caching is only allowed for data blocks with high spatio-temporal locality. Third, a Timestamp-based memory ordering validation scheme is proposed that enables the locality-aware private cache replication scheme to be implementable in processors with out-of-order memory that employ popular memory consistency models. This method does not rely on cache coherence messages to detect speculation violations, and hence is applicable to the locality-aware protocol. The timestamp mechanism is efficient due to the observation that consistency violations only occur due to conflicting accesses that have temporal proximity (i.e., within a few cycles of each other), thus requiring timestamps to be stored only for a small time window. Fourth, a locality-aware last-level cache (LLC) replication scheme that better manages the LLC is proposed. This scheme adapts replication at runtime based on fine-grained cache line reuse information and thereby, balances data locality and off-chip miss rate for optimized execution. Finally, all the above schemes are combined to obtain a cache hierarchy replication scheme that provides optimal data locality and miss rates at all levels of the cache hierarchy. The design of this scheme is motivated by the experimental observation that both locality-aware private cache & LLC replication enable varying performance improvements across benchmarks. These techniques enable optimal use of the on-chip cache capacity, and provide low-latency, low-energy memory access, while retaining the convenience of shared memory and preserving the same memory consistency model. On a 64-core multicore processor with out-of-order cores, Locality-aware Cache Hierarchy Replication improves completion time by 15% and energy by 22% over a state-of-the-art baseline while incurring a storage overhead of 30.7 KB per core. (i.e., 10% the aggregate cache capacity of each core).

A Primer on Memory Consistency and Cache Coherence

A Primer on Memory Consistency and Cache Coherence
Title A Primer on Memory Consistency and Cache Coherence PDF eBook
Author Vijay Nagarajan
Publisher Morgan & Claypool Publishers
Pages 296
Release 2020-02-04
Genre Computers
ISBN 1681737108

Download A Primer on Memory Consistency and Cache Coherence Book in PDF, Epub and Kindle

Many modern computer systems, including homogeneous and heterogeneous architectures, support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, the memory consistency model defines the architecturally visible behavior of its memory system. Consistency definitions provide rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of supporting a memory consistency model, many machines also provide cache coherence protocols that ensure that multiple cached copies of data are kept up-to-date. The goal of this primer is to provide readers with a basic understanding of consistency and coherence. This understanding includes both the issues that must be solved as well as a variety of solutions. We present both high-level concepts as well as specific, concrete examples from real-world systems. This second edition reflects a decade of advancements since the first edition and includes, among other more modest changes, two new chapters: one on consistency and coherence for non-CPU accelerators (with a focus on GPUs) and one that points to formal work and tools on consistency and coherence.

Multicore Technology

Multicore Technology
Title Multicore Technology PDF eBook
Author Muhammad Yasir Qadri
Publisher CRC Press
Pages 492
Release 2013-07-26
Genre Computers
ISBN 1439880646

Download Multicore Technology Book in PDF, Epub and Kindle

The saturation of design complexity and clock frequencies for single-core processors has resulted in the emergence of multicore architectures as an alternative design paradigm. Nowadays, multicore/multithreaded computing systems are not only a de-facto standard for high-end applications, they are also gaining popularity in the field of embedded computing. The start of the multicore era has altered the concepts relating to almost all of the areas of computer architecture design, including core design, memory management, thread scheduling, application support, inter-processor communication, debugging, and power management. This book gives readers a holistic overview of the field and guides them to further avenues of research by covering the state of the art in this area. It includes contributions from industry as well as academia.