Implementation of a Fault Tolerant Computing Testbed

Implementation of a Fault Tolerant Computing Testbed
Title Implementation of a Fault Tolerant Computing Testbed PDF eBook
Author David C. Summers
Publisher
Pages 185
Release 2000-06-01
Genre
ISBN 9781423536611

Download Implementation of a Fault Tolerant Computing Testbed Book in PDF, Epub and Kindle

With spacecraft designs placing more emphasis on reduced cost, faster design time, and higher performance, it is easy to understand why more commercial-off-the-shelf (COTS) devices are being used in space based applications. The COTS devices offer spacecraft designers shorter design-to- orbit times, lower system costs, orders of magnitude better performance, and a much better software availability than their radiation hardened (radhard) counterparts. The major drawback to using COTS devices in space is their increased susceptibility to the effects of radiation, single event upsets (SEUs) in particular. This thesis will focus on the implementation of a fault tolerant computer system. The hardware design presented here has two different benefits. First, the system can act as a software testbed, which allows testing of software fault tolerant techniques in the presence of radiation induced SEUs. This allows the testing of the software algorithms in the environment they were designed to operate in without the expense of being placed in orbit. Additionally, the design can be used as a hybrid fault tolerant computer system. By combining the masking ability of the hardware with supporting software, the system can mask out and reset processor errors in real time. The design layout will be presented using OrCAD schematics.

Fault Tolerant Computing Testbed

Fault Tolerant Computing Testbed
Title Fault Tolerant Computing Testbed PDF eBook
Author John C. Payne
Publisher
Pages 184
Release 1998
Genre
ISBN

Download Fault Tolerant Computing Testbed Book in PDF, Epub and Kindle

Operating computers in space requires the use of very expensive radiation hardened microelectronics devices. Unfortunately, the United States radiation hardened market is rapidly shrinking and makes up a very small percentage of the commercial market. For these reasons, and the fact that commercial-off-the-shelf (COTS) devices are cheaper, more capable, readily available, and software availability is much greater, the use of COTS devices in future space systems is fast becoming a reality. A significant disadvantage of COTS devices is their susceptibility to radiation induced single event upsets (SEUs), among other radiation effects which are detrimental to electronic systems. This thesis focuses on the board level design of a tool which enables the analysis of fault tolerant computing techniques in a laboratory environment in the presence of radiation induced SEUs. When implemented, this tool will be beneficial to the study of using COTS devices in space. The tool will provide the capability to analyze the performance of hardware redundancy techniques and software algorithms intended to improve the performance of COTS microprocessors in this environment prior to their use in designs intended for actual space applications. Cadence Concept(TM) design schematics, associated Verilog(registered) code and simulation results are presented to develop this concept.

Fault Tolerant Computing Testbed

Fault Tolerant Computing Testbed
Title Fault Tolerant Computing Testbed PDF eBook
Author John C. Payne, Jr.
Publisher
Pages 184
Release 1998-12-01
Genre
ISBN 9781423554882

Download Fault Tolerant Computing Testbed Book in PDF, Epub and Kindle

Operating computers in space requires the use of very expensive radiation hardened microelectronics devices. Unfortunately, the United States radiation hardened market is rapidly shrinking and makes up a very small percentage of the commercial market. For these reasons, and the fact that commercial-off-the-shelf (COTS) devices are cheaper, more capable, readily available, and software availability is much greater, the use of COTS devices in future space systems is fast becoming a reality. A significant disadvantage of COTS devices is their susceptibility to radiation induced single event upsets (SEUs), among other radiation effects which are detrimental to electronic systems. This thesis focuses on the board level design of a tool which enables the analysis of fault tolerant computing techniques in a laboratory environment in the presence of radiation induced SEUs. When implemented, this tool will be beneficial to the study of using COTS devices in space. The tool will provide the capability to analyze the performance of hardware redundancy techniques and software algorithms intended to improve the performance of COTS microprocessors in this environment prior to their use in designs intended for actual space applications. Cadence Concept(TM) design schematics, associated Verilog(registered) code and simulation results are presented to develop this concept.

Software Fault Tolerance Techniques and Implementation

Software Fault Tolerance Techniques and Implementation
Title Software Fault Tolerance Techniques and Implementation PDF eBook
Author Laura L. Pullum
Publisher Artech House
Pages 358
Release 2001
Genre Computers
ISBN 1580531377

Download Software Fault Tolerance Techniques and Implementation Book in PDF, Epub and Kindle

Look to this innovative resource for the most-comprehensive coverage of software fault tolerance techniques available in a single volume. It offers you a thorough understanding of the operation of critical software fault tolerance techniques and guides you through their design, operation and performance. You get an in-depth discussion on the advantages and disadvantages of specific techniques, so you can decide which ones are best suited for your work.

Fault-Tolerant Computing Systems

Fault-Tolerant Computing Systems
Title Fault-Tolerant Computing Systems PDF eBook
Author Mario Dal Cin
Publisher Springer
Pages 446
Release 1991-09-11
Genre Computers
ISBN

Download Fault-Tolerant Computing Systems Book in PDF, Epub and Kindle

A survey of the state of research, development and applications of all aspects of fault tolerance and dependability in computing and automation systems (hardware and software) with the topics: reconfiguration and recovery, system level diagnosis, voting and agreement, testing, fault-tolerant circuits, array testing, modelling, applied fault tolerance, fault-tolerant arrays and systems, interconnection networks, fault-tolerant software.

A Testbed for Evaluation of Fault-tolerant Routing in Multiprocessor Interconnection Networks

A Testbed for Evaluation of Fault-tolerant Routing in Multiprocessor Interconnection Networks
Title A Testbed for Evaluation of Fault-tolerant Routing in Multiprocessor Interconnection Networks PDF eBook
Author Aniruddha S. Vaidya
Publisher
Pages 20
Release 1998
Genre Computer networks
ISBN

Download A Testbed for Evaluation of Fault-tolerant Routing in Multiprocessor Interconnection Networks Book in PDF, Epub and Kindle

Abstract: "With parallel machines increasingly taking on critical and complex applications, it is important to make them dependable to ensure their commercial success. Fault-tolerance in the network to accommodate link and node failures is an important step towards this goal. This can be achieved by employing cost-effective fault-tolerant algorithms. However, despite substantial efforts on the theoretical front in developing fault-tolerant routing techniques and architectures, these ideas have not manifested themselves in many commercial platforms. The ramifications of providing fault-tolerant routing in terms of cost and performance is still not clear to the computer architect. Such an insight can only be gained through detailed analysis of a design with realistic workloads. Since no current evaluation platform supports this, previous research on fault-tolerant routing has used synthetic workloads for analyzing performance. This paper presents a comprehensive evaluation testbed for interconnection networks and routing algorithms using real applications. The testbed is flexible enough to implement any network topology and fault-tolerant routing algorithm, and allows the system architect to study the cost versus performance tradeoffs for a range of network parameters. We illustrate its use with one fault-tolerant algorithm and analyze the performance of four shared memory applications with different fault conditions. We also show how the testbed can be used to drive future research in fault-tolerant routing algorithms and architectures, by proposing and evaluating novel architectural enhancements to the network router, called path selection heuristics (PSH). We propose three such schemes and the Least Recently Used (LRU) PSH is shown to give the best performance in the presence of faults."

Completion and Testing of a TMR Computing Testbed and Recommendations for a Flight-Ready Follow-on Design

Completion and Testing of a TMR Computing Testbed and Recommendations for a Flight-Ready Follow-on Design
Title Completion and Testing of a TMR Computing Testbed and Recommendations for a Flight-Ready Follow-on Design PDF eBook
Author Damen O. Hofheinz
Publisher
Pages 172
Release 2000-12-01
Genre
ISBN 9781423532552

Download Completion and Testing of a TMR Computing Testbed and Recommendations for a Flight-Ready Follow-on Design Book in PDF, Epub and Kindle

This thesis focuses on the completion and hardware testing of a fault tolerant computer system utilizing Triple Modular Redundancy (TMR). Due to the radiation environment in space, electronics in space applications must be designed to accommodate single event phenomena. While radiation hardened processors are available, they offer lower performance and higher cost than commercial off the shelf processors. In order to utilize non-hardened devices, a fault tolerance scheme such as TMR may be implemented to increase reliability in a radiation environment. The design that was completed in this effort is one such implementation. The completion of the hardware design consisted of programming logic devices, implementing hardware design corrections, and the design of an overall system controller. The testing effort included basic power and ground verification checks to programming, executing, and evaluating programs in read only memory. During this phase, additional design changes were implemented to correct design flaws. This thesis also evaluated the preliminary design changes required for a space implementation of this TMR design. This included design changes due to size, power, and weight restrictions. Additionally, a detailed analysis of component survivability was performed based on past radiation testing.