Computer Architecture: A Quantitative Approach 5th Edition by John L Hennessy, David A Patterson – Ebook PDF Instant Download/Delivery. 012383872X, 9780123838728
Full download Computer Architecture: A Quantitative Approach 5th Edition after payment
Product details:
ISBN 10: 012383872X
ISBN 13: 9780123838728
Author: John L Hennessy, David A Patterson
The computing world today is in the middle of a revolution: mobile clients and cloud computing have emerged as the dominant paradigms driving programming and hardware innovation today. The Fifth Edition of Computer Architecture focuses on this dramatic shift, exploring the ways in which software and technology in the cloud are accessed by cell phones, tablets, laptops, and other mobile computing devices. Each chapter includes two real-world examples, one mobile and one datacenter, to illustrate this revolutionary change.
- Part of Intel’s 2012 Recommended Reading List for Developers
- Updated to cover the mobile computing revolution
- Emphasizes the two most important topics in architecture today: memory hierarchy and parallelism in all its forms.
- Develops common themes throughout each chapter: power, performance, cost, dependability, protection, programming models, and emerging trends (“What’s Next”)
- Includes three review appendices in the printed text. Additional reference appendices are available online.
- Includes updated Case Studies and completely new exercises.
Computer Architecture: A Quantitative Approach 5th Table of contents:
In Praise of Computer Architecture: A Quantitative ApproachFifth Edition
Computer Architecture: A Quantitative Approach
Copyright
Dedication
Foreword
Table of Contents
Preface
Why We Wrote This Book
This Edition
Topic Selection and Organization
An Overview of the Content
Navigating the Text
Chapter Structure
Case Studies with Exercises
Supplemental Materials
Helping Improve This Book
Concluding Remarks
Acknowledgments
Contributors to the Fifth Edition
Reviewers
Advisory Panel
Appendices
Case Studies with Exercises
Additional Material
Contributors to Previous Editions
Reviewers
Appendices
Exercises
Case Studies with Exercises
Special Thanks
1 Fundamentals of Quantitative Design and Analysis
1.1 Introduction
1.2 Classes of Computers
Personal Mobile Device (PMD)
Desktop Computing
Servers
Clusters/Warehouse-Scale Computers
Embedded Computers
Classes of Parallelism and Parallel Architectures
1.3 Defining Computer Architecture
Instruction Set Architecture: The Myopic View of Computer Architecture
Genuine Computer Architecture: Designing the Organization and Hardware to Meet Goals and Functional
1.4 Trends in Technology
Performance Trends: Bandwidth over Latency
Scaling of Transistor Performance and Wires
1.5 Trends in Power and Energy in Integrated Circuits
Power and Energy: A Systems Perspective
Energy and Power within a Microprocessor
1.6 Trends in Cost
The Impact of Time, Volume, and Commoditization
Cost of an Integrated Circuit
Cost versus Price
Cost of Manufacturing versus Cost of Operation
1.7 Dependability
1.8 Measuring, Reporting, and Summarizing Performance
Benchmarks
Desktop Benchmarks
Server Benchmarks
Reporting Performance Results
Summarizing Performance Results
1.9 Quantitative Principles of Computer Design
Take Advantage of Parallelism
Principle of Locality
Focus on the Common Case
Amdahl’s Law
The Processor Performance Equation
1.10 Putting It All Together: Performance, Price, and Power
1.11 Fallacies and Pitfalls
1.12 Concluding Remarks
1.13 Historical Perspectives and References
Case Studies and Exercises by Diana Franklin
Case Study 1: Chip Fabrication Cost
Concepts illustrated by this case study
Case Study 2: Power Consumption in Computer Systems
Concepts illustrated by this case study
Exercises
2 Memory Hierarchy Design
2.1 Introduction
Basics of Memory Hierarchies: A Quick Review
2.2 Ten Advanced Optimizations of Cache Performance
First Optimization: Small and Simple First-Level Caches to Reduce Hit Time and Power
Second Optimization: Way Prediction to Reduce Hit Time
Third Optimization: Pipelined Cache Access to Increase Cache Bandwidth
Fourth Optimization: Nonblocking Caches to Increase Cache Bandwidth
Fifth Optimization: Multibanked Caches to Increase Cache Bandwidth
Sixth Optimization: Critical Word First and Early Restart to Reduce Miss Penalty
Seventh Optimization: Merging Write Buffer to Reduce Miss Penalty
Eighth Optimization: Compiler Optimizations to Reduce Miss Rate
Loop Interchange
Blocking
Ninth Optimization: Hardware Prefetching of Instructions and Data to Reduce Miss Penalty or Miss Rat
Tenth Optimization: Compiler-Controlled Prefetching to Reduce Miss Penalty or Miss Rate
Cache Optimization Summary
2.3 Memory Technology and Optimizations
SRAM Technology
DRAM Technology
Improving Memory Performance Inside a DRAM Chip
Graphics Data RAMs
Reducing Power Consumption in SDRAMs
Flash Memory
Enhancing Dependability in Memory Systems
2.4 Protection: Virtual Memory and Virtual Machines
Protection via Virtual Memory
Protection via Virtual Machines
Requirements of a Virtual Machine Monitor
(Lack of) Instruction Set Architecture Support for Virtual Machines
Impact of Virtual Machines on Virtual Memory and I/O
An Example VMM: The Xen Virtual Machine
2.5 Crosscutting Issues: The Design of Memory Hierarchies
Protection and Instruction Set Architecture
Coherency of Cached Data
2.6 Putting It All Together: Memory Hierachies in the ARM Cortex-A8 and Intel Core i7
The ARM Cortex-A8
Performance of the Cortex-A8 Memory Hierarchy
The Intel Core i7
Performance of the i7 Memory System
2.7 Fallacies and Pitfalls
2.8 Concluding Remarks: Looking Ahead
2.9 Historical Perspective and References
Case Studies and Exercises by Norman P. Jouppi, Naveen Muralimanohar, and Sheng Li
Case Study 1: Optimizing Cache Performance via AdvancedTechniques
Concepts illustrated by this case study
Case Study 2: Putting It All Together: Highly Parallel Memory Systems
Concept illustrated by this case study
Exercises
3 Instruction-Level Parallelism and Its Exploitation
3.1 Instruction-Level Parallelism: Concepts and Challenges
What Is Instruction-Level Parallelism?
Data Dependences and Hazards
Data Dependences
Name Dependences
Data Hazards
Control Dependences
3.2 Basic Compiler Techniques for Exposing ILP
Basic Pipeline Scheduling and Loop Unrolling
Summary of the Loop Unrolling and Scheduling
3.3 Reducing Branch Costs with Advanced Branch Prediction
Correlating Branch Predictors
Tournament Predictors: Adaptively Combining Local and Global Predictors
The Intel Core i7 Branch Predictor
3.4 Overcoming Data Hazards with Dynamic Scheduling
Dynamic Scheduling: The Idea
Dynamic Scheduling Using Tomasulo’s Approach
3.5 Dynamic Scheduling: Examples and the Algorithm
Tomasulo’s Algorithm: The Details
Tomasulo’s Algorithm: A Loop-Based Example
3.6 Hardware-Based Speculation
3.7 Exploiting ILP Using Multiple Issue and Static Scheduling
The Basic VLIW Approach
3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation
3.9 Advanced Techniques for Instruction Delivery and Speculation
Increasing Instruction Fetch Bandwidth
Branch-Target Buffers
Return Address Predictors
Integrated Instruction Fetch Units
Speculation: Implementation Issues and Extensions
Speculation Support: Register Renaming versus Reorder Buffers
How Much to Speculate
Speculating through Multiple Branches
Speculation and the Challenge of Energy Efficiency
Value Prediction
3.10 Studies of the Limitations of ILP
The Hardware Model
Limitations on ILP for Realizable Processors
Beyond the Limits of This Study
3.11 Cross-Cutting Issues: ILP Approaches and the Memory System
Hardware versus Software Speculation
Speculative Execution and the Memory System
3.12 Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor Throughput
Effectiveness of Fine-Grained Multithreading on the Sun T1
T1 Multithreading Unicore Performance
Effectiveness of Simultaneous Multithreading on Superscalar Processors
3.13 Putting It All Together: The Intel Core i7 and ARM Cortex-A8
The ARM Cortex-A8
Performance of the A8 Pipeline
The Intel Core i7
Performance of the i7
3.14 Fallacies and Pitfalls
3.15 Concluding Remarks: What’s Ahead?
3.16 Historical Perspective and References
Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell
Case Study: Exploring the Impact of MicroarchitecturalTechniques
Concepts illustrated by this case study
Exercises
4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures
4.1 Introduction
4.2 Vector Architecture
VMIPS
How Vector Processors Work: An Example
Vector Execution Time
Multiple Lanes: Beyond One Element per Clock Cycle
Vector-Length Registers: Handling Loops Not Equal to 64
Vector Mask Registers: Handling IF Statements in Vector Loops
Memory Banks: Supplying Bandwidth for Vector Load/Store Units
Stride: Handling Multidimensional Arrays in Vector Architectures
Gather-Scatter: Handling Sparse Matrices in Vector Architectures
Programming Vector Architectures
4.3 SIMD Instruction Set Extensions for Multimedia
Programming Multimedia SIMD Architectures
The Roofline Visual Performance Model
4.4 Graphics Processing Units
Programming the GPU
NVIDIA GPU Computational Structures
NVIDA GPU Instruction Set Architecture
Conditional Branching in GPUs
NVIDIA GPU Memory Structures
Innovations in the Fermi GPU Architecture
Similarities and Differences between Vector Architectures and GPUs
Similarities and Differences between Multimedia SIMD Computers and GPUs
Summary
4.5 Detecting and Enhancing Loop-Level Parallelism
Finding Dependences
Eliminating Dependent Computations
4.6 Crosscutting Issues
Energy and DLP: Slow and Wide versus Fast and Narro
Banked Memory and Graphics Memory
Strided Accesses and TLB Misses
4.7 Putting It All Together: Mobile versus Server GPUs and Tesla versus Core i7
4.8 Fallacies and Pitfalls
4.9 Concluding Remarks
4.10 Historical Perspective and References
Case Study and Exercises by Jason D. Bakos
Case Study: Implementing a Vector Kernel on a Vector Processor and GPU
Concepts illustrated by this case study
Exercises
5 Thread-Level Parallelism
5.1 Introduction
Multiprocessor Architecture: Issues and Approach
Challenges of Parallel Processing
5.2 Centralized Shared-Memory Architectures
What Is Multiprocessor Cache Coherence?
Basic Schemes for Enforcing Coherence
Snooping Coherence Protocols
Basic Implementation Techniques
An Example Protocol
Extensions to the Basic Coherence Protocol
Limitations in Symmetric Shared-Memory Multiprocessors and Snooping Protocols
Implementing Snooping Cache Coherence
5.3 Performance of Symmetric Shared-Memory Multiprocessors
A Commercial Workload
Performance Measurements of the Commercial Workload
A Multiprogramming and OS Workload
Performance of the Multiprogramming and OS Workload
5.4 Distributed Shared-Memory and Directory-Based Coherence
Directory-Based Cache Coherence Protocols: The Basics
An Example Directory Protocol
5.5 Synchronization: The Basics
Basic Hardware Primitives
Implementing Locks Using Coherence
5.6 Models of Memory Consistency: An Introduction
The Programmer’s View
Relaxed Consistency Models: The Basics
Final Remarks on Consistency Models
5.7 Crosscutting Issues
Compiler Optimization and the Consistency Model
Using Speculation to Hide Latency in Strict Consistency Models
Inclusion and Its Implementation
Performance Gains from Using Multiprocessing and Multithreading
5.8 Putting It All Together: Multicore Processors and Their Performance
Performance and Energy Efficiency of the Intel Core i7 Multicore
Putting Multicore and SMT Together
5.9 Fallacies and Pitfalls
5.10 Concluding Remarks
5.11 Historical Perspectives and References
Case Studies and Exercises by Amr Zaky and David A. Wood
Case Study 1: Single-Chip Multicore Multiprocessor
Concepts illustrated by this case study
Case Study 2: Simple Directory-Based Coherence
Concepts illustrated by this case study
Case Study 3: Advanced Directory Protocol
Concepts illustrated by this case study
Exercises
6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism
6.1 Introduction
6.2 Programming Models and Workloads for Warehouse-Scale Computers
6.3 Computer Architecture of Warehouse-Scale Computers
Storage
Array Switch
WSC Memory Hierarchy
6.4 Physical Infrastructure and Costs of Warehouse-Scale Computers
Measuring Efficiency of a WSC
Cost of a WSC
6.5 Cloud Computing: The Return of Utility Computing
Amazon Web Services
6.6 Crosscutting Issues
WSC Network as a Bottleneck
Using Energy Efficiently Inside the Server
6.7 Putting It All Together: A Google Warehouse-Scale Computer
Containers
Cooling and Power in the Google WSC
Servers in a Google WSC
Networking in a Google WSC
Monitoring and Repair in a Google WSC
Summary
6.8 Fallacies and Pitfalls
6.9 Concluding Remarks
6.10 Historical Perspectives and References
Case Studies and Exercises by Parthasarathy Ranganathan
Case Study 1: Total Cost of Ownership Influencing Warehouse-Scale Computer Design Decisions
Concepts illustrated by this case study
Case Study 2: Resource Allocation in WSCs and TCO
Concepts illustrated by this case study
Exercises
Appendix A. Instruction Set Principles
A.1 Introduction
A.2 Classifying Instruction Set Architectures
Summary: Classifying Instruction Set Architectures
A.3 Memory Addressing
Interpreting Memory Addresses
Addressing Modes
Displacement Addressing Mode
Immediate or Literal Addressing Mode
Summary: Memory Addressing
A.4 Type and Size of Operands
A.5 Operations in the Instruction Set
A.6 Instructions for Control Flow
Addressing Modes for Control Flow Instructions
Conditional Branch Options
Procedure Invocation Options
Summary: Instructions for Control Flow
A.7 Encoding an Instruction Set
Reduced Code Size in RISCs
Summary: Encoding an Instruction Set
A.8 Crosscutting Issues: The Role of Compilers
The Structure of Recent Compilers
Register Allocation
Impact of Optimizations on Performance
The Impact of Compiler Technology on the Architect’s Decisions
How the Architect Can Help the Compiler Writer
Compiler Support (or Lack Thereof) for Multimedia Instructions
Summary: The Role of Compilers
A.9 Putting It All Together: The MIPS Architecture
Registers for MIPS
Data Types for MIPS
Addressing Modes for MIPS Data Transfers
MIPS Instruction Format
MIPS Operations
MIPS Control Flow Instructions
MIPS Floating-Point Operations
MIPS Instruction Set Usage
A.10 Fallacies and Pitfalls
A.11 Concluding Remarks
A.12 Historical Perspective and References
Exercises by Gregory D. Peterson
Appendix B. Review of Memory Hierarchy
B.1 Introduction
Cache Performance Review
Four Memory Hierarchy Questions
An Example: The Opteron Data Cache
B.2 Cache Performance
Average Memory Access Time and Processor Performance
Miss Penalty and Out-of-Order Execution Processors
B.3 Six Basic Cache Optimizations
First Optimization: Larger Block Size to Reduce Miss Rate
Second Optimization: Larger Caches to Reduce Miss Rate
Third Optimization: Higher Associativity to Reduce Miss Rate
Fourth Optimization: Multilevel Caches to Reduce Miss Penalty
Fifth Optimization: Giving Priority to Read Misses over Writes to Reduce Miss Penalty
Sixth Optimization: Avoiding Address Translation during Indexing of the Cache to Reduce Hit Time
Summary of Basic Cache Optimization
B.4 Virtual Memory
Four Memory Hierarchy Questions Revisited
Techniques for Fast Address Translation
Selecting a Page Size
Summary of Virtual Memory and Caches
B.5 Protection and Examples of Virtual Memory
Protecting Processes
A Segmented Virtual Memory Example: Protection in the Intel Pentium
Adding Bounds Checking and Memory Mapping
Adding Sharing and Protection
Adding Safe Calls from User to OS Gates and Inheriting Protection Level for Parameters
A Paged Virtual Memory Example: The 64-Bit Opteron Memory Management
Summary: Protection on the 32-Bit Intel Pentium vs. the 64-Bit AMD Opteron
B.6 Fallacies and Pitfalls
B.7 Concluding Remarks
B.8 Historical Perspective and References
Exercises by Amr Zaky
Appendix C. Pipelining: Basic and Intermediate Concepts
C.1 Introduction
What Is Pipelining?
The Basics of a RISC Instruction Set
A Simple Implementation of a RISC Instruction Set
The Classic Five-Stage Pipeline for a RISC Processor
Basic Performance Issues in Pipelining
C.2 The Major Hurdle of Pipelining—Pipeline Hazards
Performance of Pipelines with Stalls
Structural Hazards
Data Hazards
Minimizing Data Hazard Stalls by Forwarding
Data Hazards Requiring Stalls
Branch Hazards
Reducing Pipeline Branch Penalties
Performance of Branch Schemes
Reducing the Cost of Branches through Prediction
Static Branch Prediction
Dynamic Branch Prediction and Branch-Prediction Buffers
C.3 How Is Pipelining Implemented?
A Simple Implementation of MIPS
A Basic Pipeline for MIPS
Implementing the Control for the MIPS Pipeline
Dealing with Branches in the Pipeline
C.4 What Makes Pipelining Hard to Implement?
Dealing with Exceptions
Types of Exceptions and Requirements
Stopping and Restarting Execution
Exceptions in MIPS
Instruction Set Complications
C.5 Extending the MIPS Pipeline to Handle Multicycle Operations
Hazards and Forwarding in Longer Latency Pipelines
Maintaining Precise Exceptions
Performance of a MIPS FP Pipeline
C.6 Putting It All Together: The MIPS R4000 Pipeline
The Floating-Point Pipeline
Performance of the R4000 Pipeline
C.7 Crosscutting Issues
RISC Instruction Sets and Efficiency of Pipelining
Dynamically Scheduled Pipelines
Dynamic Scheduling with a Scoreboard
C.8 Fallacies and Pitfalls
C.9 Concluding Remarks
C.10 Historical Perspective and References
People also search for Computer Architecture: A Quantitative Approach 5th:
computer architecture: a quantitative approach
computer architecture a quantitative approach 6th edition
computer architecture a quantitative approach 7th
computer architecture a quantitative approach 6th edition solutions
computer architecture a quantitative approach 6th edition solutions pdf