
包邮大规模并行处理器程序设计
1星价
¥18.7
(5.2折)
2星价¥18.7
定价¥36.0

暂无评论
图文详情
- ISBN:9787302229735
- 装帧:暂无
- 册数:暂无
- 重量:暂无
- 开本:16开
- 页数:258
- 出版时间:2010-07-01
- 条形码:9787302229735 ; 978-7-302-22973-5
内容简介
本书介绍了并行程序设计与gpu体系结构的基本概念,并详细探讨了用于构建并行程序的各种技术,用案例演示了并行程序设计的整个开发过程,即从并行计算的思想开始,直到*终实现实际且高效的并行程序。
本书特点
介绍了并行计算的思想,使得读者可以把这种问题的思考方式渗透到高性能并行计算中去。
介绍了cuda的使用,cuda是nvidia公司专门为大规模并行环境创建的一种软件开发工具。
介绍如何使用cuda编程模式和opencl来获得高性能和高可靠性。
目录
preface
acknowledgments
dedication
chapter 1 introduction
1.1 gpus as parallel computers
1.2 architecture of a modern gpu
1.3 why more speed or parallelism?
1.4 parallel programming languages and models
1.5 overarching goals
1.6 organization of the book
chapter 2 history of gpu computing
2.1 evolution of graphics pipelines
2.1.1 the era of fixed-function graphics pipelines
2.1.2 evolution of programmable real-time graphics
2.1.3 unified graphics and computing processors
2.1.4 gpgpu: an intermediate step
2.2 gpu computing
2.2.1 scalable gpus
2.2.2 recent developments
2.3 future trends
chapter 3 introduction to cuda
3.1 data parallelism
3.2 cuda program structure
3.3 a matrix-matrix multiplication example
3.4 device memories and data transfer
3.5 kernel functions and threading
3.6 summary
3.6.1 function declarations
3.6.2 kernel launch
3.6.3 predefined variables
3.6.4 runtime apl
chapter 4 cuda threads
4.1 cuda thread organization
4.2 using b]ockldx and threadidx
4.3 synchronization and transparent scalability
4.4 thread assignment
4.5 thread scheduling and latency tolerance
4.6 summary
4.7 exercises
chapter 5 cudatm memories
5.1 importance of memory access efficiency
5.2 cuda device memory types
5.3 a strategy for reducing global memory traffic
5.4 memory as a limiting factor to parallelism
5.5 summary
5.6 exercises
chapter 6 performance considerations
6.1 more on thread execution
6.2 global memory bandwidth
6.3 dynamic partitioning of sm resources
6.4 data prefetching
6.5 instruction mix
6.6 thread granularity
6.7 measured performance and summary
6.8 exercises
chapter 7 floating point considerations
7.1 floating-point format
7.1.1 normalized representation of m
7.1.2 excess encoding of e
7.2 representable numbers
7.3 special bit patterns and precision
7.4 arithmetic accuracy and rounding
7.5 algorithm considerations
7.6 summary
7.7 exercises
chapter 8 application case study: advanced mri reconstruction
8.1 application background
8.2 iterative reconstruction
8.3 computing fhd
step 1. determine the kernel parallelism structure
step 2. getting around the memory bandwidth limitation.
step 3. using hardware trigonometry functions
step 4. experimental performance tuning
8.4 final evaluation
8.5 exercises
chapter 9 application case study: molecular visualization and analysis
chapter 10 parallel programming and computational thinking
chapter 11 a brief introduction to opencltm
chapter 12 conclusion and'future outlook
appendix a matrix multiplication host-only version source code
appendix b gpu compute capabilities
index
acknowledgments
dedication
chapter 1 introduction
1.1 gpus as parallel computers
1.2 architecture of a modern gpu
1.3 why more speed or parallelism?
1.4 parallel programming languages and models
1.5 overarching goals
1.6 organization of the book
chapter 2 history of gpu computing
2.1 evolution of graphics pipelines
2.1.1 the era of fixed-function graphics pipelines
2.1.2 evolution of programmable real-time graphics
2.1.3 unified graphics and computing processors
2.1.4 gpgpu: an intermediate step
2.2 gpu computing
2.2.1 scalable gpus
2.2.2 recent developments
2.3 future trends
chapter 3 introduction to cuda
3.1 data parallelism
3.2 cuda program structure
3.3 a matrix-matrix multiplication example
3.4 device memories and data transfer
3.5 kernel functions and threading
3.6 summary
3.6.1 function declarations
3.6.2 kernel launch
3.6.3 predefined variables
3.6.4 runtime apl
chapter 4 cuda threads
4.1 cuda thread organization
4.2 using b]ockldx and threadidx
4.3 synchronization and transparent scalability
4.4 thread assignment
4.5 thread scheduling and latency tolerance
4.6 summary
4.7 exercises
chapter 5 cudatm memories
5.1 importance of memory access efficiency
5.2 cuda device memory types
5.3 a strategy for reducing global memory traffic
5.4 memory as a limiting factor to parallelism
5.5 summary
5.6 exercises
chapter 6 performance considerations
6.1 more on thread execution
6.2 global memory bandwidth
6.3 dynamic partitioning of sm resources
6.4 data prefetching
6.5 instruction mix
6.6 thread granularity
6.7 measured performance and summary
6.8 exercises
chapter 7 floating point considerations
7.1 floating-point format
7.1.1 normalized representation of m
7.1.2 excess encoding of e
7.2 representable numbers
7.3 special bit patterns and precision
7.4 arithmetic accuracy and rounding
7.5 algorithm considerations
7.6 summary
7.7 exercises
chapter 8 application case study: advanced mri reconstruction
8.1 application background
8.2 iterative reconstruction
8.3 computing fhd
step 1. determine the kernel parallelism structure
step 2. getting around the memory bandwidth limitation.
step 3. using hardware trigonometry functions
step 4. experimental performance tuning
8.4 final evaluation
8.5 exercises
chapter 9 application case study: molecular visualization and analysis
chapter 10 parallel programming and computational thinking
chapter 11 a brief introduction to opencltm
chapter 12 conclusion and'future outlook
appendix a matrix multiplication host-only version source code
appendix b gpu compute capabilities
index
展开全部
本类五星书
浏览历史
本类畅销
-
硅谷之火-人与计算机的未来
¥13.7¥39.8 -
造神:人工智能神话的起源和破除 (精装)
¥32.7¥88.0 -
软件定义网络(SDN)技术与应用
¥26.9¥39.8 -
数学之美
¥41.0¥69.0 -
谁说菜鸟不会数据分析(入门篇)(第4版)
¥43.8¥69.0 -
自己动手写PYTHON虚拟机
¥31.0¥79.0 -
Photoshop平面设计实用教程
¥14.5¥39.8 -
计算机网络技术
¥24.1¥33.0 -
Excel函数.公式与图表
¥16.4¥48.0 -
.NET安全攻防指南(下册)
¥89.0¥129.0 -
RUST权威指南(第2版)
¥114.2¥168.0 -
.NET安全攻防指南(上册)
¥89.0¥129.0 -
大模型实战:微调、优化与私有化部署
¥63.4¥99.0 -
人工智能的底层逻辑
¥55.3¥79.0 -
多模态数据融合与挖掘技术
¥34.7¥45.0 -
仓颉编程快速上手
¥62.9¥89.8 -
剪映AI
¥62.6¥88.0 -
FINAL CUT短视频剪辑零基础一本通
¥28.9¥39.8 -
FLASK 2+VUE.JS 3实战派――PYTHON WEB开发与运维
¥83.8¥118.0 -
数据挖掘技术与应用
¥46.0¥75.0