[硬件升级] 提醒下,30系列大核心ga102的全规格是10752sp,3090并非全规格(补充最新证据-nv白皮书)

hu-avatar

hu

2020-08-25T14:05:58+00:00

nv官方放出的安培架构白皮书:
[url]https://www.nvidia.com/content/dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf[/url]

Ampere GPU Architecture In-Depth
GPC, TPC, and SM High-Level Architecture
Like prior NVIDIA GPUs, GA102 is composed of Graphics Processing Clusters (GPCs), Texture
Processing Clusters (TPCs), Streaming Multiprocessors (SMs), Raster Operators (ROPS), and
memory controllers. The full GA102 GPU contains seven GPCs, 42 TPCs, and 84 SMs.
The GPC is the dominant high-level hardware block with all of the key graphics processing units
residing inside the GPC. Each GPC includes a dedicated Raster Engine, and now also includes
two ROP partitions (each partition containing eight ROP units), which is a new feature for
NVIDIA Ampere Architecture GA10x GPUs and described in more detail below. The GPC
includes six TPCs that each include two SMs and one PolyMorph Engine.
Note: The GA102 GPU also features 168 FP64 units (two per SM), which are not depicted in this
diagram. The FP64 TFLOP rate is 1/64th the TFLOP rate of FP32 operations. The small number of FP64
hardware units are included to ensure any programs with FP64 code operate correctly, including FP64
Tensor Core code.

Figure 2. GA102 Full GPU with 84 SMs
Each SM in GA10x GPUs contain 128 CUDA Cores,
four third-generation Tensor Cores, a 256
KB Register File, four Texture Units, one second-generation Ray Tracing Core, and 128 KB of
L1/Shared Memory, which can be configured for differing capacities depending on the needs of
the compute or graphics workloads.
Ampere GPU Architecture In-Depth
NVIDIA Ampere GA102 GPU Architecture 9
The memory subsystem of GA102 consists of twelve 32-bit memory controllers (384-bit total).
512 KB of L2 cache is paired with each 32-bit memory controller, for a total of 6144 KB on the
full GA102 GPU

128x84=10752sp,铁证如山

[quote][pid=449978211,23220349,1]Reply[/pid] Post by [uid=39658527]苹果质量效应[/uid] (2020-09-03 22:14):

加256个SP能带来多大性能增长?[s:ac:汗][/quote]你看980ti和titanx,titanp/1080ti和titanxp,还有2080ti跟rtx titan,都提升了近一成
[quote][pid=449978644,23220349,1]Reply[/pid] Post by [uid=62480205]起司块wii[/uid] (2020-09-03 22:17):

所以专门刀一块新卡出来就是为了这2.44%的提升?真就给蚊子割包皮呗 这边2080和2080s的差距还小[/quote]参照白泰坦到780ti/黑泰坦,泰坦p/1080ti到泰坦xp,2080ti到r泰坦,提频加性能呗

2022-04-01 10:09

[quote][pid=452160787,23220349,4]Reply[/pid] Post by [uid=7926267]Gseed[/uid] (2020-09-13 18:15):

直接类比提升一成可还行 RTX泰坦规格比2080TI高了8% 80TI比80多40%的CUDA只能提升30%左右 宁这2%的CUDA提升给我来了个一成 感情你家一成是1%是吧[/quote][url]https://news.mydrivers.com/1/823/823191_4.htm[/url]

驱家评测结果,确实提升了一成~

2022-04-01 22:26

[quote][pid=599899232,23220349,4]Reply[/pid] Post by [uid=61190429]果然这世间不存在法[/uid] (2022-04-01 13:01):

预言家[s:ac:goodjob][/quote]我的七彩虹3090ti火神跑分结果:
cpu:9900ks
主板:rog z390 strix-h
内存:金士顿ddr4 3733 灯条32g(16x2双通道)
硬盘:905p 480g
散热:利民银箭IBE
电源:长城猎金部落G20(2000瓦!)
[img]https://img.nga.178.com/attachments/mon_202204/01/9aQ8mls-7m0vZlT3cSn7-fa.jpg[/img]
[img]https://img.nga.178.com/attachments/mon_202204/01/9aQ8mls-by1cZaT3cSog-fj.jpg[/img]
[img]https://img.nga.178.com/attachments/mon_202204/01/9aQ8mls-4yqZhT3cSsg-ku.jpg[/img]

2022-07-23 15:52

[quote][pid=450097722,23220349,2]Reply[/pid] Post by [uid=7177419]TurboWalker[/uid] (2020-09-04 12:47):

你说有GA102的titan,我就跟你说一定会有一个阉割了双精度的GA100 titan来按着锤不管是3090还是102titan,不就是挣个卡皇么,天花板在那儿放着,你非要研究中间要细分几个等级。[/quote]40系都快出来了,请问你说的“GA100 titan”到底在哪里?
Lemonz-avatar

Lemonz

那要看苏妈的核弹出来
CandyLandCobra-avatar

CandyLandCobra

啊这也行?[s:ac:汗]
EnigmaX-avatar

EnigmaX

加256个SP能带来多大性能增长?[s:ac:汗]
jia-avatar

jia

所以专门刀一块新卡出来就是为了这2.44%的提升?真就给蚊子割包皮呗 这边2080和2080s的差距还小
hu-avatar

hu

[quote][pid=449978211,23220349,1]Reply[/pid] Post by [uid=39658527]苹果质量效应[/uid] (2020-09-03 22:14):

加256个SP能带来多大性能增长?[s:ac:汗][/quote]你看980ti和titanx,titanp、1080ti和tiantanxp,还有2080ti跟rtx titan
hu-avatar

hu

[quote][pid=449978644,23220349,1]Reply[/pid] Post by [uid=62480205]起司块wii[/uid] (2020-09-03 22:17):

所以专门刀一块新卡出来就是为了这2.44%的提升?真就给蚊子割包皮呗 这边2080和2080s的差距还小[/quote]参照白泰坦到780ti/黑泰坦,提频加性能呗
`NiKiChan ♫💎-avatar

`NiKiChan ♫💎

大概Titan A就是10752 CUDA配48 GB GDDR6X功率400W
散热就官方水冷呗
hu-avatar

hu

[quote][pid=449979487,23220349,1]Reply[/pid] Post by [uid=355910]macintosh[/uid] (2020-09-03 22:21):

大概Titan A就是10752 CUDA配48 GB GDDR6X功率400W
散热就官方水冷呗[/quote]对啊,老黄至今为止还从没出过官方一体水冷卡,a卡已经出了295x2,furyx,pro duo和vega64这四款
ECHO-3-1-avatar

ECHO-3-1

泰坦有可能是计算卡用的那个7nm的ga100吗[s:ac:喘]
hu-avatar

hu

[quote][pid=449980705,23220349,1]Reply[/pid] Post by [uid=7812783]xdlbh97531[/uid] (2020-09-03 22:27):

泰坦有可能是计算卡用的那个7nm的ga100吗[s:ac:喘][/quote]不可能,ga100可没有视频输出功能
San_-avatar

San_

[img]http://pic2.178.com/1161/11617273/month_1205/7a63a7f41de81d82fd2623e8e85f6a37.png[/img]


出全规格也就是泰坦了90不太可能有ti了
倒是80ti的空间大大滴有
[img]http://pic2.178.com/1530/15301045/month_1205/f816db7687760fea934c365f90903e4c.gif[/img]
hu-avatar

hu

是谁在呼叫舰队?
AIcey-avatar

AIcey

我觉得不太可能
3090这个良率我估计供应会一直很成问题
一个100%完好的核心很可能不会放给消费市场 很可能优先特供Quadro及企业市场 因为他们对能量消耗更严格
hu-avatar

hu

[quote][pid=450056672,23220349,1]Reply[/pid] Post by [uid=60019882]AX.Procyon[/uid] (2020-09-04 10:14):

我觉得不太可能
3090这个良率我估计供应会一直很成问题
一个100%完好的核心很可能不会放给消费市场 很可能优先特供Quadro及企业市场 因为他们对能量消耗更严格[/quote]又没说现在发布,明年再出泰坦也行
Book-avatar

Book

没法这么直接简单粗暴地数吧?

这次发布会就有对比+回忆了图灵核心:
[img]https://img.nga.178.com/attachments/mon_202009/04/9aQ5-5zp0K1dT3cSsg-g0.jpg.medium.jpg[/img]

发布会上介绍RTX理念的时候也大致标注了Cuda、Tensor、RT各个核心的位置与分工:
[img]https://img.nga.178.com/attachments/mon_202009/04/9aQ5-db9sK1nT3cSsg-g0.jpg.medium.jpg[/img]

直接玩“大家来找茬”的话,实际上图灵左右两边的核心看外观没有区别啊?

如果老黄这图不是为了应付我们这种门外汉,图是随便糊弄的话,那么实际上Cuda、Tensor、RT三者核心的外观区别在这种示意图上我们是没法直接了解的啊。
hu-avatar

hu

[quote][pid=450068125,23220349,1]Reply[/pid] Post by [uid=61858332]独角兽之眸[/uid] (2020-09-04 10:56):

没法这么直接简单粗暴地数吧?

这次发布会就有对比+回忆了图灵核心:
[img]https://img.nga.178.com/attachments/mon_202009/04/9aQ5-5zp0K1dT3cSsg-g0.jpg.medium.jpg[/img]

发布会上介绍RTX理念的时候也大致标注了Cuda、Tensor、RT各个核心的位置与分工:
[img]https://img.nga.178.com/attachments/mon_202009/04/9aQ5-db9sK1nT3cSsg-g0.jpg.medium.jpg[/img]

直接玩“大家来找茬”的话,实际上图灵左右两边的核心看外观没有区别啊?

如果老[/quote]你先看nv官网公布3090的10496sp,就知道全规格最多不超过1万2;
然后10496除以256得41,可证3090仅开放41组,不是全规格;
最后再数数ga102官方图一共多少“组”——6x7=42,只需知道全规格是42组,接下来就好办了
Book-avatar

Book

Reply to [pid=450082477,23220349,1]Reply[/pid] Post by [uid=1449721]ddr354[/uid] (2020-09-04 11:46)

没看懂我意思吗?
10496/64除的出整数就行了,你后面所除的“组数”是根据你理想中的cuda分布数出来的,我给你举了图灵的例子。

所以逻辑上:
1.如果老黄不是胡弄我们门外汉,做的这个讲解图不是糊弄人的话,就说明Cuda、RT、Tensor外观上都是你上面说的那些一摸一样的“小方块”,你根本没法依靠外观辨别。
2.老黄这图图省事,是糊弄门外汉的(反正拿真的正儿八经微型集成电路图非专业人士也看不懂)。那么安培的图同理。
Tallos-avatar

Tallos

[quote][pid=449977461,23220349,1]Reply[/pid] Post by [uid=61210972]Jamesmelol[/uid] (2020-09-03 22:11):

那要看苏妈的核弹出来[/quote]苏妈的手雷是时候掏出来了,再不掏股票要跌了
Pemex-avatar

Pemex

[quote][pid=450082477,23220349,1]Reply[/pid] Post by [uid=1449721]ddr354[/uid] (2020-09-04 11:46):

你先看nv官网公布3090的10496sp,就知道全规格最多不超过1万2;
然后10496除以256得41,可证3090仅开放41组,不是全规格;
最后再数数ga102官方图一共多少“组”——6x7=42,只需知道全规格是42组,接下来就好办了[/quote][s:ac:茶]据传官方目前贴的是“等效cuda”,具体的数量可以看这个[img]https://img.nga.178.com/attachments/mon_202009/04/9aQ5-k4gsK1aT3cSmi-8d.jpg.medium.jpg[/img]