专栏名称: GiantPandaLLM

专注于机器学习、深度学习、计算机视觉、图像处理等多个方向技术分享。团队由一群热爱技术且热衷于分享的小伙伴组成。我们坚持原创，每天一到两篇原创技术分享。希望在传播知识、分享知识的同时能够启发你，大家一起共同进步(･ω<)☆

NVidia GPU指令集架构-浮点运算

GiantPandaLLM · 公众号 · 3D · 2024-08-28 21:43

正文

请到「今天看啥」查看全文

近的点的数据表示，对于0附近的绝对值特别小的数据能更好的表示。

Figure 4. All the special values of floating point(引用自参考1)

图4展示了浮点数中的0，normal数，subnormals，无穷和NaN时指数和尾数的情况。

NVidia GPU浮点运算指令

加法和乘法

NVidia GPU上实现了IEEE-754标准的加减乘指令和不同的圆整类型，具体地，加法（FADD = Float Add）和乘法（FMUL = Float MULtiply）指令如下，可以实现float32数据类型的加法和乘法，同时采用默认的Nearest圆整方式，对于double精度，则有DADD(Double Add)和DMUL(Double MULtiply)；

FADD R0 R1 R2;     // R0 = R1 + R2  with round to NEAREST
FADD.RZ R0 R1 R2;  // R0 = R1 + R2 with round to ZERO
FADD.RP R0 R1 R2;  // R0 = R1 + R2 with round to POSITIVE(+Infinity)
FADD.RM R0 R1 R2;  // R0 = R1 + R2 with round to MINUS(-Infinity)
FMUL R0 R1 R2;     // R0 = R1 * R2 with round to NEAREST
FMUL.RZ R0 R1 R2;  // R0 = R1 * R2 with round to ZERO
FMUL.RP R0 R1 R2;  // R0 = R1 * R2 with round to POSITIVE(+Infinity)
FMUL.RM R0 R1 R2;  // R0 = R1 * R2 with round to MINUS(-Infinity)

DADD R0 R2 R4;     // R0.64 = R2.64 + R4.64 with round to NEAREST
DADD.RZ R0 R2 R4;  // R0.64 = R2.64 + R4.64 with round to ZERO
DADD.RP R0 R2 R4;  // R0.64 = R2.64 + R4.64 with round to POSITIVE(+Infinity)
DADD.RM R0 R2 R4;  // R0.64 = R2.64 + R4.64 with round to MINUS(-Infinity)
DMUL R0 R2 R4;     // R0.64 = R2.64 * R4.64 with round to NEAREST
DMUL.RZ R0 R2 R4;  // R0.64 = R2.64 * R4.64 with round to ZERO
DMUL.RP R0 R2 R4;  //