what is difference between "-arch sm_13" and "-arch sm_20"

user1281071 picture user1281071 · Apr 26, 2012 · Viewed 20.4k times · Source

I need double precision calculation in my application. According what I found on google I should add a flag "-arch sm_13" or "-arch sm_20".

Q1: What is the difference between "-arch sm_13" and "-arch sm_20" ?

Q2: Is there a difference in performance between "-arch sm_13" and "-arch sm_20" ?

My GPU: GTX 570.

Thanks.

Answer

Tom picture Tom · Apr 26, 2012

SM stands for Streaming Multiprocessor and the number indicates the features supported by the architecture. You can find a good description in the CUDA Programming Guide sections 3.1.2-3.1.4 and you can see the features associated with each architecture in the table in appendix F.

From the NVCC manual (also included in the Toolkit):

In order to allow for architectural evolution, NVIDIA GPUs are released in different generations. New generations introduce major improvements in functionality and/or chip architecture, while GPU models within the same generation show minor configuration differences that „moderately‟ affect functionality, performance, or both.

Your GPU has Compute Capability 2.0, so you should use sm_20 to enable the compiler to use features not available in older architectures. If you want backward compatibility, you could also target sm_13 (or sm_1x), check out the documents above for how to use the -gencode option to nvcc to target multiple architectures in a single call to nvcc.

Regarding performance, one thing to look out for is that sm_1x did not support IEEE754 floating point, so if you target sm_13 and run on a device with Compute Capability 2.0 or later then you may find that floating point runs faster since it is using the less accurate path. You can also force the less accurate path with sm_20 or later by using the -ftz=true -prec-div=false -prec-sqrt=false options, see section 5.4.1 in the CUDA Programming Guide for more information on this.