I am completely new to terms related to HPC computing, but I just saw that EC2 released its new type of instance on AWS that's powered by the new Nvidia Tesla V100, which has both kinds of "cores": Cuda Cores (5,120) and Tensor Cores (640). What is the difference between both?
Now only Tesla V100 and Titan V have tensor cores. Both GPUs have 5120 cuda cores where each core can perform up to 1 single precision multiply-accumulate operation (e.g. in fp32: x += y * z) per 1 GPU clock (e.g. Tesla V100 PCIe frequency is 1.38Gz).
Each tensor core perform operations on small matrices with size 4x4. Each tensor core can perform 1 matrix multiply-accumulate operation per 1 GPU clock. It multiplies two fp16 matrices 4x4 and adds the multiplication product fp32 matrix (size: 4x4) to accumulator (that is also fp32 4x4 matrix).
It is called mixed precision because input matrices are fp16 but multiplication result and accumulator are fp32 matrices.
Probably, the proper name would be just 4x4 matrix cores however NVIDIA marketing team decided to use "tensor cores".