CUDA cudaMemcpy: invalid argument

TonyLic picture TonyLic · May 14, 2012 · Viewed 18.7k times · Source

Here is my code:

struct S {
    int a, b;
    float c, d;
};
class A {
private:
    S* d;
    S h[3];
public:
    A() {
        cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3));
    }
void Init();
};

void A::Init() {
    for (int i=0;i<3;i++) {
        h[i].a = 0;
        h[i].b = 1;
        h[i].c = 2;
        h[i].d = 3;
    }
    cutilSafeCall(cudaMemcpy(d, h, 3*sizeof(S), cudaMemcpyHostToDevice));
}

A a;

In fact it is a complex program which contain CUDA and OpenGL. When I debug this program, it fails when running at cudaMemcpy with the error information

cudaSafeCall() Runtime API error 11: invalid argument.

Actually, this program is transformed from another one that can run correctly. But in that one, I used two variables S* d and S h[3] in the main function instead of in the class. What is more weird is that I implement this class A in a small program, it works fine. And I've updated my driver, error still exists.

Could anyone give me a hint on why this happen and how to solve it. Thanks.

Answer

phoad picture phoad · Aug 28, 2012

Because the memory operations in CUDA are blocking, they make a synchronization point. So other errors, if not checked with cudaThreadSynchonize, will seem like errors on the memory calls.

So if an error is received on a memory operation, try to place a cudaThreadSynchronize before it and check the result.


Be sure that the first malloc statement is being executed. If it is a problem about initialization of CUDA, like @Harrism indicate, then it would fail in this statement?? Try to place printf statements, and see proper initializations are performed. I think generally invalid argument errors are generated because of using uninitalized memory areas.

  1. Write a printf to your constructor showing the address of the cudaMalloc'ed memory area

    A()
    {
        d = NULL;
        cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3));
        printf("D: %p\n", d);
    }
    
  2. Try to make a memory copy for an area that is locally allocated, namely move the cudaMalloc to above of cudaMemcopy (just for testing).

    void A::Init()
    {
        for (int i=0;i<3;i++)
        {
            h[i].a = 0;
            h[i].b = 1;
            h[i].c = 2;
            h[i].d = 3;
        }
        cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3)); // here!..
        cutilSafeCall(cudaMemcpy(d, h, 3*sizeof(S), cudaMemcpyHostToDevice));
    }
    

Good luck.