Here is my code:
struct S {
int a, b;
float c, d;
};
class A {
private:
S* d;
S h[3];
public:
A() {
cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3));
}
void Init();
};
void A::Init() {
for (int i=0;i<3;i++) {
h[i].a = 0;
h[i].b = 1;
h[i].c = 2;
h[i].d = 3;
}
cutilSafeCall(cudaMemcpy(d, h, 3*sizeof(S), cudaMemcpyHostToDevice));
}
A a;
In fact it is a complex program which contain CUDA and OpenGL. When I debug this program, it fails when running at cudaMemcpy with the error information
cudaSafeCall() Runtime API error 11: invalid argument.
Actually, this program is transformed from another one that can run correctly. But in that one, I used two variables S* d and S h[3] in the main function instead of in the class. What is more weird is that I implement this class A in a small program, it works fine. And I've updated my driver, error still exists.
Could anyone give me a hint on why this happen and how to solve it. Thanks.
Because the memory operations in CUDA are blocking, they make a synchronization point. So other errors, if not checked with cudaThreadSynchonize, will seem like errors on the memory calls.
So if an error is received on a memory operation, try to place a cudaThreadSynchronize before it and check the result.
Be sure that the first malloc statement is being executed. If it is a problem about initialization of CUDA, like @Harrism indicate, then it would fail in this statement?? Try to place printf statements, and see proper initializations are performed. I think generally invalid argument errors are generated because of using uninitalized memory areas.
Write a printf to your constructor showing the address of the cudaMalloc'ed memory area
A()
{
d = NULL;
cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3));
printf("D: %p\n", d);
}
Try to make a memory copy for an area that is locally allocated, namely move the cudaMalloc to above of cudaMemcopy (just for testing).
void A::Init()
{
for (int i=0;i<3;i++)
{
h[i].a = 0;
h[i].b = 1;
h[i].c = 2;
h[i].d = 3;
}
cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3)); // here!..
cutilSafeCall(cudaMemcpy(d, h, 3*sizeof(S), cudaMemcpyHostToDevice));
}
Good luck.