Which inline assembly code is correct for rdtscp?

James picture James · Feb 9, 2013 · Viewed 9.6k times · Source

Disclaimer: Words cannot describe how much I detest AT&T style syntax

I have a problem that I hope is caused by register clobbering. If not, I have a much bigger problem.

The first version I used was

static unsigned long long rdtscp(void)
{
    unsigned int hi, lo;
    __asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi));
    return (unsigned long long)lo | ((unsigned long long)hi << 32);
}

I notice there is no 'clobbering' stuff in this version. Whether or not this is a problem I don't know... I suppose it depends if the compiler inlines the function or not. Using this version causes me problems that aren't always reproducible.

The next version I found is

static unsigned long long rdtscp(void)
{
    unsigned long long tsc;
    __asm__ __volatile__(
        "rdtscp;"
        "shl $32, %%rdx;"
        "or %%rdx, %%rax"
        : "=a"(tsc)
        :
        : "%rcx", "%rdx");

    return tsc;
}

This is reassuringly unreadable and official looking, but like I said my issue isn't always reproducible so I'm merely trying to rule out one possible cause of my problem.

The reason I believe the first version is a problem is that it is overwriting a register that previously held a function parameter.

What's correct... version 1, or version 2, or both?

Answer

amdn picture amdn · Feb 9, 2013

Here's C++ code that will return the TSC and store the auxiliary 32-bits into the reference parameter

static inline uint64_t rdtscp( uint32_t & aux )
{
    uint64_t rax,rdx;
    asm volatile ( "rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (aux) : : );
    return (rdx << 32) + rax;
}

It is better to do the shift and add to merge both 32-bit halves in C++ statement rather than inline, this allows the compiler to schedule those instructions as it sees fit.