double to string without scientific notation or trailing zeros, efficiently

tpascale picture tpascale · Mar 1, 2013 · Viewed 7k times · Source

This routine is called a zillion times to create large csv files full of numbers. Is there a more efficient way to to this?

    static std::string dbl2str(double d)
    {
        std::stringstream ss;
        ss << std::fixed << std::setprecision(10) << d;              //convert double to string w fixed notation, hi precision
        std::string s = ss.str();                                    //output to std::string
        s.erase(s.find_last_not_of('0') + 1, std::string::npos);     //remove trailing 000s    (123.1200 => 123.12,  123.000 => 123.)
        return (s[s.size()-1] == '.') ? s.substr(0, s.size()-1) : s; //remove dangling decimal (123. => 123)
    }

Answer

Steve Jessop picture Steve Jessop · Mar 1, 2013

Before you start, check whether significant time is spent in this function. Do this by measuring, either with a profiler or otherwise. Knowing that you call it a zillion times is all very well, but if it turns out your program still only spends 1% of its time in this function, then nothing you do here can possibly improve your program's performance by more than 1%. If that were the case the answer to your question would be "for your purposes no, this function cannot be made significantly more efficient and you are wasting your time if you try".

First thing, avoid s.substr(0, s.size()-1). This copies most of the string and it makes your function ineligible for NRVO, so I think generally you'll get a copy on return. So the first change I'd make is to replace the last line with:

if(s[s.size()-1] == '.') {
    s.erase(s.end()-1);
}
return s;

But if performance is a serious concern, then here's how I'd do it. I'm not promising that this is the fastest possible, but it avoids some issues with unnecessary allocations and copying. Any approach involving stringstream is going to require a copy from the stringstream to the result, so we want a more low-level operation, snprintf.

static std::string dbl2str(double d)
{
    size_t len = std::snprintf(0, 0, "%.10f", d);
    std::string s(len+1, 0);
    // technically non-portable, see below
    std::snprintf(&s[0], len+1, "%.10f", d);
    // remove nul terminator
    s.pop_back();
    // remove trailing zeros
    s.erase(s.find_last_not_of('0') + 1, std::string::npos);
    // remove trailing point
    if(s.back() == '.') {
        s.pop_back();
    }
    return s;
}

The second call to snprintf assumes that std::string uses contiguous storage. This is guaranteed in C++11. It is not guaranteed in C++03, but is true for all actively-maintained implementations of std::string known to the C++ committee. If performance really is important then I think it's reasonable to make that non-portable assumption, since writing directly into a string saves copying into a string later.

s.pop_back() is the C++11 way of saying s.erase(s.end()-1), and s.back() is s[s.size()-1]

For another possible improvement, you could get rid of the first call to snprintf and instead size your s to some value like std::numeric_limits<double>::max_exponent10 + 14 (basically, the length that -DBL_MAX needs). The trouble is that this allocates and zeros far more memory than is typically needed (322 bytes for an IEEE double). My intuition is that this will be slower than the first call to snprintf, not to mention wasteful of memory in the case where the string return value is kept hanging around for a while by the caller. But you can always test it.

Alternatively, std::max((int)std::log10(d), 0) + 14 computes a reasonably tight upper bound on the size needed, and might be quicker than snprintf can compute it exactly.

Finally, it may be that you can improve performance by changing the function interface. For example, instead of returning a new string you could perhaps append to a string passed in by the caller:

void append_dbl2str(std::string &s, double d) {
    size_t len = std::snprintf(0, 0, "%.10f", d);
    size_t oldsize = s.size();
    s.resize(oldsize + len + 1);
    // technically non-portable
    std::snprintf(&s[oldsize], len+1, "%.10f", d);
    // remove nul terminator
    s.pop_back();
    // remove trailing zeros
    s.erase(s.find_last_not_of('0') + 1, std::string::npos);
    // remove trailing point
    if(s.back() == '.') {
        s.pop_back();
    }
}

Then the caller can reserve() plenty of space, call your function several times (presumably with other string appends in between), and write the resulting block of data to the file all at once, without any memory allocation other than the reserve. "Plenty" doesn't have to be the whole file, it could be one line or "paragraph" at a time, but anything that avoids a zillion memory allocations is a potential performance boost.