How to parse space-separated floats in C++ quickly?

Question 1

How to parse space-separated floats in C++ quickly?

c++ parsing boost-spirit

OopsUser · Jul 4, 2013 · Viewed 19.6k times · Source

Answer

Answer

UPDATE

Since Spirit X3 is available for testing, I've updated the benchmarks. Meanwhile I've used Nonius to get statistically sound benchmarks.

All charts below are available interactive online

Benchmark CMake project + testdata used is on github: https://github.com/sehe/bench_float_parsing

Summary:

Spirit parsers are fastest. If you can use C++14 consider the experimental version Spirit X3:

The above is measures using memory mapped files. Using IOstreams, it will be slower accross the board,

but not as slow as scanf using C/POSIX FILE* function calls:

What follows is parts from the OLD answer

I implemented the Spirit version, and ran a benchmark comparing to the other suggested answers.

Here's my results, all tests run on the same body of input (515Mb of input.txt). See below for exact specs.

^{(wall clock time in seconds, average of 2+ runs)}

To my own surprise, Boost Spirit turns out to be fastest, and most elegant:
handles/reports errors

supports +/-Inf and NaN and variable whitespace

no problems at all detecting the end of input (as opposed to the other mmap answer)
looks nice:
bool ok = phrase_parse(f,l,               // source iterators
     (double_ > double_ > double_) % eol, // grammar
     blank,                               // skipper
     data);                               // output attribute
Note that boost::spirit::istreambuf_iterator was unspeakably much slower (15s+). I hope this helps!

Benchmark details

All parsing done into vector of struct float3 { float x,y,z; }.

Generate input file using
od -f -A none --width=12 /dev/urandom | head -n 11000000
This results in a 515Mb file containing data like
     -2627.0056   -1.967235e-12  -2.2784738e+33
  -1.0664798e-27  -4.6421956e-23   -6.917859e+20
  -1.1080849e+36   2.8909405e-33   1.7888695e-12
  -7.1663235e+33  -1.0840628e+36   1.5343362e-12
  -3.1773715e-17  -6.3655537e-22   -8.797282e+31
    9.781095e+19   1.7378472e-37        63825084
  -1.2139188e+09  -5.2464635e-05  -2.1235992e-38
   3.0109424e+08   5.3939846e+30  -6.6146894e-20
Compile the program using:
g++ -std=c++0x -g -O3 -isystem -march=native test.cpp -o test -lboost_filesystem -lboost_iostreams
Measure wall clock time using
time ./test < input.txt 

Environment:

Linux desktop 4.2.0-42-generic #49-Ubuntu SMP x86_64
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
32GiB RAM

Full Code

Full code to the old benchmark is in the edit history of this post, the newest version is on github

Question 2

I have a file with millions of lines, each line has 3 floats separated by spaces. It takes a lot of time to read the file, so I tried to read them using memory mapped files only to find out that the problem is not with the speed of IO but with the speed of the parsing.

My current parsing is to take the stream (called file) and do the following

float x,y,z;
file >> x >> y >> z;

Someone in Stack Overflow recommended to use Boost.Spirit but I couldn't find any simple tutorial to explain how to use it.

I'm trying to find a simple and efficient way to parse a line that looks like this:

"134.32 3545.87 3425"

I will really appreciate some help. I wanted to use strtok to split it, but I don't know how to convert strings to floats, and I'm not quite sure it's the best way.

I don't mind if the solution will be Boost or not. I don't mind if it won't be the most efficient solution ever, but I'm sure that it is possible to double the speed.

Thanks in advance.

How to parse space-separated floats in C++ quickly?

Answer

UPDATE

Summary:

Benchmark details

Environment:

Full Code

Related questions