Also, can anyone point me to a good tutorial on the subject? I can't find any.
-fprofile-generate will instrument the application with profiling code. The application will, while actually running, log certain events that could improve performance if this usage pattern was known at compile time. Branches, possibility for inlining, etc, can all be logged, but I'm not sure in detail how GCC implements this.
After the program exits, it will dump all this data into *.gcda files, which are essentially log data for a test run. After rebuilding the application with -fprofile-use flag, GCC will take the *.gcda log data into account when doing its optimizations, usually increasing the performance significantly. Of course, this depends on many factors.