Why is webAssembly function almost 300 time slower than same JS function

Blindman67 picture Blindman67 · Jan 9, 2018 · Viewed 8.7k times · Source

Find length of line 300* slower

First of I have read the answer to Why is my WebAssembly function slower than the JavaScript equivalent?

But it has shed little light on the problem, and I have invested a lot of time that may well be that yellow stuff against the wall.

I do not use globals, I do not use any memory. I have two simple functions that find the length of a line segment and compare them to the same thing in plain old Javascript. I have 4 params 3 more locals and returns a float or double.

On Chrome the Javascript is 40 times faster than the webAssembly and on firefox the wasm is almost 300 times slower than the Javascript.

jsPref test case.

I have added a test case to jsPref WebAssembly V Javascript math

What am I doing wrong?

Either

  1. I have missed an obvious bug, bad practice, or I am suffering coder stupidity.
  2. WebAssembly is not for 32bit OS (win 10 laptop i7CPU)
  3. WebAssembly is far from a ready technology.

Please please be option 1.

I have read the webAssembly use case

Re-use existing code by targeting WebAssembly, embedded in a larger JavaScript / HTML application. This could be anything from simple helper libraries, to compute-oriented task offload.

I was hoping I could replace some geometry libs with webAssembly to get some extra performance. I was hoping that it would be awesome, like 10 or more times faster. BUT 300 times slower WTF.


UPADTE

This is not a JS optimisation issues.

To ensure that optimisation has as little as possible effect I have tested using the following methods to reduce or eliminate any optimisation bias..

  • counter c += length(... to ensure all code is executed.
  • bigCount += c to ensure whole function is executed. Not needed
  • 4 lines for each function to reduce a inlining skew. Not Needed
  • all values are randomly generated doubles
  • each function call returns a different result.
  • add slower length calculation in JS using Math.hypot to prove code is being run.
  • added empty call that return first param JS to see overhead

Results from update.

Because there is a lot more overhead in the test the results are closer but the JS code is still two orders of magnitude faster.

Note how slow the function Math.hypot is. If optimisation was in effect that function would be near the faster len function.

  • WebAssembly 13389µs
  • Javascript 728µs

/*
=======================================
Performance test. : WebAssm V Javascript
Use strict....... : true
Data view........ : false
Duplicates....... : 4
Cycles........... : 147
Samples per cycle : 100
Tests per Sample. : undefined
---------------------------------------------
Test : 'length64'
Mean : 12736µs ±69µs (*) 3013 samples
---------------------------------------------
Test : 'length32'
Mean : 13389µs ±94µs (*) 2914 samples
---------------------------------------------
Test : 'length JS'
Mean : 728µs ±6µs (*) 2906 samples
---------------------------------------------
Test : 'Length JS Slow'
Mean : 23374µs ±191µs (*) 2939 samples   << This function use Math.hypot 
                                            rather than Math.sqrt
---------------------------------------------
Test : 'Empty'
Mean : 79µs ±2µs (*) 2928 samples
-All ----------------------------------------
Mean : 10.097ms Totals time : 148431.200ms 14700 samples
(*) Error rate approximation does not represent the variance.

*/

Whats the point of WebAssambly if it does not optimise

End of update


All the stuff related to the problem.

Find length of a line.

Original source in custom language

   
// declare func the < indicates export name, the param with types and return type
func <lengthF(float x, float y, float x1, float y1) float {
    float nx, ny, dist;  // declare locals float is f32
    nx = x1 - x;
    ny = y1 - y;
    dist = sqrt(ny * ny + nx * nx);
    return dist;
}
// and as double
func <length(double x, double y, double x1, double y1) double {
    double nx, ny, dist;
    nx = x1 - x;
    ny = y1 - y;
    dist = sqrt(ny * ny + nx * nx);
    return dist;
}

Code compiles to Wat for proof read

(module
(func 
    (export "lengthF")
    (param f32 f32 f32 f32)
    (result f32)
    (local f32 f32 f32)
    get_local 2
    get_local 0
    f32.sub
    set_local 4
    get_local 3
    get_local 1
    f32.sub
    tee_local 5
    get_local 5
    f32.mul
    get_local 4
    get_local 4
    f32.mul
    f32.add
    f32.sqrt
)
(func 
    (export "length")
    (param f64 f64 f64 f64)
    (result f64)
    (local f64 f64 f64)
    get_local 2
    get_local 0
    f64.sub
    set_local 4
    get_local 3
    get_local 1
    f64.sub
    tee_local 5
    get_local 5
    f64.mul
    get_local 4
    get_local 4
    f64.mul
    f64.add
    f64.sqrt
)
)

As compiled wasm in hex string (Note does not include name section) and loaded using WebAssembly.compile. Exported functions then run against Javascript function len (in below snippet)

    // hex of above without the name section
    const asm = `0061736d0100000001110260047d7d7d7d017d60047c7c7c7c017c0303020001071402076c656e677468460000066c656e67746800010a3b021c01037d2002200093210420032001932205200594200420049492910b1c01037c20022000a1210420032001a122052005a220042004a2a09f0b`
    const bin = new Uint8Array(asm.length >> 1);
    for(var i = 0; i < asm.length; i+= 2){ bin[i>>1] = parseInt(asm.substr(i,2),16) }
    var length,lengthF;

    WebAssembly.compile(bin).then(module => {
        const wasmInstance = new WebAssembly.Instance(module, {});
        lengthF = wasmInstance.exports.lengthF;
        length = wasmInstance.exports.length;
    });
    // test values are const (same result if from array or literals)
    const a1 = rand(-100000,100000);
    const a2 = rand(-100000,100000);
    const a3 = rand(-100000,100000);
    const a4 = rand(-100000,100000);

    // javascript version of function
    function len(x,y,x1,y1){
        var nx = x1 - x;
        var ny = y1 - y;
        return Math.sqrt(nx * nx + ny * ny);
    }

And the test code is the same for all 3 functions and run in strict mode.

 tests : [{
        func : function (){
            var i;
            for (i = 0; i < 100000; i += 1) {
               length(a1,a2,a3,a4);

            }
        },
        name : "length64",
    },{
        func : function (){
            var i;
            for (i = 0; i < 100000; i += 1) {
                lengthF(a1,a2,a3,a4);
             
            }
        },
        name : "length32",
    },{
        func : function (){
            var i;
            for (i = 0; i < 100000; i += 1) {
                len(a1,a2,a3,a4);
             
            }
        },
        name : "lengthNative",
    }
]

The test results on FireFox are

 /*
=======================================
Performance test. : WebAssm V Javascript
Use strict....... : true
Data view........ : false
Duplicates....... : 4
Cycles........... : 34
Samples per cycle : 100
Tests per Sample. : undefined
---------------------------------------------
Test : 'length64'
Mean : 26359µs ±128µs (*) 1128 samples
---------------------------------------------
Test : 'length32'
Mean : 27456µs ±109µs (*) 1144 samples
---------------------------------------------
Test : 'lengthNative'
Mean : 106µs ±2µs (*) 1128 samples
-All ----------------------------------------
Mean : 18.018ms Totals time : 61262.240ms 3400 samples
(*) Error rate approximation does not represent the variance.
*/

Answer

ColinE picture ColinE · Jan 10, 2018

Andreas describes a number of good reasons why the JavaScript implementation was initially observed to be x300 faster. However, there are a number of other issues with your code.

  1. This is a classic 'micro benchmark', i.e. the code that you are testing is so small, that the other overheads within your test loop are a significant factor. For example, there is an overhead in calling WebAssembly from JavaScript, which will factor in your results. What are you trying to measure? raw processing speed? or the overhead of the language boundary?
  2. Your results vary wildly, from x300 to x2, due to small changes in your test code. Again, this is a micro benchmark issue. Others have seen the same when using this approach to measure performance, for example this post claims wasm is x84 faster, which is clearly wrong!
  3. The current WebAssembly VM is very new, and an MVP. It will get faster. Your JavaScript VM has had 20 years to reach its current speed. The performance of the JS <=> wasm boundary is being worked on and optimised right now.

For a more definitive answer, see the joint paper from the WebAssembly team, which outlines an expected runtime performance gain of around 30%

Finally, to answer your point:

Whats the point of WebAssembly if it does not optimise

I think you have misconceptions around what WebAssembly will do for you. Based on the paper above, the runtime performance optimisations are quite modest. However, there are still a number of performance advantages:

  1. Its compact binary format mean and low level nature means the browser can load, parse and compile the code much faster than JavaScript. It is anticipated that WebAssembly can be compiled faster than your browser can download it.
  2. WebAssembly has a predictable runtime performance. With JavaScript the performance generally increases with each iteration as it is further optimised. It can also decrease due to se-optimisation.

There are also a number of non-performance related advantages too.

For a more realistic performance measurement, take a look at:

Both are practical, production codebases.