I came across this old question and did the following experiment with scala 2.10.3.
I rewrote the Scala version to use explicit tail recursion:
import scala.annotation.tailrec
object ScalaMain {
private val t = 20
private def run() {
var i = 10
while(!isEvenlyDivisible(2, i, t))
i += 2
println(i)
}
@tailrec private def isEvenlyDivisible(i: Int, a: Int, b: Int): Boolean = {
if (i > b) true
else (a % i == 0) && isEvenlyDivisible(i+1, a, b)
}
def main(args: Array[String]) {
val t1 = System.currentTimeMillis()
var i = 0
while (i < 20) {
run()
i += 1
}
val t2 = System.currentTimeMillis()
println("time: " + (t2 - t1))
}
}
and compared it to the following Java version. I consciously made the functions non-static for a fair comparison with Scala:
public class JavaMain {
private final int t = 20;
private void run() {
int i = 10;
while (!isEvenlyDivisible(2, i, t))
i += 2;
System.out.println(i);
}
private boolean isEvenlyDivisible(int i, int a, int b) {
if (i > b) return true;
else return (a % i == 0) && isEvenlyDivisible(i+1, a, b);
}
public static void main(String[] args) {
JavaMain o = new JavaMain();
long t1 = System.currentTimeMillis();
for (int i = 0; i < 20; ++i)
o.run();
long t2 = System.currentTimeMillis();
System.out.println("time: " + (t2 - t1));
}
}
Here are the results on my computer:
> java JavaMain
....
time: 9651
> scala ScalaMain
....
time: 20592
This is scala 2.10.3 on (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51).
My question is what is the hidden cost with the scala version?
Many thanks.
Well, OP's benchmarking is not the ideal one. Tons of effects need to be mitigated, including warmup, dead code elimination, forking, etc. Luckily, JMH already takes care of many things, and has bindings for both Java and Scala. Please follow the procedures on JMH page to get the benchmark project, then you can transplant the benchmarks below there.
This is the sample Java benchmark:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
@Fork(3)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
public class JavaBench {
@Param({"1", "5", "10", "15", "20"})
int t;
private int run() {
int i = 10;
while(!isEvenlyDivisible(2, i, t))
i += 2;
return i;
}
private boolean isEvenlyDivisible(int i, int a, int b) {
if (i > b)
return true;
else
return (a % i == 0) && isEvenlyDivisible(i + 1, a, b);
}
@GenerateMicroBenchmark
public int test() {
return run();
}
}
...and this is the sample Scala benchmark:
@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
@Fork(3)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
class ScalaBench {
@Param(Array("1", "5", "10", "15", "20"))
var t: Int = _
private def run(): Int = {
var i = 10
while(!isEvenlyDivisible(2, i, t))
i += 2
i
}
@tailrec private def isEvenlyDivisible(i: Int, a: Int, b: Int): Boolean = {
if (i > b) true
else (a % i == 0) && isEvenlyDivisible(i + 1, a, b)
}
@GenerateMicroBenchmark
def test(): Int = {
run()
}
}
If you run these on JDK 8 GA, Linux x86_64, then you'll get:
Benchmark (t) Mode Samples Mean Mean error Units
o.s.ScalaBench.test 1 avgt 15 0.005 0.000 us/op
o.s.ScalaBench.test 5 avgt 15 0.489 0.001 us/op
o.s.ScalaBench.test 10 avgt 15 23.672 0.087 us/op
o.s.ScalaBench.test 15 avgt 15 3406.492 9.239 us/op
o.s.ScalaBench.test 20 avgt 15 2483221.694 5973.236 us/op
Benchmark (t) Mode Samples Mean Mean error Units
o.s.JavaBench.test 1 avgt 15 0.002 0.000 us/op
o.s.JavaBench.test 5 avgt 15 0.254 0.007 us/op
o.s.JavaBench.test 10 avgt 15 12.578 0.098 us/op
o.s.JavaBench.test 15 avgt 15 1628.694 11.282 us/op
o.s.JavaBench.test 20 avgt 15 1066113.157 11274.385 us/op
Notice we juggle t
to see if the effect is local for the particular value of t
. It is not, the effect is systematic, and Java version being twice as fast.
PrintAssembly will shed some light on this. This one is the hottest block in Scala benchmark:
0x00007fe759199d42: test %r8d,%r8d
0x00007fe759199d45: je 0x00007fe759199d76 ;*irem
; - org.sample.ScalaBench::isEvenlyDivisible@11 (line 52)
; - org.sample.ScalaBench::run@10 (line 45)
0x00007fe759199d47: mov %ecx,%eax
0x00007fe759199d49: cmp $0x80000000,%eax
0x00007fe759199d4e: jne 0x00007fe759199d58
0x00007fe759199d50: xor %edx,%edx
0x00007fe759199d52: cmp $0xffffffffffffffff,%r8d
0x00007fe759199d56: je 0x00007fe759199d5c
0x00007fe759199d58: cltd
0x00007fe759199d59: idiv %r8d
...and this is similar block in Java:
0x00007f4a811848cf: movslq %ebp,%r10
0x00007f4a811848d2: mov %ebp,%r9d
0x00007f4a811848d5: sar $0x1f,%r9d
0x00007f4a811848d9: imul $0x55555556,%r10,%r10
0x00007f4a811848e0: sar $0x20,%r10
0x00007f4a811848e4: mov %r10d,%r11d
0x00007f4a811848e7: sub %r9d,%r11d ;*irem
; - org.sample.JavaBench::isEvenlyDivisible@9 (line 63)
; - org.sample.JavaBench::isEvenlyDivisible@19 (line 63)
; - org.sample.JavaBench::run@10 (line 54)
Notice how in Java version the compiler employed the trick for translating integer remainder calculation into the multiplication and shifting right (see Hacker's Delight, Ch. 10, Sect. 19). This is possible when compiler detects we compute the remainder against the constant, which suggests Java version hit that sweet optimization, but Scala version did not. You can dig into the bytecode disassembly to figure out what quirk in scalac have intervened, but the point of this exercise is that surprising minute differences in code generation are magnified by benchmarks a lot.
P.S. So much for @tailrec
...
UPDATE: A more thorough explanation of the effect: http://shipilev.net/blog/2014/java-scala-divided-we-fail/