D programming without the garbage collector

AbstractDissonance picture AbstractDissonance · Nov 26, 2012 · Viewed 8.3k times · Source

I've been looking at D today and on the surface it looks quite amazing. I like how it includes many higher level constructs directly in the language so silly hacks or terse methods don't have to be used. One thing that really worries me if the GC. I know this is a big issues and have read many discussions about it.

My own simple tests sprouted from a question here shows that the GC is extremely slow. Over 10 times slower than straight C++ doing the same thing. (obviously the test does not directly convert into real world but the performance hit is extreme and would slow down real world happens that behave similarly(allocating many small objects quickly)

I'm looking into writing a real time low latency audio application and it is possible that the GC will ruin the performance of the application to make it nearly useless. In a sense, if it has any issues it will ruin the real time audio aspect which is much more crucial since, unlike graphics, audio runs at a much higher frame rate(44000+ vs 30-60). (due to it's low latency it is more crucial than a standard audio player which can buffer significant amounts of data)

Disabling the GC improved the results to within about 20% of the C++ code. This is significant. I'll give the code at the end for analysis.

My questions are:

  1. How difficult is it to replace D's GC with a standard smart pointers implementation so that libraries that rely on the GC can still be used. If I remove GC completely I'll lose a lot of grunt work, as D already has limit libraries compared to C++.
  2. Does GC.Disable only halt the garbage collection temporarily(preventing the GC thread from running) and GC.Enable pick back up where it left off. So I could potentially disable the GC from running in high cpu usage moments to prevent latency issues.
  3. Is there any way to enforce a pattern to not use GC consistently. (this is because I've not programming in D and when I start writing my glasses that do not use the GC I would like to be sure I don't forget to implement their own clean up.
  4. Is it possible to replace the GC in D easily? (not that I want to but it might be fun to play around with different methods of GC one day... this is similar to 1 I suppose)

What I'd like to do is trade memory for speed. I do not need the GC to run every few seconds. In fact, if I can properly implement my own memory management for my data structures then chances are it will not need to run very often at all. I might need to run it only when memory becomes scarce. From what I've read, though, the longer you wait to call it the slower it will be. Since there generally will be times in my application where I can get away with calling it without issues this will help alleviate some of the pressure(but then again, there might be hours when I won't be able to call it).

I am not worried about memory constraints as much. I'd prefer to "waste" memory over speed(up to a point, of course). First and foremost is the latency issues.

From what I've read, I can, at the very least, go the route of C/C++ as long as I don't use any libraries or language constructs that rely on the GC. The problem is, I do not know the ones that do. I've seen string, new, etc mentioned but does that mean I can't use the build in strings if I don't enable the GC?

I've read in some bug reports that the GC might be really buggy and that could explain its performance problems?

Also, D uses a bit more memory, in fact, D runs out of memory before the C++ program. I guess it is about 15% more or so in this case. I suppose that is for the GC.

I realize the following code is not representative of your average program but what it says is that when programs are instantiating a lot of objects(say, at startup) they will be much slower(10 times is a large factor). Of the GC could be "paused" at startup then it wouldn't necessarily be an issue.

What would really be nice is if I could somehow have the compiler automatically GC a local object if I do not specifically deallocate it. This almost give the best of both worlds.

e.g.,

{
    Foo f = new Foo();
    ....
    dispose f; // Causes f to be disposed of immediately and treats f outside the GC
               // If left out then f is passed to the GC.
               // I suppose this might actually end up creating two kinds of Foo 
               // behind the scenes. 

    Foo g = new manualGC!Foo();   // Maybe something like this will keep GC's hands off 
                                  // g and allow it to be manually disposed of.
}

In fact, it might be nice to actually be able to associate different types of GC's with different types of data with each GC being completely self contained. This way I could tailor the performance of the GC to my types.

Code:

module main;
import std.stdio, std.conv, core.memory;
import core.stdc.time;

class Foo{
    int x;
    this(int _x){x=_x;}
}

void main(string args[]) 
{

    clock_t start, end;
    double cpu_time_used;


    //GC.disable();
    start = clock();

    //int n = to!int(args[1]);
    int n = 10000000;
    Foo[] m = new Foo[n];

    foreach(i; 0..n)
    //for(int i = 0; i<n; i++)
    {
        m[i] = new Foo(i);
    }

    end = clock();
    cpu_time_used = (end - start);
    cpu_time_used = cpu_time_used / 1000.0;
    writeln(cpu_time_used);
    getchar();
}

C++ code

#include <cstdlib>
#include <iostream>
#include <time.h>
#include <math.h>
#include <stdio.h>

using namespace std;
class Foo{
public:
    int x;
    Foo(int _x);

};

Foo::Foo(int _x){
    x = _x;
}

int main(int argc, char** argv) {

    int n = 120000000;
    clock_t start, end;
    double cpu_time_used;




    start = clock();

    Foo** gx = new Foo*[n];
    for(int i=0;i<n;i++){
        gx[i] = new Foo(i);
    }


    end = clock();
    cpu_time_used = (end - start);
    cpu_time_used = cpu_time_used / 1000.0;
    cout << cpu_time_used;

    std::cin.get();
    return 0;
}

Answer

0b1100110 picture 0b1100110 · Nov 27, 2012
  1. D can use pretty much any C library, just define the functions needed. D can also use C++ libraries, but D does not understand certain C++ constructs. So... D can use almost as many libraries as C++. They just aren't native D libs.

  2. From D's Library reference.
    Core.memory:

    static nothrow void disable();
    

    Disables automatic garbage collections performed to minimize the process footprint. Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition. This function is reentrant, but enable must be called once for each call to disable.

    static pure nothrow void free(void* p);
    

    Deallocates the memory referenced by p. If p is null, no action occurs. If p references memory not originally allocated by this garbage collector, or if it points to the interior of a memory block, no action will be taken. The block will not be finalized regardless of whether the FINALIZE attribute is set. If finalization is desired, use delete instead.

    static pure nothrow void* malloc(size_t sz, uint ba = 0);
    

    Requests an aligned block of managed memory from the garbage collector. This memory may be deleted at will with a call to free, or it may be discarded and cleaned up automatically during a collection run. If allocation fails, this function will call onOutOfMemory which is expected to throw an OutOfMemoryError.

    So yes. Read more here: http://dlang.org/garbage.html

    And here: http://dlang.org/memory.html

    If you really need classes, look at this: http://dlang.org/memory.html#newdelete delete has been deprecated, but I believe you can still free() it.

  3. Don't use classes, use structs. Structs are stack allocated, classes are heap. Unless you need polymorphism or other things classes support, they are overhead for what you are doing. You can use malloc and free if you want to.

  4. More or less... fill out the function definitions here: https://github.com/D-Programming-Language/druntime/blob/master/src/gcstub/gc.d . There's a GC proxy system set up to allow you to customize the GC. So it's not like it is something that the designers do not want you to do.

Little GC knowledge here: The garbage collector is not guaranteed to run the destructor for all unreferenced objects. Furthermore, the order in which the garbage collector calls destructors for unreference objects is not specified. This means that when the garbage collector calls a destructor for an object of a class that has members that are references to garbage collected objects, those references may no longer be valid. This means that destructors cannot reference sub objects. This rule does not apply to auto objects or objects deleted with the DeleteExpression, as the destructor is not being run by the garbage collector, meaning all references are valid.

import std.c.stdlib; that should have malloc and free.

import core.memory; this has GC.malloc, GC.free, GC.addroots, //add external memory to GC...

strings require the GC because they are dynamic arrays of immutable chars. ( immutable(char)[] ) Dynamic arrays require GC, static do not.

If you want manual management, go ahead.

import std.c.stdlib;
import core.memory;

char* one = cast(char*) GC.malloc(char.sizeof * 8);.
GC.free(one);//pardon me, I'm not used to manual memory management. 
//I am *asking* you to edit this to fix it, if it needs it.

why create a wrapper class for an int? you are doing nothing more than slowing things down and wasting memory.

class Foo { int n; this(int _n){ n = _n; } }
writeln(Foo.sizeof);  //it's 8 bytes, btw
writeln(int.sizeof);  //Its *half* the size of Foo; 4 bytes.


Foo[] m;// = new Foo[n]; //8 sec
m.length=n; //7 sec minor optimization. at least on my machine.
foreach(i; 0..n)
    m[i] = new Foo(i);


int[] m;
m.length=n; //nice formatting. and default initialized to 0
//Ooops! forgot this...
foreach(i; 0..n)
    m[i] = i;//.145 sec

If you really need to, then write the Time-sensitive function in C, and call it from D. Heck, if time is really that big of a deal, use D's inline assembly to optimize everything.