I went through answers on similar topics here on SO but could't find a satisfying answer. Since i know this is a rather large topic, i will try to be more specific.
I want to write a program which processes files. The processing is nontrivial, so the best way is to split different phases into standalone modules which then would be used as necessary (since sometimes i will be only interested in the output of module A, sometimes i would need output of five other modules, etc). The thing is, that i need the modules to cooperate, because the output of one might be the input of another. And i need it to be FAST. Moreover i want to avoid doing certain processing more than once (if module A creates some data which then need to be processed by module B and C, i don't want to run module A twice to create the input for modules B,C ).
The information the modules need to share would mostly be blocks of binary data and/or offsets into the processed files. The task of the main program would be quite simple - just parse arguments, run required modules (and perhaps give some output, or should this be the task of the modules?).
I don't need the modules to be loaded at runtime. It's perfectly fine to have libs with a .h file and recompile the program every time there is a new module or some module is updated. The idea of modules is here mainly because of code readability, maintaining and to be able to have more people working on different modules without the need to have some predefined interface or whatever (on the other hand, some "guidelines" on how to write the modules would be probably required, i know that). We can assume that the file processing is a read-only operation, the original file is not changed.
Could someone point me in a good direction on how to do this in C++ ? Any advice is wellcome (links, tutorials, pdf books...).
This looks very similar to a plugin architecture. I recommend to start with a (informal) data flow chart to identify:
With these Information you can start to build generic interfaces, which allow to bind to other interfaces at runtime. Then I would add a factory function to each module to request the real processing object out of it. I don't recommend to get the processing objects direct out of the module interface, but to return a factory object, where the processing objects ca be retrieved. These processing objects then are used to build the entire processing chain.
A oversimplified outline would look like this:
struct Processor
{
void doSomething(Data);
};
struct Module
{
string name();
Processor* getProcessor(WhichDoIWant);
deleteprocessor(Processor*);
};
Out of my mind these patterns are likely to appear: