Self Modifying Code [C++]

Gogeta70 picture Gogeta70 · Apr 26, 2011 · Viewed 8.5k times · Source

I was reading a codebreakers journal article on self-modifying code and there was this code snippet:

void Demo(int (*_printf) (const char *,...))
{ 
      _printf("Hello, OSIX!n"); 
      return; 
} 
int main(int argc, char* argv[]) 
{ 
  char buff[1000]; 
  int (*_printf) (const char *,...); 
  int (*_main) (int, char **); 
  void (*_Demo) (int (*) (const char *,...)); 
  _printf=printf; 
  int func_len = (unsigned int) _main ­- (unsigned int) _Demo; 
  for (int a=0; a<func_len; a++) 
    buff[a] = ((char *) _Demo)[a]; 
  _Demo = (void (*) (int (*) (const char *,...))) &buff[0]; 
  _Demo(_printf); 
  return 0; 
}

This code supposedly executed Demo() on the stack. I understand most of the code, but the part where they assign 'func_len' confuses me. As far as i can tell, they're subtracting one random pointer address from another random pointer address.

Someone care to explain?

Answer

Jonathan Leffler picture Jonathan Leffler · Apr 26, 2011

The code is relying on knowledge of the layout of functions from the compiler - which may not be reliable with other compilers.

The func_len line, once corrected to include the - that was originally missing, determines the length of the function Demo by subtracting the address in _Demo (which is is supposed to contain the start address of Demo()) from the address in _main (which is supposed to contain the start address of main()). This is presumed to be the length of the function Demo, which is then copied byte-wise into the buffer buff. The address of buff is then coerced into a function pointer and the function then called. However, since neither _Demo nor _main is actually initialized, the code is buggy in the extreme. Also, it is not clear that an unsigned int is big enough to hold pointers accurately; the cast should probably be to a uintptr_t from <stdint.h> or <inttypes.h>.

This works if the bugs are fixed, if the assumptions about the code layout are correct, if the code is position-independent code, and if there are no protections against executing data space. It is unreliable, non-portable and not recommended. But it does illustrate, if it works, that code and data are very similar.

I remember pulling a similar stunt between two processes, copying a function from one program into shared memory, and then having the other program execute that function from shared memory. It was about a quarter of a century ago, but the technique was similar and 'worked' for the machine it was tried on. I've never needed to use the technique since, thank goodness!