I'm interested in hooking and I decided to see if I could hook some functions. I wasn't interested in using a library like detours because I want to have the experience of doing it on my own. With some sources I found on the internet, I was able to create the code below. It's basic, but it works alright. However when hooking functions that are called by multiple threads it proves to be extremely unstable. If two calls are made at nearly the same time, it'll crash. After some research I think I need to create a trampoline function. After looking for hours all I was not able to find anything other that a general description on what a trampoline was. I could not find anything specifically about writing a trampoline function, or how they really worked. If any one could help me write one, post some sources, or at least point me in the right direction by recommending some articles, sites, books, etc. I would greatly appreciate it.
Below is the code I've written. It's really basic but I hope others might learn from it.
test.cpp
#include "stdafx.h"
Hook hook;
typedef int (WINAPI *tMessageBox)(HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType);
DWORD hMessageBox(HWND hWnd, LPCTSTR lpText, LPCTSTR lpCaption, UINT uType)
{
hook.removeHook();
tMessageBox oMessageBox = (tMessageBox)hook.funcPtr;
int ret =oMessageBox(hWnd, lpText, "Hooked!", uType);
hook.applyHook(&hMessageBox);
return ret;
}
void hookMessageBox()
{
printf("Hooking MessageBox...\n");
if(hook.findFunc("User32.dll", "MessageBoxA"))
{
if(hook.applyHook(&hMessageBox))
{
printf("hook applied! \n\n");
} else printf("hook could not be applied\n");
}
}
hook.cpp
#include "stdafx.h"
bool Hook::findFunc(char* libName, char* funcName)
{
Hook::funcPtr = (void*)GetProcAddress(GetModuleHandleA(libName), funcName);
return (Hook::funcPtr != NULL);
}
bool Hook::removeHook()
{
DWORD dwProtect;
if(VirtualProtect(Hook::funcPtr, 6, PAGE_EXECUTE_READWRITE, &dwProtect))
{
WriteProcessMemory(GetCurrentProcess(), (LPVOID)Hook::funcPtr, Hook::origData, 6, 0);
VirtualProtect(Hook::funcPtr, 6, dwProtect, NULL);
return true;
} else return false;
}
bool Hook::reapplyHook()
{
DWORD dwProtect;
if(VirtualProtect(funcPtr, 6, PAGE_EXECUTE_READWRITE, &dwProtect))
{
WriteProcessMemory(GetCurrentProcess(), (LPVOID)funcPtr, Hook::hookData, 6, 0);
VirtualProtect(funcPtr, 6, dwProtect, NULL);
return true;
} else return false;
}
bool Hook::applyHook(void* hook)
{
return setHookAtAddress(Hook::funcPtr, hook);
}
bool Hook::setHookAtAddress(void* funcPtr, void* hook)
{
Hook::funcPtr = funcPtr;
BYTE jmp[6] = { 0xE9, //jmp
0x00, 0x00, 0x00, 0x00, //address
0xC3 //retn
};
DWORD dwProtect;
if(VirtualProtect(funcPtr, 6, PAGE_EXECUTE_READWRITE, &dwProtect)) // make memory writable
{
ReadProcessMemory(GetCurrentProcess(), (LPVOID)funcPtr, Hook::origData, 6, 0); // save old data
DWORD offset = ((DWORD)hook - (DWORD)funcPtr - 5); //((to)-(from)-5)
memcpy(&jmp[1], &offset, 4); // write address into jmp
memcpy(Hook::hookData, jmp, 6); // save hook data
WriteProcessMemory(GetCurrentProcess(), (LPVOID)funcPtr, jmp, 6, 0); // write jmp
VirtualProtect(funcPtr, 6, dwProtect, NULL); // reprotect
return true;
} else return false;
}
If you want your hook to be safe when called by multiple threads, you don't want to be constantly unhooking and rehooking the original API.
A trampoline is simply a bit of code you generate that replicates the functionality of the first few bytes of the original API (which you overwrote with your jump), then jumps into the API after the bytes you overwrote.
Rather than unhooking the API, calling it and rehooking it you simply call the trampoline.
This is moderately complicated to do on x86 because you need (a fairly minimal) disassembler to find the instruction boundaries. You also need to check that the code you copy into your trampoline doesn't do anything relative to the instruction pointer (like a jmp, branch or call).
This is sufficient to make calls to the hook thread-safe, but you can't create the hook if multiple threads are using the API. For this, you need to hook the function with a two-byte near jump (which can be written atomically). Windows APIs are frequently preceded by a few NOPs (which can be overwritten with a far jump) to provide a target for this near jump.
Doing this on x64 is much more complicated. You can't simply patch the function with a 64-bit far jump (because there isn't one, and instructions to simulate it are often too long). And, depending on what your trampoline does, you may need to add it to the OS's stack unwind information.
I hope this isn't too general.