I recently worked on a garbage collector that, in order to function properly, needs to be aware of all the application’s threads. However, unlike managed environments where the virtual machine is aware of all threads, this scenario occurs in a system context where new threads can be initiated from anywhere, including those linked in 3rd party libraries.
Consequently, I had no choice but to intercept pthread_create
to incorporate logic for registering the thread in the garbage collector (GC). Achieving this in a satisfactory manner was a lengthy quest, so I would like to share the end result here. Furthermore, the solution is not specific to pthread_create
and can be applied to intercept most system library calls, such as malloc
and free
.
The calls we aim to intercept originate from shared objects supplied by the system, such as libpthread.so
or libc.so
. Since the function we intend to intercept lacks an implementation in our program, we can provide one, and it will take precedence over the system's implementation. This is fantastic news, but a significant problem remains: we still need to forward the call to the system's pthread_create
so that a new thread will indeed be created.
To achieve this, we manually resolve the pthread_create
function using dlsym
. To ensure we do not resolve the same method again and again, we use the RTLD_NEXT
functionality, which ensures the lookup only happens in subsequent objects.
#include <dlfcn.h>
#include <pthread.h>
#include <stdio.h>
#define RTLD_DEFAULT ((void*)0)
#define RTLD_NEXT ((void*)-1ll)
typedef int (*pthread_create_type)(pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg);
int pthread_create(pthread_t* thread, const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg) {
printf("pthread_create intercepted!");
pthread_create_type real_pthread_create =
(pthread_create_type) dlsym(RTLD_NEXT, "pthread_create");
return real_pthread_create(thread, attr, start_routine, arg);
}
This solution still has a couple of problems. First it does resolve the symbol every time, which can be costly. We could cache the result of the resolution and check the cache before resolving, but we can do better. We are doing an indirect call no matter what, so we might as well take advantage of this and create a trampoline function, that resolves and then update the indirect call directly. While we are at it, we can add some error handling code when the real pthread_create
fails to resolve.
#include <dlfcn.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
typedef int (*pthread_create_type)(pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg);
static int resolve_pthread_create(pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg);
static pthread_create_type pthread_create_trampoline = resolve_pthread_create;
static int resolve_pthread_create(pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg) {
pthread_create_type real_pthread_create =
(pthread_create_type) dlsym(RTLD_NEXT, "pthread_create");
if (!real_pthread_create) {
printf("Failed to locate pthread_create!");
exit(1);
}
pthread_create_trampoline = real_pthread_create;
return real_pthread_create(thread, attr, start_routine, arg);
}
int pthread_create(pthread_t* thread, const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg) {
printf("pthread_create intercepted!");
return pthread_create_trampoline (thread, attr, start_routine, arg);
}
This is good, and we could stop there, however, there one more major problem: sanitizers. Sanitizers such as asan, tsan, ubsan and so on leverage a similar technique to intercept system library calls. If we stop there, we’d be unable to use them, which would be a major problem. Thanksfully, they also provide a way to work around this: we can resolve __interceptor_function_we_want_to_intercept
instead of the function itself. Detailed explainations can be found in the intercept library’s source.
Introducing this two stage resolution, we have something that does pretty much everything we want.
typedef int (*pthread_create_type)(pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg);
static int resolve_pthread_create(pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg);
static pthread_create_type pthread_create_trampoline = resolve_pthread_create;
static int resolve_pthread_create(pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg) {
pthread_create_type real_pthread_create = (pthread_create_type)
dlsym(RTLD_DEFAULT, "__interceptor_pthread_create");
if (real_pthread_create) {
goto Forward;
}
real_pthread_create =
(pthread_create_type) dlsym(RTLD_NEXT, "pthread_create");
if (!real_pthread_create) {
printf("Failed to locate pthread_create!");
exit(1);
}
Forward:
pthread_create_trampoline = real_pthread_create;
return real_pthread_create(thread, attr, start_routine, arg);
}
int pthread_create(pthread_t* thread, const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg) {
printf("pthread_create intercepted!");
return pthread_create_trampoline(thread, attr, start_routine, arg);
}
While this snippet of code is rather short and easy to understand, most of this is undocumented and it took a fair amount of research to figure it out.
If this is interesting or useful to you, let me know.