Thursday, November 8, 2012

Funny performance characteristics of DBI tools

These days I'm working on dynamic binary instrumentation (DBI) tools built with DynamoRIO, in particular Dr. Memory.  One of the things people always as is, "So what's the slowdown?  Are you faster than Valgrind?"  The answer is incredibly complicated, as performance questions usually are.  The easy answer is, "On SPEC?  10x slowdown, and yes, we are faster than Valgrind."  Go to the drmemory.org front page and you can see our pretty graph of spec slowdowns and how we are twice as fast as Valgrind.

OK, great!  Unfortunately, it turns out that most apps aren't at all like SPEC.

My team's goal is to find bugs in Chrome, so we want to run Chrome and its tests, not SPEC.  So what's different about Chrome?  Many things, but the biggest difference in one word is: V8.  V8 is the JavaScript engine that gives Chrome much of its performance edge, and it loves to optimize, deoptimize, and generally modify its generated code.  This creates a problem for DBI systems like DynamoRIO and Valgrind because they actually execute instrumented code out of a code cache, and not from the original application PC.  DBI systems need to maintain code cache consistency.

Valgrind doesn't actually try to solve this problem.  It requires the application to annotate all of its code modifications before re-executing the modified code.  Search for "VALGRIND_DISCARD_TRANSLATIONS" for more information on how this works.

DynamoRIO was originally a security product designed to run on unmodified Windows applications, so this approach was a non-starter.  Instead, DR uses page protections and code sandboxing to detect modification.  Sandboxing is where we insert extra code to check that the code we're about to execute is unmodified on every instruction.  When we use page protections, we mark all read, write, execute pages as read-only.  When the app writes its code, we catch the page fault, flush the code, and mark the page writable.

In theory, with those two techniques we are able to provide perfect application transparency.  However, it they come at a very high performance cost.  I'm currently running a V8 test case that takes .1 seconds to execute natively.  The version running under DynamoRIO has been running for 50 minutes while I've been writing this blog post, and it's actually making progress based on the output.  That gives us approximately a 32400x slowdown!

Generally speaking, our slowdown isn't this bad.  This particular test case is stress testing the optimizer.  But it demonstrates how hard it is to answer the question of performance.

Still, there's a lot of room for improvement here.  In particular, we are considering integrating parts of the Valgrind approach where we get help from the app to maintain code cache consistency, but I don't want to give up on the dream of a perfectly transparent DBI system yet.  Our rough idea for how to do this is to have the app tell us which areas of virtual memory it will maintain consistency for, and for the rest of the memory, we'll use our normal cache consistency protocol.  This naturally handles two JITs in the same process, one which is cooperating with us, and one which isn't.

Hopefully I'll write another blog post when we get this stuff implemented.

Monday, July 16, 2012

The environment is a command line with a standard format

This isn't particularly deep, but I was playing with _start and totally static ELF exes yesterday, and I had this minor realization about the environment.  It lives completely in userspace, and it's basically just a command line with a standard format.

People have varying opinions about environment variables.  Because of the way your shell handles them, they're usually implicit and easily forgotten.  People rarely specify them manually, and when they have to, they grumble.  But, as far as the kernel is concerned, they're just another null-terminated array of strings to pass to userspace when you exec a new process.  Truly, the execve prototype is:

int execve(const char *fname, char **argv, char **envp);

There's really nothing special about envp.  You can pass argv there if you like, but if you exec a regular exe that uses glibc, it probably won't understand you.

The environment has a standard, agreed-upon format, which is not true for argv.  Imagine if we used environment variables for most arguments.  You'd never have to write another flag parser again.  Different, unrelated components of your program can all read from it without conflicting with each other.

On the other hand, it's a totally global and flat namespace, kind of like DOM ids and CSS class names.  Everyone's nervous that they're going to trip over each other.  So, people tend to shy away from using it, and every exe has a slightly different flags format.