Simple Bug Finding Tools: Fugue (I)

2012, Aug 05    

It’s been a while since I started writing, as a personal ‘research’ project, a tool to automatically find bugs (that could lead to vulnerabilities) performing static code analysis and, even when it will take a very long while until I have something decent to release to the general public, I have some -I hope interesting- thoughts about the tool I’m writing: Fugue.

This tool uses CLang as the parser (as I do not have a rich uncle to get a license for EDG) and everything else is being written in Python: the translator to convert the CLang AST to my internal representation, the translator to convert other tools generated ASTs to that internal representation, the builder of the CFG, the SSA code generator, etc… Its in the very early stages at the moment but, more or less, it works for writing very simple scanners, as in the following example.

AST based checkers and ‘one’ bug from freetype2

In my opinion, writing checkers to find bugs that aren’t based on coding-style traversing the AST is a wrong approach (as in the AST we do not have a sense of code paths, the AST just describes the syntax). However, in some cases, it’s more than enough for writing very quick checkers without the need to perform complex data flow analysis. Take for example the following bug from freetype2:

$ diff –git a/src/base/ftstroke.c b/src/base/ftstroke.c
index 5399efe..f488e03 100644
-- a/src/base/ftstroke.c
+++ b/src/base/ftstroke.c
@@ -789,7 +789,6 @@
   FT_Stroker_New( FT_Library   library,
                   FT_Stroker  *astroker )
   {
-    FT_Error    error;
     FT_Memory   memory;
     FT_Stroker  stroker = NULL;
 
@@ -809,7 +808,7 @@
 
     *astroker = stroker;
 
-    return error;
+    return FT_Err_Ok;
   }

The developer declares a variable that is never initialized and returns the value of it at the end of the function. It’s a so basic error that even traversing the AST we can easily catch that one. The following is the interesting part of the simple checker I wrote in order to catch this bug:

def checker_step(self, element):
  if type(element) is asti.VarDecl:
    if element.init == None:
      self.locals[element.name] = element
    return
 
  if element.name in self.locals:
    del self.locals[element.name]
    return
 
  if type(element) is asti.ReturnStmt:
    if len(element.children) == 1:
      var = element.children[].name
      if self.locals.has_key(var):
        self.report_bug("returning uninitialized variable %s" % repr(var), element.position)

When the checker is executed it catches variable declaration statements, remembering the name of that variable. If that variable is, simply, referenced somewhere in the function it’s analysing, it forgets that variable (it considers it was initialized, even when it may not be the case). Then, if the tool finds a return statement with only one children expression and that one is a variable it remembered before, then, the tool reports a bug (the checker isn’t taking into account the possible code paths the program can follow but I wrote this simply as a test of the framework I’m working on).

Well so… I’m surprised that such a very basic checker actually finds a large number of real bugs fulfilling this very same pattern in a code base widely used, like is freetype2. When I wrote that checker to test my framework I expected to catch only this specific occurrence of the error in the freetype’s code base (the error at FT_Stroker_New), however, many more bugs like this appeared in the same code base:

$ cd freetype2 && autofugue
(...)
BUG:./src/smooth/ftgrays.c:1998:gray_raster_new: returning uninitialized variable 'error'
BUG:./src/cache/ftccmap.c:174:ftc_cmap_node_new: returning uninitialized variable 'error'
BUG:./src/cache/ftcsbits.c:60:ftc_sbit_copy_bitmap: returning uninitialized variable 'error'
BUG:./src/bzip2/ftbzip2.c:241:ft_bzip2_file_reset: returning uninitialized variable 'error'
BUG:./src/gzip/ftgzip.c:351:ft_gzip_file_reset: returning uninitialized variable 'error'
BUG:./src/lzw/ftlzw.c:166:ft_lzw_file_reset: returning uninitialized variable 'error'
BUG:./src/pshinter/pshglob.c:702:psh_globals_new: returning uninitialized variable 'error'
BUG:./src/base/ftutil.c:487:FT_QRealloc: returning uninitialized variable 'error'
BUG:./src/base/ftutil.c:459:FT_QAlloc: returning uninitialized variable 'error'
BUG:./src/base/ftutil.c:446:FT_Alloc: returning uninitialized variable 'error'
BUG:./src/base/ftutil.c:473:FT_Realloc: returning uninitialized variable 'error'
BUG:./src/base/ftstroke.c:812:FT_Stroker_New: returning uninitialized variable 'error'
BUG:./src/base/ftobjs.c:308:ft_glyphslot_alloc_bitmap: returning uninitialized variable 'error'
BUG:./src/base/ftobjs.c:4429:FT_New_Library: returning uninitialized variable 'error'
BUG:./src/base/ftobjs.c:1422:ft_lookup_PS_in_sfnt_stream: returning uninitialized variable 'error'
(...)

I stripped many more references to this very same bug found with a dumb ass AST based checker that I wrote in about 5 minutes. Why the freetype2 developers didn’t check if the bug  they fixed was in other parts of their code base? I don’t know, perhaps it’s because of the lack of tools for performing static analysis that are available for the general public and that grep’ing + performing manual analysis is a tedious task.

Conclussions

When I started this project I doubted if it would be worth it or not: it’s too many work, I started practically from zero and I didn’t know if, after all, it would find real bugs in real code bases. However, I noticed that every time I write a new checker (even when they are too basic) I always find bugs in real code bases/binaries. This is a good way to get the necessary strength to continue with this project 😉

Well, that’s all for now! I’ll try to write, from time to time, more about the framework I’m working on. Bye!