Friday, March 18, 2011

massdiff - Diff for Massif Snapshots

Massif (part of Valgrind) is a super-useful tool to find out how a program allocates memory. It gives very detailed graphs about exactly which lines of code do allocations that are not free'd later. For example,

96.42% (22,771,482B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->39.75% (9,388,032B) 0x8058AC1: arena_malloc_large (jemalloc.c:3831)
| ->39.75% (9,388,032B) 0x8058DB9: arena_malloc (jemalloc.c:3856)
|   ->34.74% (8,204,288B) 0x8058E98: imalloc (jemalloc.c:3866)
|   | ->34.27% (8,093,696B) 0x805D833: malloc (jemalloc.c:5882)
|   | | ->12.23% (2,887,680B) 0x6BBB63F: sqlite3MemMalloc (sqlite3.c:14221)
  ...

This shows a snapshot at a particular point in time, during which the whole program allocated ~22MB of memory, of which almost 3MB was due to SQLite.

One limitation though is you can't easily see what changes from one snapshot to another, which is important in order to see if there are any gradual memory increases (these may or may not be actual leaks). So I wrote a small python script, massdiff, which diffs two Massif snapshots. Here is an example of the output:


Diffing snapshots 30 50

- (heap allocation functions) malloc/new/new[], --alloc-fns, etc. - 22,286,738
    [ diff: +502,088 ]
+ (heap allocation functions) malloc/new/new[], --alloc-fns, etc. - 22,788,826

-   0x8058927: arena_malloc_small (jemalloc.c:3794) -  7,494,594
      [ diff: +499,552 ]
+   0x8058927: arena_malloc_small (jemalloc.c:3794) -  7,994,146

-     0x8058D9D: arena_malloc (jemalloc.c:3854) -  7,494,594
        [ diff: +499,552 ]
+     0x8058D9D: arena_malloc (jemalloc.c:3854) -  7,994,146

-       0x8058E98: imalloc (jemalloc.c:3866) -  6,055,702
          [ diff: +499,232 ]
+       0x8058E98: imalloc (jemalloc.c:3866) -  6,554,934

-         0x805D833: malloc (jemalloc.c:5882) -  5,552,542
            [ diff: +499,232 ]
+         0x805D833: malloc (jemalloc.c:5882) -  6,051,774

-           0x52D2FA3: js_malloc (jsutil.h:213) -  1,635,252
              [ diff: +432,064 ]
+           0x52D2FA3: js_malloc (jsutil.h:213) -  2,067,316

-             0x52D9D27: JSRuntime::malloc(unsigned int, JSContext*) (jscntxt.h:1358) -  1,635,252
                [ diff: +432,064 ]
+             0x52D9D27: JSRuntime::malloc(unsigned int, JSContext*) (jscntxt.h:1358) -  2,067,316

-               0x52D9DEC: JSContext::malloc(unsigned int) (jscntxt.h:2027) -  1,600,532
                  [ diff: +431,904 ]
+               0x52D9DEC: JSContext::malloc(unsigned int) (jscntxt.h:2027) -  2,032,436

-                 0x5D5B41D: JSObject::allocSlots(JSContext*, unsigned int) (jsobj.cpp:4032) -    161,456
                    [ diff: +344,832 ]
+                 0x5D5B41D: JSObject::allocSlots(JSContext*, unsigned int) (jsobj.cpp:4032) -    506,288

-                   0x5D5B594: JSObject::growSlots(JSContext*, unsigned int) (jsobj.cpp:4078) -    161,456
                      [ diff: +344,832 ]
+                   0x5D5B594: JSObject::growSlots(JSContext*, unsigned int) (jsobj.cpp:4078) -    506,288


The diff only shows what changed between the two revisions, and shows that in in tree format, just like Massif's ms_print does. Each group of three lines shows a line from the first snapshot on top, the corresponding line from the later snapshot on bottom, and between them the difference.

The output here is from loading about:blank 500 times in Fennec. There is overall ~500K of additional allocation (so, ~1K per page load), of which JSObject::growSlots is responsible for ~340K (this is later all deallocated at once, presumably due to GC being run).

So far this has been useful in helping discover one specific case, bug 641663, and I'm still investigating some additional issues. Hopefully it can be a useful tool for other people too. To use it,
  • Run Massif on your program, see here and here
  • Run ms_print on the output
  • Run massdiff on that file, with the two snapshot numbers you want to diff as parameters

8 comments:

  1. The script doesn't work when you take 100 or more snapshot as it matches one or more spaces at the beginning of the snapshot title - instead it should match zero or more spaces.

    i.e.

    # Snapshot title
    - m = re.match(' +([..regex elided...]')
    + m = re.match(' *([..regex elided...]')
    if m:
    snapshot = Snapshot()
    for i in ['n', 'time', 'total', 'useful_heap', 'extra_heap', 'stacks']:

    ReplyDelete
  2. Your script is interesting, but doesn't work when valgrind collects more than 100 snapshots. The problem is on line 42 - when you attempt to match the snapshot title it should have zero or more spaces at the beginning, not one or more.

    ReplyDelete
  3. Very useful script.
    My use case is slightly different: my program loops 100 times over a memory consuming api and I want the details. For that, I use the macro VALGRIND_MONITOR_COMMAND("detailed_snapshot") at the beginning of the api, and it generates 100 snapshot files.
    In the same idea, I want to compare 2 snapshot files. Inspired from your script, here is mine. Thanks for the idea.

    ReplyDelete
  4. '''
    Parse 2 massif snapshot files
    ============================

    Usage: this_script.py SNAPSHOT_1 SNAPSHOT 2

    where

    SNAPSHOT_1 and 2 are massif snapshots (not a ms_print dump)
    typically generated by the macro in the code VALGRIND_MONITOR_COMMAND("detailed_snapshot")
    Here is a sample of such a file:

    >>>>>>>>>>>>>>
    desc: --time-unit=ms --threshold=0.0
    cmd: /my_prog.sh
    time_unit: ms
    #-----------
    snapshot=0
    #-----------
    time=4078176
    mem_heap_B=765203759
    mem_heap_extra_B=58898945
    mem_stacks_B=0
    heap_tree=detailed
    n29988: 765203759 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
    n187: 230733840 0x1A303B27: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.390.77)
    n3: 121525532 0x1B7CCA75: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
    n2: 123451053 0x1B7B7B6B: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
    n1: 123451050 0x1B89A702: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
    n1: 123451050 0x1B89C079: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
    n2: 123451050 0x1B4FFE67: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
    n2: 116592620 0x1B503A2B: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
    n2: 109734190 0x1B4D387A: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
    >>>>>>>>>>>>>>

    As you can see, there is only one snapshot for the whole file, some header
    lines, followed by n-lines describing the allocations in a tree-like structure
    where indentation is relevant to the call stacks. The first n-line is the
    allocated memory for the whole snapshot.
    '''

    ReplyDelete
  5. import os, sys, re


    # define custom parameters

    # depth of the tree, 0 means only global memory
    max_depth = 1
    print 'maximum depth =', max_depth
    # tolerance in octets for comparison, 1000=1ko
    diff_threshold = 1000000
    print 'tolerance =', diff_threshold, 'bytes'


    # Read files

    class Snapshot: pass
    class SnapshotLine:
    def __str__(self):
    return ('%s%s - %d - %10s' % (' '*self.indent, self.addr, self.mem, self.text))[:130]

    snapshots = []

    def read_file(file):
    print 'reading file', file, '...'
    snapshot = Snapshot()
    snapshot.file = file
    snapshot.lines = []
    snapshots.append(snapshot)

    started = False
    title_read = False
    above_max_depth_ws='^' + ' '*(max_depth+1)
    for line in open(file, 'r').readlines():
    #print line
    line = line.replace('\n', '')
    if 'snapshot=0' in line:
    started = True
    continue
    if not started: continue
    if '#----' in line: continue

    # Snapshot title lines
    if not title_read:
    found = False
    for i in ['time', 'mem_heap_B', 'mem_heap_extra_B', 'mem_stacks_B']:
    expr = '^' + i + '=(?P[\d]+)'
    m = re.match(expr, line)
    if m:
    setattr(snapshot, i, int(m.group('value')))
    found = True
    break
    if found: continue;
    title_read = True

    # Snapshot detail line
    m = re.match(above_max_depth_ws, line)
    if m: continue;

    m = re.match('(?P[ ]*)n(?P[\d]+): (?P[\d]+) (?P0x[0-9A-F]*): (?P.*)', line)
    if not m:
    m = re.match('(?P[ ]*)n(?P[\d]+): (?P[\d]+) (?P.*)', line)
    if m:
    snapshot_line = SnapshotLine()
    snapshot_line.indent = len(m.group('indent'))
    snapshot_line.n = int(m.group('n'))
    snapshot_line.mem = int(m.group('mem'))
    try:
    snapshot_line.addr = m.group('addr')
    except:
    snapshot_line.addr = 0
    snapshot_line.text = m.group('text')
    #print snapshot_line.__dict__
    if snapshot_line.indent <= max_depth:
    snapshot.lines.append(snapshot_line)

    read_file(sys.argv[1])
    read_file(sys.argv[2])


    # Generate tree structure

    print 'generating tree structure...'
    for snapshot in snapshots:
    snapshot.roots = {}
    for i in range(snapshot.lines.__len__()):
    line = snapshot.lines[i]
    line.children = {}
    indent = line.indent
    if indent == 0: snapshot.root = line
    elif indent == 1: snapshot.roots[line.addr] = line
    else:
    # Find parent
    j = i-1
    while snapshot.lines[j].indent != indent-1: j -= 1
    snapshot.lines[j].children[line.text] = line

    #print snapshot.file, snapshot.roots


    # Dump tree

    def dump_lines(lines):
    def dump_line(line):
    print line
    for child in line.children.values():
    dump_line(child)
    for line in lines:
    dump_line(line)

    def dump_tree():
    print 'Tree:'
    for snapshot in snapshots:
    print snapshot.file, snapshot.root, snapshot.roots
    dump_lines(snapshot.roots.values())

    #dump_tree()

    ReplyDelete
  6. # Diff two snapshots

    print 'diffing...'
    def commify(x):
    sign = '+' if x >= 0 else '-'
    ret = list(str(abs(x))[::-1])
    for i in range(len(ret)):
    if i % 3 == 2 and i != len(ret)-1:
    ret[i] += ','
    return sign+(''.join(ret))[::-1]

    def diff_dicts(d1, d2):
    keys = list(set(d1.keys() + d2.keys()))
    data = [[key, 0] for key in keys]

    for datum in data:
    key = datum[0]
    if key not in d2:
    datum[1] = -d1[key].mem
    elif key not in d1:
    datum[1] = d2[key].mem
    else:
    datum[1] = d2[key].mem - d1[key].mem

    data.sort(lambda x, y: y[1]-x[1])

    for datum in data:
    key = datum[0]
    diff = datum[1]
    if key not in d2:
    print "-", d1[key]
    elif key not in d1:
    print "+", d2[key]
    else:
    if abs(diff) > diff_threshold:
    print "-/+", d1[key]
    print '[diff: %s]' % commify(diff)
    diff_dicts(d1[key].children, d2[key].children)

    print '-', snapshots[0].file, snapshots[0].root
    print '[diff: %s]' % commify(snapshots[1].root.mem - snapshots[0].root.mem)
    print '+', snapshots[1].file, snapshots[1].root
    print '-------------'
    diff_dicts(snapshots[0].roots, snapshots[1].roots)

    ReplyDelete
  7. Sorry I had to make several posts because the full script is too large.

    ReplyDelete