Demonstrations of perf-stat-hist, the Linux perf_events version.


Tracing the net:net_dev_xmit tracepoint, and building a power-of-4 histogram
for the "len" variable, for 10 seconds:

# ./perf-stat-hist net:net_dev_xmit len 10
Tracing net:net_dev_xmit, power-of-4, max 1048576, for 10 seconds...

            Range          : Count    Distribution
            0              : 0        |                                      |
            1 -> 3         : 0        |                                      |
            4 -> 15        : 0        |                                      |
           16 -> 63        : 2        |#                                     |
           64 -> 255       : 30       |###                                   |
          256 -> 1023      : 3        |#                                     |
         1024 -> 4095      : 446      |######################################|
         4096 -> 16383     : 0        |                                      |
        16384 -> 65535     : 0        |                                      |
        65536 -> 262143    : 0        |                                      |
       262144 -> 1048575   : 0        |                                      |
      1048576 ->           : 0        |                                      |

This showed that most of the network transmits were between 1024 and 4095 bytes,
with a handful between 64 and 255 bytes.

Cat the format file for the tracepoint to see what other variables are available
to trace. Eg:

# cat /sys/kernel/debug/tracing/events/net/net_dev_xmit/format 
name: net_dev_xmit
ID: 1078
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:void * skbaddr;	offset:8;	size:8;	signed:0;
	field:unsigned int len;	offset:16;	size:4;	signed:0;
	field:int rc;	offset:20;	size:4;	signed:1;
	field:__data_loc char[] name;	offset:24;	size:4;	signed:1;

print fmt: "dev=%s skbaddr=%p len=%u rc=%d", __get_str(name), REC->skbaddr, REC->len, REC->rc

That's where "len" came from.

This works by creating a series of tracepoint and filter pairs for each
histogram bucket, and doing in-kernel counts. The overhead should in many cases
be better than user space post-processing, however, this approach is still
not ideal. I've called it a "perf hacktogram". The overhead is relative to
the frequency of events, multiplied by the number of buckets. You can modify
the script to use power-of-2 instead, or whatever you like, but the overhead
for more buckets will be higher.


Histogram of the returned read() syscall sizes:

# ./perf-stat-hist syscalls:sys_exit_read ret 10
Tracing syscalls:sys_exit_read, power-of-4, max 1048576, for 10 seconds...

            Range          : Count    Distribution
            0              : 90       |#                                     |
            1 -> 3         : 9587     |######################################|
            4 -> 15        : 69       |#                                     |
           16 -> 63        : 590      |###                                   |
           64 -> 255       : 250      |#                                     |
          256 -> 1023      : 389      |##                                    |
         1024 -> 4095      : 296      |##                                    |
         4096 -> 16383     : 183      |#                                     |
        16384 -> 65535     : 12       |#                                     |
        65536 -> 262143    : 0        |                                      |
       262144 -> 1048575   : 0        |                                      |
      1048576 ->           : 0        |                                      |

Most of our read()s were tiny, between 1 and 3 bytes.


Using power-of-2, and a max of 1024:

# ./perf-stat-hist -P 2 -m 1024 syscalls:sys_exit_read ret
Tracing syscalls:sys_exit_read, power-of-2, max 1024, until Ctrl-C...
^C
            Range          : Count    Distribution
              -> -1        : 29       |##                                    |
            0 -> 0         : 1        |#                                     |
            1 -> 1         : 959      |######################################|
            2 -> 3         : 1        |#                                     |
            4 -> 7         : 0        |                                      |
            8 -> 15        : 2        |#                                     |
           16 -> 31        : 14       |#                                     |
           32 -> 63        : 1        |#                                     |
           64 -> 127       : 0        |                                      |
          128 -> 255       : 0        |                                      |
          256 -> 511       : 0        |                                      |
          512 -> 1023      : 1        |#                                     |
         1024 ->           : 1        |#                                     |


Specifying custom bucket sizes:

# ./perf-stat-hist -b "10 50 100 5000" syscalls:sys_exit_read ret
Tracing syscalls:sys_exit_read, specified buckets, until Ctrl-C...
^C
            Range          : Count    Distribution
              -> 9         : 989      |######################################|
           10 -> 49        : 5        |#                                     |
           50 -> 99        : 0        |                                      |
          100 -> 4999      : 2        |#                                     |
         5000 ->           : 0        |                                      |


Specifying a single value to bifurcate statistics:

# ./perf-stat-hist -b 10 syscalls:sys_exit_read ret
Tracing syscalls:sys_exit_read, specified buckets, until Ctrl-C...
^C
            Range          : Count    Distribution
              -> 9         : 2959     |######################################|
           10 ->           : 7        |#                                     |

This has the lowest overhead for collection, since only two tracepoint 
filter pairs are used.


Use -h to print the USAGE message:

# ./perf-stat-hist -h
USAGE: perf-stat-hist [-h] [-b buckets|-P power] [-m max] tracepoint
                      variable [seconds]
                 -b buckets      # specify histogram bucket points
                 -P power        # power-of (default is 4)
                 -m max          # max value for power-of
                 -h              # this usage message
   eg,
       perf-stat-hist syscalls:sys_enter_read count 5
                 # read() request histogram, 5 seconds
       perf-stat-hist syscalls:sys_exit_read ret 5
                 # read() return histogram, 5 seconds
       perf-stat-hist -P 10 syscalls:sys_exit_read ret 5
                 # ... use power-of-10
       perf-stat-hist -P 2 -m 1024 syscalls:sys_exit_read ret 5
                 # ... use power-of-2, max 1024
       perf-stat-hist -b "10 50 100 500" syscalls:sys_exit_read ret 5
                 # ... histogram based on these bucket ranges
       perf-stat-hist -b 10 syscalls:sys_exit_read ret 5
                 # ... bifurcate by the value 10 (lowest overhead)

See the man page and example file for more info.
