Profiling Linux Kernel Modules

Code profiling is a useful technique to help find what sections of code that might be written inefficiently.  Profiling a usperspace application is well documented, but I had to piece together the process for profiling a Linux kernel module.  Here is how you do it:

  1. Make sure the oprofile package is installed on your system.  For Archlinux:
    $ pacman -Ss oprofile
    community/oprofile 1.0.0-1 [installed]
        System-wide profiler for Linux systems
  2. Set the necessary kernel .config options to enable profiling support:
  3. Compile, install, and boot into the new kernel.  Oprofile uses the decompressed vmlinux image from the root of your kernel source directory to properly count kernel function calls, so make sure the machine you are testing on has access to this file:
    $ ls linux/
    vmlinux*  #
  4. Tell oprofile to start collecting data on the entire system.  The –vmlinux option is used to point operf to the vmlinux file from the previous step:
    $ sudo operf --system-wide --vmlinux=/boot/vmlinux-3.19.0-rc5-ARCH+

    The operf daemon will then collect data for both kernel and user level code.  You can narrow down what you are interested in using the –events option, so if you only want kernel statistics:

    $ sudo operf --system-wide --vmlinux=/boot/vmlinux-3.19.0-rc5-ARCH+ --events CPU_CLK_UNHALTED:900000:0:1:0
  5. Run your test and then stop the operf daemon.
  6. Use the opreport tool to view profiling results.  The –image-path option is needed to tell opreport where to look for modules:
    $ sudo opreport --image-path=/usr/lib/modules/3.19.0-rc5-ARCH+ -l
    samples  %       image name                app name  symbol name
    3184     2.5018  vmlinux-3.19.0-rc5-ARCH+  0:1H      memcpy
    2025     1.5912  vmlinux-3.19.0-rc5-ARCH+  1:1H      memcpy
    1426     1.1205  vmlinux-3.19.0-rc5-ARCH+  fsx       copy_user_enhanced_fast_string
    1264     0.9932  nfs.ko                    0:2       nfs_page_group_sync_on_bit
    1140     0.8958  vmlinux-3.19.0-rc5-ARCH+  fsx       _raw_spin_lock
    871      0.6844  vmlinux-3.19.0-rc5-ARCH+  0:1H      _raw_spin_lock
    800      0.6286  virtio_ring.ko            0:1H      virtqueue_get_buf
    714      0.5610  nfs.ko                    0:2       nfs_page_group_lock
    686      0.5390  vmlinux-3.19.0-rc5-ARCH+  fsx       get_page_from_freelist
    677      0.5320  vmlinux-3.19.0-rc5-ARCH+  0:1H      pvclock_clocksource_read