Code profiling is a useful technique to help find what sections of code that might be written inefficiently. Profiling a usperspace application is well documented, but I had to piece together the process for profiling a Linux kernel module. Here is how you do it:
- Make sure the oprofile package is installed on your system. For Archlinux:
$ pacman -Ss oprofile community/oprofile 1.0.0-1 [installed] System-wide profiler for Linux systems
- Set the necessary kernel .config options to enable profiling support:
CONFIG_PROFILING=y CONFIG_OPROFILE=y
- Compile, install, and boot into the new kernel. Oprofile uses the decompressed vmlinux image from the root of your kernel source directory to properly count kernel function calls, so make sure the machine you are testing on has access to this file:
$ ls linux/ arch/ block/ ... System.map vmlinux* #
- Tell oprofile to start collecting data on the entire system. The –vmlinux option is used to point operf to the vmlinux file from the previous step:
$ sudo operf --system-wide --vmlinux=/boot/vmlinux-3.19.0-rc5-ARCH+
The operf daemon will then collect data for both kernel and user level code. You can narrow down what you are interested in using the –events option, so if you only want kernel statistics:
$ sudo operf --system-wide --vmlinux=/boot/vmlinux-3.19.0-rc5-ARCH+ --events CPU_CLK_UNHALTED:900000:0:1:0
- Run your test and then stop the operf daemon.
- Use the opreport tool to view profiling results. The –image-path option is needed to tell opreport where to look for modules:
$ sudo opreport --image-path=/usr/lib/modules/3.19.0-rc5-ARCH+ -l samples % image name app name symbol name 3184 2.5018 vmlinux-3.19.0-rc5-ARCH+ 0:1H memcpy 2025 1.5912 vmlinux-3.19.0-rc5-ARCH+ 1:1H memcpy 1426 1.1205 vmlinux-3.19.0-rc5-ARCH+ fsx copy_user_enhanced_fast_string 1264 0.9932 nfs.ko 0:2 nfs_page_group_sync_on_bit 1140 0.8958 vmlinux-3.19.0-rc5-ARCH+ fsx _raw_spin_lock 871 0.6844 vmlinux-3.19.0-rc5-ARCH+ 0:1H _raw_spin_lock 800 0.6286 virtio_ring.ko 0:1H virtqueue_get_buf 714 0.5610 nfs.ko 0:2 nfs_page_group_lock 686 0.5390 vmlinux-3.19.0-rc5-ARCH+ fsx get_page_from_freelist 677 0.5320 vmlinux-3.19.0-rc5-ARCH+ 0:1H pvclock_clocksource_read