Unlocking the secrets of the /proc filesystem

0

In a previous article, 4 open source tools for Linux system monitoring, I looked at tools like top, htop, and others, that allow SysAdmins to look inside the running Linux kernel. But there is also this thing called the observer effect which is a theory in the scientific discipline of physics that states, “simply observing a situation or phenomenon necessarily changes that phenomenon.” This is also true when measuring computer system performance. The act of using those monitoring tools alters the system’s use of resources including memory and CPU time.

The performance of Linux hosts is not impacted by data collection and monitoring tools. The Linux kernel is designed to always collect and store the performance data that is then merely accessed and displayed by any and all performance monitoring tools. It is the tools’ access of that data to read it and then to manipulate it and display it in a meaningful format that further affects the system performance.

Linux has some special filesystems that it creates in RAM at each boot, one of which is especially interesting to SysAdmins. The /proc filesystem is where the kernel stores the performance data of the running kernel and much more. The data is always there and it is easy to access. This is a virtual filesystems that exists only in RAM while the Linux host is running; it dosn’t exist on any physical disk. The /proc filesystem is not persistent like filesystems that are located on storage devices such as hard drives and solid state drives. It disappears, along with other virtual filesystems such as /sys, when the computer is turned off and is recreated anew each time Linux starts up.

The /proc filesystem is one with which you will become well acquainted as a SysAdmin so we are going to explore it in some detail.

The /proc filesystem

The /proc filesystem is defined by the FHS as the location for Linux to store information about the system, the kernel, and all processes running on the host. It is intended to be a place for the kernel to expose information about itself in order to facilitate access to data about the system. It is also designed to provide access to view kernel configuration parameters and to modify many of them when necessary in order to allow the SysAdmin to tune the running system without needing to perform reboots after making changes.

When used as a window into the state of the operating system and its view of the system and hardware, it provides easy access to virtually every bit of information you might want as a SysAdmin.

Exploring /proc

Let’s explore the /proc filesystem. For best results you should do this as root.

Let’s first look at the top-level contents of the /proc filesystem of a running Linux host. You may see color coding to differentiate files from directories.

First, look at the numeric entries. The names of these directories is a PID, or process ID number. Each of those PID directories contains information about the running process that it represents.

# cd /proc ; ls 
1     11    1353  20    46    61   929            kmsg
10    110   1355  21    47    611  930            kpagecgroup
1003  1101  136   22    4734  618  932            kpagecount
1004  1104  1360  23    4786  63   946            kpageflags
1024  1109  1361  24    48    641  96             latency_stats
1026  111   1362  2470  49    642  991            loadavg
103   114   1367  25    4931  65   992            locks
104   1147  1368  257   4941  66   997            mdstat
1047  115   137   258   4969  67   998            meminfo
1048  1150  1374  27    5     69   acpi           misc
105   1154  1375  28    5027  697  asound         modules
1051  1158  1376  29    5044  698  bootconfig     mounts
1053  116   138   299   5079  7    buddyinfo      mtrr
1059  1167  1382  3     5080  70   bus            net
106   1169  139   30    5095  71   cgroups        pagetypeinfo
1060  117   14    302   51    72   cmdline        partitions
1062  12    140   31    5111  73   consoles       pressure
1063  1206  141   312   5117  737  cpuinfo        schedstat
1065  121   142   3193  5118  74   crypto         scsi
1067  122   1422  33    5132  76   devices        self
1068  1226  1423  34    5138  77   diskstats      slabinfo
107   123   1477  35    5143  78   dma            softirqs
1070  124   1478  36    5148  788  driver         stat
1072  1241  1479  3671  5149  79   dynamic_debug  swaps
1079  1244  15    37    5165  80   execdomains    sys
108   1248  1509  39    5166  81   filesystems    sysrq-trigger
1081  1276  1558  4     5175  818  fs             sysvipc
1082  13    16    40    5176  82   interrupts     thread-self
1083  130   1631  41    5177  822  iomem          timer_list
1084  1324  1641  42    5178  829  ioports        tty
1086  1333  1652  4242  519   832  irq            uptime
1087  1335  17    43    520   86   kallsyms       version
109   1337  18    4446  58    87   kcore          vmallocinfo
1090  1342  19    45    59    926  keys           vmstat
1091  1348  2     4570  6     927  key-users      zoneinfo

Each of the files in the /proc directory contains information about some part of the kernel. Let’s take a look at a couple of these files, cpuinfo and meminfo.

The cpuinfo file is mostly static. It contains the specifications for all CPUs in the processor package. I pipe the output of the cat command through the less pager to make it easier to see the entire data stream.

# cat cpuinfo | less
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz
stepping        : 4
cpu MHz         : 2819.454
cache size      : 22528 KB
physical id     : 0
siblings        : 6
core id         : 0
cpu cores       : 6
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch pti fsgsbase bmi1 avx2 bmi2 invpcid rdseed clflushopt arat md_clear flush_l1d arch_capabilities
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi
bogomips        : 5638.90
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz
stepping        : 4
cpu MHz         : 2819.454
cache size      : 22528 KB
physical id     : 0
siblings        : 6
<SNIP>

The data from the cpuinfo file includes the processor ID and model, its current speed in MHz, and the flags that can be used to determine the CPU features. Now lets look at memory. First use cat to view the meminfo file and then use the free command to do a comparison.

# cat meminfo
MemTotal:        8126608 kB
MemFree:         6964236 kB
MemAvailable:    7543448 kB
Buffers:          111796 kB
Cached:           669104 kB
SwapCached:            0 kB
Active:           418000 kB
Inactive:         516204 kB
Active(anon):     161952 kB
Inactive(anon):        0 kB
Active(file):     256048 kB
Inactive(file):   516204 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8126460 kB
SwapFree:        8126460 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        153304 kB
Mapped:           142788 kB
Shmem:              8648 kB
KReclaimable:      51888 kB
Slab:             113028 kB
SReclaimable:      51888 kB
SUnreclaim:        61140 kB
KernelStack:        4672 kB
PageTables:         7160 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12189764 kB
Committed_AS:     912480 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       53040 kB
VmallocChunk:          0 kB
Percpu:             5024 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      137152 kB
DirectMap2M:     8251392 kB

# free
               total        used        free      shared  buff/cache   available
Mem:         8126608      584160     6963228        8648      832796     7542448
Swap:        8126460           0     8126460

There is a lot of information in the /proc/meminfo file. Some of that data are used by programs like the free command. If you want the complete picture of memory usage, look in /proc/meminfo. The free command, like top, htop, and many other core utilities, gets its data from the /proc filesystem.

Run the cat meminfo command several times in quick succession to see that the /proc/meminfo file is continuously changing. That indicates the file is being updated. You can do this easily with the watch command.

# watch cat meminfo

Because the data in /proc is a realtime picture of the state of the Linux kernel and the computer hardware, the data may change rapidly. Look at the interrupts file several times in a row.

Spend a little time to compare the data in the /proc/meminfo file against the information you get when using commands like free, and top. Where do you think these utility tools and many others get their information? Right here in the /proc filesystem, that’s where.

Exploring systemd

Lets look a little bit deeper into PID 1. Like all of the process directories it contains information about the process with that ID. So let’s look at some of that information. As root, enter and look at the contents of the /proc/1 directory. Then view the contents of the cmdline file.

# cd /proc/1 ; cat cmdline
/usr/lib/systemd/systemd--switched-root--system--deserialize=42rhgb

We can see from the contents of the cmdline that this is systemd, the mother of all programs. On all older and some current versions of Linux, PID 1 will be the init program.

Create an executable file named cpuHog in /usr/local/bin with permissions of 755. Add the content shown here and save the file.

#!/bin/bash
# This little program is a cpu hog
X=0;while [ 1 ];do echo $X;X=$((X+1));done

Start one instance in a terminal session as a non-root user.

$ cpuHog
0
1
2
3
4
5
6
7
8
9
10
11
12
<SNIP>

As root, use one of the monitoring tools like top to determine the PID of this cpuHog process. I use htop which shows the cpuHog as in Figure 1. On my VM, the PID is 5445 but it will be different for you. Be sure to use the correct PID for the cpuHog on your host.

Figure 1: The htop tool shows the CPU hog. This instance of cpuHog is 5539. Click to enlarge.

Make the directory corresponding to the PID of your cpuHog instance the PWD. Then list the contents.

# cd /proc/5539 ; ll | less
total 0
total 0
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 arch_status
dr-xr-xr-x.  2 dboth dboth 0 Sep 27 10:01 attr
-rw-r--r--.  1 dboth dboth 0 Sep 27 10:01 autogroup
-r--------.  1 dboth dboth 0 Sep 27 10:01 auxv
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 cgroup
--w-------.  1 dboth dboth 0 Sep 27 10:01 clear_refs
-r--r--r--.  1 dboth dboth 0 Sep 27 09:48 cmdline
-rw-r--r--.  1 dboth dboth 0 Sep 27 09:48 comm
-rw-r--r--.  1 dboth dboth 0 Sep 27 10:01 coredump_filter
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 cpu_resctrl_groups
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 cpuset
lrwxrwxrwx.  1 dboth dboth 0 Sep 27 10:01 cwd -> /home/dboth
-r--------.  1 dboth dboth 0 Sep 27 10:01 environ
lrwxrwxrwx.  1 dboth dboth 0 Sep 27 09:48 exe -> /usr/bin/bash
dr-x------.  2 dboth dboth 4 Sep 27 10:01 fd
dr-xr-xr-x.  2 dboth dboth 0 Sep 27 10:01 fdinfo
-rw-r--r--.  1 dboth dboth 0 Sep 27 10:01 gid_map
-r--------.  1 dboth dboth 0 Sep 27 10:01 io
-r--------.  1 dboth dboth 0 Sep 27 10:01 ksm_merging_pages
-r--------.  1 dboth dboth 0 Sep 27 10:01 ksm_stat
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 latency
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 limits
-rw-r--r--.  1 dboth dboth 0 Sep 27 10:01 loginuid
dr-x------.  2 dboth dboth 0 Sep 27 10:01 map_files
-r--r--r--.  1 dboth dboth 0 Sep 27 09:48 maps
-rw-------.  1 dboth dboth 0 Sep 27 10:01 mem
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 mountinfo
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 mounts
-r--------.  1 dboth dboth 0 Sep 27 10:01 mountstats
dr-xr-xr-x. 54 dboth dboth 0 Sep 27 10:01 net
dr-x--x--x.  2 dboth dboth 0 Sep 27 10:01 ns
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 numa_maps
-rw-r--r--.  1 dboth dboth 0 Sep 27 10:01 oom_adj
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 oom_score
-rw-r--r--.  1 dboth dboth 0 Sep 27 10:01 oom_score_adj
-r--------.  1 dboth dboth 0 Sep 27 10:01 pagemap
-r--------.  1 dboth dboth 0 Sep 27 10:01 patch_state
-r--------.  1 dboth dboth 0 Sep 27 10:01 personality
-rw-r--r--.  1 dboth dboth 0 Sep 27 10:01 projid_map
lrwxrwxrwx.  1 dboth dboth 0 Sep 27 10:01 root -> /
-rw-r--r--.  1 dboth dboth 0 Sep 27 10:01 sched
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 schedstat
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 sessionid
-rw-r--r--.  1 dboth dboth 0 Sep 27 10:01 setgroups
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 smaps
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 smaps_rollup
-r--------.  1 dboth dboth 0 Sep 27 10:01 stack
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 stat
-r--r--r--.  1 dboth dboth 0 Sep 27 09:48 statm
-r--r--r--.  1 dboth dboth 0 Sep 27 09:48 status
-r--------.  1 dboth dboth 0 Sep 27 10:01 syscall
dr-xr-xr-x.  3 dboth dboth 0 Sep 27 09:48 task
-rw-r--r--.  1 dboth dboth 0 Sep 27 10:01 timens_offsets
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 timers
-rw-rw-rw-.  1 dboth dboth 0 Sep 27 10:01 timerslack_ns
-rw-r--r--.  1 dboth dboth 0 Sep 27 10:01 uid_map
-r--r--r--.  1 dboth dboth 0 Sep 27 10:01 wchan

Note the entries for cwd and exe. The cwd entry points to the current working directory, aka the PWD, for the process. The exe entry points to the executable file for the process, which is the Bash shell. But look at the content of the cmdline file.

# cat cmdline 
/bin/bash/usr/local/bin/cpuHog

This tells us the program that is running is the cpuHog. It also gives us some insight into the manner in which programs — at least shell scripts — are run in Linux. When starting a shell program, systemd first launches a shell, the default being Bash unless otherwise specified, and the shell program, cpuHog, is provided as an argument to the command.

Open two more terminal sessions and start an instance of cpuHog in each.

If you are not already using top or htop to monitor the ongoing activities on your VM, start an instance of one now. Look for the COMMAND column which, as you can see in Figure 2 shows the three running instances of the cpuHog.

Figure 2: The effects of three cpuHogs are easy to visualize. Click to enlarge.

We can see in Figure 2 that the cpuHogs have PIDs of 5539, 5620, and 5621.

Now press F5 to show the process tree which allows us to see the process hierarchy as in Figure 3.

Figure 3: The htop process tree view clarifies the process hierarchy. Click to enlarge.

Once again, this helps us to understand a bit more about how Linux launches command line programs. We can see that in all three cases that systemd starts a sub-shell and then launches the program within that sub-shell.

Another tool that allows us to view the process tree is the pstree utility. Use the pstree utility to view the process tree. I’ve removed some portions of the data stream from the pstree command. Scroll through the output and find the cpuHogs. You should check the man page for pstree to discover the meanings of the options we used for this command.

# pstree -Acp | less
systemd(1)-+-ModemManager(1158)-+-{ModemManager}(1160)
           |                    |-{ModemManager}(1161)
           |                    `-{ModemManager}(1163)
           |-NetworkManager(1169)-+-{NetworkManager}(1175)
           |                      |-{NetworkManager}(1177)
           |                      `-{NetworkManager}(1180)
           |-VBoxDRMClient(1087)-+-{VBoxDRMClient}(1098)
           |                     |-{VBoxDRMClient}(1099)
           |                     `-{VBoxDRMClient}(1100)
           |-VBoxService(1091)-+-{VBoxService}(1112)
           |                   |-{VBoxService}(1113)
           |                   |-{VBoxService}(1114)
<SNIP>
           |-accounts-daemon(1081)-+-{accounts-daemon}(1123)
           |                       |-{accounts-daemon}(1124)
           |                       `-{accounts-daemon}(1128)
<SNIP>
           |-rtkit-daemon(1072)-+-{rtkit-daemon}(1075)
           |                    `-{rtkit-daemon}(1076)
           |-sshd(1206)-+-sshd(1342)---sshd(1368)---bash(1376)---screen(1477)---screen(1478)-+-bash(1479)---htop(5584)
           |            |                                                                    |-bash(1558)-+-less(5939)
           |            |                                                                    |            `-pstree(5938)
           |            |                                                                    `-bash(5302)
           |            `-sshd(5377)---sshd(5393)---bash(5397)---screen(5514)---screen(5515)-+-bash(5516)---cpuHog(5539)
           |                                                                                 |-bash(5540)---cpuHog(5621)
           |                                                                                 `-bash(5558)---cpuHog(5620)
           |-systemd(1353)---(sd-pam)(1355)
           |-systemd(1335)-+-(sd-pam)(1337)
<SNIP>

Our real purpose here was to discover the PID of the cpuHogs in order to explore them in the /proc filesystem. Now we know multiple ways to do that. Pick one of the cpuHogs and, as root, make /proc/<PID> the pwd. I chose PID 5539 but you should use the PID for an instance of cpuHog on your host, and then list the contents of the directory.

# cd /proc/5539 ; ls
arch_status         environ            maps           patch_state   statm
attr                exe                mem            personality   status
autogroup           fd                 mountinfo      projid_map    syscall
auxv                fdinfo             mounts         root          task
cgroup              gid_map            mountstats     sched         timens_offsets
clear_refs          io                 net            schedstat     timers
cmdline             ksm_merging_pages  ns             sessionid     timerslack_ns
comm                ksm_stat           numa_maps      setgroups     uid_map
coredump_filter     latency            oom_adj        smaps         wchan
cpu_resctrl_groups  limits             oom_score      smaps_rollup
cpuset              loginuid           oom_score_adj  stack
cwd                 map_files          pagemap        stat

Take some time to explore the content of some of these files and subdirectories. Be sure to view the content of the status, limits, loginuid, and maps files. The maps file is a memory map that lists executable and library locations in virtual memory. The status file contains a great deal of information including some interesting data about virtual memory usage. Also take some time to explore a few of the other files and subdirectories in this and the PID directories for the other cpuHogs and some other processes as well.

There is a huge amount of information available in the /proc filesystem and it can be used to good advantage to solve problems. But that’s not all.

Kernel performance parameters

The capability to make changes to the running kernel on the fly and without a reboot is a powerful tool. It allows you to make instant changes to the Linux kernel to resolve a problem, enable a function, or tune performance. Let’s look at one example.

Linux is very flexible and can do many interesting things. One of those cool things is that any Linux host with multiple network interface cards (NICs) can act as a router. All it takes is a little knowledge, a simple command, and some changes to the firewall.

Routing is a task managed by the kernel. So turning it on (or off) requires that we change a kernel configuration parameter. Fortunately we do not need to recompile the kernel, and that is one of the benefits of exposing the kernel configuration in the /proc filesystem. We are going to turn on IP forwarding which provides the kernel’s basic routing functionality.

This little command line program makes the /proc/sys/net/ipv4 directory the pwd, prints the current state of the ip_forward file which should be zero (0), sets it to “1” and then prints its new state which should be 1. Routing is now turned on. Be sure to enter the command on a single line.

# cd /proc/sys/net/ipv4 ; cat ip_forward ; echo 1 > ip_forward ; cat ip_forward
0
1

Warning: I intentionally chose to modify a kernel parameter that I am familiar with and that won’t cause any harm to your Linux VM. As you explore the /proc filesystem, you should not make any further changes.

Congratulations! You have just altered the configuration of the running kernel.

In order to complete the configuration of a Linux host to full function as a router, additional changes would need to be made to the iptables firewall, or to whatever firewall software you may be using, and to the routing table. Those changes will define the specifics of the routing such as which packets get routed where.

While you are here in the /proc filesystem look around some more – follow your own curiosity to explore different areas of this important filesystem.

Leave a Reply