The Linux Philosophy for SysAdmins, Tenet 11 — Store data in open formats

0

Image by Opensource.com: CC-by-SA 4.0

Author’s note: This article is excerpted in part from chapter 13 of my book, The Linux Philosophy for SysAdmins, with some changes to update the information in it and to better fit this format.

The reason we use computers is to manipulate data. It used to be called “Data Processing” for a reason and that was an accurate description. We still process data although it may be in the form of video and audio streams, network and wireless streams, word processing data, spreadsheets, images, and more. It is all still just data.

We work with and manipulate text data streams with the tools we have available to us in Linux. That data usually needs to be stored and when there is a need to store data, it is always better to store it in open file formats than closed ones.

Closed is impenetrable

Way back before the Registry1 was introduced with Windows 3.1, most utilities and applications stored their configuration data in .ini files. These .ini files were stored as ASCII text and were easy to access, read, and even to modify. All it took was a simple text editor to make changes to these .ini configuration files.

The registry changed all that by storing configuration data in a single, large, and impenetrable binary data file. Although individual programs could store configuration data in .ini files, the Registry was touted as a way to centralize control over program configuration and its binary format was allegedly faster to parse than ASCII text files.

The registry also became the single point of failure for Windows systems. A single mangled entry could make the entire registry unreadable.

As System Administrators we have need to use many different types of data. Binary formats are by their very nature obscure and require special tools and knowledge to manipulate. There is a plethora of tools available that provide registry viewing and editing capability. These tools range from so-called freeware to expensive commercial programs. The necessity to use special tools that are themselves closed in order to manage a computer is a further step into impenetrability.

Part of the problem with all this is that the writers of these tools need to have information about the contents of registry entries that are being viewed or edited. Without that inside knowledge from the vendors of the proprietary software these tools are also useless. And one reason that proprietary software stores configuration data in a binary and proprietary format is to hide things from users.

This all stems from the closed and proprietary philosophy adhered to by these vendors. It appears on the surface to be about protecting the users from doing “stupid things,” but it is also a good way to obscure information.

Unix was not designed to stop its users from doing stupid things, as that would also stop them from doing clever things.

Doug Gwyn

I did try to locate a binary format Linux system configuration file in /etc but was unable to. Not one of the hundreds of configuration files in that directory was in a binary format. That is a really good thing, but it leaves me without a sample of a binary configuration file that I can use to show you what one is like.

One of the issues with binary formats is that there would have been no reason to create the many powerful tools we have in Linux. None of the data streams that could be generated from binary format files would be usable for tools like grep, awk, sed, cat, vim, emacs, or any of the hundreds of other text-based tools we take for granted every day while we administer the systems for which we are responsible.

Open is knowable

“Open source” is about the code and making the source code available to any and all who want to view or modify it. “Open data2” is about the openness of the data itself.

The term open data does not mean just having access to the data itself, it also means that the data can be viewed, used in some manner and shared with others. The exact manner in which those goals are achieved may be subject to some sort of attribution and open licensing. As with open source software, such licensing is intended to ensure the continued open availability of the data and not to restrict it any manner.

Open data is knowable. That means that access to it is unfettered. Truly open data can be read freely and understood without the need for further interpretation or decryption. In the SysAdmin world, open means that the data we use to configure, monitor, and manage our Linux hosts is easy to find, read, and modify when necessary. It is stored in formats that permit that ease of access, such as ASCII text. When a system is open the data and software can all be managed by open tools – tools that work with ASCII text.

Flat ASCII text

Flat ASCII plain text files open and knowable. They are easy to read by both programs and SysAdmins so it is easy to see when things are working — or not. Most Linux configuration files are simple flat ASCII text files which makes them easy to view and modify with the simple Linux text manipulation tools that are already at our disposal.

So we can use cat and less to view the Linux configuration files, and grep to extract and view lines containing specified strings. We can use vi, vim, emacs, or any other text editor to modify configuration files that are ASCII plain text format.

In one of my jobs – the one where we used Perl CGI scripts to manage the email system – we used flat text files to store all of our data. This data included departmental information such as who was authorized to access the data for that department. It also contained the ID and login information for the email users for each department.

We wrote some Perl programs to manage access to this data, both for us as the overall email SysAdmins, as well as for the departmental administrators. The data was still flat ASCII text files so we could use basic Linux command line tools to access and modify the data, especially when making mass changes to the files. At the same time we were also able to use our web-based Perl CGI scripts to work with individual personnel and departmental records.

We did think about using MySQL for record management but we decided that ACII files made more sense for their ease of access. One of our SysAdmins wrote a series of Perl scripts in about a week that allowed us to use SQL-like function calls from within the Perl scripts so we had the best of both worlds.

System Configuration files

Most of the system-wide configuration files are located in the /etc directory and its subdirectories. The files in /etc provide configuration data for many of the system services and servers such as email (SMTP, POP, IMAP), web (HTTP), time (NTP or chrony), SSH, network adapters and routing, the GRUB boot loader, display screen and printer configuration, and much more.

You can also find configuration files that provide system-wide configuration that affects all users, such as /etc/bashrc. The /etc/bashrc file provides initial setup and configuration for all users when they open a bash shell. Figure 1 shows a typical version of this file from my primary workstation.

# /etc/bashrc

# System wide functions and aliases
# Environment stuff goes in /etc/profile

# It's NOT a good idea to change this file unless you know what you
# are doing. It's much better to create a custom.sh shell script in
# /etc/profile.d/ to make custom changes to your environment, as this
# will prevent the need for merging in future updates.

# Prevent doublesourcing
if [ -z "$BASHRCSOURCED" ]; then
  BASHRCSOURCED="Y"

  # are we an interactive shell?
  if [ "$PS1" ]; then
    if [ -z "$PROMPT_COMMAND" ]; then
      declare -a PROMPT_COMMAND
      case $TERM in
      xterm*)
        if [ -e /etc/sysconfig/bash-prompt-xterm ]; then
            PROMPT_COMMAND=/etc/sysconfig/bash-prompt-xterm
        else
            PROMPT_COMMAND='printf "\033]0;%s@%s:%s\007" "${USER}" "${HOSTNAME%%.*}" "${PWD/#$HOME/\~}"'
        fi
        ;;
      screen*)
        if [ -e /etc/sysconfig/bash-prompt-screen ]; then
            PROMPT_COMMAND=/etc/sysconfig/bash-prompt-screen
        else
            PROMPT_COMMAND='printf "\033k%s@%s:%s\033\\" "${USER}" "${HOSTNAME%%.*}" "${PWD/#$HOME/\~}"'
        fi
        ;;
      *)
        [ -e /etc/sysconfig/bash-prompt-default ] && PROMPT_COMMAND=/etc/sysconfig/bash-prompt-default
        ;;
      esac
    fi
    # Turn on parallel history
    shopt -s histappend
    # Turn on checkwinsize
    shopt -s checkwinsize
    # Change the default prompt string
    [ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\u@\h \W]\\$ "
    # You might want to have e.g. tty in prompt (e.g. more virtual machines)
    # and console windows
    # If you want to do so, just add e.g.
    # if [ "$PS1" ]; then
    #   PS1="[\u@\h:\l \W]\\$ "
    # fi
    # to your custom modification shell script in /etc/profile.d/ directory
  fi

  if ! shopt -q login_shell ; then # We're not a login shell
    # Need to redefine pathmunge, it gets undefined at the end of /etc/profile
    pathmunge () {
        case ":${PATH}:" in
            *:"$1":*)
                ;;
            *)
                if [ "$2" = "after" ] ; then
                    PATH=$PATH:$1
                else
                    PATH=$1:$PATH
                fi
        esac
    }

    # Set default umask for non-login shell only if it is set to 0
    [ `umask` -eq 0 ] && umask 022

    SHELL=/bin/bash
    # Only display echos from profile.d scripts if we are no login shell
    # and interactive - otherwise just process them to set envvars
    for i in /etc/profile.d/*.sh; do
        if [ -r "$i" ]; then
            if [ "$PS1" ]; then
                . "$i"
            else
                . "$i" >/dev/null
            fi
        fi
    done

    unset i
    unset -f pathmunge
  fi

fi
# vim:ts=4:sw=4

Figure 1: A typical Linux Bash configuration file.

Relax – we are not going to examine every line of the /etc/bashrc file in Figure 1. However, there are a few things that we should observe in this file.

First, just look at all of the comments. This file is meant to be read by users. We SysAdmins are, after all, advanced users. One thing I like about Red Hat based distributions is that most of the configuration files and scripts are well commented.

One of the functions of this script is to set the shell command prompt. The script determines whether the shell is a standard xterm or vte terminal session, or if it is in a screen session. It sets the prompt string differently depending upon that condition. It also uses external files such as /etc/sysconfig/bash-prompt-xterm which contains the prompt configuration in a file and location easily managed by the SysAdmin.

Up near the top of the file is a series of comments that briefly describe the function of the script along with an admonishment bot to change this particular file. The comments also tell you where your own modifications should go. We will look at that a little further on.

Notice how the indents make the structure of this script fragment easier to read than if everything were jammed up against the left margin.

Did you catch that as we went by? This configuration file is an executable program. It is a bash script which contains program logic that can determine which execution path to take depending upon outside conditions. This script is not complete in itself; it is actually a fragment that can be sourced – imported – into other scripts as necessary.

Sourcing is a bash shell method for including the content of other bash scripts or fragments into a script. This allows the contents of the fragment being sourced to be used by multiple scripts. You can think of it like function libraries used by compiled programs. The sourced file is loaded into the calling script at the location of the source command. It is then immediately executed.

Sourcing can be accomplished by using the source command. The period (.) is an alias for the source command.

The /etc/profile file is also a script fragment. We could spend some time here to locate the manner in which /etc/profile is launched but that would take us in the wrong direction for what we are trying to accomplish here. Suffice it to say that when called from a bash itself is invoked as a login shell, it reads /etc/profile first (if it exists) and then ~/.bash_profile, ~/.bash_login, and ~/.profile, in that order (if they exist3).

User configuration files

Look at the so-called hidden files in your own home directory – those whose names begin with a period (.). These are user-specific configuration files that you can change to meet your own needs and preferences. The .bashrc file which is the configuration file in which individual users can set their own bash configuration such as aliases, functions, and environment variables that are unique to them.

The .bashrc file is short so we can view it with cat. Let’s be sure you are in your home directory and then display the file. Mine is seen in Figure 2.

$ cat .bashrc
# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

# User specific environment
if ! [[ "$PATH" =~ "$HOME/.local/bin:$HOME/bin:" ]]
then
    PATH="$HOME/.local/bin:$HOME/bin:$PATH"
fi
export PATH

# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=

# User specific aliases and functions
if [ -d ~/.bashrc.d ]; then
        for rc in ~/.bashrc.d/*; do
                if [ -f "$rc" ]; then
                        . "$rc"
                fi
        done
fi

unset rc

Figure 2: The ~/.bashrc for my user account.

This file is well commented also and even tells us where to add our own configuration. So let’s add something innocuous that will allow us to test this local configuration. Use your favorite editor to add the following line to the end of the file.

MyVariable="This is a local variable."

View the variable.

$ echo $MyVariable

$

The variable has not been added to the environment. It will now be part of the environment for bash terminal sessions opened from now on. It can be added to existing bash terminal sessions by sourcing the .bashrc file like this.

$ source .bashrc
$ echo $MyVariable
This is a local variable.
$

These are trivial examples but they should give you some idea of how flexible having open format configuration files can be. It is easy to follow the logic of the files and easy to modify them when needed. Although each distribution varies in how it adds comments to these files, all of the ones I have used have enough information in the comments to enable me to figure out the appropriate location for me to alter the configuration. They also contain enough information to allow me to follow the logic. That doesn’t mean I don’t have to work a bit to understand it all, but I can do it if I need to or am just curious.

Be aware that the local user bash configuration overrides the global configuration. So if a user has the knowledge and wants to alter a global configuration parameter for themselves, they can do that by setting it in the ~/.bashrc file.

Final thoughts

Open data in Linux enables us as SysAdmins to explore everything in order to satisfy our curiosity about how Linux works. The use of ASCII text files for scripting and configuration files allows us access to the inner workings of the environment in which we work every day.

We were able to use that openness to trace our way through a couple Bash configuration programs and files. We even made an innocuous change to the ~/.bashrc file to see how easy that is. Reboots are seldom required to make changes take effect – they were in effect immediately for new bash terminal sessions.

And, if we want or need to, we can download the source code used to compile the executable code for the kernel and all of the open source programs and utilities available with our Linux distribution. I have done that on a couple occasions because I wanted to know more. You can, too, if your curiosity takes you there.

All of this is only possible in an open operating system — Linux.


  1. Wikipedia, Windows Registry, https://en.wikipedia.org/wiki/Windows_Registry ↩︎
  2. Wikipedia, Open Data, https://en.wikipedia.org/wiki/Open_data ↩︎
  3. See the bash man page for this and much more detail. ↩︎

Leave a Reply