Results of the Great Both.org Challenge

0

Since no one entered out little challenge, we have no new results to share. I was hoping to see if the results from this time would be similar to those of the first time this challenge was offered at Opensource.com.

Let’s take a look at those previous results as they are quite interesting.

The solutions

We received entries from readers residing in many countries around the world. Some people submitted multiple solutions but the contest rules stated that only the entrant’s first solution would be considered. So some good entries had to be disqualified because they were a second or third entry by the same person.

I have my own very simple solution shown in Figure 1. It would not have been a winner, however, even if I had been eligible. In fact, many of the contest entries provided much better solutions than my own.

grep -i banned admin.index | grep SSH | awk '{print $4}' | sort -n | uniq -c | sort -n

Figure 1: My own solution to the problem.

My own solution provides a list sorted in ascending order of IP Addresses with the most entries with the source data taken from the admin.index file. That last sort in my solution was not a requirement to win the contest but it is something I like to do to see from where the most attacks are emanating.

My solution produced 5377 lines of output, so there are about that number of unique IP addresses. However, my solution does not take into account some anomalous entries that have no IP addresses in them. As I was thinking about the objectives for the command line program in this challenge, I decided not to specify the number of lines that should be produced as I felt that might be too restrictive and would place an unnecessary constraint on the entries. I think that was a good idea because many of the entries we received produce somewhat different numbers. So a winning solution need not produce the same number of lines of data as my solution.

First entry with solution

Michael DiDomenico of Hamilton, NJ, U.S.A, submitted the very first entry of the contest and it was also a working one. I particularly like Michael’s use of the sort command to ensure that the output is sorted in order by IP Address.

Michael’s entry, shown in Figure 2, produces 5295 lines of output which is not very different from my own result. This is also the number of lines of output that many of the other entries produced.

grep "SSH: banned" admin.index | sed 's/","/ /g'| cut -f4 -d" " | grep "^[0-9]" | sort -k1,1n -k2,2 -k3,3n -k4,4n -t. | uniq -c 

Figure 2: Michael DiDomenico submitted the first entry with a correct solution.

Shortest solutions

The shortest solution that was eligible to win a prize was submitted by Víctor Ochoa Rodríguez of Madrid, España. His 65 character solution in Figure 3 is very elegant and uses egrep to select only the lines that contain SSH along with an IP address while only printing that portion of each line that matches the expression. I learned about the -o option from this entry, so thanks to Víctor for that bit of new knowledge.

egrep -o '".F.*H.*\.[0-9]+' admin.index|cut -d\ -f4|sort|uniq -c

Figure 3: Víctor Ochoa Rodríguez submitted this solution which is the shortest one that was eligible for a prize.

Figure 4 shows another entry that was actually shorter than Víctor’s. Teresa e Junior submitted an entry that is 58 characters in length. She was not eligible to win a prize in the contest, but her solution deserved to be recognized at least informally in this category.

grep SSH admin.index|grep -Po '(\d+\.){3}\d+'|sort|uniq -c 

Figure 4: This submission by Teresa e Junior was the shortest of all.

Both of these solutions also produce 5295 lines of output.

Most creative solution

The first two categories can be judged on purely objective criteria so I wanted to have this category to provide an additional opportunity to recognize folks who came up with more creative answers. The results in this category were based on my purely subjective opinion, and in my opinion there was a tie in this category.

Przemo Firszt of Co. Cork, Ireland, submitted the entry in Figure 5 which is very interesting and creative for its use of the tee and xargs commands. It is also unique because, in addition to using pipes, it also stores intermediate data in a file using the tee command which also passes the data on to STDOUT, and the final output is redirected to another file rather than being allowed to go to STDOUT. It even cleans up at the end by deleting the temporary file.

grep SSH admin.index | awk '{print $4}' | grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sed 's/\".*//' | tee ips | xargs -I % sh -c "echo -ne '%\t' ; grep -o % ips | wc -w" | sort | uniq > results ; rm ips

Figure 5: Przemo Firszt submitted this creative entry that uses tee and xargs.

This solution produces 7403 lines of output. That appears to be because there are multiple lines for many of the IP addresses. So although this is not a perfect solution, it would take very little modification to produce only a single line of output for each IP Address.

Tim Chase of Frisco, TX, U.S.A., was the other winner in this category. Tim’s entry, seen in Figure 6, is unique in its use of the curl command to download the file from the server, and then it uses the awk command to both select the desired lines in the file and select only the IP Address from each line. Tim’s solution is the only one that included code to perform the file download. It results in 5295 lines of output.

curl -s http://www.millennium-technology.com/downloads/admin.index|awk -F, '$1~/SSH: banned/{print $1}'|grep -o '[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+'|sort|uniq -c 

Figure 6: Tim Chase’s solution is creative in its use of curl to download the file.

Extra credit solution

A number of entries were aimed at the extra credit solution requirement to provide the country names for each IP Address. I found two of the entries that especially piqued my interest. Both of these entries use the GeoIP package to provide a local database for obtaining the country information. A couple other entries used the whois command but, among other issues, whois uses a remote database and, when accessed too rapidly from a single IP address, is subject to blocking. The GeoIP package is available in the standard Fedora repository and the EPEL repository for CentOS.

Gustavo Yzaguirre, from Argentina, submitted the entry in Figure 7 which I like because it gives first a bare-bones listing of IP addresses with a count and then lists the countries. It produces 16,419 lines of output, many of which are duplicates. Gustavo says it is not optimized, but that was not one of the requirements.

awk '/SSH: banned/ && $4 ~ /^[0-9]/ {print $4}' admin.index | sed 's/[^0-9.]*//g' | sort | uniq -c | awk '{printf $1 " " $2 " "; system("geoiplookup "$2)};' | sort -gr | sed 's/ GeoIP Country Edition: / /g'

Figure 7: Gustavo Yzaguirre submitted this entry that lists the country name for each IP address.

Dejan Bogdanovic, of Belgrade, Serbia, also submitted a very interesting entry for the extra credit solution. His entry in Figure 8 lists the IP addresses in descending order of frequency along with the country information. Dejan’s entry produces 5764 lines of output.

cat admin.index | egrep -o '([0-9]*\.){3}[0-9]*' | sort -n | uniq -c | sort -nr | awk '{ORS=" "} {print $1} {print $2} {system("geoiplookup " $2 "| cut -d: -f 2 | xargs")}' 

Figure 8: This extra credit entry was submitted by Dejan Bogdanovic.


Thoughts on the solutions

I was amazed at the many different solutions to this problem that readers were able to come up with. In part, I think that this is because many of the entrants interpreted the desired results with a bit of freedom, in many cases adding more information than was asked for in the original specifications.

There was also a good bit of creativity in all of the solutions. No two solutions were alike which underscores the fact that everyone approaches problem solving differently. And even when some solutions appeared to start out from the same perspective, each had its own personality and bit of flair that can only be the product of the unique perspectives brought to the table by SysAdmins who are diverse, smart, knowledgeable, and very creative.

Let’s take this contest as a metaphor for the real world. The contest rules are the specifications for this project. Each SysAdmin, even the ones that were not winners in the contest, took those specifications and crafted solutions that met the requirements and which were also insanely creative. Each solution illustrates the use of filter programs and the use of STDIO to transform a data stream in a manner that ultimately provides meaningful information to the SysAdmin.

This contest also beautifully illustrates that, “There is no should.” There is no one way in which you “should” do anything. It is the results that count. You know, this sounded so good that I made it one of the tenets of the Linux Philosophy for SysAdmins.

Leave a Reply