Hadoop Etudes : #1 : TableT

DataCenter for $2K

To run Hadoop we need a DataCenter, so let's build it

A table from IKEA for $9, bunch of laptop hard drives, bunch of MiniITX mother boards (fanless), 2 UPS-es, some cables, some drilling
8 units
No noise. No heat
OS is Ubuntu
It's not a toy

November 03. 2012 - December 03. 2012

Background

The Mini-Cluster
Early supercomputers used parallel processing and distributed computing and to link processors together in a single machine. Using freely available tools, it is possible to do the same today using inexpensive PCs - a cluster. Glen Gardner liked the idea, so he built himself a massively parallel Mini-ITX cluster using 12 x 800Mhz nodes.
Homemade "Helmer" Linux Cluster
I then found a number of people on the Web (in particular, Janne and Tim Molter) who have each built a Linux cluster into an Ikea "Helmer" file cabinet at a fairly low cost (around 2,500 USD).
Data Center In A Box
Few weeks ago, I was thinking to make an HPCC in a box for myself, based on the idea given by Douglas Eadline, on his website [[1]]. Later, while dealing with some problems of one of my clients, I decided to design an inexpensive chassis, which could hold in-expensive "servers" . My client's setup required multiple Xen/KVM hosts running x number of virtual machines. The chassis is to hold everything, from the cluster mother-boards, to their power supplies, network switches, power cables, KVM, etc.
Mini-ITX cluster
Here is a question: why? - Because we can!

Hadoop Etudes : #1 : TableT (2019)

Year 2019 update

So, it's been running for two years now

The (only) problem was the cabling
Cables' wear and tear caused some of the nodes to go down once in a while
It presented good tests for some of the things I've been doing (like recovery for distributed filesystem)
Also it presented a nice challenge - how to catch the H/W downtime of partially misbehaving node in situation when software HA mechanisms keep the cluster running
Eventually it became too frequent and annoying
So the decision was to replace the table with a metal rack (from IKEA + Home Depot. Total is < $100) and also cables are now fixed on a (sliding) blades
Downside: a bit more complex to make (more parts and metal)
Upside: Supposedly 0 downtime

That was update from Nov 19, 2014

So, it's been running for 6+ years now

All good. My ISP went out of business, US president is toying with idea of indefinite government shutdown, I see people doing funny things in the streets that used to be civilized. The machine keeps humming, lights are blinking in the night. Once in a while harddrive (that exports the NFS, so everybody is trashing that drive) will die. We replace it and show goes on. Only two died.

That was update from Jan 12, 2019

So, it's been running for 7+ years now

All good. Replaced all mechanical hard drives with SSD. The thing is now completely silent. Runs faster too. Problem is - some local police or something is now sniffing around, checking electricity usage. And the bank(s) stole my money. And there are riots all over USA. Other than that - everything works.

That was update from Aug 11, 2020

Physically destroyed. Sep 01. 2022.

By the act of terror. Could be running longer, but US is covered in shit. What you gonna do.

Sep 01. 2022

Background

Hadoop Etudes : #2 : Munin

Basic Cluster Monitoring

We have 8 nodes to run. Monitoring is mandatory

Install munin-node on every node
Install munin-server on first node
Don't run Hadoop on first node
Hadoop is network and CPU hungry, Munin is CPU hungry
When both Hadoop and munin-server run on the same node, they do bump into each other every 5 minutes on long running Hadoop jobs

December 04. 2012

Background

Setting up Munin on Ubuntu
Munin is a system monitoring tool. It produces graphs for Apache, MySQL, Nginx, CPU, Memory, Disk, etc.

Here are my notes from setting it up, they are brief, but should help you get going.

All the monitored machines run a small daemon called munin-node. One machine is the central server. Every few minutes it gathers data from all the nodes (including itself), generates the graphs, and writes out some HTML files.

Here is a little Graphite Dashboard -
based on Rsyslog

Hadoop Etudes : #3 : Rsyslog & Graphite

Centralized Logging

Monitoring is good, but logging provides details

On every node there is a system log that goes into /var/log/syslog
On every Hadoop node there is a bunch of hadoop logs that go into /opt/hadoop/logs
Let's route all the logs from all the nodes to single log file on first node - /var/log/syslog

Add a line to /etc/rsyslog.conf on every node to send system logs to first node
Use, say, log4j appender to send Hadoop logs into syslog

End result - just one log file to look at - on first node
Use tail -f /var/log/syslog for realtime monitoring
Use grep to check for errors and stuff

pault@n1:~$ grep INFO /var/log/syslog | tail -5
Dec  4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:33,034 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting asynchronous block report scan
Dec  4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:33,043 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 8ms
Dec  4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:36,036 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 0 ms
Dec  4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:36,038 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 1 blocks took 1 msec to generate and 2 msecs for RPC and NN processing
Dec  4 17:21:14 n1 HDP-namenode-n2: 2012-12-04 17:19:36,037 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* NameSystem.processReport: from 192.168.1.102:50010, blocks: 1, processing time: 0 msecs

December 05. 2012

Background

TCP syslog / RFC5424 log4j appender
As part of a training project, we created a small log4j appender supporting TCP syslog and RFC5424. Most importantly, it is capable of formatting stack traces as a single syslog messages (NOT the usual bunch of multiple malformed messages). The work is based on the syslog4j implementation, which did not work for us (our fault? ;)) and so we extended this framework.
Graphite - Scalable Realtime Graphing
Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces.

Hadoop Etudes : #4 : Hadoop, Ubuntu, JDK

Hadoop, Ubuntu, JDK

Modern Ubuntu / JDK situation is not particularly friendly to Hadoop

Hadoop wants Sun's JDK (but can live with OpenJDK)
Default Ubuntu JDK is OpenJDK
Sun's JDK is no longer available via apt-get (used to be)
If you only plan to run Hadoop - take OpenJDK - via apt-get
JAVA_HOME for OpenJDK is /usr
Still - should read on how to switch between OpenJDK and Sun JDK via 'alternatives' - might need it

December 06. 2012

Background

Hadoop Java Versions
Hadoop does build and run on OpenJDK (OpenJDK is based on the Sun JDK).
OpenJDK is handy to have on a development system as it has more source for you to step into when debugging something. OpenJDK and Sun JDK mainly differ in (native?) rendering/AWT/Swing code, which is not relevant for any MapReduce Jobs that aren't creating images as part of their work.
Google search - returns several pointers, some of info is true, some is obsolete.

Hadoop Etudes : #5 : Mountable HDFS

Mountable HDFS

Let's mount HDFS as a UNIX filesystem

There are 6 options to mount HDFS as UNIX filesystem
There are things that plain HDFS does much better comparing to mounted HDFS
hadoop fs -put large-file dst could be 10-100 faster than cp large-file /mnt/dir
When dealing with large number of small files in one dir, HDFS itself might choke
When 32bit nodes are mixed with 64bit nodes

64bit nodes will show the correct numbers for 'Size', 'Used', 'Available'
32bit nodes might show messed up numbers (but will work OK)

Backup is tricky. Need to plan it depending on task at hand

December 08. 2012

Background

Mounting HDFS
These projects (enumerated below) allow HDFS to be mounted (on most flavors of Unix) as a standard file system using the mount command. Once mounted, the user can operate on an instance of hdfs using standard Unix utilities such as 'ls', 'cd', 'cp', 'mkdir', 'find', 'grep', or use standard Posix libraries like open, write, read, close from C, C++, Python, Ruby, Perl, Java, bash, etc.

All, except HDFS NFS Proxy, are based on the Filesystem in Userspace project FUSE (http://fuse.sourceforge.net/). Although the Webdav-based one can be used with other webdav tools, but requires FUSE to actually mount.

Hadoop Etudes : #6 : HadooX

Hadoop Cluster Foundation

HadooX is a minimalistic set of (Perl) scripts to run Hadoop Cluster

All the work is done on AUX node from command line
hx help - lists all available commands (see screenshot)
hx start - brings up the cluster, mounts the common filesystems
hx stop - brings the cluster down, unmounts the common filesystems
hx status - validates every node and common filesystems
The Cluster Architecture is:

8 nodes

n1 - Auxiliary Node (AUX)
n2 - Master Node (MAS)
n2-n8 Data Nodes (NOD)

Two common Filesystems, accessible R/W from every node

/CODE - is NFS, to push code from AUX
/H - is Mountable HDFS

Auxiliary node (AUX) contains:

HadooX
Monitoring
Logging
Anyhing else that is not part of Hadoop Stack (NFS server, MySQL etc)

December 15. 2012

Background

HadooX finalizes minimalistic operational foundation

DataCenter : TableT
Monitoring : Munin
Logging : Rsysylog & Graphite
HDFS : Mountable HDFS