Hadoop Etudes : #1 : TableT

DataCenter for $2K



To run Hadoop we need a DataCenter, so let's build it

  • A table from IKEA for $9, bunch of laptop hard drives, bunch of MiniITX mother boards (fanless), 2 UPS-es, some cables, some drilling
  • 8 units
  • No noise. No heat
  • OS is Ubuntu
  • It's not a toy

November 03. 2012 - December 03. 2012

Background

  • The Mini-Cluster
    Early supercomputers used parallel processing and distributed computing and to link processors together in a single machine. Using freely available tools, it is possible to do the same today using inexpensive PCs - a cluster. Glen Gardner liked the idea, so he built himself a massively parallel Mini-ITX cluster using 12 x 800Mhz nodes.
  • Homemade "Helmer" Linux Cluster
    I then found a number of people on the Web (in particular, Janne and Tim Molter) who have each built a Linux cluster into an Ikea "Helmer" file cabinet at a fairly low cost (around 2,500 USD).
  • Data Center In A Box
    Few weeks ago, I was thinking to make an HPCC in a box for myself, based on the idea given by Douglas Eadline, on his website [[1]]. Later, while dealing with some problems of one of my clients, I decided to design an inexpensive chassis, which could hold in-expensive "servers" . My client's setup required multiple Xen/KVM hosts running x number of virtual machines. The chassis is to hold everything, from the cluster mother-boards, to their power supplies, network switches, power cables, KVM, etc.
  • Mini-ITX cluster
    Here is a question: why? - Because we can!





Hadoop Etudes : #1 : TableT (2019)

Year 2019 update



So, it's been running for two years now

  • The (only) problem was the cabling
  • Cables' wear and tear caused some of the nodes to go down once in a while
  • It presented good tests for some of the things I've been doing (like recovery for distributed filesystem)
  • Also it presented a nice challenge - how to catch the H/W downtime of partially misbehaving node in situation when software HA mechanisms keep the cluster running
  • Eventually it became too frequent and annoying
  • So the decision was to replace the table with a metal rack (from IKEA + Home Depot. Total is < $100) and also cables are now fixed on a (sliding) blades
  • Downside: a bit more complex to make (more parts and metal)
  • Upside: Supposedly 0 downtime

That was update from Nov 19, 2014

So, it's been running for 6+ years now

  • All good. My ISP went out of business, US president is toying with idea of indefinite government shutdown, I see people doing funny things in the streets that used to be civilized. The machine keeps humming, lights are blinking in the night. Once in a while harddrive (that exports the NFS, so everybody is trashing that drive) will die. We replace it and show goes on. Only two died.

  • That was update from Jan 12, 2019

    So, it's been running for 7+ years now

  • All good. Replaced all mechanical hard drives with SSD. The thing is now completely silent. Runs faster too. Problem is - some local police or something is now sniffing around, checking electricity usage. And the bank(s) stole my money. And there are riots all over USA. Other than that - everything works.

  • August 11. 2020

    Background






    Hadoop Etudes : #2 : Munin

    Basic Cluster Monitoring



    We have 8 nodes to run. Monitoring is mandatory

    • Install munin-node on every node
    • Install munin-server on first node
    • Don't run Hadoop on first node
    • Hadoop is network and CPU hungry, Munin is CPU hungry
    • When both Hadoop and munin-server run on the same node, they do bump into each other every 5 minutes on long running Hadoop jobs

    December 04. 2012

    Background

    • Setting up Munin on Ubuntu
      Munin is a system monitoring tool. It produces graphs for Apache, MySQL, Nginx, CPU, Memory, Disk, etc.

      Here are my notes from setting it up, they are brief, but should help you get going.

      All the monitored machines run a small daemon called munin-node. One machine is the central server. Every few minutes it gathers data from all the nodes (including itself), generates the graphs, and writes out some HTML files.





    Here is a little Graphite Dashboard -
    based on Rsyslog

    Hadoop Etudes : #3 : Rsyslog & Graphite

    Centralized Logging



    Monitoring is good, but logging provides details

    • On every node there is a system log that goes into /var/log/syslog
    • On every Hadoop node there is a bunch of hadoop logs that go into /opt/hadoop/logs
    • Let's route all the logs from all the nodes to single log file on first node - /var/log/syslog
      • Add a line to /etc/rsyslog.conf on every node to send system logs to first node
      • Use, say, log4j appender to send Hadoop logs into syslog
    • End result - just one log file to look at - on first node
    • Use tail -f /var/log/syslog for realtime monitoring
    • Use grep to check for errors and stuff
    pault@n1:~$ grep INFO /var/log/syslog | tail -5
    Dec  4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:33,034 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting asynchronous block report scan
    Dec  4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:33,043 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 8ms
    Dec  4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:36,036 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 0 ms
    Dec  4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:36,038 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 1 blocks took 1 msec to generate and 2 msecs for RPC and NN processing
    Dec  4 17:21:14 n1 HDP-namenode-n2: 2012-12-04 17:19:36,037 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* NameSystem.processReport: from 192.168.1.102:50010, blocks: 1, processing time: 0 msecs
        

    December 05. 2012

    Background

    • TCP syslog / RFC5424 log4j appender
      As part of a training project, we created a small log4j appender supporting TCP syslog and RFC5424. Most importantly, it is capable of formatting stack traces as a single syslog messages (NOT the usual bunch of multiple malformed messages). The work is based on the syslog4j implementation, which did not work for us (our fault? ;)) and so we extended this framework.
    • Graphite - Scalable Realtime Graphing
      Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces.





    Hadoop Etudes : #4 : Hadoop, Ubuntu, JDK

    Hadoop, Ubuntu, JDK



    Modern Ubuntu / JDK situation is not particularly friendly to Hadoop

    • Hadoop wants Sun's JDK (but can live with OpenJDK)
    • Default Ubuntu JDK is OpenJDK
    • Sun's JDK is no longer available via apt-get (used to be)
    • If you only plan to run Hadoop - take OpenJDK - via apt-get
    • JAVA_HOME for OpenJDK is /usr
    • Still - should read on how to switch between OpenJDK and Sun JDK via 'alternatives' - might need it

    December 06. 2012

    Background

    • Hadoop Java Versions
      Hadoop does build and run on OpenJDK (OpenJDK is based on the Sun JDK).
      OpenJDK is handy to have on a development system as it has more source for you to step into when debugging something. OpenJDK and Sun JDK mainly differ in (native?) rendering/AWT/Swing code, which is not relevant for any MapReduce Jobs that aren't creating images as part of their work.
    • Google search - returns several pointers, some of info is true, some is obsolete.







    Hadoop Etudes : #5 : Mountable HDFS

    Mountable HDFS



    Let's mount HDFS as a UNIX filesystem

    • There are 6 options to mount HDFS as UNIX filesystem
    • There are things that plain HDFS does much better comparing to mounted HDFS
    • hadoop fs -put large-file dst could be 10-100 faster than cp large-file /mnt/dir
    • When dealing with large number of small files in one dir, HDFS itself might choke
    • When 32bit nodes are mixed with 64bit nodes
      • 64bit nodes will show the correct numbers for 'Size', 'Used', 'Available'
      • 32bit nodes might show messed up numbers (but will work OK)
    • Backup is tricky. Need to plan it depending on task at hand

    December 08. 2012

    Background

    • Mounting HDFS
      These projects (enumerated below) allow HDFS to be mounted (on most flavors of Unix) as a standard file system using the mount command. Once mounted, the user can operate on an instance of hdfs using standard Unix utilities such as 'ls', 'cd', 'cp', 'mkdir', 'find', 'grep', or use standard Posix libraries like open, write, read, close from C, C++, Python, Ruby, Perl, Java, bash, etc.

      All, except HDFS NFS Proxy, are based on the Filesystem in Userspace project FUSE (http://fuse.sourceforge.net/). Although the Webdav-based one can be used with other webdav tools, but requires FUSE to actually mount.







    Hadoop Etudes : #6 : HadooX

    Hadoop Cluster Foundation



    HadooX is a minimalistic set of (Perl) scripts to run Hadoop Cluster

    • All the work is done on AUX node from command line
    • hx help - lists all available commands (see screenshot)
    • hx start - brings up the cluster, mounts the common filesystems
    • hx stop - brings the cluster down, unmounts the common filesystems
    • hx status - validates every node and common filesystems
    • The Cluster Architecture is:
      • 8 nodes
        • n1 - Auxiliary Node (AUX)
        • n2 - Master Node (MAS)
        • n2-n8 Data Nodes (NOD)
      • Two common Filesystems, accessible R/W from every node
      • Auxiliary node (AUX) contains:
        • HadooX
        • Monitoring
        • Logging
        • Anyhing else that is not part of Hadoop Stack (NFS server, MySQL etc)

    December 15. 2012

    Background

    HadooX finalizes minimalistic operational foundation