Hadoop Etudes : #1 : TableT
November 03. 2012 - December 03. 2012
if ($background) { ?>Early supercomputers used parallel processing and distributed computing and to link processors together in a single machine. Using freely available tools, it is possible to do the same today using inexpensive PCs - a cluster. Glen Gardner liked the idea, so he built himself a massively parallel Mini-ITX cluster using 12 x 800Mhz nodes.
I then found a number of people on the Web (in particular, Janne and Tim Molter) who have each built a Linux cluster into an Ikea "Helmer" file cabinet at a fairly low cost (around 2,500 USD).
Few weeks ago, I was thinking to make an HPCC in a box for myself, based on the idea given by Douglas Eadline, on his website [[1]]. Later, while dealing with some problems of one of my clients, I decided to design an inexpensive chassis, which could hold in-expensive "servers" . My client's setup required multiple Xen/KVM hosts running x number of virtual machines. The chassis is to hold everything, from the cluster mother-boards, to their power supplies, network switches, power cables, KVM, etc.
Here is a question: why? - Because we can!
Hadoop Etudes : #1 : TableT (2019)
That was update from Nov 19, 2014
That was update from Jan 12, 2019
That was update from Aug 11, 2020
By the act of terror. Could be running longer, but US is covered in shit. What you gonna do.
Sep 01. 2022
if ($background) { ?>Hadoop Etudes : #2 : Munin
December 04. 2012
if ($background) { ?>Munin is a system monitoring tool. It produces graphs for Apache, MySQL, Nginx, CPU, Memory, Disk, etc.
Here are my notes from setting it up, they are brief, but should help you get going.
All the monitored machines run a small daemon called munin-node. One machine is the central server. Every few minutes it gathers data from all the nodes (including itself), generates the graphs, and writes out some HTML files.
Here is a little Graphite Dashboard -
based on Rsyslog
Hadoop Etudes : #3 : Rsyslog & Graphite
pault@n1:~$ grep INFO /var/log/syslog | tail -5 Dec 4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:33,034 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting asynchronous block report scan Dec 4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:33,043 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 8ms Dec 4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:36,036 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 0 ms Dec 4 17:21:13 n1 HDP-datanode-n2: 2012-12-04 17:19:36,038 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 1 blocks took 1 msec to generate and 2 msecs for RPC and NN processing Dec 4 17:21:14 n1 HDP-namenode-n2: 2012-12-04 17:19:36,037 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* NameSystem.processReport: from 192.168.1.102:50010, blocks: 1, processing time: 0 msecs
December 05. 2012
if ($background) { ?>As part of a training project, we created a small log4j appender supporting TCP syslog and RFC5424. Most importantly, it is capable of formatting stack traces as a single syslog messages (NOT the usual bunch of multiple malformed messages). The work is based on the syslog4j implementation, which did not work for us (our fault? ;)) and so we extended this framework.
Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces.
Hadoop Etudes : #4 : Hadoop, Ubuntu, JDK
December 06. 2012
if ($background) { ?>Hadoop does build and run on OpenJDK (OpenJDK is based on the Sun JDK).
OpenJDK is handy to have on a development system as it has more source for you to step into when debugging something. OpenJDK and Sun JDK mainly differ in (native?) rendering/AWT/Swing code, which is not relevant for any MapReduce Jobs that aren't creating images as part of their work.
Hadoop Etudes : #5 : Mountable HDFS
December 08. 2012
if ($background) { ?>These projects (enumerated below) allow HDFS to be mounted (on most flavors of Unix) as a standard file system using the mount command. Once mounted, the user can operate on an instance of hdfs using standard Unix utilities such as 'ls', 'cd', 'cp', 'mkdir', 'find', 'grep', or use standard Posix libraries like open, write, read, close from C, C++, Python, Ruby, Perl, Java, bash, etc.
All, except HDFS NFS Proxy, are based on the Filesystem in Userspace project FUSE (http://fuse.sourceforge.net/). Although the Webdav-based one can be used with other webdav tools, but requires FUSE to actually mount.
Hadoop Etudes : #6 : HadooX
December 15. 2012
if ($background) { ?>HadooX finalizes minimalistic operational foundation