PFS pic

Better HDFS : #1 : PFS

PFS - Pragmatic POSIX compliant distributed filesystem

It's been battle tested (only Ubuntu so far, but should be *NIX agnostic)

  • PFS is similar to HDFS + POSIX layer
  • Logical /PFS directory is physically distributed between various hard drives
  • PFS allows (many) things that HDFS does not allow
  • PFS allows appends
  • PFS is not Java
  • PFS is implemented on a system level, so it can run in a public cloud
  • PFS could (in principle) run even on spot instances, but that would be silly
  • PFS backups/replicas are optional and can be done with simple rsync
  • If HW dies and leaves PFS in bad state (rare event) - it could be recovered with 'pfs fsck'
  • If 'pfs fsck' fails - PFS can be easily recovered manually
  • It is all possible, because PFS files are in fact just plain unix files
  • Currently I use /PFS to store (several months of) the fingerprints from my Cloud Servers, to do forensics if somebody has a success compromising my Cloud Servers
  • There is one controlling node 'aux' and any number of bricks
  • Not 100% of POSIX calls is currently implemented. Permissions are very relaxed, for example
  • It's all fuse, of course. I only had couple weeks between jobs - about a year ago
  • It is not "open source", but your super awesome company is welcome to give me a grant to take it open source, for example

Sample commands

root@n1:/PFS# ls /PFS
net  syslog  top
root@n1:/PFS# ls -l /PFS
total 12
drwxrwxrwx 2 root root 4096 Nov 20 00:00 net
drwxrwxrwx 2 root root 4096 Nov 20 00:00 syslog
drwxrwxrwx 2 root root 4096 Nov 20 00:00 top
root@n1:/PFS# ls -l `find /PFS` | head
-rw-rw-rw- 1 root root  402941 Mar 25  2014 /PFS/net/2014-03-25.node3
-rw-rw-rw- 1 root root  213784 Mar 25  2014 /PFS/net/2014-03-25.node4
-rw-rw-rw- 1 root root   82395 Mar 25  2014 /PFS/net/2014-03-25.node5
-rw-rw-rw- 1 root root 4827975 Mar 26  2014 /PFS/net/2014-03-26.node3
-rw-rw-rw- 1 root root 2559988 Mar 26  2014 /PFS/net/2014-03-26.node4
-rw-rw-rw- 1 root root 2566879 Mar 26  2014 /PFS/net/2014-03-26.node5
-rw-rw-rw- 1 root root 4817161 Mar 27  2014 /PFS/net/2014-03-27.node3
-rw-rw-rw- 1 root root 2586308 Mar 27  2014 /PFS/net/2014-03-27.node4
-rw-rw-rw- 1 root root 2713682 Mar 27  2014 /PFS/net/2014-03-27.node5
-rw-rw-rw- 1 root root 4806869 Mar 28  2014 /PFS/net/2014-03-28.node3
root@n1:/PFS# find /PFS | wc -l
root@n1:/PFS# pfs fstatus
n2 -> 327
n3 -> 327
n4 -> 326
n5 -> 326
n6 -> 326
/PFS files: 1632
  DB total: 1632
root@n1:/PFS# pfs help
fs.mkdir $1    
fs.rmdir $1    
fsck $1    
node $1 $2   

November 19. 2014



foreach *.csv -< F(3) | Z Rvi(60) | Lin(3) >- Avg 
Only part of it can be expressed via current UNIX pipes. Note ( -< map and >- reduce ). The question of Hadoop was correct. The right answer - throw away HDFS! It's *not* part of the pipeline. M/R is.

The way I arrieved here was this. Today I started to write down some of the formulas, that are working in production. So I noticed the similarity with UNIX pipes, but there was no syntax to express some of the moves. First I thought <= for map and => for reduce, but wife said that -< and >- is more accurate. I agreed to that because for example => is already used in mathematics.

November 14. 2020

Figured out how to do version 2. blockchain is not bad for this kind of stuff actually.

March 6. 2021

Figured out how to replicate the same trick to entire Hadoop vertical. Not surprisingly. Scaling out is easy.

April 9. 2021