Atom Feed
Comments Atom Feed

Similar Articles

2012-04-09 08:45
Filesystems for USB drive Backup
2011-05-01 15:40
Filesystem checker (mini_fscheck) for OpenWrt

Recent Articles

2019-07-28 16:35
git http with Nginx via Flask wsgi application (git4nginx)
2018-05-15 16:48
Raspberry Pi Camera, IR Lights and more
2017-04-23 14:21
Raspberry Pi SD Card Test
2017-04-07 10:54
DNS Firewall (blackhole malicious, like Pi-hole) with bind9
2017-03-28 13:07
Kubernetes to learn Part 4

Glen Pitt-Pladdy :: Blog

Home Lab Project: GlusterFS

This is a write-up of experiments with GlusterFS I've been doing on my Home Lab (see related articles).

GlusterFS is one of the popular distributed network filesystem that is widely used for "Cloudy" web applications as it allows always-on scale-out for storage, both in terms of storage capacity and performance. Additionally it offers fault tolerance in that data can be replicated among multiple nodes (RAID like) which is useful for modern Cloud based applications where providers may be rebooting hosts for maintenance or have failures of parts of their infrastructure.

A typical example may be a news site where there is considerable media content and several web front end servers. In this case storage of front end servers could be pooled together to have sufficient capacity for the whole media store, while also having resilience against single server failures.

For this experiment I'm using CentOS 7 and always going to have redundancy against one node failing.... for anything significant enough to need this technology (ie. multiple nodes) it would be unusual not to require that.

Detailed information can be found in the Gluster Docs.


CentOS does not ship with the server component so in order to get this working you will need to use the main GlusterFS repo:

# curl >/etc/yum.repos.d/glusterfs-epel.repo

You may want to do your own validation on security here (GPG Keys), but for experimenting I'm taking the repo at it's word. Remember this is a non-SSL site and hence there is risk of spoofing, interception/modification and other MiM type attacks.

Not much here - the server depends on everything we will need:

# yum install glusterfs-server
# systemctl enable glusterd
# systemctl start glusterd

Next you will need some storage. In my case I've got quite a tight root but deliberately left some extra space on the VG in the kick so:

# lvcreate --name gluster --size 512M vg00
# mkfs.xfs -L gluster /dev/mapper/vg00-gluster

Then put an entry in /etc/fstab

/dev/mapper/vg00-gluster   /gluster-data                       xfs     defaults        0 0

And mount it:

# mkdir /gluster-data && mount -a

For this exercise, that provides us with our "Brick", but multiple "Bricks" may be used in a practical implementation.

Next we need to get the servers talking to each other (however many there are):

# gluster peer probe <peer address>

The pool can be viewed with:

# gluster pool list

Additional members can be added to any host already in the pool.

Another useful command for checking how everything is working is:

# gluster peer status

One vital thing to check is that your configuration allows communication between nodes - you may need to open firewall ports etc.

Basic 2-Node File Replica (RAID1-like)

Very simple, in the Brick and:

# gluster volume create gv0 replica 2 <node1 address>:/gluster-data/gv0/ <node2 address>:/gluster-data/gv0/
# gluster volume  start gv0

The gv0/ directories will be created automatically if they don't exist.

You can then mount the volume with:

# mount -t glusterfs <node address>:/gv0 /somewhere/to/mount

An interesting experiment here will be to mount a different node the the current one, then shut that node down. After a delay while the mount blocks, continues working from the remaining node.

Bring the node back and quickly any new files created in the filesystem are replicated to the recovering node.

Add another replica

This will add a 3rd node / Brick in to the replica set:

# gluster volume add-brick gv0 replica 3
# gluster volume heal gv0 full

All the existing files will remain with two replicas until the heal command is run.

The same can be done by "stat" on files in the filesystem without running a full heal. This has the advantage that you can control which files are replicated and impact of the heal operation. Any means can be used to do this including plain old "ls" which effectively does a stat on an individual file.

Removal of the brick is pretty much the same except using "remove-brick" and adding a "force" to the end. The brick can't then be added back as it's got contents including the .glusterfs/ directory. The lot will need deleting before it can be added back.

Basic 4-Node File Distributed Replica (RAID10-like)

This can be done with the above 2-node scenario just by adding another 2 Bricks, and then rebalancing if needed:

# gluster volume add-brick gv0 <node3 address>:/gluster-data/gv0 <node4 address>:/gluster-data/gv0
# gluster volume rebalance gv0 start

Or, alternately create a new volume from the start:

# gluster volume create gv1 replica 2 <node1 address>:/gluster-data/gv1/ <node2 address>:/gluster-data/gv1/ <node3 address>:/gluster-data/gv1/ <node4 address>:/gluster-data/gv1/
# gluster volume  start gv1

What about 3-nodes, 2 replica and similar?

Combinations like this can be done, but you still need 6 Bricks in total. Each pair becomes a 2-way replica with 3-way distribution. That means you will need to stagger your replicas to avoid both replicas being on the same node:

# gluster volume create gv2 replica 2 <node1 address>:/gluster-data/gv2a/ <node2 address>:/gluster-data/gv2a/ <node3 address>:/gluster-data/gv2a/ <node1 address>:/gluster-data/gv2b/ <node2 address>:/gluster-data/gv2b/ <node3 address>:/gluster-data/gv2b/
# gluster volume  start gv2

Practical applications

In the real world there are a lot of applications where there is a shred data store (eg. product images/media on an e-commerce solution) and we need to be able to scale this with the size of the solution, and over time usage will grow. For auto-scaling (eg. scaling up and down for a flash sale) this is likely to be more difficult since you would want to be able to be able to replace or add nodes on a more ad-hoc basis and this will either require very careful management or would require a filesystem that operates on a more fluid basis (eg. any combination of bricks on different nodes can be used rather than fixed pairs).

In many cases this is still a good solution. If you can have a separate storage tier (or perhaps this could be on a fixed "baseline" set of web front ends) then these can be replaced and healed as they fail.

For larger files (eg. video), the striping options are likely to be more useful where small files there could be more overhead fetching all the fragments than picking up a single file.

Always Remember...

Distributed storage (or RAID for that matter) is neither a substitute for DR (requires completely separate "shared-nothing" infrastructure) nor for Backups.

While GlusterFS certainly has capability for being used across geographically dispersed clusters in multiple datacenters as part of a DR design, true DR goes much further and should be driven by the organisation's BCP (Business Continuity Plan).