Glen Pitt-Pladdy :: BlogHome Lab Project: GlusterFS | |||
This is a write-up of experiments with GlusterFS I've been doing on my Home Lab (see related articles). GlusterFS is one of the popular distributed network filesystem that is widely used for "Cloudy" web applications as it allows always-on scale-out for storage, both in terms of storage capacity and performance. Additionally it offers fault tolerance in that data can be replicated among multiple nodes (RAID like) which is useful for modern Cloud based applications where providers may be rebooting hosts for maintenance or have failures of parts of their infrastructure. A typical example may be a news site where there is considerable media content and several web front end servers. In this case storage of front end servers could be pooled together to have sufficient capacity for the whole media store, while also having resilience against single server failures. For this experiment I'm using CentOS 7 and always going to have redundancy against one node failing.... for anything significant enough to need this technology (ie. multiple nodes) it would be unusual not to require that. Detailed information can be found in the Gluster Docs. InstallCentOS does not ship with the server component so in order to get this working you will need to use the main GlusterFS repo: # curl http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/CentOS/glusterfs-epel.repo >/etc/yum.repos.d/glusterfs-epel.repo You may want to do your own validation on security here (GPG Keys), but for experimenting I'm taking the repo at it's word. Remember this is a non-SSL site and hence there is risk of spoofing, interception/modification and other MiM type attacks. Not much here - the server depends on everything we will need: # yum install glusterfs-server Next you will need some storage. In my case I've got quite a tight root but deliberately left some extra space on the VG in the kick so: # lvcreate --name gluster --size 512M vg00 Then put an entry in /etc/fstab /dev/mapper/vg00-gluster /gluster-data xfs defaults 0 0 And mount it: # mkdir /gluster-data && mount -a For this exercise, that provides us with our "Brick", but multiple "Bricks" may be used in a practical implementation. Next we need to get the servers talking to each other (however many there are): # gluster peer probe <peer address> The pool can be viewed with: # gluster pool list Additional members can be added to any host already in the pool. Another useful command for checking how everything is working is: # gluster peer status One vital thing to check is that your configuration allows communication between nodes - you may need to open firewall ports etc. Basic 2-Node File Replica (RAID1-like)Very simple, in the Brick and: # gluster volume create gv0 replica 2 <node1 address>:/gluster-data/gv0/ <node2 address>:/gluster-data/gv0/ The gv0/ directories will be created automatically if they don't exist. You can then mount the volume with: # mount -t glusterfs <node address>:/gv0 /somewhere/to/mount An interesting experiment here will be to mount a different node the the current one, then shut that node down. After a delay while the mount blocks, continues working from the remaining node. Bring the node back and quickly any new files created in the filesystem are replicated to the recovering node. Add another replicaThis will add a 3rd node / Brick in to the replica set: # gluster volume add-brick gv0 replica 3 10.146.37.208:/gluster-data/gv0 All the existing files will remain with two replicas until the heal command is run. The same can be done by "stat" on files in the filesystem without running a full heal. This has the advantage that you can control which files are replicated and impact of the heal operation. Any means can be used to do this including plain old "ls" which effectively does a stat on an individual file. Removal of the brick is pretty much the same except using "remove-brick" and adding a "force" to the end. The brick can't then be added back as it's got contents including the .glusterfs/ directory. The lot will need deleting before it can be added back. Basic 4-Node File Distributed Replica (RAID10-like)This can be done with the above 2-node scenario just by adding another 2 Bricks, and then rebalancing if needed: # gluster volume add-brick gv0 <node3 address>:/gluster-data/gv0 <node4 address>:/gluster-data/gv0 Or, alternately create a new volume from the start: # gluster volume create gv1 replica 2 <node1 address>:/gluster-data/gv1/ <node2 address>:/gluster-data/gv1/ <node3 address>:/gluster-data/gv1/ <node4 address>:/gluster-data/gv1/ What about 3-nodes, 2 replica and similar?Combinations like this can be done, but you still need 6 Bricks in total. Each pair becomes a 2-way replica with 3-way distribution. That means you will need to stagger your replicas to avoid both replicas being on the same node: # gluster volume create gv2 replica 2 <node1 address>:/gluster-data/gv2a/ <node2 address>:/gluster-data/gv2a/ <node3 address>:/gluster-data/gv2a/ <node1 address>:/gluster-data/gv2b/ <node2 address>:/gluster-data/gv2b/ <node3 address>:/gluster-data/gv2b/ Practical applicationsIn the real world there are a lot of applications where there is a shred data store (eg. product images/media on an e-commerce solution) and we need to be able to scale this with the size of the solution, and over time usage will grow. For auto-scaling (eg. scaling up and down for a flash sale) this is likely to be more difficult since you would want to be able to be able to replace or add nodes on a more ad-hoc basis and this will either require very careful management or would require a filesystem that operates on a more fluid basis (eg. any combination of bricks on different nodes can be used rather than fixed pairs). In many cases this is still a good solution. If you can have a separate storage tier (or perhaps this could be on a fixed "baseline" set of web front ends) then these can be replaced and healed as they fail. For larger files (eg. video), the striping options are likely to be more useful where small files there could be more overhead fetching all the fragments than picking up a single file. Always Remember...Distributed storage (or RAID for that matter) is neither a substitute for DR (requires completely separate "shared-nothing" infrastructure) nor for Backups. While GlusterFS certainly has capability for being used across geographically dispersed clusters in multiple datacenters as part of a DR design, true DR goes much further and should be driven by the organisation's BCP (Business Continuity Plan). |
|||
This is a bunch of random thoughts, ideas and other nonsense, and is not intended to be taken seriously. I'm experimenting and mostly have no idea what I am doing with most of this so it should be taken with cuation and at your own risk. Intrustive technologies are minimised where possible. For the purposes of reducing abuse and other risks hCaptcha is used and has it's own policies linked from the widget.
Copyright Glen Pitt-Pladdy 2008-2023
|