mhddfs: join several real filesystems together to form a single larger one
May 25th, 2008 edited by TinchoArticle submitted by Roman Mamedov. Guess what? We still need you to submit good articles about software you like!
Suppose, you have three hard drives - sized 80, 40 and 60 GB. And 150 GB of music files, which you need to store on these drives. How would you do it?
The two solutions I knew of, were:
- either to simply have three separate «Music» folders - one per each drive;
- or create some sort of RAID, joining all the drives into an array.
However, the first method is quite tiresome, as one needs to decide how to split the data between the drives and keep track of what is stored where. For example, I might decide to store all «Classical» music on the first disk, and «Rock» music on the second. Then, suddenly, the first drive fills up and the second one still has plenty of space. Now I need to move the files between the disks, or jump around with symlinks.
The RAID method, while solving this problem, always incurs significant loss of either storage reliability or usable disk space.
But recently, I found a better solution to this problem and similar ones: mhddfs. It is a FUSE filesystem module which allows to combine several smaller filesystems into one big «virtual» one, which will contain all the files from all its members, and all their free space. Even better, unlike other similar modules (unionfs?), this one does not limit the ability to add new files on the combined filesystem and intelligently manages, where those files will be placed.
The package is called «mhddfs» and is currently present in Debian Testing and Unstable. It does not seem to be available in Ubuntu at the moment.
Let's say the three hard drives you have are mounted at /mnt/hdd1
/mnt/hdd2
and /mnt/hdd3
. Then, you might have something akin to the following:
$ df -h Filesystem Size Used Avail Use% Mounted on ... /dev/sda1 80G 50G 30G 63% /mnt/hdd1 /dev/sdb1 40G 35G 5G 88% /mnt/hdd2 /dev/sdc1 60G 10G 50G 17% /mnt/hdd3
After you have installed the mhddfs
package using your favourite package manager, you can create a new mount point, let's call it /mnt/virtual
, which will join all these drives together for you. The beauty of FUSE means you don't really have to be root
for this (can be just a member of the fuse
group), but for the sake of examples' simplicity, let's suppose we are logged in as root
here.
# mkdir /mnt/virtual # mhddfs /mnt/hdd1,/mnt/hdd2,/mnt/hdd3 /mnt/virtual -o allow_other option: allow_other (1) mhddfs: directory '/mnt/hdd1' added to list mhddfs: directory '/mnt/hdd2' added to list mhddfs: directory '/mnt/hdd3' added to list mhddfs: move size limit 4294967296 bytes mhddfs: mount point '/mnt/virtual'
The «-o allow_other
» option here means that the resulting filesystem should be visible to all users, not just to the one who created it.
The result will look like this:
$ df -h Filesystem Size Used Avail Use% Mounted on ... /dev/sda1 80G 50G 30G 63% /mnt/hdd1 /dev/sdb1 40G 35G 5G 88% /mnt/hdd2 /dev/sdc1 60G 10G 50G 17% /mnt/hdd3 mhddfs 180G 95G 85G 53% /mnt/virtual
As you can see, the new filesystem has been created. It joined the total size of all drives together (180G), added together the space used by all files there (95G) and summed up the free space (85G). If you look at files in /mnt/virtual
, you'll notice that it has files from all three drives, with all three directory structures «overlayed» onto each other.
But what if you try to add new files somewhere inside that /mnt/virtual
? Well, that is quite tricky issue, and I must say the author of mhddfs
solved it very well. When you create a new file in the virtual filesystem, mhddfs
will look at the free space, which remains on each of the drives. If the first drive has enough free space, the file will be created on that first drive. Otherwise, if that drive is low on space (has less than specified by «mlimit» option of mhddfs
, which defaults to 4 GB), the second drive will be used instead. If that drive is low on space too, the third drive will be used. If each drive individually has less than mlimit
free space, the drive with the most free space will be chosen for new files.
It's even more than that; if a certain drive runs out of free space in the middle of a write (suppose, you tried to create a very large file on it), the write process will not fail; mhddfs
will simply transfer the already written data to another drive (which has more space available) and continue the write there. All this completely transparently for to the application which writes the file (it will not even know that anything happened).
Now you can simply work with files in /mnt/virtual
, not caring about what is being read from which disk, etc. Also, the convenience of having large «contiguous» free space means you can simply drop any new files into that folder and (as long as there's space on at least one member of the virtual FS) not care about which file gets stored where.
If you decide to make that mount point creating automatically for you on each boot, you can add the following line to /etc/fstab
:
mhddfs#/mnt/hdd1,/mnt/hdd2,/mnt/hdd3 /mnt/virtual fuse defaults,allow_other 0 0
For more details, see man mhddfs
.
The last, but not the least important thing to mention, is the fact that it's very simple to stop using mhddfs
, if you later decide to do so - and not lose any file data or directory structure. Let's say, at some point in time, you purchase a new 500 GB hard disk, and want to sell the smaller disks on Ebay. You can just plug in the new drive, copy everything from /mnt/virtual
onto it, and then remove mhddfs
mount point and disconnect old drives. All your folders, which were previously merged in a «virtual» way by mhddfs
, will now be merged in reality, on the new disk. And thanks to the fact that files themselves are not split into bits which are stored on different drives, even in the unlikely event when mhddfs
suddenly no longer works for you (or disappears from existence), you can still copy all your data from all three drives into one single folder, and have the same structure you previously had in that /mnt/virtual
mount point.
May 25th, 2008 at 6:18 am
Many thanks, didn’t know such tool existed. Very handy.
May 25th, 2008 at 6:46 am
well - LVM is what comes first in mind to me (not RAID per se) when you are trying to manage multiple partitions (or disks) as a one entity — you just need to add them into the same volume group.. . but that is a story on its own, and presented here module is valuable on its own, thanks
May 25th, 2008 at 8:16 am
Thanks, very interesting tool! But I must agree, anyone considering this should consider LVM first. I’ve been using LVM to grow multiple drives, and move “extents” between new drives for a few years, and it works beautifully well.
May 25th, 2008 at 10:26 am
Cool, although it does not seem to support NFS.
when I try to use it with nfs, I get:
Warning: /storage does not support NFS export.
when I restart nfs kernel daemon.
May 25th, 2008 at 11:17 am
What happens if several mountpoints are merged which contain one or more equally named directories? Does the later specified mountpoint cover the former?
May 25th, 2008 at 11:56 am
Mathias,
> What happens if several mountpoints are merged which contain one or more equally named directories? Does the later specified mountpoint cover the former?
The contents will be transparently merged together, e.g. if you have
/mnt/hdd1/Music/file_one.ogg
and
/mnt/hdd2/Music/file_two.ogg
then the merged mountpoint will have /mnt/virtual/Music/, inside which you will find both “file_one.ogg” and “file_two.ogg”.
This works also when sub-folders match, and even sub-sub-folders(etc, etc) match too.
May 25th, 2008 at 1:23 pm
Thanks. I should have asked this earlier too: and what about files?
May 25th, 2008 at 1:30 pm
>Thanks. I should have asked this earlier too: and what about files?
see README
http://svn.uvw.ru/mhddfs/trunk/README
quote:
Working
~~~~~~~
Consider we have two hard drives with the content below:
/hdd1 /hdd2
| |
+– /dir1 +– /dir1
| | | |
| +- file2 | +- file4
| | +- file2
+– file1 |
| +– file5
+– /dir2 |
| +– /dir3
+- file3 |
+- file6
mounting this tree with the command:
mhddfs /hdd1,/hdd2 /hdd_common
into the specified file system point we will see a combined tree.
In the united tree we can see all the directories and files. Note
file2 of 2nd hdd is not visible (because 1st hdd has the file2
already).
May 25th, 2008 at 1:58 pm
Ah, I see. Thanks.
May 25th, 2008 at 4:14 pm
There is a Ubuntu deb package that can be found at http://security.ubuntu.com/ubuntu/pool/universe/m/mhddfs/
I am successfully using the “mhddfs_0.1.10-1_i386.deb” package and mhddfs on Ubuntu 8.04 Hardy as we speak.
Thanks for the write up.
May 25th, 2008 at 4:15 pm
Even better, well for me, you could use glusterfs, it allows to do the same, it’s fuse based and it’s a client server model, so it works to aggregate storage from multiple servers too.
As i see this, looks like userspace lvm, am i right?
May 25th, 2008 at 5:48 pm
>As i see this, looks like userspace lvm, am i right?
No.
This is high-level driver
LVM - low-level
see README
May 26th, 2008 at 1:37 pm
great!
June 4th, 2008 at 8:09 am
@Andrés Suárez
glusterfs is a bit too complicated to install and use comparing to mhddfs. Since I only have one machine, I think mhddfs is sufficient.
June 13th, 2008 at 6:14 pm
nice tool. normally i would suggest LVM for this kind of thing, but FUSE doesn’t require root and in fact you could still ‘fuse’ several filesystem even if they are on LVM using this tool.
i think zfs has similar functionality.
June 24th, 2008 at 7:44 am
LVM is _nothing_ like this. It joins block devices into volume groups and allows creating arbitrary block devices (and cow) on those[0]. Mhddfs joins filesystem trees.
As the fellow said, entirely different levels!
[0] Unlike some useless glorified RAID fake volume managers
July 16th, 2008 at 2:43 pm
I just tested in in ubuntu hardy heron 8.04, using the package Doug suggested (thanks for that).
/dev/sda3 129G 198M 123G 1% /home
/dev/sdb1 316G 13G 287G 5% /media/media1
/dev/sdc1 314G 136G 163G 46% /media/media2
/dev/sdd1 197G 173G 15G 93% /media/media3
/dev/sdf1 316G 201M 299G 1% /media/media4
/dev/sdg1 197G 178G 9.3G 96% /media/media5
/dev/sdh1 247G 224G 11G 96% /media/media6
/dev/sdi1 247G 224G 11G 96% /media/media7
/dev/sdh1 197G 178G 9.3G 96% /media/media5
/dev/sdi1 247G 224G 11G 96% /media/media6
/dev/sdg1 316G 51G 249G 17% /media/backup
/media/media1;/media/media3;/media/media4;/media/media5;/media/media6;/media/media7
1.6T 810G 630G 57% /media/storage
August 16th, 2008 at 12:43 pm
just to inform, it works perfectly under ubu server 8.04:
Filesystem Size Used Avail Use% Mounted on
/dev/sdg1 4,5G 1,0G 3,3G 24% /
varrun 131M 185k 131M 1% /var/run
varlock 131M 0 131M 0% /var/lock
udev 131M 99k 131M 1% /dev
devshm 131M 0 131M 0% /dev/shm
/dev/sdg3 14G 183M 14G 2% /home
/dev/sdc1 497G 422G 51G 90% /home/hdd01
/dev/sdd1 497G 4,1G 468G 1% /home/hdd02
/dev/sde1 497G 16G 456G 4% /home/hdd03
/dev/sdf1 497G 4,4G 467G 1% /home/hdd04
/dev/sda1 497G 208M 471G 1% /home/hdd05
/dev/sdb1 497G 208M 471G 1% /home/hdd06
/home/hdd01;/home/hdd02;/home/hdd03;/home/hdd04;/home/hdd05;/home/hdd06
3,0T 446G 2,4T 16% /home/virtual
/dev/sdh1 321G 128G 193G 40% /media/usb
September 13th, 2008 at 8:01 pm
> Cool, although it does not seem to support NFS.
Have You tried /etc/exports like
/mnt/warsaw_logs 10.0.0.0/24(rw,sync,no_root_squash,fsid=1)
#note the fsid option.
Another Q, If used with root priviledges, does ownership/permissions work as expected. What about directory permissions/ownership across physical volumes?
October 13th, 2008 at 6:11 pm
I’ll second (or maybe 4th, by this point) the LVM solution in this case. It does need to be done as root, but there aren’t too many occasions when users have to deal with raw filesystems on a machine that they don’t have root on.
I wrote an introduction to LVM not too long ago if you’re curious how it works.
This is an interesting user-land program though!
October 14th, 2008 at 3:18 am
I swear I just asked a similar question last week on the ubuntu forums;
I ended up using dnotify and symlinks to accomplish a similar process with two folders on different filesystems; but I like what I hear for this…
October 14th, 2008 at 3:44 pm
Great intro, Thanks
I can see this being used instead of LVM in a lot of places.
Couple of questions.
Anybody rank benchmarks yet to check the performance penalty?
Is the order of disks for writing determined by their physical order or by order specified in “mounting” command.
October 21st, 2008 at 11:46 pm
any tips or idea of how can make that gnome/nautilus look the new virtual fielsystem like a new Disk ?
October 22nd, 2008 at 12:07 pm
never mind , if you mount it in /media/ the new mount point will be appeart like new Disk in gnome
December 31st, 2008 at 8:31 am
This is what i’m looking for..
Thankyou.
–budiw
January 7th, 2009 at 4:40 pm
thanks, this package is very usefull. i use this tool for managing mirror and looks perfect!
February 1st, 2009 at 8:16 am
If I have three disks and one fails, will I lose all data? I had an issue with LVM, where I simulated a disk failure and could not access any of the data that was stored on the disks after the simulation. I am now running RAID. I have not tried to do RAID with LVM or MHDDFS (should I want to venture this way). Please post or notify me if anyone has had the experience of disk failure using LVM or MHDDFS. I am curious as to the outcome. Thanks!
February 23rd, 2009 at 4:35 pm
to Thomas:
> I had an issue with LVM, where I simulated a disk failure and could not access any of the data that was stored on the disks after the simulation.
This is true, and this is the mentioned “loss of reliability” which you have with RAID or LVM, if you choose to use those of the configurations which don’t have redundancy.
With LVM joining your disks into one (and with one partition for all three) it is very likely that no data will be accessible after one single disk failure. That’s because the filesystem structures (like the files themselves) are split in bits and pieces between all disks.
Not so with mhddfs. If one disk fails, you only lose the files which happen to be stored on that disk. And all other files, which were residing on other disks, are left completely unharmed. This is why I consider mhddfs to be a better solution than LVM or RAID (JBOD).
However, if you care about the data, you should always have a backup copy of it. And even if you use RAID1 (mirror) you still should do backups, because RAID does not protect you from accidental (or malicious) deletions, filesystem bugs, etc.
March 25th, 2009 at 7:43 pm
It would be really cool if you could specify somehow (regexes, mount options) that some files should be replicated onto more than one underlying filesystem.
Any writes would need to hit all underlying copies as well, but not necessarily synchronously.
That would get you cheap, somewhat reliable file-by-file replication.
April 6th, 2009 at 12:40 pm
/etc/cron.daily/locate will make my system unresponsive when traversing the mhddfs mount. In other words: mhddfs has a quite severe performance problem.