Search

mhddfs: join several real filesystems together to form a single larger one

May 25th, 2008 edited by Tincho

Article submitted by Roman Mamedov. Guess what? We still need you to submit good articles about software you like!

Suppose, you have three hard drives - sized 80, 40 and 60 GB. And 150 GB of music files, which you need to store on these drives. How would you do it?

The two solutions I knew of, were:

  • either to simply have three separate «Music» folders - one per each drive;
  • or create some sort of RAID, joining all the drives into an array.

However, the first method is quite tiresome, as one needs to decide how to split the data between the drives and keep track of what is stored where. For example, I might decide to store all «Classical» music on the first disk, and «Rock» music on the second. Then, suddenly, the first drive fills up and the second one still has plenty of space. Now I need to move the files between the disks, or jump around with symlinks.

The RAID method, while solving this problem, always incurs significant loss of either storage reliability or usable disk space.

But recently, I found a better solution to this problem and similar ones: mhddfs. It is a FUSE filesystem module which allows to combine several smaller filesystems into one big «virtual» one, which will contain all the files from all its members, and all their free space. Even better, unlike other similar modules (unionfs?), this one does not limit the ability to add new files on the combined filesystem and intelligently manages, where those files will be placed.

The package is called «mhddfs» and is currently present in Debian Testing and Unstable. It does not seem to be available in Ubuntu at the moment.

Let's say the three hard drives you have are mounted at /mnt/hdd1 /mnt/hdd2 and /mnt/hdd3. Then, you might have something akin to the following:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
...
/dev/sda1              80G   50G   30G  63% /mnt/hdd1
/dev/sdb1              40G   35G    5G  88% /mnt/hdd2
/dev/sdc1              60G   10G   50G  17% /mnt/hdd3

After you have installed the mhddfs package using your favourite package manager, you can create a new mount point, let's call it /mnt/virtual, which will join all these drives together for you. The beauty of FUSE means you don't really have to be root for this (can be just a member of the fuse group), but for the sake of examples' simplicity, let's suppose we are logged in as root here.

# mkdir /mnt/virtual
# mhddfs /mnt/hdd1,/mnt/hdd2,/mnt/hdd3 /mnt/virtual -o allow_other
option: allow_other (1)
mhddfs: directory '/mnt/hdd1' added to list
mhddfs: directory '/mnt/hdd2' added to list
mhddfs: directory '/mnt/hdd3' added to list
mhddfs: move size limit 4294967296 bytes
mhddfs: mount point '/mnt/virtual'

The «-o allow_other» option here means that the resulting filesystem should be visible to all users, not just to the one who created it.

The result will look like this:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
...
/dev/sda1              80G   50G   30G  63% /mnt/hdd1
/dev/sdb1              40G   35G    5G  88% /mnt/hdd2
/dev/sdc1              60G   10G   50G  17% /mnt/hdd3
mhddfs                180G   95G   85G  53% /mnt/virtual

As you can see, the new filesystem has been created. It joined the total size of all drives together (180G), added together the space used by all files there (95G) and summed up the free space (85G). If you look at files in /mnt/virtual, you'll notice that it has files from all three drives, with all three directory structures «overlayed» onto each other.

But what if you try to add new files somewhere inside that /mnt/virtual? Well, that is quite tricky issue, and I must say the author of mhddfs solved it very well. When you create a new file in the virtual filesystem, mhddfs will look at the free space, which remains on each of the drives. If the first drive has enough free space, the file will be created on that first drive. Otherwise, if that drive is low on space (has less than specified by «mlimit» option of mhddfs, which defaults to 4 GB), the second drive will be used instead. If that drive is low on space too, the third drive will be used. If each drive individually has less than mlimit free space, the drive with the most free space will be chosen for new files.

It's even more than that; if a certain drive runs out of free space in the middle of a write (suppose, you tried to create a very large file on it), the write process will not fail; mhddfs will simply transfer the already written data to another drive (which has more space available) and continue the write there. All this completely transparently for to the application which writes the file (it will not even know that anything happened).

Now you can simply work with files in /mnt/virtual, not caring about what is being read from which disk, etc. Also, the convenience of having large «contiguous» free space means you can simply drop any new files into that folder and (as long as there's space on at least one member of the virtual FS) not care about which file gets stored where.

If you decide to make that mount point creating automatically for you on each boot, you can add the following line to /etc/fstab:

mhddfs#/mnt/hdd1,/mnt/hdd2,/mnt/hdd3 /mnt/virtual fuse defaults,allow_other 0 0

For more details, see man mhddfs.

The last, but not the least important thing to mention, is the fact that it's very simple to stop using mhddfs, if you later decide to do so - and not lose any file data or directory structure. Let's say, at some point in time, you purchase a new 500 GB hard disk, and want to sell the smaller disks on Ebay. You can just plug in the new drive, copy everything from /mnt/virtual onto it, and then remove mhddfs mount point and disconnect old drives. All your folders, which were previously merged in a «virtual» way by mhddfs, will now be merged in reality, on the new disk. And thanks to the fact that files themselves are not split into bits which are stored on different drives, even in the unlikely event when mhddfs suddenly no longer works for you (or disappears from existence), you can still copy all your data from all three drives into one single folder, and have the same structure you previously had in that /mnt/virtual mount point.

Posted in Debian, Ubuntu |

30 Responses

  1. damanp Says:

    Many thanks, didn’t know such tool existed. Very handy.

  2. Yaroslav Halchenko Says:

    well - LVM is what comes first in mind to me (not RAID per se) when you are trying to manage multiple partitions (or disks) as a one entity — you just need to add them into the same volume group.. . but that is a story on its own, and presented here module is valuable on its own, thanks

  3. Shanness Says:

    Thanks, very interesting tool! But I must agree, anyone considering this should consider LVM first. I’ve been using LVM to grow multiple drives, and move “extents” between new drives for a few years, and it works beautifully well.

  4. omry Says:

    Cool, although it does not seem to support NFS.
    when I try to use it with nfs, I get:
    Warning: /storage does not support NFS export.
    when I restart nfs kernel daemon.

  5. Mathias Brodala Says:

    What happens if several mountpoints are merged which contain one or more equally named directories? Does the later specified mountpoint cover the former?

  6. rm Says:

    Mathias,
    > What happens if several mountpoints are merged which contain one or more equally named directories? Does the later specified mountpoint cover the former?

    The contents will be transparently merged together, e.g. if you have
    /mnt/hdd1/Music/file_one.ogg
    and
    /mnt/hdd2/Music/file_two.ogg
    then the merged mountpoint will have /mnt/virtual/Music/, inside which you will find both “file_one.ogg” and “file_two.ogg”.

    This works also when sub-folders match, and even sub-sub-folders(etc, etc) match too.

  7. Mathias Brodala Says:

    Thanks. I should have asked this earlier too: and what about files?

  8. UNera Says:

    >Thanks. I should have asked this earlier too: and what about files?

    see README

    http://svn.uvw.ru/mhddfs/trunk/README

    quote:

    Working
    ~~~~~~~

    Consider we have two hard drives with the content below:

    /hdd1 /hdd2
    | |
    +– /dir1 +– /dir1
    | | | |
    | +- file2 | +- file4
    | | +- file2
    +– file1 |
    | +– file5
    +– /dir2 |
    | +– /dir3
    +- file3 |
    +- file6

    mounting this tree with the command:

    mhddfs /hdd1,/hdd2 /hdd_common

    into the specified file system point we will see a combined tree.

    In the united tree we can see all the directories and files. Note
    file2 of 2nd hdd is not visible (because 1st hdd has the file2
    already).

  9. Mathias Brodala Says:

    Ah, I see. Thanks.

  10. Doug Says:

    There is a Ubuntu deb package that can be found at http://security.ubuntu.com/ubuntu/pool/universe/m/mhddfs/

    I am successfully using the “mhddfs_0.1.10-1_i386.deb” package and mhddfs on Ubuntu 8.04 Hardy as we speak.

    Thanks for the write up.

  11. Andrés Suárez Says:

    Even better, well for me, you could use glusterfs, it allows to do the same, it’s fuse based and it’s a client server model, so it works to aggregate storage from multiple servers too.

    As i see this, looks like userspace lvm, am i right?

  12. UNera Says:

    >As i see this, looks like userspace lvm, am i right?

    No.
    This is high-level driver

    LVM - low-level

    see README

  13. rdskaroff Says:

    great!

  14. weakish Says:

    @Andrés Suárez

    glusterfs is a bit too complicated to install and use comparing to mhddfs. Since I only have one machine, I think mhddfs is sufficient.

  15. dirk Says:

    nice tool. normally i would suggest LVM for this kind of thing, but FUSE doesn’t require root and in fact you could still ‘fuse’ several filesystem even if they are on LVM using this tool.

    i think zfs has similar functionality.

  16. XX Says:

    LVM is _nothing_ like this. It joins block devices into volume groups and allows creating arbitrary block devices (and cow) on those[0]. Mhddfs joins filesystem trees.
    As the fellow said, entirely different levels!

    [0] Unlike some useless glorified RAID fake volume managers

  17. Gijs Nelissen Says:

    I just tested in in ubuntu hardy heron 8.04, using the package Doug suggested (thanks for that).

    /dev/sda3 129G 198M 123G 1% /home
    /dev/sdb1 316G 13G 287G 5% /media/media1
    /dev/sdc1 314G 136G 163G 46% /media/media2
    /dev/sdd1 197G 173G 15G 93% /media/media3
    /dev/sdf1 316G 201M 299G 1% /media/media4
    /dev/sdg1 197G 178G 9.3G 96% /media/media5
    /dev/sdh1 247G 224G 11G 96% /media/media6
    /dev/sdi1 247G 224G 11G 96% /media/media7
    /dev/sdh1 197G 178G 9.3G 96% /media/media5
    /dev/sdi1 247G 224G 11G 96% /media/media6
    /dev/sdg1 316G 51G 249G 17% /media/backup
    /media/media1;/media/media3;/media/media4;/media/media5;/media/media6;/media/media7
    1.6T 810G 630G 57% /media/storage

  18. certosin0 Says:

    just to inform, it works perfectly under ubu server 8.04:

    Filesystem Size Used Avail Use% Mounted on
    /dev/sdg1 4,5G 1,0G 3,3G 24% /
    varrun 131M 185k 131M 1% /var/run
    varlock 131M 0 131M 0% /var/lock
    udev 131M 99k 131M 1% /dev
    devshm 131M 0 131M 0% /dev/shm
    /dev/sdg3 14G 183M 14G 2% /home
    /dev/sdc1 497G 422G 51G 90% /home/hdd01
    /dev/sdd1 497G 4,1G 468G 1% /home/hdd02
    /dev/sde1 497G 16G 456G 4% /home/hdd03
    /dev/sdf1 497G 4,4G 467G 1% /home/hdd04
    /dev/sda1 497G 208M 471G 1% /home/hdd05
    /dev/sdb1 497G 208M 471G 1% /home/hdd06
    /home/hdd01;/home/hdd02;/home/hdd03;/home/hdd04;/home/hdd05;/home/hdd06
    3,0T 446G 2,4T 16% /home/virtual
    /dev/sdh1 321G 128G 193G 40% /media/usb

  19. ravenpl Says:

    > Cool, although it does not seem to support NFS.
    Have You tried /etc/exports like
    /mnt/warsaw_logs 10.0.0.0/24(rw,sync,no_root_squash,fsid=1)
    #note the fsid option.

    Another Q, If used with root priviledges, does ownership/permissions work as expected. What about directory permissions/ownership across physical volumes?

  20. Matt Simmons Says:

    I’ll second (or maybe 4th, by this point) the LVM solution in this case. It does need to be done as root, but there aren’t too many occasions when users have to deal with raw filesystems on a machine that they don’t have root on.

    I wrote an introduction to LVM not too long ago if you’re curious how it works.

    This is an interesting user-land program though!

  21. Colin Says:

    I swear I just asked a similar question last week on the ubuntu forums;

    I ended up using dnotify and symlinks to accomplish a similar process with two folders on different filesystems; but I like what I hear for this…

  22. Ajay Says:

    Great intro, Thanks

    I can see this being used instead of LVM in a lot of places.

    Couple of questions.

    Anybody rank benchmarks yet to check the performance penalty?

    Is the order of disks for writing determined by their physical order or by order specified in “mounting” command.

  23. E0x Says:

    any tips or idea of how can make that gnome/nautilus look the new virtual fielsystem like a new Disk ?

  24. E0x Says:

    never mind , if you mount it in /media/ the new mount point will be appeart like new Disk in gnome

  25. budiw Says:

    This is what i’m looking for..
    Thankyou.

    –budiw

  26. udienz Says:

    thanks, this package is very usefull. i use this tool for managing mirror and looks perfect!

  27. Thomas Says:

    If I have three disks and one fails, will I lose all data? I had an issue with LVM, where I simulated a disk failure and could not access any of the data that was stored on the disks after the simulation. I am now running RAID. I have not tried to do RAID with LVM or MHDDFS (should I want to venture this way). Please post or notify me if anyone has had the experience of disk failure using LVM or MHDDFS. I am curious as to the outcome. Thanks!

  28. rm Says:

    to Thomas:
    > I had an issue with LVM, where I simulated a disk failure and could not access any of the data that was stored on the disks after the simulation.

    This is true, and this is the mentioned “loss of reliability” which you have with RAID or LVM, if you choose to use those of the configurations which don’t have redundancy.

    With LVM joining your disks into one (and with one partition for all three) it is very likely that no data will be accessible after one single disk failure. That’s because the filesystem structures (like the files themselves) are split in bits and pieces between all disks.

    Not so with mhddfs. If one disk fails, you only lose the files which happen to be stored on that disk. And all other files, which were residing on other disks, are left completely unharmed. This is why I consider mhddfs to be a better solution than LVM or RAID (JBOD).

    However, if you care about the data, you should always have a backup copy of it. And even if you use RAID1 (mirror) you still should do backups, because RAID does not protect you from accidental (or malicious) deletions, filesystem bugs, etc.

  29. Johan Says:

    It would be really cool if you could specify somehow (regexes, mount options) that some files should be replicated onto more than one underlying filesystem.

    Any writes would need to hit all underlying copies as well, but not necessarily synchronously.

    That would get you cheap, somewhat reliable file-by-file replication.

  30. Tomas Pospisek Says:

    /etc/cron.daily/locate will make my system unresponsive when traversing the mhddfs mount. In other words: mhddfs has a quite severe performance problem.