Types of Linux File Systems

Ext2 and Ext3

The oldest file system in the Linux kernel, Ext2, was released in the early 1993 as an improved replacement for the Minix file system. Ext2 is a simple table-based file system, unburdened even by a journaling system, resulting in exceptionally high performance that closely matches the physical performance of the storage device.

Introduced in 2001, Ext3 is an evolution of Ext2, incorporating the ability to use a B-tree for the file map and a journaling system.

At present, both Ext2 and Ext3 are hopelessly outdated, both technically and morally. For example, with a standard cluster size of 4 KB, Ext2 and Ext3 only allow creating a file with a maximum size of 2 TB, and the size of the file system itself is limited to 16 TB. The precision of file attribute times is a significant value of 1 second by modern standards.

Characteristics of Use:

  • Old systems that are difficult, pointless, or impossible to migrate to modern file systems.
  • File system for the boot partition in cases where the root partition uses an exotic file system not supported by the bootloader or kernel.
  • Ext2 is sometimes used for flash drives due to its lack of journaling, which reduces write intensity and consequently minimizes flash memory cell wear. However, there are more suitable modern file systems for flash drives and SSDs.
  • Ext2 is also applied in situations where maximum speed is required without the need for data integrity: caches, temporary files, etc. (although it is more reasonable to use Ext4 with journaling disabled, as Google does on its servers).
  • Creating an Ext2 file system with 10 million inodes and the label "new_ext2fs" on the partition /dev/sda1:
mkfs.ext2 -L new_ext2fs -N 10000000 /dev/sda1

Creating Ext3 on partition /dev/sda3, using a partition with the label "ext3journal" as the journal storage:

mkfs.ext3 /dev/sda3 -O journal_dev LABEL=ext3journal

Mounting Ext3 on partition /dev/sdc2 to the directory /var in full journaling mode:

mount -t ext3 /dev/sdc2 /var -o data=journal

Ext4

The next, fourth evolutionary step in the development of file systems in the Ext family occurred between 2006 and 2008. Currently, Ext4 is one of the modern and most popular file systems in the Linux ecosystem. Its differences from Ext3 are significant:

  • Object size limits have increased significantly, and the limit on the number of subdirectories has been completely lifted.
  • Inode size has doubled, allowing for increased precision in timestamp accuracy up to 1 nanosecond and the inclusion of extended file attributes within inodes.
  • Introduction of extents—allocating contiguous disk areas for large files, significantly improving their speed and reducing file system fragmentation.
  • Delayed block allocation for file space optimization, enabling the optimization of the file system's request queue.
  • Online defragmentation.
  • Journal changes are logged with checksums.
  • Barrier mechanism—write quantization, ensuring file system integrity at every data write moment.
  • Transparent encryption has been available since kernel version 4.1.

Ext4 has already proven itself as an excellent example of a general-purpose traditional file system. It is known for high performance and reasonably good reliability. Therefore, the kernel has long provided the option to emulate Ext2 and Ext3 through the Ext4 driver—meaning it behaves like Ext4 but with disabled features not supported in Ext2 and Ext3. This mode is activated by enabling a kernel option.

CONFIG_EXT4_USE_FOR_EXT23

and the disabling of options:

CONFIG_EXT2_FS
CONFIG_EXT3_FS

However, Ext4, as a typical representative of the Ext family, is not without its drawbacks:

  • Static inode allocation — the limit on the number of files is rigidly set during the file system creation, and changing it afterward without reformatting is impossible.
  • As a consequence, Ext4 is characterized by high overhead space consumption on the storage device, which can reach tens of gigabytes with a partition size of 1 terabyte or more.

Ext4 is widely used both as the root file system and as a file system for storage, on both HDDs and SSDs (in the latter case, it is advisable to enable the discard mount option). Generally, if there are no specific requirements for the file system, Ext4 can be used without hesitation.

Creating Ext4 without journaling and with the label "squid_cache" on partition /dev/sdc8:

mkfs.ext4 -O ^has_journal -L squid_cache /dev/sdb8

Converting an unmounted partition /dev/sdb1 from Ext3 to Ext4:

tune2fs -O dir_index,extents,uninit_bg /dev/sdb1
e2fsck -pDf /dev/sdb1

Checking the fragmentation level of partition /dev/sda2:

e4defrag -c /dev/sda2

Reducing the reserved space for system needs on partition /dev/sdb7 to 1% (the default is 5%, which is too much; for non-system partitions, you can even set it to 0%):

tune2fs -m 1 /dev/sdb7

Btrfs

In 2007, Oracle introduced a new file system that competes in functionality with ZFS but overcomes some of its drawbacks. Alongside features like B-trees, extents, online defragmentation, and other capabilities of modern file systems, Btrfs possesses unique properties:

  • Copy-on-Write (CoW) concept—new or modified data is not overwritten on old data but is written to a new location. This ensures the preservation of at least the previous version of the data. This mechanism also facilitates the easy and rapid creation of file system snapshots. Theoretically, it also contributes to a more even wear of flash memory cells.
  • Management of physical devices and logical volumes, similar to LVM, including support for RAID 0, 1, 5, 6, 10.
  • Checksums for data and metadata.
  • Online data scrubbing—checking and correcting errors based on existing file duplicates.
  • Transparent compression using zlib and lzo, including for individual subvolumes and files.
  • Hierarchical quotas for subvolumes.
  • Deduplication and automatic defragmentation.
  • Sending and receiving subvolume contents as a binary data stream to the console, file, or remote node.
  • Conversion from Ext3 or Ext4 (in this case, the content will be transformed into a Btrfs snapshot) with the option to roll back.

Btrfs is actively developed and continuously enriched with capabilities—planned features include support for swap subvolumes, transparent encryption, live file system checking, and more.

Among the strengths of Btrfs, the following can be highlighted:

  • High speed, comparable to Ext4 or XFS, especially when using compression.
  • The ability to disable CoW with the mount option "nodatacow," in which case Btrfs operates as a traditional file system.
  • Elimination of journaling in favor of CoW has increased the performance and reliability of the file system.
  • Automatic detection of SSDs and full support for their operation (mount options ssd and discard).
  • Built-in RAID handling outperforms mdadm in terms of performance.
  • Automatic transparent defragmentation.
  • Packing files of a specified size (default up to 4 kilobytes) into the metadata tree to reduce fragmentation and improve performance.
  • Transparent resizing of file system sizes, as well as subvolumes (subvolumes are no different from regular directories in this regard).

Weaknesses:

  • The implementation of the B-tree is far from ideal, leading to an increase in its size and subsequent performance degradation. This negative effect can be mitigated by reducing the size of files that will be packed into the tree (mount option max_inline).
  • A large number of snapshots (on the order of several thousand) leads to a catastrophic drop in Btrfs performance.
  • The CoW mechanism increases fragmentation, resulting in a performance drop on HDDs (insignificant for SSDs). This can be addressed through auto defragmentation or by disabling CoW (mount options autodefrag and nodatacow, respectively).
  • Some features of Btrfs are still in experimental mode, mainly related to the RAID subsystem.
  • However, Red Hat and Oracle recognize Btrfs as ready for industrial use.
  • The Btrfs code is complex, making its development and support challenging.

The application scope of Btrfs is extensive, as it is a general-purpose file system, albeit complex in design. It is known to be used on heavily loaded servers (Facebook) and smartphones (Jolla). Almost all major distributions offer Btrfs as an option for the root file system. Using Btrfs makes sense for its unique capabilities—snapshots, compression, RAID support, and flash memory.

Creating a file system with the label "new_raid" in RAID 0 mode on four drives:

mkfs.btrfs -d raid0 -L new_raid /dev/sdb /dev/sdc /dev/sdd /dev/sde

Creating a read-only snapshot of the subvolume /usr:

btrfs subvolume snapshot -r /usr /usr@snapshot

Mounting the Btrfs partition /dev/sdb2 with a maximum inline file size of 256 bytes:

mount -t btrfs -o max_inline=256 /dev/sdb2 /var/cache

Adding the device /dev/sdc to the created file system and rebalancing its content across all devices:

btrfs device add /dev/sdc /home
btrfs filesystem balance /home

XFS

Developed by Silicon Graphics, XFS was integrated into the kernel in the early 2000s. Created in 1993, it remains relevant due to its 64-bit architecture, use of extents and B+ trees, support for real-time API (only under IRIX), and quotas, as well as nanosecond-precision timestamping.

Strengths:

  • Huge limits on file size and file system size—8 and 16 exbibytes, respectively.
  • Very high performance with large files and partitions.
  • Multi-block, delayed space allocation, improving performance and reducing data fragmentation.
  • Excellent implementation of multithreading.
  • Ability to defragment and resize the file system on-the-fly.
  • Well-optimized file system code.
  • Support for freezing the file system to create a consistent snapshot (snapshot creation is done by external tools).
  • Support for barriers.
  • Dynamic inode allocation and low overhead on storage.

Weaknesses:

  • Inability to shrink the file system.
  • Relatively slow performance with a large number of small files, especially during deletion. In recent kernels, this drawback has been mitigated through caching.
  • Extensive use of caching and the ability to journal only metadata changes can impact data storage reliability. However, in terms of fault tolerance, XFS is not significantly different from Ext4.
  • Long-standing stagnation in development—new features are not added, only bug fixes and optimizations.

Recommendations for use:

  • Primarily suited for large arrays of large files (100 MB and above). For instance, Red Hat engineers claim that for storage above 100 TB, XFS outperforms traditional file systems. NASA has a long positive experience with two XFS file systems each sized at 300 TB.
  • Even on relatively small arrays of large files (e.g., video libraries), XFS is preferable to other file systems.
  • XFS also performs well as a root file system—it is used as the default file system in RHEL 7.

Creating XFS on partition /dev/sdb3 with the specification of 32 allocation groups for improved parallel performance:

mkfs.xfs -d acount=32 /dev/sdb3

Mounting /dev/sdb8 to the directory /mnt/video with journaling on /dev/sde1 and enabling user quotas:

mount -t xfs -o logdev=/dev/sde1,uquota /dev/sdb8 /mnt/video

Defragmenting partition /dev/sda2:

xfs_db -r -c frag /dev/sda2
xfs_fsr /dev/sda2

Creating a full snapshot of XFS on /dev/sdb2 into the file /mnt/snapshots/xfs_last:

xfsdump -l 0 -f /mnt/snapshots/xfs_last /dev/sdb2

JFS

JFS, the creation of IBM, predates XFS with its first release in 1990. While it partly meets modern requirements with nanosecond precision timestamping, a 64-bit architecture, and B+ trees, the size limits for files (4 petabytes) and the file system (32 petabytes) might be considered insufficient. Journaling is performed only for metadata, which is a downside for data storage reliability.

Overall, JFS cannot boast any outstanding or unique capabilities. In terms of performance and reliability, it noticeably lags behind Ext4, so its use is justified mainly for backward compatibility purposes.

ReiserFS

This filesystem, developed by Hans Reiser, is a product of the 21st century. ReiserFS utilizes B+ trees and has quite decent size limits: 1 exbibyte for files and 16 tebibytes for volumes.

Strengths:

  • Extremely high speed with a vast number of small files.
  • The ability to pack small files (so-called "tails") into the metadata tree to reduce fragmentation (though this negatively impacts filesystem performance and stability, so it is usually disabled with the "notail" parameter).
  • High reliability.

Weaknesses:

  • Significant fragmentation growth with prolonged use.
  • Packing "tails" in the latest kernel versions can lead to spontaneous hangs and kernel panics.
  • In case of filesystem failure, attempting to rebuild the tree may result in complete data loss.
  • Complete halt in development and support—Hans Reiser received a life sentence in prison, and the ReiserFS code is too complex for external developers to modify.

Currently, ReiserFS makes sense to use only for partitions with a large number of small temporary files. In this domain, with regular reformatting (to eliminate fragmentation), it indeed has no equals.

Creating ReiserFS with the label "tmp_cache" on partition /dev/sdb2:

mkreiserfs -l tmp_cache /dev/sdb2

Mounting with optimal parameters to the directory /var/cache:

mount -t reiserfs -o noatime,notail /dev/sdb2 /var/cache

F2FS

F2FS, developed by Samsung in 2013, is specifically designed for flash storage. It uses multi-level hash tables and supports nanosecond precision timestamps. F2FS is actively evolving, with plans for transparent compression, deduplication, and filesystem resizing.

Strengths:

  • Most precise adaptation of filesystem characteristics to flash memory, resulting in exceptionally high performance on flash drives and SSDs.
  • Atomicity of operations and the use of checkpoints with rollback capabilities.
  • Initial support for TRIM and FITRIM.
  • Lack of journaling; instead, it utilizes action logging in conjunction with the CoW mechanism. This helps balance the wear of flash memory cells and improves reliability.
  • F2FS has a set of algorithms tailored to the model and type of the specific flash storage, achieving the highest performance and reliability.

The main weakness of F2FS is that it is generally unsuitable for use on HDDs.

Creating F2FS on /dev/sdd1:

mkfs.f2fs /dev/sdd1

Mounting to the directory /media/flash:

mount -t f2fs /dev/sdd1 /media/flash

Conclusion:

The choice of a file system is a crucial decision for any operating system. Despite the emergence of new technologies, ext4 remains the preferred choice for the majority of Linux distributions, providing an optimal balance between performance, reliability, and user convenience. In a world where data is growing exponentially, ext4 continues to demonstrate its ability to adapt and scale according to the needs of users and systems.