Definitions
Filesystem — data structure and logic used by OS to translate logical files into physical blocks on a storage device.
It manages 3 primary things:
- Storage mapping — tracking which disk sectors belong to which file
- Metadata — storing attributes like timestamps, permissions, file names.
- Space management — recording which areas are free or occupied.
File — logical abstraction of a contiguous byte stream, treated as a single unit by the OS.
It consists of:
- Data — raw sequence of bits
- Metadata stored in control structure like inode.
- Handle — stateful reference used by app to treck its current pointer within the stream.
Popular filesystems
FAT family (Windows):
| Year | Variant | Max File Size | Max Volume Size | Comment |
|---|---|---|---|---|
| 1977 | FAT | 8-bit | ||
| 1980 | FAT12 | ~32 Mb | 32 Mb | for MS-DOS 1.0, Floppy disks 1.44 Mb |
| 1984 | FAT16 | 2 Gb | 2 Gb | for DOS 3.0 |
| 1996 | FAT32 | 4 Gb | 2 Tb | 32-bit table, Universal compatibility standard |
| 2006 | exFAT | 16 Eb | 128 Pb | 64-bit length field |
NTFS:
- v1.0 - Windows NT 3.1 (1993)
- v1.1, v1.2 - Windows NT 3.5, Windows NT 4.0
- v3.0 — Windows 2000
- v3.1 — Windows XP, Windows Server 2003
Linux:
- ext (1992)
- ext2 (1993)
- ext3 (2001)
- ext4 (2008)
FAT32
FAT — File Allocation Table.
Terms used below:
- Sector — smallest physical unit of storage on a drive, defined by hardware. Usually 512 bytes.
- Cluster — logical group of sectors. Smallest unit of space that OS/FAT32 can manage. Cluster can only belong to one file at a time. Usually around 4..32 Kb (= 8..64 sectors of 512 byte size)
On-Disk Layout
The volume is devided into 4 distinct regions, organized sequentially:
- Reserved region. Starts with the Boot Sector (Sector 0), which contains BIOS Parameter Block (BPB). BPB defines sector size, sectors per cluster, location of other regions.
- FAT Region. Contains FATs, typically 2 identical copies for safety.
- Rood dir region.
- Data region. Part of the disk where file and dirs data are stored in units called clusters.
Core Architecture
Cluster-based Addressing — files are addressed by cluster number, not by byte or sector.
FAT — linked list stored as array of 32-bit entries. Each entry corresponds to a cluster in the data region. Entry either points to the next cluster in the file or contains EOC (end-of-chain) marker.
28-bit limitation — lower 28 bits of each entry are used for cluster addressing. Top 4 bits are reserved.