When most people hear “data storage,” they think about conventional file level storage. The storage solutions used by typical end users are file systems that are mapped to individual hard drives.
However, file systems are only one way to organize data. Another popular method -- and one that is particularly useful when setting up virtual machine storage, network attached storage and SAN storage -- is block storage.
This article defines block storage, discusses common block storage use cases and explains what makes block storage different from file level storage.
Block Level Storage Definition
Block storage is so-called because it involves storing data in “blocks.” Typically, the blocks all exist on a single hard drive, which the operating system treats as a block device. However, blocks can also be spread across a distributed replicated block device (DRBD), which is essentially a software-defined storage pool composed of individual hard disks.
Each block on a block device or DRBD has a unique identifier. The operating system uses the identifier to write and read each block of data.
The blocks are not stored in any particular order. The data stored in the blocks on either “side” of a particular block may have nothing to do with the data inside that block, and there are no hierarchies or “folders” that govern how block data is organized.
Blocks also do not contain any metadata. They store just the data itself. The absence of metadata helps to make block storage very efficient because virtually all of the space on a block storage device can be used for storing data. None is wasted on “overhead” associated with metadata and data storage hierarchies.
Because of these characteristics, block storage is relatively simple from an organizational standpoint. It is also very efficient. These features make block storage ideal for workloads where data storage needs to be able to scale quickly, and where fast read/write speeds are more important than having data organized in a way that is easy for a human to interpret.
Storage area network (SAN) storage is usually built using block storage (although end-user SAN interfaces are often configured to function in the same way as file systems, in order to be user-friendly). Virtual machine platforms, such as VMware, sometimes also use block storage for storing data because block storage makes it easy to increase or decrease the size of virtual disks, as well as to migrate virtual machine data between one host and another.
Block Storage vs. File Level Storage
File level storage involves storing data within file systems. To use file-level storage, you have to create partitions on hard disks, then install file systems on the partitions. The size of a file system is usually limited by the size of an individual hard drive because most file systems cannot span across multiple hard drives.
Although block storage is more scalable and higher-performing than file system storage, block storage can also be less convenient to work within certain respects. The following table illustrates the main features of block storage as compared to file system storage.
|Block Storage||File System|
|Overhead||Very low||Relatively high (usable disk space is usually about ~7% lower than raw disk space)|
|Data contiguity (i.e., are interrelated pieces of data stored next to each other)?||No||Yes (in most cases; sometimes, data contiguity breaks down, leading to file system fragmentation, but this can be fixed with defrag tools)|
|File access speeds||Very fast||Less fast|
To sum up, file level storage is ideal for situations where:
- Your data storage needs are finite.
- You can tolerate moderate delays in data read/write times.
- User-friendliness and the ease of locating data are more important than performance and scalability.
Block storage is ideal when:
- The scale of your data storage needs is unknown or is subject to fluctuation.
- Performance and availability are more important than convenience.
Although block storage may not be the first storage solution you think of, it can be an ideal storage mechanism for certain types of workloads. This is increasingly true because the trend today is toward massively scalable, distributed storage solutions where performance is more important than the rigid organization of conventional file systems and databases.