Certified Linux Administrator Partition and volume management

Partition and volume management
 


Partitioning is a means to divide a single hard drive into many logical drives. A partition is a contiguous set of blocks on a drive that are treated as an independant disk. A partition table (the creation of which is the topic of this HOWTO) is an index that relates sections of the hard drive to partitions.

Why have multiple partitions?

  • Encapsulate your data. Since file system corruption is local to a partition, you stand to lose only some of your data if an accident occurs.

  • Increase disk space efficiency. You can format partitions with varying block sizes, depending on your usage. If your data is in a large number of small files (less than 1k) and your partition uses 4k sized blocks, you are wasting 3k for every file. In general, you waste on average one half of a block for every file, so matching block size to the average size of your files is important if you have many files.

  • Limit data growth. Runaway processes or maniacal users can consume so much disk space that the operating system no longer has room on the hard drive for its bookkeeping operations. This will lead to disaster. By segregating space, you ensure that things other than the operating system die when allocated disk space is exhausted.

Devices

There is a special nomenclature that linux uses to refer to hard drive partitions that must be understood in order to follow the discussion on the following pages.

In Linux, partitions are represented by device files. These are phoney files located in /dev. Here are a few entries:

brw-rw----    1 root     disk       3,   0 May  5  1998 hda
brw-rw----    1 root     disk       8,   0 May  5  1998 sda
crw-------    1 root     tty        4,  64 May  5  1998 ttyS0

A device file is a file with type c ( for "character" devices, devices that do not use the buffer cache) or b (for "block" devices, which go through the buffer cache). In Linux, all disks are represented as block devices only.

Device names

Naming Convention

By convention, IDE drives will be given device names /dev/hda to /dev/hdd. Hard Drive A (/dev/hda) is the first drive and Hard Drive C (/dev/hdc) is the third.

Table 2. IDE controller naming convention

drive name drive controller drive number
/dev/hda 1 1
/dev/hdb 1 2
/dev/hdc 2 1
/dev/hdd 2 2

A typical PC has two IDE controllers, each of which can have two drives connected to it. For example, /dev/hda is the first drive (master) on the first IDE controller and /dev/hdd is the second (slave) drive on the second controller (the fourth IDE drive in the computer).

You can write to these devices directly (using cat or dd). However, since these devices represent the entire disk, starting at the first block, you can mistakenly overwrite the master boot record and the partition table, which will render the drive unusable.

Table 3. partition names

drive name drive controller drive number partition type partition number
/dev/hda1 1 1 primary 1
/dev/hda2 1 1 primary 2
/dev/hda3 1 1 primary 3
/dev/hda4 1 1 swap NA
/dev/hdb1 1 2 primary 1
/dev/hdb2 1 2 primary 2
/dev/hdb3 1 2 primary 3
/dev/hdb4 1 2 primary 4

Once a drive has been partitioned, the partitions will represented as numbers on the end of the names. For example, the second partition on the second drive will be /dev/hdb2. The partition type (primary) is listed in the table above for clarity 

Table 4. SCSI Drives

drive name drive controller drive number partition type partition number
/dev/sda1 1 6 primary 1
/dev/sda2 1 6 primary 2
/dev/sda3 1 6 primary 3

SCSI drives follow a similar pattern; They are represented by 'sd' instead of 'hd'. The first partition of the second SCSI drive would therefore be /dev/sdb1. In the table above, the drive number is arbitraily chosen to be 6 to introduce the idea that SCSI ID numbers do not map onto device names under linux.

Name Assignment

Under (Sun) Solaris and (SGI) IRIX, the device name given to a SCSI drive has some relationship to where you plug it in. Under linux, there is only wailing and gnashing of teeth.

Before

SCSI ID #2        SCSI ID #5       SCSI ID #7        SCSI ID #8
 /dev/sda          /dev/sdb         /dev/sdc          /dev/sdd

After

SCSI ID #2                         SCSI ID #7        SCSI ID #8
 /dev/sda                           /dev/sdb          /dev/sdc

SCSI drives have ID numbers which go from 1 through 15. Lower SCSI ID numbers are assigned lower-order letters. For example, if you have two drives numbered 2 and 5, then #2 will be /dev/sda and #5 will be /dev/sdb. If you remove either, all the higher numbered drives will be renamed the next time you boot up.

If you have two SCSI controllers in your linux box, you will need to examine the output of /bin/dmesg in order to see what name each drive was assigned. If you remove one of two controllers, the remaining controller might have all its drives renamed. Grrr...

There are two work-arounds; both involve using a program to put a label on each partition. The label is persistent even when the device is physically moved. You then refer to the partition directly or indirectly by label.

Logical Partitions

Table 5. Logical Partitions

drive name drive controller drive number partition type partition number
/dev/hdb1 1 2 primary 1
/dev/hdb2 1 2 extended NA
/dev/hda5 1 2 logical 2
/dev/hdb6 1 2 logical 3

The table above illustrates a mysterious jump in the name assignments. This is due to the use of logical partitions.

This is all you have to know to deal with linux disk devices.

Device numbers

The only important thing with a device file are its major and minor device numbers, which are shown instead of the file size:

$ ls -l /dev/hda

Table 6. Device file attributes

brw-rw---- 1 root disk 3, 0 Jul 18 1994 /dev/hda
permissions   owner group major device number minor device number date device name

When accessing a device file, the major number selects which device driver is being called to perform the input/output operation. This call is being done with the minor number as a parameter and it is entirely up to the driver how the minor number is being interpreted. The driver documentation usually describes how the driver uses minor numbers. For IDE disks, this documentation is in /usr/src/linux/Documentation/ide.txt. For SCSI disks, one would expect such documentation in /usr/src/linux/Documentation/scsi.txt, but it isn't there. One has to look at the driver source to be sure ( /usr/src/linux/driver/scsi/sd.c:184-196). Fortunately, there is Peter Anvin's list of device numbers and names in /usr/src/linux/Documentation/devices.txt; see the entries for block devices, major 3, 22, 33, 34 for IDE and major 8 for SCSI disks. The major and minor numbers are a byte each and that is why the number of partitions per disk is limited.

A partition is labeled to host a certain kind of file system (not to be confused with a volume label ). Such a file system could be the linux standard ext2 file system or linux swap space, or even foreign file systems like (Microsoft) NTFS or (Sun) UFS. There is a numerical code associated with each partition type. For example, the code for ext2 is 0x83 and linux swap is 0x82. To see a list of partition types and their codes, execute /sbin/sfdisk -T

Foreign Partition Types

The partition type codes have been arbitrarily chosen (you can't figure out what they should be) and they are particular to a given operating system. Therefore, it is theoretically possible that if you use two operating systems with the same hard drive, the same code might be used to designate two different partition types. OS/2 marks its partitions with a 0x07 type and so does Windows NT's NTFS. MS-DOS allocates several type codes for its various flavors of FAT file systems: 0x01, 0x04 and 0x06 are known. DR-DOS used 0x81 to indicate protected FAT partitions, creating a type clash with Linux/Minix at that time, but neither Linux/Minix nor DR-DOS are widely used any more.

OS/2 marks its partitions with a 0x07 type and so does Windows NT's NTFS. MS-DOS allocates several type codes for its various flavors of FAT file systems: 0x01, 0x04 and 0x06 are known. DR-DOS used 0x81 to indicate protected FAT partitions, creating a type clash with Linux/Minix at that time, but neither Linux/Minix nor DR-DOS are widely used any more.

Primary Partitions

The number of partitions on an Intel-based system was limited from the very beginning: The original partition table was installed as part of the boot sector and held space for only four partition entries. These partitions are now called primary partitions.

Logical Partitions

One primary partition of a hard drive may be subpartitioned. These are logical partitions. This effectively allows us to skirt the historical four partition limitation.

The primary partition used to house the logical partitions is called an extended partition and it has its own file system type (0x05). Unlike primary partitions, logical partitions must be contiguous. Each logical partition contains a pointer to the next logical partition, which implies that the number of logical partitions is unlimited. However, linux imposes limits on the total number of any type of partition on a drive, so this effectively limits the number of logical partitions. This is at most 15 partitions total on an SCSI disk and 63 total on an IDE disk.

Swap Partitions

Every process running on your computer is allocated a number of blocks of RAM. These blocks are called pages. The set of in-memory pages which will be referenced by the processor in the very near future is called a "working set." Linux tries to predict these memory accesses (assuming that recently used pages will be used again in the near future) and keeps these pages in RAM if possible.

If you have too many processes running on a machine, the kernel will try to free up RAM by writing pages to disk. This is what swap space is for. It effectively increases the amount of memory you have available. However, disk I/O is about a hundred times slower than reading from and writing to RAM. Consider this emergency memory and not extra memory.

Everything in your linux file system can go in the same (single) partition. However, there are circumstances when you may want to restrict the growth of certain file systems. For example, if your mail spool was in the same partition as your root fs and it filled the remaining space in the partition, your computer would basically hang.

/var

This fs contains spool directories such as those for mail and printing. In addition, it contains the error log directory. If your machine is a server and develops a chronic error, those msgs can fill the partition. Server computers ought to have /var in a different partition than /.

/usr

This is where most executable binaries go. In addition, the kernel source tree goes here, and much documentation.

/tmp

Some programs write temporary data files here. Usually, they are quite small. However, if you run computationally intensive jobs, like science or engineering applications, hundreds of megabytes could be required for brief periods of time. In this case, keep /tmp in a different partition than /.

/home

This is where users home directories go. If you do not impose quotas on your users, this ought to be in its own partition.

/boot

This is where your kernel images go.

 

fdisk is started by typing (as root) fdisk device at the command prompt. device might be something like /dev/hda or /dev/sda . The basic fdisk commands you need are:

p print the partition table

n create a new partition

d delete a partition

q quit without saving changes

w write the new partition table and exit

Changes you make to the partition table do not take effect until you issue the write (w) command. Here is a sample partition table:

Disk /dev/hdb: 64 heads, 63 sectors, 621 cylinders
Units = cylinders of 4032 * 512 bytes
 
   Device Boot    Start       End    Blocks   Id  System
/dev/hdb1   *         1       184    370912+  83  Linux
/dev/hdb2           185       368    370944   83  Linux
/dev/hdb3           369       552    370944   83  Linux
/dev/hdb4           553       621    139104   82  Linux swap

The first line shows the geometry of your hard drive. It may not be physically accurate, but you can accept it as though it were. The hard drive in this example is made of 32 double-sided platters with one head on each side (probably not true). Each platter has 621 concentric tracks. A 3-dimensional track (the same track on all disks) is called a cylinder. Each track is divided into 63 sectors. Each sector contains 512 bytes of data. Therefore the block size in the partition table is 64 heads * 63 sectors * 512 bytes er...divided by 1024.  The start and end values are cylinders.

Four primary partitions

The overview:

Decide on the size of your swap space  and where it ought to go . Divide up the remaining space for the three other partitions.

Example:

I start fdisk from the shell prompt:

# fdisk /dev/hdb 

which indicates that I am using the second drive on my IDE controller.  When I print the (empty) partition table, I just get configuration information.

Command (m for help): p

Disk /dev/hdb: 64 heads, 63 sectors, 621 cylinders
Units = cylinders of 4032 * 512 bytes

I knew that I had a 1.2Gb drive, but now I really know: 64 * 63 * 512 * 621 = 1281982464 bytes. I decide to reserve 128Mb of that space for swap, leaving 1153982464. If I use one of my primary partitions for swap, that means I have three left for ext2 partitions. Divided equally, that makes for 384Mb per partition. Now I get to work.

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-621, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-621, default 621): +384M

Next, I set up the partition I want to use for swap:

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (197-621, default 197):
Using default value 197
Last cylinder or +size or +sizeM or +sizeK (197-621, default 621): +128M

Now the partition table looks like this:

   Device Boot    Start       End    Blocks   Id  System
/dev/hdb1             1       196    395104   83  Linux
/dev/hdb2           197       262    133056   83  Linux

I set up the remaining two partitions the same way I did the first. Finally, I make the first partition bootable:

Command (m for help): a
Partition number (1-4): 1

And I make the second partition of type swap:

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): 82
Changed system type of partition 2 to 82 (Linux swap)      
Command (m for help): p

The end result:

Disk /dev/hdb: 64 heads, 63 sectors, 621 cylinders
Units = cylinders of 4032 * 512 bytes
 
   Device Boot    Start       End    Blocks   Id  System
/dev/hdb1   *         1       196    395104+  83  Linux
/dev/hdb2           197       262    133056   82  Linux swap
/dev/hdb3           263       458    395136   83  Linux
/dev/hdb4           459       621    328608   83  Linux          

Finally, I issue the write command (w) to write the table on the disk.

Volume labels make it possible for partitions to retain a consistent name regardless of where they are connected, and regardless of whatever else is connected. Labels are not mandatory for a linux volume. Each can be a maximum of 16 characters long.

There are three tools to make volume labels: mke2fs, tune2fs and e2label.

Simple Invocation

e2label /dev/hdb1 pubsw

tune2fs -L pubsw /dev/hdb1

Either of thse two commands will label the first partition of the second drive "pubsw". That label stays with that particular partition, even if the drive is moved to another controller or even another computer.

mke2fs pubsw /dev/hdb1

mke2fs -L pubsw /dev/hdb1

will do the same thing as the first two commands - after they make the file system. This means that either of these last two commands will delete any existing data in the partition.

How to Use

Here is a sample fstab. This is a text file located in /etc, which is usually set up during the installation of the operating system. it describes where each partition wil be mounted, and how it will be mounted. It can be modified by you, either through a utility or manually, when you add/remove devices.

 

LABEL=/        /                    ext3    defaults        1 1
LABEL=/boot    /boot                ext2    defaults        1 2
none           /dev/pts             devpts  gid=5,mode=620  0 0
none           /dev/shm             tmpfs   defaults        0 0
LABEL=HOME     /home                ext3    defaults        1 2
none           /proc                proc    defaults        0 0
none           /sys                 sysfs   defaults        0 0
LABEL=/usr     /usr                 ext3    defaults        1 2
/dev/hdc1      /k-space             ext3    defaults        1 2
/dev/hda6      swap                 swap    defaults        0 0
/dev/hdd       /media/cdrecorder    auto    pamconsole,ro,exec,noauto,managed 0 0
/dev/fd0       /media/floppy        auto    pamconsole,exec,noauto,managed 0 0

The leftmost column lists devices and the second column lists mount points. This example contains a mixture of devices and labels. The master drive of the second controller is always mounted on /k-space. The partition labeled "HOME" is always mounted on /home, regardless of which drive it is on or which partition number it has. Notice that it is permissible to use mount points as labels, such as "/usr"

Formatting an ext2/3 partition

When a hard drive is partitioned, it is mapped into sections, but the sections are empty. It is like a newly constructed library; shelves, signs, and a card catalogue system must be put in place before the books are put away.

The organizational structure inside a partition is called a file system. With Linux, the standard file system is ext2 and ext3. The ext3 file system is ext2, plus a log of disk writes called a journal. The journal allows the system to recover quickly from accidental power outages, among other things.

The principal tool for making an ext2/3 file system in a partition is mke2fs. It is usually found in /sbin. mkfs.ext2 and mkfs.ext3 are frontends which pass specific options to mke2fs.

Simple Invocation

mke2fs /dev/hdb1

mkfs.ext2 /dev/hdb1

both of which make an ext2 file system on the first partition of the second drive, and

mke2fs -j /dev/hdb1

mkfs.ext3 /dev/hdb1

make an ext3 file system.

Reserved blocks

The -m option is probably the one of most use to non-experts. If the file system becomes filled and there is no more space to write, it is basically unusable because the operating system is constantly writing to disk. By default, five percent of the partition is reserved for use by the root user. This allows root to conduct administrative activities on the partition and perhaps move some data off. However, this is most critical when the partition contains / or home directories. For pure data partitions, this is just lost space. Five percent of a 250Gb partition is 12.5 Gb. Especially in the case of large partitions, it is safe to set the reserved space to the minimum, which is one percent.

mkfs.ext3 -m 1/dev/hdb1

creates a file system with only 1% of its space reserved for the root user. tune2fs -m can be used to adjust the reserved blocks after data is loaded on the partition.


The Linux Logical Volume Manager (LVM) is a mechanism for virtualizing disks. It can create "virtual" disk partitions out of one or more physical hard drives, allowing you to grow, shrink, or move those partitions from drive to drive as your needs change. It also allows you to create larger partitions than you could achieve with a single drive.

 

Traditional uses of LVM have included databases and company file servers, but even home users may want large partitions for music or video collections, or for storing online backups. LVM and RAID 1 can also be convenient ways to gain redundancy without sacrificing flexibility.

This article looks first at a basic file server, then explains some variations on that theme, including adding redundancy with RAID 1 and some things to consider when using LVM for desktop machines.

LVM Installation

An operational LVM system includes both a kernel filesystem component and userspace utilities. To turn on the kernel component, set up the kernel options as follows:

Device Drivers --> Multi-device support (RAID and LVM)

    [*] Multiple devices driver support (RAID and LVM)
    < >   RAID support
    <*>   Device mapper support
    < >     Crypt target support (NEW)

You can usually install the LVM user tools through your Linux distro's packaging system. In Gentoo, the LVM user tools are part of the lvm2package. Note that you may see tools for LVM-1 as well (perhaps named lvm-user). It doesn't hurt to have both installed, but make sure you have the LVM-2 tools.

LVM Basics

To use LVM, you must understand several elements. First are the regular physical hard drives attached to the computer. The disk space on these devices is chopped up into partitions. Finally, a filesystem is written directly to a partition. By comparison, in LVM, Volume Groups (VGs) are split up into logical volumes (LVs), where the filesystems ultimately reside (Figure 1).

Each VG is made up of a pool of Physical Volumes (PVs). You can extend (or reduce) the size of a Volume Group by adding or removing as many PVs as you wish, provided there are enough PVs remaining to store the contents of all the allocated LVs. As long as there is available space in the VG, you can also grow and shrink the size of your LVs at will (although most filesystems don't like to shrink).

The Linux Logical Volume Manager (LVM) is a mechanism for virtualizing disks. It can create "virtual" disk partitions out of one or more physical hard drives, allowing you to grow, shrink, or move those partitions from drive to drive as your needs change. It also allows you to create larger partitions than you could achieve with a single drive.

 

Traditional uses of LVM have included databases and company file servers, but even home users may want large partitions for music or video collections, or for storing online backups. LVM and RAID 1 can also be convenient ways to gain redundancy without sacrificing flexibility.

This article looks first at a basic file server, then explains some variations on that theme, including adding redundancy with RAID 1 and some things to consider when using LVM for desktop machines.

LVM Installation

An operational LVM system includes both a kernel filesystem component and userspace utilities. To turn on the kernel component, set up the kernel options as follows:

Device Drivers --> Multi-device support (RAID and LVM)

    [*] Multiple devices driver support (RAID and LVM)
    < >   RAID support
    <*>   Device mapper support
    < >     Crypt target support (NEW)

You can usually install the LVM user tools through your Linux distro's packaging system. In Gentoo, the LVM user tools are part of the lvm2package. Note that you may see tools for LVM-1 as well (perhaps named lvm-user). It doesn't hurt to have both installed, but make sure you have the LVM-2 tools.

LVM Basics

To use LVM, you must understand several elements. First are the regular physical hard drives attached to the computer. The disk space on these devices is chopped up into partitions. Finally, a filesystem is written directly to a partition. By comparison, in LVM, Volume Groups (VGs) are split up into logical volumes (LVs), where the filesystems ultimately reside (Figure 1).

Each VG is made up of a pool of Physical Volumes (PVs). You can extend (or reduce) the size of a Volume Group by adding or removing as many PVs as you wish, provided there are enough PVs remaining to store the contents of all the allocated LVs. As long as there is available space in the VG, you can also grow and shrink the size of your LVs at will (although most filesystems don't like to shrink).

A Basic File Server

A simple, practical example of LVM use is a traditional file server, which provides centralized backup, storage space for media files, and shared file space for several family members' computers. Flexibility is a key requirement; who knows what storage challenges next year's technology will bring?

For example, suppose your requirements are:

400G  - Large media file storage
 50G  - Online backups of two laptops and three desktops (10G each)
 10G  - Shared files

Ultimately, these requirements may increase a great deal over the next year or two, but exactly how much and which partition will grow the most are still unknown.

Disk Hardware

Traditionally, a file server uses SCSI disks, but today SATA disks offer an attractive combination of speed and low cost. At the time of this writing, 250 GB SATA drives are commonly available for around $100; for a terabyte, the cost is around $400.

SATA drives are not named like ATA drives (hda, hdb), but like SCSI (sda, sdb). Once the system has booted with SATA support, it has four physical devices to work with:

/dev/sda  251.0 GB
/dev/sdb  251.0 GB
/dev/sdc  251.0 GB
/dev/sdd  251.0 GB

Next, partition these for use with LVM. You can do this with fdiskby specifying the "Linux LVM" partition type 8e. The finished product looks like this:

# fdisk -l /dev/sdd

Disk /dev/sdd: 251.0 GB, 251000193024 bytes
255 heads, 63 sectors/track, 30515 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device            Start   End      Blocks      Id  System
/dev/sdd1         1       30515    245111706   8e  Linux LVM

Notice the partition type is 8e, or "Linux LVM."

Creating a Virtual Volume

Initialize each of the disks using the pvcreatecommand:

# pvcreate /dev/sda /dev/sdb /dev/sdc /dev/sdd

This sets up all the partitions on these drives for use under LVM, allowing creation of volume groups. To examine available PVs, use the pvdisplaycommand. This system will use a single-volume group named datavg:

# vgcreate datavg /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

Use vgdisplayto see the newly created datavgVG with the four drives stitched together. Now create the logical volumes within them:

# lvcreate --name medialv  --size 400G
# lvcreate --name backuplv --size  50G
# lvcreate --name sharelv  --size  10G

Without LVM, you might allocate all available disk space to the partitions you're creating, but with LVM, it is worthwhile to be conservative, allocating only half the available space to the current requirements. As a general rule, it's easier to grow a filesystem than to shrink it, so it's a good strategy to allocate exactly what you need today, and leave the remaining space unallocated until your needs become clearer. This method also gives you the option of creating new volumes when new needs arise (such as a separate encrypted file share for sensitive data). To examine these volumes, use the lvdisplaycommand.

Now you have several nicely named logical volumes at your disposal:

/dev/datavg/backuplv     (also /dev/mapper/datavg-backuplv)
/dev/datavg/medialv      (also /dev/mapper/datavg-medialv)
/dev/datavg/sharelv      (also /dev/mapper/datavg-sharelv)

 

 For Support