Basic concepts
LVM and JFS are not exactly
new in the OS/2 world, they were first introduced in OS/2 Warp Server for
e-business, aka WSeB, about two years ago. But only few lucky OS/2 users
(including me) use WSeB on their home/office machines. Thus the upcoming
Serenity Systems' eComStation (eCS) will be the first exposure to LVM/JFS
for many OS/2 users. Many of the prospective eCS owners have understandable
concerns about these new concepts, sometimes they perhaps even fear them.
In this article I will try to explain the basic concepts introduced by
LVM and JFS and some of the logic behind them.
First off, I'll finally explain
what those acronyms are: LVM stands for Logical Volume Manager and JFS
is a Journaled File System. Not much clearer, is it? It will be later -
I hope.
LVM and JFS didn't originate
on OS/2. They were created for AIX, IBM's high-end Unix clone running on
IBM's RS/6000 hardware. For the users that means that all the really nasty
bugs were ironed out long ago.
The role of LVM is to present
a simple logical view of underlying physical storage space, ie. harddrive(s).
LVM manages individual physical disks - or to be more precise, the individual
partitions
present on them (for a short glossary of terms, look at the end
of the article). LVM hides the numbers, size and location of physical
partitions from users. Instead it presents the concept of logical volume.
A logical volume may correspond to a physical partition (but that obviously
almost defeats the purpose of LVM) but it doesn't have to. One volume may
be composed of several partitions located on multiple physical disks. Not
only that, the volumes can even be extended (not shrunk - people usually
want more space, not less). They can even be extended while the OS is running
and the filesystem is being accessed! Of course, most home and SOHO users
don't have the hardware required for this.
The more experienced readers
are now probably wondering how 'traditional' file systems like FAT or HPFS
could be extended at runtime. The answer is, they can't. To take full advantage
of LVM, it is necessary to use a filesystem designed for it. This file
system is of course JFS. JFS is not really tied to LVM, both LVM and JFS
can exist separately, but only when working in concert both can reach their
full potential.
JFS volume structure
JFS is organized like a traditional
Unix-ish file system, it presents a logical view of files and directories
linked together to form a tree-like structure. This is the concept that
spread from the Unix world pretty much everywhere else and that we all
know. I can only speculate about IBM's motives for incorporating JFS into
WSeB, but it has some obvious advantages when compared to HPFS and HPFS386
(some shortcomings too). I see two significant advantages:
-
capacity - JFS allows much larger
file and volume sizes than HPFS. Basically JFS is a 64-bit file system
while HPFS structures are at most 32 bits large.
-
recovery - thanks to the journaling
techniques employed by JFS (described in more detail later), CHKDSK times
for JFS are significantly faster than for equivalent HPFS volumes.
Roughly speaking, where HPFS checkdisk after a crash takes minutes, JFS
takes seconds.
JFS is created on top of a logical
volume. To maintain information about files and directories, it uses the
following important internal structures:
-
the superblock>
-
the i-nodes
-
the data blocks
-
the allocation groups
The superblock lies at
the heart of JFS (and many other file systems). It contains essential information
such as size of file system, number of blocks it contains or state of the
file system (clean, dirty etc.).
The entire file system space
is divided into logical blocks that contain file or directory data.
For JFS, the logical blocks are always 4096 bytes (4K) in size, but can
be optionally subdivided into smaller fragments (512, 1024 or 2048 bytes).
An i-node is a logical
entity that contains information about a file or directory. There is a
1:1 relationship between i-nodes and files/directories. An i-node contains
file type, access permissions, user/group ID (UID/GID - unused on OS/2),
access times and points to actual logical blocks where file contents are
stored. The maximum file size allowed in JFS is 2TB (HPFS and FAT allow
2GB max). It should be noted that the number of i-nodes is fixed. It is
determined at file system creation (FORMAT) time and depends on fragment
size (which is user selectable). In theory users could run out of i-nodes,
meaning that they would be unable to create more files even if there was
enough free space. In practice this is extremely rare.
Fragments were already
briefly mentioned in the discussion of logical blocks. The JFS logical
block size is fixed at 4K. This is a reasonable default but it means that
the file system cannot allocate less than 4K for file storage. If a file
system stores large amounts of small files (< 2K), the disk space waste
becomes significant. We've all got to know and hate this problem from FAT
(cluster size of 32K leads to massive waste of space, in some cases over
50%). JFS attacks this by allowing fragmentation of logical blocks into
smaller units, as small as 512 bytes (this is sector size on harddrives
and it is not possible to read or write less than 512 bytes from/to disk).
However users should be careful because fragmentation incurs additional
overhead and hence slows down disk access. I would recommend using fragments
smaller than 4K only when the users know for sure that they will store
very large amounts of small files on the file system.
The entire JFS volume space
is subdivided into allocation groups. Each allocation group contains
i-nodes and data blocks. This enables the file system to store i-nodes
and their associated data in physical proximity (HPFS uses a very similar
technique). The allocation group size varies from 8MB to 64MB and depends
on fragment size and number of fragments it contains.
Journaling
As the name of JFS implies,
journaling is a very important feature of this file system. It should be
noted that journaling is actually independent of JFS's structure described
above. The journaling technique has its roots in database systems and it
is employed to ensure maximum consistency of the file system, hence minimizing
the risk of data loss - a very important feature for servers, but even
home/SOHO users hate to lose data.
JFS uses a special log device
to implement circular journal. On AIX, several JFS volumes can share single
log device. I'm not sure this is possible on OS/2, I believe each JFS volume
(corresponding to a drive letter) has its own 'inline' log located inside
the JFS volume - its size is selectable at FORMAT time.
It is important to note that
JFS does not log (or journal) everything. It only logs all changes
to file system meta-data. Simply speaking, the log contains a record
of changes to everything in the file system except actual file data, ie.
changes to the superblock, i-nodes, directories and allocation structures.
It is clear that there must be some overhead here and indeed, performance
may suffer when applications are doing lots of synchronous (uncached) I/O
or creating and/or deleting many files in short amount of time. The performance
loss is however not noticeable in most cases and is well worth the increased
security.
The log (or journal) occupies
a dedicated area on disk and is written to immediately when any meta-data
change occurs. When the disk becomes idle, the actual file system structure
is updated according to the log. After a crash, all it usually takes to
restore the file system to full consistency is replaying the log,
ie. performing the recorded transactions. Of course, if a process was in
the middle of writing a file when the system crashed or power died, the
file could be inconsistent (the app might not be able to read it again),
but you will not lose this file nor other files, as is often the case with
other file systems.
OS/2 considerations
The above was mostly a generic
description of LVM and JFS and applies to both AIX and OS/2 and perhaps
even to Linux (at least the JFS part). Now I will discuss how exactly LVM/JFS
differ from the solutions previously available on OS/2.
LVM
From users' point of view LVM
replaces FDISK. On WSeB, FDISK is no longer available. In fact, if you
try to run fdisk, you get the following message:
FDISK.COM has been replaced by LVM.EXE and FDISKPM.EXE has been
replaced by LVMGUI.CMD. Please use one of these utilities.
It should be noted here that
LVMGUI is a GUI app (as the name implies) and requires Java, while LVM
is a VIO app and can be run from a command line boot. It looks and feels
similar to FDISK, but it presents two views: logical and physical.
FDISK didn't differentiate between the two. These views corresponds to
the concepts described at the beginning of this article. Basically the
physical view shows physical disks and lets users manage partitions while
logical view presents volumes. One important concept must be introduced
here, and that is a compatibility volume. A compatibility
volume corresponds to old FDISK partitions. During WSeB installation, the
installer automatically converts all existing partitions to compatibility
volumes. This conversion technically means that the installer writes a
special block of LVM data to the sector following the partition table.
OSes other than WSeB won't see any difference at all. It is however necessary
to manage all partitions/volumes exclusively with LVM after this
conversion.
I've captured several screenshots
of LVM and LVMGUI to give users unacquainted with LVM some idea of what
they can expect. First, there's the logical view of LVM: See screenshot
Now there's the physical
view of the same system. See screenshot
And finally a glance at LVMGUI.
It looks pretty cool but takes ages to start. Personally I prefer the VIO
version. Disk 3 is a ZIP-100 by the way and G: is a FAT32 partition. See screenshot
All FAT, HPFS, FAT32 etc.
partitions can reside on either compatibility or LVM volumes, however other
OSes will only be able to access them on compatibility volumes. JFS
on the other hand must be created on LVM volumes. Those were already
described above and enjoy all the flexibility of LVM, such as spanning
multiple physical disks or online expansion.
Each volume, compatibility
or LVM, represents a single drive letter on an OS/2 system. LVM however
is significantly more flexible than FDISK because the drive letters are
not assigned by a fixed algorithm. Instead, users can assign arbitrary
drive letters to volumes. The drive letters can even be changed at runtime,
but users have to understand the dangers before doing that. If you reassign
the drive letter of the boot volume, it doesn't require a genius to understand
that a system crash will be the most likely result.
JFS
OS/2 users often ask what exactly
the difference is between the various file systems available on OS/2. The
following table, taken almost verbatim from WSeB's Quick Beginnings book,
summarizes the most important differences between the file systems available
for WSeB from IBM.
Characteristic |
Journaled File System (JFS) |
386 High Performance File
System (386HPFS) |
High Performance File System
(HPFS) |
FAT File System |
Max volume size |
2TB (terabytes) |
64GB (gigabytes) |
64GB (gigabytes) |
2GB (gigabytes) |
Max file size |
2TB (terabytes) |
2GB (gigabytes) |
2GB (gigabytes) |
2GB (gigabytes) |
Allows spaces and periods
in file names |
Yes |
Yes |
Yes |
No (8.3 format) |
Standard directory and file
attributes |
Within file system |
Within file system |
Within file system |
Within file system |
Extended Attributes (64KB
text or binary data with keywords) |
Within file system |
Within file system |
Within file system |
In separate file |
Max path length |
260 characters 1) |
260 characters |
260 characters |
64 characters |
Bootable |
No 2) |
Yes |
Yes |
Yes |
Allows dynamic volume expansion |
Yes |
No |
No |
No |
Scales with SMP |
Yes |
No |
No |
No |
Local security support |
No |
Yes |
No |
No |
Average wasted space per
file |
256 to 2048 bytes |
256 bytes |
256 bytes |
1/2 cluster (1KB to 16KB) |
Allocation information for
files |
Near each file in its i-node |
Near each file in its FNODE |
Near each file in its FNODE |
Centralized near volume
beginning |
Directory structure |
Sorted B+tree |
Sorted B-tree |
Sorted B-tree, must be searched
exhaustively |
Unsorted linear |
Directory location |
Close to files it contains |
Near seek center of volume |
Near seek center of volume |
Root directory at beginning
of volume; others scattered |
Write-behind (lazy write) |
Optional |
Optional |
Optional |
Optional |
Maximum cache size |
Physical memory available |
Physical memory available |
2MB |
14MB |
Caching program |
None (parameters set in
CONFIG.SYS) |
CACHE386.EXE |
CACHE.EXE |
None (parameters set in
CONFIG.SYS) |
LAN Server access control
lists |
Within file system |
Within file system |
In separate file (NET.ACC) |
In separate file |
1) JFS stores file and directory
names in Unicode. This allows JFS to always maintain proper sort order,
regardless of active codepage.
2) This is not a permanent
limitation. Only no one wrote a JFS micro- and mini-IFS yet.
It might perhaps interest
some users that JFS also seems to have built-in support for DASD
limits. I have however never tried to use this feature. DASD limits, aka
Directory Limits feature of LAN Server allows administrators to control
how much space a directory can take, effectively enabling them to limit
disk space usage of users. Previously this feature only worked on HPFS386
volumes. Obviously this is of no use to home users who have all their disk
space for themselves but it can be very useful for system administrators.
JFS Utilities
WSeB comes with several new
JFS-specific utilities, in addition to the usual ones like CHKDSK and FORMAT.
I'll only give a quick overview of them here, the important ones are documented
in the Command Reference.
-
DEFRAGFS - can be used to defragment
and reorganize a JFS volume. It is similar in spirit to equivalent FAT
or HPFS utilities. It should be noted that just like HPFS, JFS tries not
to fragment files. However especially on nearly full volumes, this is not
always possible. In addition to defragmenting files, DEFRAGFS will try
to rearrange internal JFS structures by placing certain pieces of data
physically close to each other to speed up disk access. DEFRAGFS is designed
to be run in the background.
-
EXTENDFS - after enlarging a
LVM volume, this utility must be used to tell the JFS file system that
it should take up all the extra space now available.
-
CACHEJFS - not documented in
Command Reference, this utility can be used to query the settings of the
JFS cache and set its lazy writer parameters.
-
CHKLGJFS - again undocumented.
This is a diagnostic tool and will show a formatted log of the last (or
one before last) checkdisk process. Not very useful to normal users.
In addition to the above utilities
that are supplied with WSeB, I also managed to build several extra utilities
from the OpenJFS sources thanks to invaluable help from several friends.
Those are not available publicly in binary form to my knowledge, though
I could probably e-mail them to interested readers - but beware, these
are for experts only and not guaranteed to work!
-
LOGDUMP - as the name suggests,
this tool dumps formatted contents of the current JFS log (journal) to
a file.
-
CSTATS - lists current statistics
of the JFS cache.
-
XPEEK - perhaps the most useful
of the bunch, this one is the closest thing to a JFS disk editor I've seen.
This utility lets users dump and optionally modify various internal JFS
structures. It has a very crude interface but it worked for me. Needless
to say, this utility is extremely dangerous and you can easily destroy
your data if you don't know exactly what you're doing.
Conclusion
I have deliberately skipped
some of the more advanced and less widely used LVM/JFS concepts. Interested
readers will find more in the books and files I listed in the reference
section. I hope I managed to present the features and benefits of LVM
and JFS in a clear and concise manner. I believe these two pieces of software
brought/will bring new levels of flexibility, manageability and reliability
to WSeB and shortly all eCS users. Don't be afraid of them!
Parting note: Everything
said here about WSeB will equally apply to eCS.
Glossary of Terms:
-
Partition
- a portion of physical hard disk space. A hard disk may contain one or
more partitions. Partitions are defined by PC BIOS and described by partition
tables stored on a harddrive. Every PC OS understands partitions.
-
Volume
- a logical concept which hides the physical organization of storage space.
A compatibility volume directly corresponds to a partition while LVM volume
may span more than one partition on one or more physical disks. A volume
is seen by users as a single drive letter. Only WSeB and eCS understand
LVM volumes.
-
DASD
- Direct Access Storage Device. A term often used by IBM instead of 'hard
disk' to confuse mere mortals.
Unless otherwise noted, all content on this site is Copyright © 2004, VOICE
|