Article: About the built-in Flash memory support in kernel 2.4.x
Feb 13, 2001 — by Rick Lehrbaum — from the LinuxDevices Archive — viewsAlthough I had heard that Linux kernel 2.4.0 now contains Memory Technology Device (MTD) support directly within the kernel, I wasn't clear about how extensive the built-in support is. I therefore contacted MTD project leader David Woodhouse and asked him to clarify the “bigger picture” of 2.4.0's built-in support for devices such as the M-Systems DiskOnChip.
Before presenting Woodhouse's reply, a brief introduction to the MTD is in order . . .
What's a Memory Technology Device?
According to the MTD project homepage . . .
- The Memory Technology Device (MTD) subsystem for Linux provides generic support Linux for various types of memory devices, especially Flash devices such as the M-Systems DiskOnChip. The aim of the system is to make it simple to provide a driver for new hardware, by providing a generic interface between the hardware drivers and the upper layers of the system. Hardware drivers need to know nothing about the storage formats used, such as FTL (Flash Translation Layer), FFS2, etc., but will only need to provide simple routines for read, write, erase, and query. Presentation of the device's contents to the user in an appropriate form is handled by the upper layers of the system.
Here, then, is Woodhouse's clarification regarding the extent to which MTDs are supported by 2.4.x . . .
The MTD Big Picture
RL: I'd like to get the “bigger picture” of 2.4.0's MTD support. I was aware that there is “some” sort of MTD support in the new kernel, but I don't know how extensive it is.
Woodhouse: DiskOnChip support is now essentially fully implemented. That includes read/write, wear leveling (primitive but serviceable), and all. We support the DiskOnChip 2000 and Millennium, as well as the new DIMM because it's the same electrically as the 2000.
Note, however, that the DiskOnChip support contained in 2.4.0 hasn't been widely tested yet, so it should still be considered “beta” and used with caution — especially for read/write purposes. Read-only, on the other hand, is quite useable for loading the system, after which the DiskOnChip should probably be accessed via M-Systems' loadable module at this time for serious requirements. It's working fine for quite a few people, but there are still some remaining bugs.
For the really large DiskOnChip devices, the allocation table deals with erase blocks in groups rather than individually, to avoid the actual metadata having to be so huge. Much like the progression from FAT12/FAT16/ FAT32 as the media get bigger. We don't yet support that, but it's trivial to do so. It's just that while I didn't have any such devices to test, I was happier for the code to just say no. M-Systems sent me a batch of big DiskOnChips yesterday, so I have no excuse anymore. I expect it'll happen soon.
We also support Common Flash Interface chips (Intel Strataflash, various AMD chips support this) mapped directly to the CPU, in various combinations of bus width and interleave. There's a separation between the chip driver, which knows only about the chip's response to certain stuff being placed on its data/address lines, and the 'map' driver, which knows the hoops that the CPU has to jump through to actually address the chip.
Often, the 'map' driver just has to add a certain offset to the chip address at which that the chip driver has requested, and can access the chip directly from the CPU's physical address space. Sometimes, though, there are strange paging mechanisms &c. So it's all kept separate.
On top of the normal CFI flash devices, we support FTL, which has been in use for a long time on PCMCIA devices. Unless you live in the Free World, you're only allowed to use it on PCMCIA devices, though, so it's not that interesting. It emulates a block device and does wear leveling, just like NFTL (NAND Flash Translation Layer) does on the DiskOnChip.
We also have a direct block device. This doesn't do any wear leveling and is unsafe to use in write mode — each time you write to a sector, it reads the whole erase block, erases it, and writes back the modified data. (Although it's a little sensible about grouping writes to the same sector). If you want to ship a filesystem read-only, it's useful for development, though.
But the really interesting thing is JFFS — the Journaling Flash File System from Axis. It does away with the silly block-device emulation and puts a real journaling filesystem directly on the flash chips. It's far more efficient than using a kind of journaling pseudo-filesystem to emulate a block device, then using a 'real' journaling filesystem on top of that emulated block device.
The JFFS in 2.4.0 works on NOR chips. In my CVS, there are drivers for some NAND chips and modifications to JFFS to make it work on those too. The intention is for JFFS2 to work on DiskOnChip directly.
Execute-in-place support for read-only filesystems is planned — I know how to do it, and Linus reckons it ought to work. It's a 2.5 thing. At this time, the only thing I want to do for people who want read-write XIP is to take away their crackpipe :)
Did I miss anything? Oh yeah — the 2.4 CFI code handles only devices with uniform eraseblock size — not the devices with 'boot blocks'. Current CVS fixes that for AMD chips, and it's trivial to do the same for Intel chips, now it's been done once. Actually, we're planning to merge a lot of the code which is common to the two types.
Wear leveling support in MTD
RL: What about wear leveling — do you currently have much wear leveling built into the MTD? M-Systems makes a big point about the sophistication of their wear leveling — for example, that you couldn't “use up” the device within a system's expected real lifetime, even if you did nothing but try to wear out the DiskOnChip. How does the wear leveling in MTD compare?
Woodhouse: There are two layers here — there's MTD devices, which just provide a physical interface to read/write/erase the chips. Obviously they just do what they're told — there's no wear leveling internally.
On top of those are MTD 'users', which present the devices to userspace in some form. The simplest is the mtdchar device, which just presents the underlying device directly to userspace as a character device. You read and write it, and use an ioctl to erase it. Then there are FTL and NFTL, which use a pseudo-filesystem on the actual MTD device to emulate a block device. NFTL is the one used on the DiskOnChip. It does some basic wear leveling — you don't need to be too clever; just random, and it'll work out OK statistically. It could be a little better than it is at the moment, though — we're concentrating on getting the thing to work without corrupting the NFTL format first, before we worry about details like optimizing the wear-leveling.
There's also a simple 'mtdblock' device which does no wear-leveling. When you write a sector to the block device, it reads the whole erase block, changes the bits you asked it to, erases it and writes it back. Obviously it's not particularly safe to write to, but if you want to use it read/write in development, then ship it in read-only mode, it's OK.
The most useful MTD user code, though, is JFFS — and soon to be JFFSv2. These put a filesystem directly on top of the underlying MTD device. Although it currently pretends to use the mtdblock device, that's just so that it can get a handle on the underlying MTD device easily. Soon, we'll fix the mount code so it doesn't use the mtdblock device, and then we can actually remove all the crufty code in the kernel which is used to support block devices.
This article was originally published on LinuxDevices.com and has been donated to the open source community by QuinStreet Inc. Please visit LinuxToday.com for up-to-date news and articles about Linux and open source.