Linux powers low-cost petabyte-level storage
Jun 22, 2005 — by Henry Kingman — from the LinuxDevices Archive — viewsCapricorn Technologies says it has completed delivery of more than a petabyte of storage to the Internet Archive, a non-profit organization based in San Francisco that creates periodic snapshots of the Internet. Capricorn's PetaBox products are based on Via mini-ITX boards running Debian or Fedora Linux, and deliver the lowest cost-per-GB and cost-of-ownership available, the company claims.
(Click for larger view of Capricorn PetaBox racks)
Capricorn started as a project within the Internet Archive (IA) to develop inexpensive storage devices based on Linux and commodity PC components. The project was spun out in June of 2004, resulting in the formation of Capricorn Technologies. The company has since supplied its PetaBox products to a number of universities, research centers, libraries, and national archives, both within the US and overseas, according to CEO C.R. Saikley. The IA remains Capricorn's largest customer, however, Saikley says.
The IA's PetaBox installation
The IA is an online digital library with very large collections of audio, video, texts, web sites, and software. For example, it claims to host footage of more than 20,000 live concerts, and snapshots of the Internet dating back to 1996, accessible through the well-known Wayback Machine, which currently hosts over 40 billion web pages.
The IA's PetaBox installation comprises about 16 racks housing 600 systems with 2,500 spinning drives, for a total capacity of roughly 1.5 petabytes. Despite its large size, the IA's PetaBox installation draws only about 50kW of power, Saikley says, and is maintained by one full- and one half-time person who spend a disproportionate amount of time working on older systems. “We've improved reliability considerably,” Saikley claims.
The IA systems boot Debian or Fedora Linux from a central PXE boot server, and are remotely monitored using nagios. “The beauty of nagios is that it is so readily extensible,” says Saikley. “If the register exists on the board, nagios can figure out how to read it. We typically provide hard disk temperatures, cpu temperatures, ping response, capacity utilization, that sort of thing.”
The PetaBox can also be managed by Linux cluster management software, according to Saikley.
The PetaBox
Capricorn claims that its PetaBox storage devices provide the lowest ownership cost and cost-per-GB available. The company offers 40- and 64-terabyte models comprised of racks with 40 1U systems. The 1U systems are available in 1- and 1.6-terabyte models that are essentially the same but for hard-drive capacity. Both systems run Debian or Fedora Linux on Via mini-ITX motherboards.
The PetaBox is based on Via mini-ITX motherboards
Each 1U system includes a Via M-10000 mini-ITX board with a 1GHz Via C3 processor and 512MB of RAM, expandable to 1GB. Each includes four Hitachi ATA hard drives with 8MB caches and a claimed 8.5ms of typical latency.
Saikley says Capricorn did extensive testing to qualify hard drives for capacity, reliability, and cost, finally choosing Hitachi. “Although Hitachi does not offer an 'enterprise' or '24×7' SATA drive, our testing found their drives to be as reliable as anything out there, enterprise distinction or not,” Saikley said.
The 1U PetaBox units (shown stacked in a rack, on the right) include all I/O on the front panel, reducing the need to access the back panel while maximizing its cooling capacity. Drives are housed in EZ-Latch bays that can be easily changed after the 1U unit is removed from the rack and had its cover removed. “We experimented with hot-swap, but found it caused as many problems as it solved. It actually induced failures, so we backed away. But you still have to make it easy to replace disks,” Saikley said.
Similarly, Saikley says Capricorn tried then backed away from RAID (redundant arrays of inexpensive disks), instead opting to recommend JBOD (just a bunch of disks) configurations to most of its clients. “We had a painful experience with RAID 5, which does not scale well to petabyte-level storage,” Saikley notes.
PetaBox options include a 16 x 2 LCD display and gigabit Ethernet (10/100 is standard). The PetaBox is configured by default to boot from a USB key, then from a PXE boot server, and finally from the local hard drive. However, boot order can easily be changed in the BIOS.
Each 1.6-terabyte 1U system draws 80 Watts of power (typical), or about 50 Watts per terabyte, according to Capricorn. Each measures 17.25 x 18 x 1.72 inches (43.8 x 45.7 x 4.4 cm), and weighs 18 lbs, 12 oz (8.5 kg).
According to Saikley, Capricorn is currently positioning itself for increased production levels, following recent improvements to its manufacturing process. “We have been constantly improving the efficiency and effectiveness of our manufacturing processes. By positioning ourselves for increased production levels, we are better able to pursue our relentless commitment to driving the cost of storage down.”
Availability
The PetaBox is available now, priced at approximately $2/GB, in 40- and 64-terabyte capacities. Further details are on the company's website.
This article was originally published on LinuxDevices.com and has been donated to the open source community by QuinStreet Inc. Please visit LinuxToday.com for up-to-date news and articles about Linux and open source.