Article: Two approaches to shared library support for uClinux/uClibc
Jul 15, 2002 — by LinuxDevices Staff — from the LinuxDevices Archive — 26 viewsForeword: Two companies (SnapGear and RidgeRun) recently announced dynamic linking support for uClinux using two different approaches. In this whitepaper, SnapGear CEO Rick Stevenson discusses various practical issues from both technical and legal perspectives, and compares and contrasts the two companies' differing approaches. (Note: material in this paper is based on email discussions between Rick Stevenson of SnapGear and Dan George of RidgeRun, moderated by Rick Lehrbaum of LinuxDevices.com.)
by Rick Stevenson
What is uClinux?
uClinux is a set of patches to conventional Linux for MMUless microprocessors (e.g. ARM7, Motorola ColdFire). MMUless processors are quite common in deeply embedded systems because of lower price-point per unit, a major factor of embedded product development where component price is crucial. Prior to uClinux a variety of proprietary or homegrown executives were used but all lacked the advantages of the Linux API which provides a uniform application interface in a powerful and consistent POSIX manner.
What is the big deal about MMUless microprocessors and how does uClinux help?
Because processors without an MMU (Memory Management Unit) lack hardware memory management and protection, conventional Linux cannot operate. uClinux allows a wealth of open source software to be immediately ported to these microprocessors and provides a high level of application stability.
Unfortunately, one of the key features lacking in uClinux was dynamic linking — all applications had to be statically linked with their libraries resulting in large firmware images and wasted space. The addition of dynamic linking solves a big hurdle in the race to keep code size down and increase functionality in deeply embedded devices.
What are the advantages of shared libraries and why is it controversial?
Program and data storage in an embedded device are often limited to a highly constrained combination of Flash ROM and conventional RAM — space is at a constant premium. The more memory that can be saved, the more room there is for additional application functionality and capacity. Shared libraries mean only one copy of the library code need be present rather than an individual copy for each executable.
There are other advantages, such as version control, in that a single update of a shared library will update all executables simultaneously. There is often a 'double dipping' advantage to the space savings — that is, implementations that unpack a compressed firmware Flash image into RAM for execution save both ways.
SnapGear engineers have previously contributed XIP (eXecute In Place) technology (see SnapGear Technical Bulletin #8), which went some way to reducing memory usage by allowing the text segment to be shared by simultaneous executing images, along with advanced memory management (see SnapGear Technical Bulletin #2 and Technical Bulletin #11).
GPL 'Tainting'
The GNU Open Source license agreement, the General Public License (GPL), is a double-edged gift. All open source code released under the GPL is free to use for both noncommercial and commercial purposes, as long as the license is preserved and source to the application is provided to those who receive the application. All extensions to the code and additions that are linked in are also subject to the GPL. This latter condition is often referred to as “GPL Tainting”; that is, by using GPL'ed code your own source also becomes GPL code.
There has been considerable discussion around the issue of whether code that is linked to GPL software dynamically, rather than statically, needs to be GPL, because in the former case the linking only takes place at run time rather than at compile time. Although there is a special exception in the case of the Linux kernel (made by Linus himself), in general dynamic linking doesn't get around the linking issue — and the GNU lawyers take a dim view of anyone attempting to navigate around the spirit and letter of the GPL license agreement. In this article Jerry Epplin says: “If, at execution time, your work is linked with a GPL work, it is a derived work. Note that it does not matter whether the linking is static or dynamic, so making use of a GPL-licensed shared library creates a work derived from the library.”
On the other hand, linking to code released under the GNU Lesser General Public License (LGPL) is fine, regardless of whether dynamic or static linking is used, as the LGPL is a less restrictive license agreement. Note, however, that it only takes one introduced fragment of GPL code to render everything else GPL even if it started as LGPL.
Two Philosophies
The RidgeRun announcement indicates that their work was focused on dynamic linking of libraries for the sake of being able to combine proprietary code/drivers with LGPL/GPL software in an embedded system. RidgeRun also wanted to be able to satisfy customers migrating from a conventional Linux background. RidgeRun was driven to implement shared libraries by a customer who could not use Linux if it meant releasing, or even offering to release, various parts of their source code. Addressing this issue as the primary goal, they also believed that a strong draw to Linux would be its support for various applications. Linux is a general purpose OS, and their customers liked the idea that they wouldn't have to be hard core embedded programmers to use it. Thus, they decided that they would try solve the problem in a standard 'non-embedded' Linux way.
The SnapGear announcement, on the other hand, indicated that the focus was on using shared libraries as a means to eliminate redundant code and thereby reduce the software memory footprint requirements (both ROM and RAM) required for typical embedded applications. This would appeal to typical embedded system developers.
History
Both companies saw the need for shared library support around the same time, but were not aware of each other until late in their projects.
RidgeRun attempted to solicit community support in October of 2001, but to no avail. After much 'Googling', talking with such 'pundits' as Erik Anderson, Phil Blundell, Ralph S., and others, they struck out on their own. No one seemed to be working on it at that time. In hindsight, they noted that they should have expanded the search outside the ARM community and touched base with SnapGear. They hadn't really considered shared libraries as a uClinux problem but rather as a uClibc/binutils problem.
Similarly, at SnapGear the development team was very focused on the Motorola ColdFire community. A SnapGear engineer, David McCullough, became aware that RidgeRun was also working on shared libraries and touched base with them to see if there were any cooperative opportunities. As both teams had nearly finished, and had taken quite different approaches, it was agreed that there was little benefit in trying to reconcile the two projects. By the time RidgeRun heard from David, we were done with everything but the announcement which they made shortly after. Both companies agreed that it was too bad we hadn't hooked up sooner, as we'd have happily worked together.
RidgeRun's Approach
The RidgeRun approach uses ELF format files and something very close to the vanilla Linux approach. This provides a degree of comfort for non-embedded developers, but does incur additional space requirements (ELF headers, symbol table information).
RidgeRun chose ELF so that they could leverage the existing ld.so already supplied by uClibc. The size difference was within their budget, and had they run into a size problem they'd have put more effort into minimizing ELF overhead. It is actually pretty small already.
SnapGear's Approach
The SnapGear approach uses the uClinux flat file format. It provides a very space-efficient solution, but does introduce some limitations (which SnapGear doesn't believe are serious in an embedded environment).
SnapGear didn't have to write a new ld.so to support flat files. The applications are statically linked against the shared libraries, so there is no need for ld.so at all.
Comparisons
The RidgeRun approach . . .
- based on ELF format and in fact actually is standard ELF format for the shared libs. Programs and statically linked libs are still flat (see below).
- currently for ARM7, believed portable to other processors
- less space-efficient in some situations, although they do strip out unused symbol table entries, etc. and this could easily turn into a long debate with lots of byte counting. (The best way to resolve this question would be to pick one or more example apps and look at the approximate percentage overhead to get a rough feel for relative space efficiency.)
- file format
- thunks
- no GOT required; that is, a GOT is not required. We were carrying around a GOT already (to avoid data segment size limits on the ColdFire architecture) so the additional space is insignificant for us. It may not be on ARM7.
- single pic base eliminates PIC load on every global function
- No library indices or corresponding tables
- requires source code changes for callouts. More specifically, libraries that invoke callbacks must take care to save client PIC info when the function pointer is passed into the library. Ridgerun think this problem is solvable but weren't able to put the time into it. They accepted this limitation with much trepidation.
- requires compiler patching because of a problems with PIC code trying to address function pointers r9. Binutils also needed work. Changes were in ld and ld.so to support the thunk code. Programs that make use of shared libs must be in ELF object file format; other programs may be in flat. Ridgerun fixed a number of bugs as well because they we were running through existing blocks of code under new circumstances such as combining command line options in a new way.
- more flexible platform for future dynamic linking changes with no need to keep track of global identifiers for shared libraries. Not much of an issue for hard core embedded use but could get messy over time. Might not be a problem in the life time of MMU-less processors.
- RidgeRun shared libraries run XIP.
- Function call overhead to execute thunk for exported functions. However, not all global functions in the library need a 'thunk'. No overhead when these 'thunked' functions are called from within the library.
- based on flat file format
- currently for Motorola ColdFire and 68k, highly portable to other processors
- very space-efficient
- some limitations, e.g. max size of 16MB per library, can link max of 255 shared libs in one app, app must be relinked if the library changes (the build system does this automatically). This does mean that apps must be re-linked if a library changes. Could get messy when apps come from various suppliers. Again, not an issue for the hard core embedded case. We generate a runtime error when running against a newer library. You don't need source code to relink, so third parties components could be supplied as object files.
- no source code changes needed
- requires compiler changes but not to ld and ld.so. The only changes required were a small mod to the compiler and changes in elf2flt.
- programs run XIP and the libraries do so as well
- on the ColdFire processor each and every global library function includes at most three extra instructions — two in the prolog and one in the epilog. However, most functions don't need separate save/restore instructions (movem deals with this) so the overhead is generally one instruction. Better still, not all functions need the data segment stuff setup and then there is no overhead.
The objectives of the two teams were different. The RidgeRun mechanism is flexible and familiar for non-embedded developers. On the other hand, SnapGear's primary goal was to pack more features into a small footprint — it's a 'hard core' embedded solution in the same vein as our previous contributions to memory management, PIC enhancements, and eXecute In Place which have allowed uClinux developers to cram applications into smaller memory footprints, or just make more efficient use of existing hardware.
RidgeRun viewed uClinux as first and foremost a Linux for MMU-less processors and secondly, a small footprint version of Linux. If the hardware is MMU-less then it is probably memory constrained as well. Their approach took both of these considerations into account but the priority was on Linux. The outcome was that applications do not have to be re-deployed whenever libraries are re-deployed. The cost in terms of size was small or non-existent.
SnapGear has built a distribution of kernel and tools that is available for public download from uclinux.org.
It would be worthwhile for the RidgeRun changes to be integrated into uClibc and binutils if this hasn't already been done — although SnapGear's solution meets the needs of the hard-core community.
From a legal perspective, RidgeRun's approach is possibly best for systems that must incorporate proprietary software components. Specifically, the RidgeRun shared library approach supports item 6.b. of the LGPL. Item 6.b. is the most practical option for many commercial products because there is no requirement to publish client source code or object files. The object files, of course, are included in the shipped product. SnapGear's solution requires re-linking and therefore doesn't support 6.b. The other options under Item 6 require more effort by the manufacturer to maintain compliance with the license.
Technically, RidgeRun's approach is good for devices that might have programs from various sources (ISVs) and running various libraries. The flexibility of their approach makes it easier to update pieces of the system without breaking others. SnapGear's approach requires complete re-deployment of all applications using an updated shared library. Under many circumstances this is not a problem. For example, SnapGear's own VPN Firewall Appliances are field upgradable and the firmware is built by SnapGear themselves. Similarly, SnapGear's OEM customers have full access to SnapGear's toolchains.
The SnapGear approach may, in the final analysis, result in a smaller memory footprint. RidgeRun expects this difference to be small, but it may make a difference in systems with 2MB or less of system RAM/ROM. RidgeRun's advantages may not be as apparent in such systems.
SnapGear's approach provides a good mechanism for saving on memory resource requirements. RidgeRun's approach is also conserves memory resources and also addresses legal and flexibility issues faced by manufactures considering uClinux for commercial products, as long as a clear distinction is made between GPL and LPGL source — otherwise there is no advantage.
Developers are encouraged to review both approaches and choose what is best suited to their specific circumstances.
About the author: Rick Stevenson is CEO of SnapGear Inc.. He has been involved in the UNIX and Open Source communities for over 20 years and is an Adjunct Professor in the School of Information Technology and Electrical Engineering at the University of Queensland, a leading Australian university. Stevenson is one of the original founders of SnapGear where he served initially as VP of Engineering and CTO, and has previously held senior roles within companies such as IBM, DASCOM, and Pyramid Technology.
This article was originally published on LinuxDevices.com and has been donated to the open source community by QuinStreet Inc. Please visit LinuxToday.com for up-to-date news and articles about Linux and open source.