News Archive (1999-2012) | 2013-current at LinuxGizmos | Current Tech News Portal |    About   

ELJonline: Building a Minimal Glibc with Componentization

Nov 1, 2001 — by LinuxDevices Staff — from the LinuxDevices Archive — views

Use a stripped-down C library to save space or budget for the size of glibc for compatibility? Now there's a third option: build a custom library from the original sources for the best of both.

Glibc componentization is a process to build a custom minimal set of the glibc C libraries, using only the necessary objects required by a specific executable or group of executables. By minimizing the footprint of the libraries, resource-limited embedded targets can maximize resources available for applications and storage. This article discusses the feasibility of componentizing glibc as well as the development of some custom analysis tools. With the help of these tools it was possible to build test executables successfully, each with a custom minimal version of libc.

Embedded systems typically have tighter resource constraints than desktop computers or servers, although they often are expected to perform similar functions such as serving web pages and storing important information. Therefore, the applications they run use much of the same functionality from the system libraries as their desktop and server counterparts. With a reduced expectation of expandability, it is logical to provide a minimal subset of the same libraries.

Independent embedded versions of the system libraries do exist. While these libraries greatly reduce the footprint, they sacrifice functionality (such as pthreads), do not guarantee complete API compatibility with a complete glibc and must be maintained separately.

There are several advantages to building a minimal library from the source of the complete library. The primary advantage is a guaranteed equivalent API. Because there is only one source tree to maintain, whenever glibc is updated so are all the minimal libraries built from glibc. For example, developers don't have to concern themselves with whether or not the embedded library's printf function supports the %f parameter. This enables developers to design applications on a desktop system, with all the amenities they have to offer, and deploy them to an embedded target without concerning themselves with API compatibility. The difficulty of this approach involves how to create a minimal library from such a large source tree without over-complicating the source code. This study investigates the possibility of building a custom libc.so from only the necessary prebuilt object files of a complete glibc build.

When glibc is linked as the final step of the build processes, the various objects (1,756 total) satisfy undefined symbols among themselves. Glibc contains nearly 250,000 implicit dependencies among its various objects. With this many dependencies, manually selecting which objects to include would be tedious at best and impossible at worst. To make this task manageable, a MySQL database containing all the object dependencies for all of glibc was implemented. A detailed description of the library analysis tool can be found in the Sidebar “Library Analysis Tool”. With this tool, a list of all the objects needed to build a custom library can be generated based on the required symbols of a given application set. From the output of this tool, three test executables were successfully built, each with a custom minimal version of glibc. These custom libraries are considerably smaller than the complete versions, as small as 19% of the original size for the simplest case.

Library Analysis Tool

Building Glibc

The first step was to build glibc, understand its build process and note the size of each of its libraries. This analysis was performed on a clean build of a recent version (2.1.3) with the crypt and linuxthreads add-ons. The glibc library set consists of 21 libraries and the linker (ld.so); Table 1 lists all of them and their respective sizes. It should be noted that glibc builds 21 libraries, and of these 21, libc is the largest, accounting for nearly 50% of the total size. For this reason, this research is focused on componentizing libc.so, with the reasoning that the other 20 libraries are already sufficiently modular.

Table 1. Original Libraries and Sizes

By default, glibc builds three versions of its libraries: static, shared and profiled. Only the process of building the shared libraries is relevant to this study. This process consists of five steps:

  1. All the object files (.os) are built with the -fPIC flag to gcc, creating position-independent code.
  2. For each directory, a listing of every object from that directory to be linked into libc is created in a stamp.os file.
  3. An archive, libc_pic.a, is created from these lists using ar.
  4. This archive is made relocatable with the -r flag to gcc.
  5. The relocatable archive is linked into a shared library, libc.so.

Preparing an Application

Prior to building a custom shared library, it is necessary to determine which objects from libc.so will be needed for the target application(s). This is done by compiling and linking the application(s) to the newly built glibc, not the system glibc, and then adding that application to the database managed by the analysis tool. In order to avoid the need to install the newly built glibc, the correct options must be passed at compile time to link against the new library set.

The sample application, test_printf.c, follows:


#include "stdio.h"
int main() {
int i;
for (i = 0; i < 10; i++) {
printf("iteration: %02dn", i);
}
return 0;
}

It is compiled with the commands shown in Listing 1. Note that the system startup files and default libraries are omitted with the -nostdlib and -nostartfiles options. They are replaced with the startup files from the new glibc build (crt1.o, crti.o, crtn.o, etc.), and the newly built libraries are explicitly specified.

Listing 1. Compiling test_printf.c

This application must be executed with the new loader as well (or it will not find the right libraries). The command in Listing 2 specifies the new loader and library path and executes the application. It can be verified that the appropriate libraries are loaded by prepending strace to the previous command and examining the output (the lines starting with open are of interest).

Listing 2. Specifying the New Loader and Library Path and Executing the Application

The program is then added to the database with the addApplication.pl script:


./addApplication ../projects/testcases/test_printf

Building a Minimal libc.so

A minimal libc.so can be built based on any set of applications in the database. The following example will use a single application (test_printf from above) as the source for required objects. The process, outlined below, consists of the following five steps:

  1. Generate a list of required object files, libc_objects.master.
  2. Generate a customized set of libc_objects files.
  3. Create an archive, libc_pic.a, from these lists using ar.
  4. Make the archive relocatable with the -r flag to gcc.
  5. Link the relocatable archive into a shared library, libc.so.

This process should be executed in the minilib directory, containing only the Makefile and associated scripts. The Makefile variable GLIBCPATH has to be updated to the path where glibc was built; the rest of the process is automated with the make command. The library analysis tool provides a list of the object files that provides the symbols explicitly required by an application, as well as the implicitly required objects. This list, libc_objects.master, is generated by the getAppDeps.pl script and should be copied to the minilib directory.

Running make first executes the script getstamps, which descends into the glibc source directory and recursively copies every stamps.os file to an equivalent tree within the current directory. These stamps.os files are formatted to list one object per line and are then sorted alphabetically. The newly formatted stamp.os files are then joined with libc_objects.master to create an intersection of the two files, effectively removing any unnecessary objects from the list. The full path is appended to the objects in the list, and the result is stored in libc_objects (one per directory). With all the libc_objects files in place, the custom library is ready to be linked.

The various commands needed to link the final shared library were taken from the glibc make process and modified to account for the new build location and object-list filenames (libc_objects). Linking is done in three steps. First, ar is used to link all the objects listed in the libc_objects files into one archive with the command in Listing 3.

Listing 3. Linking the Objects in libc_objects into One Archive

Second, the archive is made relocatable:


gcc -nostdlib -nostartfiles -r -o libc_pic.os -Wl,-d -Wl,--whole-archive libc_pic.a

The -r option here generates relocatable code in the output file, libc_pic.os; -nostdlib and -nostartfiles prevent gcc from linking in the standard system libraries and startup files; –whole-archive instructs gcc to include everything from the archives listed after –whole-archive and before –no-whole-archive, and not just the symbols explicitly required by the other objects scheduled for link.

Finally, the shared library is created, as shown in Listing 4.

Listing 4. The Shared Library

The linker option, –version-script, acts as a filter for exported symbols, providing complete control over which symbols are exported. Even if a symbol exists in the objects and archives linked into the library, they will not be exported by the final shared library unless they are listed in the version-script, libc.map. The -e option forces __libc_main as the library's entry point. The -u option forces the symbol __register_frame to be undefined, forcing a link with libgcc.a, which provides this symbol. And then -rpath-link specifies the first set of directories to search for share libraries specified on the command line, such as ld.so. It should be noted that as these commands were taken from the partially automatically generated commands from the glibc build process, it is likely that there are some unnecessary paths and even unnecessary options listed.

The resulting library is placed in the top-level directory as libc.so, a nonstripped shared library.

When linking the application it is possible that the libc_objects.master list is not complete, and undefined symbol errors are the result. These symbols must be tracked down (using the findsymbol script), and their providing objects should be appended to the libc_objects.master list. Running make clean and make will attempt to rebuild the shared library with the updated object list. In its current state, the library analysis tool provides information assuming that a custom version of every library will be built. Since only libc.so is being rebuilt in this example, if the application requires pthreads, the complete libpthread.so library will be used. If it requires something of libc.so that the application does not, it must be added manually. There are generally one or two objects that must be added to the list. This manual step should be eliminated with future versions of the analysis tool.

Testing the Minimal Library

To test the custom library, the application for which it was built must be relinked, using the new library. The new libc.so must be copied into the glibc source tree, replacing the old one. Running make again recompiles the test application, linking to the new minimal library. This analysis tested three test applications, each with unique requirements of libc.so (see Table 2).

Table 2. Test Cases and Minimal Library Statistics

Conclusion

Glibc componentization offers the most customizable libraries, while requiring very little from the developer. The advantages for componentization include rapid development, API consistency and by using the stock glibc source tree, zero maintenance due to a forked tree. Target devices that are resource limited, but that will be used for varying tasks (such as PDAs), should consider other options such as glibc profiling. A profiled version of glibc could be built so that frequently accessed functions are grouped together in pages. Devices not so restricted as to resources may find the best solution simply is to use the complete library. This approach allows for future development of new and more functional applications, without the need to redeploy the system libraries as well. Componentization finds its application in very specialized devices where resources are at a premium, and the applications it must run are fixed and known prior to deployment.

This process defines dependencies at the object level; it does not offer as high a level of granularity as a system based on symbols could, but it is relatively simple and in no way modifies the glibc source tree. The library could be reduced further by implementing simplified versions of some of the larger components, but this too would require modifying the source code. The test cases show that glibc can be componentized with reasonable granularity at the object level, and although not as fine as at the symbol level, this process is far easier and requires less effort from all parties involved. The process discussed can be used to implement any standards-compliant library proposed by third parties as well as to create completely customized minimal libraries for a specific application set when no standard is appropriate.

Glossary

Resources


About the author: Darren Hart is a 24-year-old senior in Brigham Young University's undergraduate Computer Engineering program. His fields of interest and study include embedded systems and embedded application development as well as operating systems–Linux in particular. He has done three consecutive co-ops with IBM, most recently with the Linux Technology Center where he researched glibc componentization.



Copyright © 2001 Specialized Systems Consultants, Inc. All rights reserved. Embedded Linux Journal Online is a cooperative project of Embedded Linux Journal and LinuxDevices.com.

 
This article was originally published on LinuxDevices.com and has been donated to the open source community by QuinStreet Inc. Please visit LinuxToday.com for up-to-date news and articles about Linux and open source.



Comments are closed.