Building a Minimal Glibc with Componentization

Library Analysis Tool

After researching the build process of glibc, it became apparent (rather quickly) that the libraries composing glibc were largely dependent upon themselves and even on each other. This is a good thing for a system library in that by reusing existing functions the library doesn't duplicate code and become unnecessarily large. It does, however, make it difficult to componentize such a library as each component (object, function, symbol, etc.) is dependent on others, which in turn depend on others, and so on, through a massive web of explicit and implicit dependencies. A library analysis tool was designed to handle these.

The tool needed to be able to collect dependency information from the nonstripped libraries and store them for later analysis. The data needed to be stored in such a way that it could be accessed in a very customizable manner, as performing an analysis invariably reveals something unexpected that leads to a new analysis procedure. An SQL database was the obvious choice. A web front end written in PHP provided the ability to view the information gathered in an intelligible manner.

The tool relies on Perl and PHP to create the initial database. First, Perl parses the output of the objdump utility to populate the database with the relevant object dependency information. With all the necessary information now in the database, it was possible to calculate dependency trees and generate lists of interdependent objects. In order to create a complete list of dependencies, it was necessary to recurse through every required object to find all its requirements, and so on for every level. For example, if object A requires object B, and B requires C, then A requires C implicitly. If a function provided by object A is needed, B and C also must be included in the library as A explicitly requires B and implicitly depends on C. With 1,756 object files, and even more symbols, this recursive search can take several minutes per object, which is unacceptable for a web interface for obvious reasons.

To speed up the access to dependency information, a PHP script was written to populate a pre-indexed dependency table. The process of recursively searching for dependencies turned out to be a lot more complex than expected. Even circular dependencies are common. With this table built, a list of both explicitly and implicitly required objects of any one object could be generated quickly. With the ability to find all of a single object's dependencies it was trivial to find the requirements of an application, which can be treated just like an object file by objdump. The object requirement listing for an application lists exactly which object files must be linked into the custom library.

In addition to listing dependencies, it would be useful to be able to see, at a glance, how much of a library was dependent on any one object, both in object count and physical size. A complete table takes several minutes to generate from the object and dependency tables, so another PHP script was written to create a pre-indexed dependency summary table that lists both sizes and object count of every object's requirements and dependents. The table below lists a subset of the complete dependency summary. Note the extent to which the libraries are dependent on malloc and strcpy; if either were removed, 52% (905/1,756) of the library would break.

Subset of the Dependency Summary Table