Article: The case for out-sourcing embedded database software

May 27, 2003 — by LinuxDevices Staff — from the LinuxDevices Archive — views

The embedded software build-or-buy tradeoff

For embedded systems, the build-or-buy decision is getting pushed down the software component food chain. Embedded systems hardware developers have long embraced commercial off-the-shelf (COTS) components to shorten development time and reduce development costs. In contrast, developers of embedded systems software have shown a propensity to build from scratch.

Embedded and real-time operating system (RTOS) vendors have probably enjoyed the most success in swaying software developers to adopt COTS software versus roll-your-own components. Despite the millions spent by RTOS vendors to woo developers, this change may be due as much to technical imperatives, as to successful marketing. As embedded systems have grown in complexity, a sufficiently capable custom-built RTOS has become a more difficult undertaking. For all but the most basic embedded uses, it simply isn't practical to write a custom RTOS.

The same is becoming true of middleware and other complex software components, such as protocol stacks, messaging, and data management. Consider protocol stacks, for example. High-performance embedded networking requires fast, reliable and re-entrant TCP/IP stacks, extensively tested to interoperate with Windows, UNIX and other TCP/IP stacks. Because of memory and power constraints, products with embedded processors typically demand lightweight operating systems and highly efficient stack implementations:

Stacks must be compact and modular. They must also be configurable, enabling an application to choose between allocating buffers in advance or at runtime, as required.
Embedded stack design must eliminate latency. To achieve this, an embedded stack, unlike many non-embedded implementations, will eliminate data copying within its processes, and rely on interrupt processing of packets rather than on operating system scheduling.
These days, when hardware's life-span is shorter than firmware's, manufacturers must often port embedded software to new hardware platforms. Therefore, embedded stacks must be easily portable from one target architecture to another. They must support various CPU architectures such as big-endian and little-endian targets, including RISC processors, without requiring extra data copies within the driver to align packets.

Logical distribution of real-time and embedded systems is also increasingly a fact of life, and this is driving adoption of COTS messaging middleware. In contrast to distributed PC architectures, where peripherals communicate data with a central processor via shared memory, embedded systems often carry transient data, which moves from one computing element (device) to another for further processing. Distributed embedded systems, ranging from process control to telecommunications, have at least one thing in common — the right answer delivered too late, becomes the wrong answer. COTS messaging middleware eliminates the laborious network programming previously required to link point A with point B, and provides low-latency message delivery, often with multicast support for fast, efficient communications over IP.

Some real-time COTS messaging implementations use proprietary APIs while others are standards-based. Among the standards-based products, the use of CORBA has increased significantly. This stems at least in part from QoS policies defined in the OMG CORBA Messaging specification, which provide a new degree of flexibility and control to CORBA applications, and from the Real-Time CORBA specification, which defines standard features to support end-to-end predictability in applications.

Embedded database software

Data management for embedded systems is similarly outgrowing its self-developed roots. With the advent of the Internet, high speed and wireless interconnects, inexpensive memory, and relatively faster processors, embedded systems are now required to manage greater volumes of more complex data. Managing more data demands that the data management software be able to scale. In other words, performance cannot noticeably degrade as data volume scales up from hundreds or thousands of objects to hundreds of thousands, millions, and more. More complex data means that data management software must also manage complex structures and relationships.

As with RTOSes before, increasingly demanding applications and operating environments are leading embedded system developers to seek COTS data management solutions that can meet the requirements of cost-effectiveness, scalability, and the ability to tame complexity.

Some of the emerging requirements for embedded systems data management include the following . . .

Many short, fast transactions — Embedded systems with challenging data management requirements characteristically have very high transaction rates, with each transaction of very short duration. (A transaction, in this context, is an atomic operation with data that can be read-only, read-write, or both.) Examples include set-top boxes that receive electronic programming guide data from a satellite feed at 10's of megabits per second (digital television requirements specify > 100 megabits per second); IP routers processing a hundred thousand routing table lookups/updates per second; and industrial controllers with networks of sensors, each transmitting data many thousand of times per second.
Data management for embedded systems, therefore, must have the ability to keep up with the applications' real-time data needs. This requires extremely lightweight and nimble transaction management. Application processes must be able to interact directly with the data management software, as the inter-process communication latency of remote procedure call (RPC)-based client/server architecture or an independent lock arbiter would introduce unacceptable delays.

Furthermore, because the various tasks of an embedded system have inherently different priorities, which can change dynamically, an embedded data management transaction manager should be able to prioritize transactions as the operating environment changes. For example, in response to increased input rates, a data acquisition system must adjust its data processing in order to free up the “input buffer” and avoid losing data. Ideally the application can boost the priority of transactions responsible for writing data into the database — thus emptying the input buffer faster — at the expense of other tasks such as searching/reading data out for display. When the peak time is over, the transaction priority is changed back to normal, so all the database-related activities are equal in priority again.
Shared data and event processing — Embedded real-time operations are typically event-driven, responding to interrupts coming from external sources. New or changed data (together, an “event”) triggers processing, such as an IP router broadcasting a routing table change to its peers, or an industrial controller causing a visible alert on a display. To avoid the unnecessary processing entailed in periodically polling the database for changes, COTS data management for embedded systems must also support event processing by propagating events out to other “interested” software components. For example, an interrupt from a sensor that causes a value to be changed in the database should in turn cause a database event that notifies other interested software modules of the change.
Complex data and design flexibility — Embedded systems are often required to manage highly complex data, and at rates of speed that are not amenable to additional processing for decomposing/recomposing data to/from a normalized form (i.e. third normal form, 3NF, or higher). An embedded application should ideally be able to write/read data to/from the database in the (un-normalized) structure in which it processes the data internally. To the C language programmer, this can mean embedded structures, nested structures, and fixed and variable length arrays of atomic types as well as structures, opaque or un-typed data, and even optional data elements.
A database's value lies in enabling retrieval by a variety of criteria, or in different sorted orders. An embedded systems database must support simple and compound indexes for this purpose. Ideally, the database will offer indexes that can be programmatically turned on and off to provide access methods that are only occasionally needed.

While not an absolute requirement, a solution that decouples the data definition from the application source code provides superior flexibility and an easier means to modify the data structure. This is accomplished via a database management system's data definition language (DDL), which is used to express the data grouping, attributes (size and type), access methods, and other characteristics of the managed data. A compiler processes the DDL, validating it for correctness and rendering it in a form that is usable by the database management system software. In the absence of a DDL, the data design becomes inextricably hard-wired to the application code and more difficult to modify.
High availability — Some embedded systems, such as telecom and network infrastructure, must be resilient, remaining operational even in the face of failed hardware or software components. For an embedded database, this means that a mechanism must be available to survive the failure of the hardware device on which the database resides. Obviously, one or more standby copies of the database must be maintained on one or more other hardware components. Simplistic mirroring or replication schemes are inadequate; the embedded database must guarantee that all copies of the primary database are exactly synchronized at all times.
This is best accomplished with an architecture such as two line cards within a single chassis, sharing a common bus for maximum performance, and a two-phase commit protocol between the master and slave database run-time instances. This guarantees the consistency of committed transactions across all copies. The notification scheme elaborated above must be present to immediately signal the failure of (or loss of communication with) any standby or master component, so that the application can take corrective action and continue its operation.
Sharing data with other systems — Determining the health of an enterprise is always a process of gathering, distilling, and analyzing the enterprise's data. Virtually all embedded systems and devices operate within an enterprise or in a similar organizational context, and as such systems grow in capability, intelligence, and data gathering ability, the need to deliver and assess this information upstream grows. Therefore, embedded systems' data management solutions must have an easy way to share data with other systems within the enterprise. XML is emerging as the preferred method. Thus, a COTS embedded database solution should be able to speak and understand XML.
XML is, of course, an open standard, with a vast amount of technical information and open source software available to assist, should a developer wish to bolt XML compatibility onto self-developed data management solutions. Some developers relish such a challenge, and this, as much as anything, ensures that homegrown data management, protocol stacks and messaging software for embedded systems won't vanish from our midst. But assuming that COTS vendors price their technology reasonably, deploying the time, expertise and expense to develop these components in-house is becoming less and less justifiable economically. Even — or perhaps especially — for highly capable technology manufacturers, such an endeavor is an ineffective use of the organization's talent.

Conclusion

In the past, organizations bought hardware for embedded systems, and built the software required for such projects. This is changing. At a high level, the new COTS embedded components may resemble those of long-established enterprise systems, in that they tackle data management, messaging, and other critical tasks. But viewed closely, database and other new COTS embedded software can be quite different, reflecting the unique development, performance, reliability, and other needs of embedded systems. As embedded software vendors progress in understanding these needs, the use of COTS embedded software will push into ever-more aspects of embedded development, streamlining the development process and leading to more reliable and economical end-products.

About the author: Steve Graves is president and cofounder of McObject, developer of the eXtremeDB in-memory database system. As president of Raima Corporation, he helped pioneer the use of DBMS technology in embedded systems, working closely with companies in building database-enabled intelligent devices. A database industry veteran, Graves has held executive-level engineering, consulting and sales/marketing positions at several public and private technology companies.

This article was originally published on LinuxDevices.com and has been donated to the open source community by QuinStreet Inc. Please visit LinuxToday.com for up-to-date news and articles about Linux and open source.

Comments are closed.

Pages

Archives

Categories

Article: The case for out-sourcing embedded database software

Related Posts: