Networking improvements in the 2.6 kernel
Apr 5, 2004 — by LinuxDevices Staff — from the LinuxDevices Archive — viewsThe new Linux 2.6 kernel offers many improvements over the 2.4 version. One area of technical advancement is in the kernel networking options. Although there are enhancements in most of the files associated with the networking options, this article focuses on major feature improvements and additions that affect entire sections rather than on specific files.
Specifically, in this article we will address improvements to the Networking File System (NFS) and Internet Protocol Security (IPSec). We will also meet two new members of the TCP/IP protocol family, Stream Control Transmission Protocol (SCTP) and Internet Protocol version 6 (IPv6).
Network File System and security
The 2.6 kernel improves the Networking File System (NFS) by including version 4. This new version of NFS has better security, allows for more support across different operating systems, and has a reduced server daemon overhead.
The inclusion of version 4 of the Networking File System (NFSv4) into the 2.6 kernel allows for improvements in security and functionality not seen in previous versions of NFS. Users of NFS may now conduct secure transactions using a remote procedure call (RPC) implementation of the General Security Service (GSS) API. Designers also introduced the idea of a compound procedure, which combines multiple RPCs into one call. This combination of calls means that file system operations need fewer RPCs, leading to faster NFS response.
Reducing NFS overhead even more, NFS now handles file handle-to-path name mapping (mountd), as well as byte range file locking (lockd), which lessens the number of server-side support daemons required. To ease server-side implementations, NFSv4 includes an additional file handle type and provides classifications of file and file system attributes. This new NFS version also includes support for server migration and replication to enable clients to seamlessly change servers when needed. Finally, NFSv4 now has the ability to allow the server to delegate certain responsibilities to the client in caching situations where this option is desired.
The ability to use cryptographic authentication for NFS RPC requests provides support for end-to-end NFS security. NFSv4 uses the RPCSEC_GSS framework to extend the basic security of RPC. This security framework allows NFSv4 to provide mechanisms for authentication, integrity, and privacy between clients and servers. Clients also have the ability to query servers about their security policies with respect to which mechanisms must be used for access. This in-band security negotiation allows the client to securely match the server's security policy to the mechanism that meets both client and server requirements.
Compound procedures are another improvement to NFS included in the version 4 design. Previous versions of NFS did not have a method of allowing clients to build complex logical file system RPCs. By using compound procedures, clients can read data from a file in one request by combining LOOKUP
, OPEN
, and READ
operations in a single RPC request. Older versions of NFS require clients to perform an RPC for each of these three operations. The implementation of handling these compound requests on the server side is very simple. The compound request is broken into a list of separate requests by the server. The server iterates through and performs each operation in the list until it reaches the end or fails, and then returns the results of all operations to the client.
NFSv4 introduces further streamlining by reducing the number of non-NFS server protocols required on the server. With version 4, the NFS code is able to map filehandles to path names, which the mountd protocol does in older versions. The server provides a root file handle that represents the top of the file system tree exported by the server. The server allows for multiple filesystems by attaching them together with pseudo filesystems that cover potential gaps in the path names between real filesystems. This translates to support for a global hierarchical namespace.
In addition, this new version of the NFS protocol supports byte range file locking, whereas previous versions used the lockd protocol provided by the Network Lock Manager. The restructuring of file locking support allows the server to maintain the lock state of files using a lease-based model. Basically, clients are required to issue lock requests to the server. If granted, a client must renew its lease within a server-specified lease time. The server may release the client's lock after the lease expires. The elimination of these two protocols, mountd and lockd, reduces the processing overhead for operating an NFS server.
The new version of NFS also contains improvements that provide for easier NFS server implementations. File handle persistence over the lifetime of the file system object it referred to was a difficult requirement to meet for some older NFS server implementations. NFSv4 adds a volatile file handle type, in addition to the persistent file handle type. With these two file handle types, the server implementation can match the abilities of the file system at the server along with the operating system. Clients can know and be prepared for the type of filehandles provided by the server and set up operations to handle each.
File and file system attribute classification is another addition to NFS that allows for easier server implementations. Older NFS versions use a fixed set of attributes primarily focused on UNIX files and filesystems. If a server or client cannot support a particular attribute, it must attempt to simulate the attribute as best as it can. Version 4 classifies attributes into three categories: mandatory, recommended, and named.
Mandatory attribute are the minimal set of file or file system attributes that must be correctly provided and represented by the server. Recommended attributes represent different file system types and operating systems, and allow for better inclusion of, and interoperability between, operating systems. The named file system attribute classification is a byte stream that is associated with a directory or file and is referred to by a string name. These named attributes are intended to be utilized by client applications for associating specific data with a file(s) and/or file system(s). The creation of the attribute classification system establishes an easier way to add new attributes without having to perform major code revisions.
For better redundancy, NFSv4 supports file system replication and migration on the server side. Using a special file system location attribute, clients can query the server regarding the location of a file system. If the server file system is replicated for load balancing or other such reasons, the client can receive all the locations of the requested file system. Using its own policies, the client is then able to mount and access the appropriate location for the file system it requested. Similarly, if a file system is migrated, upon receiving an error while accessing the old location, the client queries the new location of the file system and makes the necessary change to accommodate the relocation.
A final highlight of NFSv4 is the ability to have the server delegate certain responsibilities to the client in caching situations, which is necessary for providing true data integrity. Previous versions of NFS did not honor UNIX write semantics safely. With NFSv4, a server may provide a client read or write delegation for a certain file. If a client receives a read delegation for a file, then all writes to that file for the length of the delegation are not allowed for any other client. Additionally, if a client receives a write delegation for a file, then no other client may write or read to that file for the length of the delegation. Delegations may be recalled by the server when a client requests conflicting access to a file delegated to another client. In this case, the server notifies the delegated client using a callback path existing between client and server, and recalls the delegation. Delegations allow the client to locally service operations using the NFS cache without immediate interaction with the server, thus reducing server load and network traffic.
Stream Control Transmission Protocol (SCTP) is a new transport layer protocol added to the 2.6 kernel. Besides having many of the same characteristics that the Transmission Control Protocol (TCP) has, SCTP provides additional features beneficial to telephony, data communication, and high availability applications.
SCTP provides similar functionality as TCP, by ensuring error-free and sequenced data transportation, and by establishing a session-oriented, end-to-end relationship between both endpoints throughout the data transmission. However, SCTP provides new features beyond TCP, such as multi-streaming and multi-homing, that are critical to certain workloads, such as telephony signaling over IP networks.
Multi-streaming allows data to be partitioned into multiple independently sequenced streams. As a result, message loss in any one stream will only initially affect delivery within that stream, and not delivery in other streams. SCTP's message orientation, versus TCP byte orientation, supports framing of individual message boundaries, which allows for the multiple streaming of data. Using the single stream of data method applied in TCP, there is an added delay when messages are lost or sequence errors occur. TCP must delay data delivery to the application level until the correct sequence is restored. This data delivery delay is a performance liability for applications where the sequence of all messages is not a necessity, such as in telephony signaling or delivery of web pages that have multimedia content. Although telephony signaling requires sequencing of messages that affect the same resource, such as the same call, other correlating messages can be delivered without the requirement of sequence integrity.
Web pages containing multimedia objects of different types and sizes can use multi-streaming to transport these components in a partially ordered method, rather than in a strict ordered way. This flexibility in data transmission results in possible improved user perception of transport. Additionally, the idea that data transport occurs within a single SCTP association means that all streams adhere to a common flow and congestion control mechanism, which reduces the work required at the transport level.
Multi-homing is another feature of SCTP that separates it from traditional transport layer protocols. Multi-homing allows a single SCTP endpoint to support multiple IP address and provides redundancy in situations where there are multiple routes to a single destination. TCP and UDP use single-homed sessions, where the failure of a local LAN access can isolate the end-system, and failures within the entire network can cause temporary failure until IP routing protocols re-route traffic.
Multi-homed SCTP, combined with redundant LANs, allows for the reinforcement of local endpoint access. Multiple addresses with different prefixes and/or routes, used in association with SCTP multi-homing, allow for improved redundancy across the entire network. The multi-homing feature of SCTP does not provide network load balancing or sharing. This mechanism's key purpose is to provide redundant connections for applications communicating over SCTP. SCTP designates a single address as the “primary” address and uses this address for all data communication. Upon a need to retransmit, data is sent to all addresses in an attempt to improve the probability of reaching the other endpoint. In the event that the primary connection fails completely, all data is re-routed to an alternate address. Using methods similar to standard high availability, a “heartbeat” signal is sent across to the failed primary connection, which is used to determine whether the original connection can be re-established.
Internet Protocol Security (IPSec) is another enhancement to the 2.6 kernel. IPSec provides methods to authenticate and encrypt network communication across local networks and the Internet. In addition to providing packet encryption, the 2.6 kernel offers improved transmission via IP Payload Compression (IPComp). IPComp is a protocol that uses compression and decompression algorithms to improve the quality of transmission over slow and/or congested networks.
The introduction of Internet Protocol Security (IPSec) into the 2.6 kernel code provides users with security services for traffic at the Internet Protocol (IP) layer. IPSec provides a general solution to the complex combination of media and application mixtures that make up the Internet. The 2.6 kernel supports two IPSec mechanisms: Authentication Header (AH) and Encapsulated Security Payload (ESP). Both rely on authentication algorithms supplied by the Cryptographic API, which also included in the 2.6 kernel.
The Authentication Header (AH) is an additional header added directly after the IP header to provide packet authentication. Authentication at the packet level allows users to be sure that the packet received came from a particular machine, and that its contents were not altered somewhere along the way. This mechanism does not attempt to conceal or protect the contents of the packet. The main feature provided by AH is packet integrity assurance. For the added benefit of encryption, users should additionally use ESP.
The Encapsulated Security Payload (ESP) header has the ability to provide encryption as well as packet authentication. The ESP header provides encryption, authentication, “anti-replay service (a form of partial sequence integrity),” and “limited traffic flow confidentiality.” Users may select encryption without specifying authentication, but this action leaves the packets vulnerable to active attacks, which can lead to an outside entity breaking the encryption. The ESP header is located after the IP header and before the transport mode protocol (UDP or TCP) or before an encapsulated IP header when using tunneling.
ESP protects the entire inner IP packet and header. In tunnel mode, the inner IP header carries the intended source and original destination addresses, while an outer IP header contains IP addresses for hop points such as security gateways.
IP Payload Compression (IPComp) reduces the size of IP datagrams. This 2.6 networking feature improves the performance of communication between two endpoints, provided both machines have sufficient computational power and the communication takes place over congested and/or slow links.
The IPComp protocol is extremely useful in combination with IPSec, due to the increase in packet size when using the additional headers IPSec provides and requires. There are two phases of IPComp: outbound packet compression and inbound packet decompression. The integrity of data within the original IP packet is maintained during compression and decompression. Each packet is compressed and decompressed independently to allow for unordered packet arrival caused by the inherent volatility of the Internet.
The 2.6 kernel features improved security options with IPv6. Besides extending IPSec, IPComp, and tunneling support to work over IPv6, the 2.6 kernel provides IPv6 Privacy Extensions.
IPSec for IPv6 provides the same level of authentication and security as it does for IPv4. The inclusion of support for IPv6-to-IPv6 tunneling allows secure seamless communication to occur between two endpoints, such as transmissions over Virtual Private Networks, or VPNs.
IPv6 Privacy Extensions is a feature specifically focused on improving Internet anonymity, giving users the option to protect their identity when using IPv6 addresses. The current model for stateless address autoconfiguration uses the MAC address of the device (in other words, an ethernet card or mobile phone) to formulate the prefix of the 128-bit IPv6 address. Using a non-changing identifier to formulate an address allows for tracking of data, which may be used for unintended reasons. For example, a network sniffer could track which machines and at what time a particular machine communicated with another machine just by knowing its MAC address.
This data from a network sniffer is easily gathered, since the MAC address stays the same regardless of network topology, even if the machine is a mobile phone or a laptop. The individual recording the data can use this information to track work patterns, location, and so on.
IPv6 Privacy Extensions allow users to create additional IPv6 global addresses using a random interface identifier. A machine uses these temporary addresses for a specified amount of time, before being reset to another random address. After the reset, current connections are allowed to maintain communication; however, all new connections must establish communication with the new temporary address.
Most users will find that one or more of these new and enhanced features can improve the way they use Linux in their system environment.
Current users of NFS looking for improved performance or security will gain in both areas by migrating to version 4. Developers of carrier-grade and telephony applications can use the features provided by SCTP to help ensure better, more reliable service for consumers and customers. IPSec provides solutions for people and businesses who need a method of transmitting secure data across insecure networks, and IPComp allows these same groups to improve data communication across the Internet by using smaller packet sizes during transmission. And the enhancements to IPv6 can provide better security and privacy to those using this next-generation Internet protocol, while allowing more IPv4 application developers to make the transition to using this improved version of IP.
Overall, the advancements in networking for the 2.6 Linux kernel are a positive step in the direction towards large-scale Linux adoption in an enterprise environment.
- Generic Packet Tunneling in IPv6 Specification, or RFC 2473, outlines the model and generic mechanisms for IPv6 encapsulation of Internet packets.
- A US government task force is currently studying the implications of IPv6 deployment. Read more about it in this Request for Comments from the agencies involved.
- IP Authentication Header, or RFC 2402, deals with connectionless integrity and data origin authentication for IP datagrams.
- IP Encapsulating Security Payload (ESP), known by its fans as RFC 2406, describes how ESP provides its signature mix of security services in IPv4 and IPv6.
- Privacy Extensions for Stateless Address Autoconfiguration in IPv6, also known as RFC 3041, outlines the methods by which IPv6 Privacy Extensions use stateless autoconfiguration to generate addresses without an DHCP server.
- An Introduction to the Stream Control Transmission Protocol (SCTP), known affectionately as RFC 3286, provides a high-level introduction to SCTP.
- IP Payload Compression Protocol (IPComp), or RFC 3173, describes how and why IP Payload Compression takes place before encryption.
- Network File System (NFS) version 4 Protocol, or RFC 3530, introduces file locking and security to NFS.
- Learn the fundamentals of TCP/IP and Linux networking in the IBM Global Services Linux TCP/IP Administration classroom or virtual courses.
- Basics are also covered in the AIX Security Guide, which includes chapters on TCP/IP Security, Internet Protocol (IP) Security, Network File System (NFS) Security, and more.
- The TCP/IP Tutorial and Technical Overview Redbook is 980 pages all about the underlying concepts essential to the TCP/IP family of protocols.
- Internet Security is the subject of this iSeries and AS/400 White Paper, which includes coverage of such topics as Authentication Headers (AH) and Encapsulating Security Payload (ESP) as well as cryptography, auditing, log analysis, and more.
- Find more resources for Linux developers in the developerWorks Linux zone.
- You'll find a wide selection of books on Linux at the Linux section of the Developer Bookstore.
About the author: Robbie Williamson is Staff Software Engineer in the IBM Linux Technology Center. He graduated from the University of Texas with a B.A. in Computer Science in 2000. During his career, he has worked as a support technician, verification engineer, and developer for various implementations of UNIX. Robbie is currently one of the maintainers for the Linux Test Project and can be reached at [email protected].
First published by IBM developerWorks. Reproduced by LinuxDevices.com with permission.
This article was originally published on LinuxDevices.com and has been donated to the open source community by QuinStreet Inc. Please visit LinuxToday.com for up-to-date news and articles about Linux and open source.