| streams PaperDescription: OpenSS7 Online PapersA PDF version of this document is available here.
STREAMS-based vs. Legacy Pipe Performance Comparison
|
Distribution | Kernel |
RedHat 7.2 | 2.4.20-28.7 |
CentOS 4 | 2.6.9-5.0.3.EL |
CentOS 5 | 2.6.18-8-el5 |
SuSE 10.0 OSS | 2.6.13-15-default |
Ubuntu 6.10 | 2.6.17-11-generic |
Ubuntu 7.04 | 2.6.20-15-generic |
Fedora Core 6 | 2.6.20-1.2933.fc6 |
To remove the dependence of test results on a particular machine, various machines were used for testing as follows:
Hostname | Processor | Memory | Architecture |
porky | 2.57GHz PIV | 1Gb (333MHz) | i686 UP |
pumbah | 2.57GHz PIV | 1Gb (333MHz) | i686 UP |
daisy | 3.0GHz i630 HT | 1Gb (400MHz) | x86_64 SMP |
mspiggy | 1.7GHz PIV | 1Gb (333MHz) | i686 UP |
The results for the various distributions and machines is tabulated in Appendix B. The data is tabulated as follows:
Performance is charted by graphing the number of writes per second against the logarithm of the write size.
Delay is charted by graphing the number of second per write against the write size. The delay can be modelled as a fixed write overhead per write operation and a fixed overhead per byte written. This model results in a linear graph with the intercept at 1 byte per write representing the fixed per-write overhead, and the slope of the line representing the per-byte cost. As all implementations use the same primary mechanisms for copying bytes to and from user space, it is expected that the slope of each graph will be similar and that the intercept will reflect most implementation differences.
Throughput is charted by graphing the logarithm of the product of the number of writes per second and the message size against the logarithm of the message size. It is expected that these graphs will exhibit strong log-log-linear (power function) characteristics. Any curvature in these graphs represent throughput saturation.
Improvement is charted by graphing the quotient of the writes per second of the implementation and the writes per second of the Linux legacy pipe implementation as a percentage against the write size. Values over 0% represent an improvement over Linux legacy pipes, whereas values under 0% represent the lack of an improvement.
The results are organized in the section that follow in order of the machine tested.
Porky is a 2.57GHz Pentium IV (i686) uniprocessor machine with 1Gb of memory. Linux distributions tested on this machine are as follows:
Distribution | Kernel |
Fedora Core 6 | 2.6.20-1.2933.fc6 |
CentOS 4 | 2.6.9-5.0.3.EL |
SuSE 10.0 OSS | 2.6.13-15-default |
Ubuntu 6.10 | 2.6.17-11-generic |
Ubuntu 7.04 | 2.6.20-15-generic |
Fedora Core 6 is the most recent full release Fedora distribution. This distribution sports a 2.6.20-1.2933.fc6 kernel with the latest patches. This is the x86 distribution with recent updates.
Figure 4 illustrates the performance of LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be see from Figure 4, the performance of LiS is dismal across the entire range of write sizes. The performance of Linux Fast-STREAMS STREAMS-based pipes, on the other hand, is superior across the entire range of write sizes. Performance of Linux Fast-STREAMS is a full order of magnitude better than LiS.
Figure 5 illustrates the average write delay for LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. The slope of all three curves is comparable and about the same. This indicates that each implementation is only slightly dependent upon the size of the message and each implementation has a low per-byte processing overhead. This is as expected as pipes primarily copy data from user space to the kernel just to copy it back to user space on the other end. Note that the intercepts, on the other hand, differ to a significant extent. Linux Fast-STREAMS STREAMS-based pipes have by far the lowest per-write overhead (about half that of the Linux legacy pipes, and a sixth of LiS pipes).
Figure 6 illustrates the throughput experienced by LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be seen from Figure 6, all implementations exhibit strong power function characteristics, indicating structure and robustness for each implementation (regardless of performance).
Figure 7 illustrates the improvement over Linux legacy pipes of Linux Fast-STREAMS STREAMS-based pipes. The improvement of Linux Fast-STREAMS over Linux legacy pipes is marked: improvements range from a significant 75% increase in performance at large write sizes, to a staggering 200% increase in performance at lower write sizes.
CentOS 4.0 is a clone of the RedHat Enterprise 4 distribution. This is the x86 version of the distribution. The distribution sports a 2.6.9-5.0.3.EL kernel.
Figure 8 illustrates the performance of LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be see from Figure 8, the performance of LiS is dismal across the entire range of write sizes. The performance of Linux Fast-STREAMS STREAMS-based pipes, on the other hand, is superior across the entire range of write sizes. Performance of Linux Fast-STREAMS is a full order of magnitude better than LiS.
Figure 9 illustrates the average write delay for LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. The slope of all three curves is comparable and about the same. This indicates that each implementation is only slightly dependent upon the size of the message and each implementation has a low per-byte processing overhead. This is as expected as pipes primarily copy data from user space to the kernel just to copy it back to user space on the other end. Note that the intercepts, on the other hand, differ to a significant extent. Linux Fast-STREAMS STREAMS-based pipes have by far the lowest per-write overhead (about half that of the Linux legacy pipes, and a sixth of LiS pipes).
Figure 10 illustrates the throughput experienced by LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be seen from Figure 10, all implementations exhibit strong power function characteristics, indicating structure and robustness for each implementation (regardless of performance).
Figure 11 illustrates the improvement over Linux legacy pipes of Linux Fast-STREAMS STREAMS-based pipes. The improvement of Linux Fast-STREAMS over Linux legacy pipes is marked: improvements range from a significant 100% increase in performance at large write sizes, to a staggering 275% increase in performance at lower write sizes.
SuSE 10.0 OSS is the public release version of the SuSE/Novell distribution. There have been two releases subsequent to this one: the 10.1 and recent 10.2 releases. The SuSE 10 release sports a 2.6.13 kernel and the 2.6.13-15-default kernel was the tested kernel.
Figure 12 illustrates the performance of LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be see from Figure 12, the performance of LiS is dismal across the entire range of write sizes. The performance of Linux Fast-STREAMS STREAMS-based pipes, on the other hand, is superior across the entire range of write sizes. Performance of Linux Fast-STREAMS is a full order of magnitude better than LiS.
Figure 13 illustrates the average write delay for LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. The slope of the delay curves are similar for all implementations, as expected. The zero intercept of Linux Fast-STREAMS is, however, far superior to that of legacy Linux and a full order of magnitude better than the under-performing LiS.
Figure 14 illustrates the throughput experienced by LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be seen from Figure 14, all implementations exhibit strong power function characteristics, indicating structure and robustness for each implementation. The Linux Fast-STREAMS curve exhibits a downward concave characteristic at large message sizes indicating that the memory bus saturates at about 10Gbps.
Figure 15 illustrates the improvement over Linux legacy pipes of Linux Fast-STREAMS STREAMS-based pipes. The improvement of Linux Fast-STREAMS over Linux legacy pipes is significant: improvements range from a 100% increase in performance at large write sizes, to a 475% increase in performance at lower write sizes.
Ubuntu 6.10 is the current release of the Ubuntu distribution. The Ubuntu 6.10 release sports a 2.6.15 kernel. The tested distribution had current updates applied.
Figure 20 illustrates the performance of LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be see from Figure 20, the performance of LiS is dismal across the entire range of write sizes. The performance of Linux Fast-STREAMS STREAMS-based pipes, on the other hand, is superior across the entire range of write sizes. Performance of Linux Fast-STREAMS is a full order of magnitude better than LiS.
Figure 21 illustrates the average write delay for LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. Again, the slope of the delay curves is similar, but Linux Fast-STREAMS exhibits a greatly reduced intercept indicating superior per-message overheads.
Figure 22 illustrates the throughput experienced by LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be seen from Figure 22, all implementations exhibit strong power function characteristics, indicating structure and robustness for each implementation. Again Linux Fast-STREAMS appears to saturate the memory bus approaching 10Gbps.
Figure 23 illustrates the improvement over Linux legacy pipes of Linux Fast-STREAMS STREAMS-based pipes. The improvement of Linux Fast-STREAMS over Linux legacy pipes is significant: improvements range from a 75% increase in performance at large write sizes, to a 200% increase in performance at lower write sizes.
Ubuntu 7.04 is the current release of the Ubuntu distribution. The Ubuntu 7.04 release sports a 2.6.20 kernel. The tested distribution had current updates applied.
Figure 20 illustrates the performance of LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be see from Figure 20, the performance of LiS is dismal across the entire range of write sizes. The performance of Linux Fast-STREAMS STREAMS-based pipes, on the other hand, is superior across the entire range of write sizes. Performance of Linux Fast-STREAMS is a full order of magnitude better than LiS.
Figure 21 illustrates the average write delay for LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. Again, the slope of the delay curves is similar, but Linux Fast-STREAMS exhibits a greatly reduced intercept indicating superior per-message overheads.
Figure 22 illustrates the throughput experienced by LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be seen from Figure 22, all implementations exhibit strong power function characteristics, indicating structure and robustness for each implementation. Again Linux Fast-STREAMS appears to saturate the memory bus approaching 10Gbps.
Figure 23 illustrates the improvement over Linux legacy pipes of Linux Fast-STREAMS STREAMS-based pipes. The improvement of Linux Fast-STREAMS over Linux legacy pipes is significant: improvements range from a 75% increase in performance at large write sizes, to a 200% increase in performance at lower write sizes.
Pumbah is a 2.57GHz Pentium IV (i686) uniprocessor machine with 1Gb of memory. This machine differs from Porky in memory type only (Pumbah has somewhat faster memory than Porky.) Linux distributions tested on this machine are as follows:
Distribution | Kernel |
RedHat 7.2 | 2.4.20-28.7 |
Pumbah is a control machine and is used to rule out differences between recent 2.6 kernels and one of the oldest and most stable 2.4 kernels.
RedHat 7.2 is one of the oldest (and arguably the most stable) glibc2 based releases of the RedHat distribution. This distribution sports a 2.4.20-28.7 kernel. The distribution has all available updates applied.
Figure 24 illustrates the performance of LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be see from Figure 24, the performance of LiS is dismal across the entire range of write sizes. The performance of Linux Fast-STREAMS STREAMS-based pipes, on the other hand, is superior across the entire range of write sizes. At a write size of one byte, the performance of Linux Fast-STREAMS is an order of magnitude greater than LiS.
Figure 25 illustrates the average write delay for LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. The slope of all three graphs is similar, indicating that memory caching and copy to and from user performance on a byte-by-byte basis is similar. The intercepts, on the other hand, are drastically different. LiS per-message overheads are massive. Linux Fast-STREAMS and Linux legacy pipes are far better. STREAMS-based pipes have about one third of the per-message overhead of legacy pipes.
Figure 26 illustrates the throughput experienced by LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be seen from Figure 26, all implementations exhibit strong power function characteristics, indicating structure and robustness for each implementation (despite performance differences). On Pumbah, as was experienced on Porky, Linux Fast-STREAMS is beginning to saturate the memory bus at 10Gbps.
Figure 27 illustrates the improvement over Linux legacy pipes of Linux Fast-STREAMS STREAMS-based pipes. The improvement of Linux Fast-STREAMS over Linux legacy pipes is significant: improvements range from a 75% increase in performance at large write sizes, to a 175% increase in performance at lower write sizes. LiS pipes waddle in at a 75% decrease in performance.
Daisy is a 3.0GHz i630 (x86_64) hyper-threaded machine with 1Gb of memory. Linux distributions tested on this machine are as follows:
Distribution | Kernel |
Fedora Core 6 | 2.6.20-1.2933.fc6 |
CentOS 5 | 2.6.18-8-el5 |
CentOS 5.2 | 2.6.18-92.1.6.el5.centos.plus |
This machine is used as an SMP control machine. Most of the test were performed on uniprocessor non-hyper-threaded machines. This machine is hyper-threaded and runs full SMP kernels. This machine also supports EMT64 and runs x86_64 kernels. It is used to rule out both SMP differences as well as 64-bit architecture differences.
Fedora Core 6 is the most recent full release Fedora distribution. This distribution sports a 2.6.20-1.2933.fc6 kernel with the latest patches. This is the x86_64 distribution with recent updates.
Figure 28 illustrates the performance of LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be see from Figure 28, the performance of LiS is dismal across the entire range of write sizes. The performance of Linux Fast-STREAMS STREAMS-based pipes, on the other hand, is superior across the entire range of write sizes. The performance of Linux Fast-STREAMS is almost an order of magnitude greater than that of LiS.
Figure 29 illustrates the average write delay for LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. Again the slope appears to be the same for all implementations, except Linux legacy pipes which exhibit some anomalies below 1024 byte write sizes. The intercept for Linux Fast-STREAMS is again much superior to the other two implementations.
Figure 30 illustrates the throughput experienced by LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be seen from Figure 30, all implementations exhibit strong power function characteristics, indicating structure and robustness for each implementation.
Figure 31 illustrates the improvement over Linux legacy pipes of Linux Fast-STREAMS STREAMS-based pipes. The improvement of Linux Fast-STREAMS over Linux legacy pipes is significant: improvements range from a 100% increase in performance at large write sizes, to a 175% increase in performance at lower write sizes. LiS again drags in at -75%.
CentOS 5 is the most recent full release CentOS distribution. This distribution sports a 2.6.18-8-el5 kernel with the latest patches. This is the x86_64 distribution with recent updates.
Figure 32 illustrates the performance of LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be see from Figure 32, the performance of LiS is dismal across the entire range of write sizes. The performance of Linux Fast-STREAMS STREAMS-based pipes, on the other hand, is superior across the entire range of write sizes. The performance of Linux Fast-STREAMS is almost an order of magnitude greater than that of LiS.
Figure 33 illustrates the average write delay for LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. Again the slope appears to be the same for all implementations, except Linux legacy pipes which exhibit some anomalies below 1024 byte write sizes. The intercept for Linux Fast-STREAMS is again much superior to the other two implementations.
Figure 34 illustrates the throughput experienced by LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be seen from Figure 34, all implementations exhibit strong power function characteristics, indicating structure and robustness for each implementation.
Figure 35 illustrates the improvement over Linux legacy pipes of Linux Fast-STREAMS STREAMS-based pipes. The improvement of Linux Fast-STREAMS over Linux legacy pipes is significant: improvements range from a 100% increase in performance at large write sizes, to a 175% increase in performance at lower write sizes. LiS again drags in at -75%.
CentOS 5.2 is the most recent full release CentOS distribution. This distribute sports a 2.6.18-92.1.6.el5.centos.plus kernel with the latest patches. This is the x86_64 distribution with recent updates.
This is a test result set that was updated July 26, 2008. The additional options, -H, -M, -F and -w were added to the perftest_script command line. Also, streams-0.9.2.4 was tested.
Figure 36 illustrates the performance of Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be see from Figure 36, the performance of Linux Fast-STREAMS STREAMS-based pipes is superior across the entire range of write sizes. The performance of Linux Fast-STREAMS is significantly greater (by a factor of 4 through 7) than Linux legacy pipes at smaller write sizes.
The performance boost experienced by Linux Fast-STREAMS at write sizes beneath 128 is primariliy due to the write coallescing feature (hold feature) of the Stream head combined with the fact that the fast-buffer sizes for x86_64 is 128 bytes. The performance boost experienced across the entire range is primarily due to the read-fill option combined with full-sized reads.
Note that it was not possible to get LiS running on this kernel.
Figure 37 illustrates the average write delay for Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. Again the slope appears to be similar for both implementations if a little bit erratic. The intercept for Linux Fast-STREAMS is again much superior than Linux legacy pipes.
Again, the delay drop experienced by Linux Fast-STREAMS at write sizes beneath 128 is primariliy due to the write coallescing feature (hold feature) of the Stream head combined with the fact that the fast-buffer sizes for x86_64 is 128 bytes. The delay drop experienced across the entire range is primarily due to the read-fill option combined with full-sized reads.
Note that it was not possible to get LiS running on this kernel.
Figure 38 illustrates the throughput experienced by Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be seen from Figure 38, all implementations exhibit strong power function characteristics, indicating structure and robustness for each implementation.
Again, the throughput increase experienced by Linux Fast-STREAMS at write sizes beneath 128 is primariliy due to the write coallescing feature (hold feature) of the Stream head combined with the fact that the fast-buffer sizes for x86_64 is 128 bytes. The throughput increase experienced across the entire range is primarily due to the read-fill option combined with full-sized reads.
Note that it was not possible to get LiS running on this kernel.
Figure 39 illustrates the improvement over Linux legacy pipes of Linux Fast-STREAMS STREAMS-based pipes. The improvement of Linux Fast-STREAMS over Linux legacy pipes is significant: improvements range from a 100% increase in performance at large write sizes, to a staggering 500% increase in performance at lower write sizes.
Again, the improvements experienced by Linux Fast-STREAMS at write sizes beneath 128 is primariliy due to the write coallescing feature (hold feature) of the Stream head combined with the fact that the fast-buffer sizes for x86_64 is 128 bytes. The improvements experienced across the entire range is primarily due to the read-fill option combined with full-sized reads.
Note that it was not possible to get LiS running on this kernel.
Mspiggy is a 1.7Ghz Pentium IV (M-processor) uniprocessor notebook (Toshiba Satellite 5100) with 1Gb of memory. Linux distributions tested on this machine are as follows:
Distribution | Kernel |
SuSE 10.0 OSS | 2.6.13-15-default |
Note that this is the same distribution that was also tested on Porky. The purpose of testing on this notebook is to rule out the differences between machine architectures on the test results. Tests performed on this machine are control tests.
SuSE 10.0 OSS is the public release version of the SuSE/Novell distribution. There have been two releases subsequent to this one: the 10.1 and recent 10.2 releases. The SuSE 10 release sports a 2.6.13 kernel and the 2.6.13-15-default kernel was the tested kernel.
Figure 40 illustrates the performance of LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be see from Figure 40, the performance of LiS is dismal across the entire range of write sizes. The performance of Linux Fast-STREAMS STREAMS-based pipes, on the other hand, is superior across the entire range of write sizes. Linux Fast-STREAMS again performs a full order of magnitude better than LiS.
Figure 41 illustrates the average write delay for LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. The slope of the delay curves is, again, similar, but the intercept for Linux Fast-STREAMS is far superior.
Figure 42 illustrates the throughput experienced by LiS, Linux Fast-STREAMS and Linux legacy pipes across a range of write sizes. As can be seen from Figure 42, all implementations exhibit strong power function characteristics, indicating structure and robustness for each implementation. Linux Fast-STREAMS again begins to saturate the memory bus at 10Gbps.
Figure 43 illustrates the improvement over Linux legacy pipes of Linux Fast-STREAMS STREAMS-based pipes. The improvement of Linux Fast-STREAMS over Linux legacy pipes is significant: improvements range from a 100% increase in performance at large write sizes, to a staggering 400% increase in performance at lower write sizes.
The results across the various distributions and machines tested are consistent enough to draw some conclusions from the test results.
The test results reveal that the maximum throughput performance, as tested by the perftest program, of STREAMS-based pipes (as implemented by Linux Fast-STREAMS) is remarkably superior to that of legacy Linux pipes, regardless of write or read sizes. In fact, STREAMS-based pipe performance at smaller write/read sizes is significantly greater (as much as 200-400%) than that of legacy pipes. The performance of LiS is dismal (approx. 75% decrease) compared to legacy Linux pipes.
Looking at only the legacy Linux and Linux Fast-STREAMS implementations, the difference can be described by analyzing the implementations.
Linux legacy pipes use a simple method on the write side of the pipe. The pipe copies bytes from the user into a preallocated page, by pushing a tail pointer. If there is a sleeping reader process, the process is awoken. If there is no more room in the buffer, the write process sleeps or fails.
STREAMS, on the other hand, uses full flow control. On the write side of the STREAMS-based pipe, the Stream head allocates a message block and copies the bytes from the user to the message block and places the message block onto the Stream. This results in placing the message on the opposite Stream head. If a reader is sleeping on the opposite Stream head, the Stream head's read queue service procedure is scheduled. If the Stream is flow controlled, the writing process sleeps or fails.
STREAMS has the feature that when a reader finds insufficient bytes available to satisfy the read, it issues an M_READ message downstream requesting a specific number of bytes. When the writing Stream head receives this message, it attempts to satisfy the full read request before sending data downstream.
Linux Fast-STREAMS also has the feature that when flow control is exerted, it saves the message buffer and a subsequent write of the same size is added to the same buffer.
On the read side of the legacy pipe, bytes are copied from the preallocated page buffer to the user, pulling a head pointer. If there are no bytes available to be read in the buffer, the reading process sleeps or fails. When bytes have been read from the buffer and a process is sleeping waiting to write, the sleeping process is awoken.
STREAMS again uses full flow control. On the read side of the STREAMS-based pipe, messages are removed from the Stream head read queue, copied to the user, and then the message is either freed (when all the bytes contained are consumed) or placed back on the Stream head read queue. If the read queue was previously full and falls beneath the low water mark for the read queue, the Stream is back-enabled. Back-enabling results in the service procedure of the write side queue of the other Stream head to be scheduled for service. If there are no bytes available to be read, the reading process sleeps or fails.
STREAMS has the additional feature that if there are no bytes to be read, it can issue an M_READ message downstream requesting the number of bytes that were issued to the read(2) system call.
There are two primary differences in the buffering approaches used by legacy and STREAMS-based pipes:
One would expect that the STREAMS-based approach would present significant overheads in comparison to the legacy approach; however, the lack of flow control in the Linux approach is problematic.
Legacy pipes schedule by waking a reading process whenever data is available in the buffer to be read, and waking a writing process whenever there is room available in the buffer to write. While accomplishing buffering, this does not provide flow control or scheduling. By not providing even the hysteresis afforded by Sockets, the write and read side thrash the scheduler as bytes are written to and removed from the pipe.
STREAMS-based pipes, on the other hand, use the scheduling mechanisms of STREAMS. When messages are written to the reading Stream head and a reader is sleeping, the service procedure for the reading Stream head's read queue is scheduled for later execution. When the STREAMS scheduler later runs, the reading process is awoken. When message are read from the reading Stream head read queue and the queue was previously flow controlled, and the byte count falls below the low water mark defined for the queue, the writing Stream head write queue service procedure is scheduled. Once the STREAMS scheduler later runs, the writing process is awoken.
Linux Fast-STREAMS is designed to run tasks queued to the STREAMS scheduler on the same processor as the queueing process or task. This avoids unnecessary context switches.
The STREAMS-based pipe approach results in fewer wakeup events being generated. Because there are fewer wakeup events, there are fewer context switches. The reading process is permitted to consume more messages before the writing process is awoken; and the writing process is permitted to write more messages before the reading process is awoken.
The result of the differences between the legacy and the STREAMS based approach is that fewer context switches result: writing processes are allowed to write more messages before a blocked reader is awoken and the reading process is allowed to read more messages before a blocked writer is awoken. This results in greater code path and data cache efficiency and significantly less scheduler thrashing between the reading and writing process.
The increased performance of the STREAMS-based pipes can be explained as follows:
The STREAMS message coalescing features allows the complexity of the write side process to approach that of the legacy approach. This feature provides a boost to performance at message sizes smaller than a FASTBUF. The size of a FASTBUF on 32-bit systems is 64 bytes; on 64-bit systems, 128 bytes. (However, this STREAMS feature is not sufficient to explain the dramatic performance gains, as close to the same performance is exhibited with the feature disabled.)
The STREAMS read notification feature allows the write side to exploit efficiencies from the knowledge of the amount of data that was requested by the read side. (However, this STREAMS feature is also not sufficient to explain the performance gains, as close to the same performance is exhibited with the feature disabled.)
The STREAMS read fill mode feature permits the read side to block until the full read request is satisfied, regardless of the O_NONBLOCK flags setting associated with the read side of the pipe. (Again, this STREAMS feature is not sufficient to explain the performance gains, as close to the same performance is exhibited with the feature disabled.)
The STREAMS flow control and scheduling mechanisms permits the read side to read more messages between wakeup events; and also permits the write side to write more messages between wakeup events. This results in superior code and data caching efficiencies and a greatly reduced number of context switches. This is the only difference that explains the full performance increase in STREAMS-based pipes over legacy pipes.
These experiments have shown that the Linux Fast-STREAMS implementation of STREAMS-based pipes outperforms the legacy Linux pipe implementation by a significant amount (up to a factor of 5) and outperform the LiS implementation by a staggering amount (up to a factor of 25).
The Linux Fast-STREAMS implementation of STREAMS-based pipes is superior by a significant factor across all systems and kernels tested.
While it can be said that all of the preconceptions regarding STREAMS and STREAMS-based pipes are applicable to the under-performing LiS, and may very well be applicable to historical implementations of STREAMS, these preconceptions with regard to STREAMS and STREAMS-based pipes are dispelled for the high-performance Linux Fast-STREAMS by these test results.
Contrary to the preconception that STREAMS must be slower because it is more complex, in fact the reverse has been shown to be true for Linux Fast-STREAMS in these experiments. The STREAMS flow control and scheduling mechanisms serve to adapt well and increase both code and data cache as well as scheduler efficiency.
Contrary to the preconception that STREAMS trades flexibility for efficiency (that is, that STREAMS is somehow less efficient because it is more flexible), in fact has shown to be untrue for Linux Fast-STREAMS, which is both more flexible and more efficient. Indeed, the performance gains achieved by STREAMS appear to derive from its more sophisticated queueing, scheduling and flow control model. (Note that this is in fitting with the statements made about 4.2BSD pipes being implemented with UNIX domain sockets for "performance reasons" (MBKQ97).)
Contrary to the preconception that STREAMS must be slower due to complex locking and synchronization mechanisms, Linux Fast-STREAMS performed as well on SMP (hyperthreaded) machines as on UP machines and strongly outperformed legacy Linux pipes with 100% improvements at all write sizes and a staggering 500% at smaller write sizes.
Contrary to the preconception that STREAMS-based pipes must be slower because STREAMS-based pipes provide such a rich set of features as well as providing full duplex operation where legacy pipes only unidirectional operation, the reverse has been shown in these experiments for Linux Fast-STREAMS. By utilizing STREAMS flow control and scheduling, STREAMS-based pipes indeed perform better than legacy pipes.
Contrary to the preconception that STREAMS-based pipes must be poorer due to their increased implementation complexity, the reverse has shown to be true in these experiments for Linux Fast-STREAMS. Also, the fact that legacy, STREAMS and 4.2BSD pipes conform to the same standard (POSIX), means that they are no more cumbersome from a programming perspective. Indeed a POSIX conforming application will not know the difference between the implementation (with the exception that superior performance will be experienced on STREAMS-based pipes).
Despite claiming to be an adequate implementation of SVR4 STREAMS, LiS performance is dismal enough to make it unusable. Due to conformance and implementation errors, LiS was already deprecated by Linux Fast-STREAMS, and these tests exemplify why a replacement for LiS was necessary and why support for LiS was abandoned by the OpenSS7 Project (SS7). LiS pipe performance tested about half that of legacy Linux pipes and a full order of magnitude slower than Linux Fast-STREAMS.
There are two future work items that immediately come to mind:
It is fairly straightforward to replace the pipe implementation of an application that uses shared libraries from underneath it using preloaded libraries. The Linux Fast-STREAMS libstreams.so library can be preloaded, replacing the pipe(2) library call with the STREAMS-based pipe equivalent. A suitable application that uses pipes extensively could be benchmarked both on legacy Linux pipes and STREAMS-based pipes to determine the efficiencies achieved over a less narrowly defined workload.
Because STREAMS-based pipes exhibit superior performance in these respects, it can be expected that STREAMS pseudo-terminals will also exhibit superior performance over the legacy Linux pseudo-terminal implementation. STREAMS pseudo-terminals utilize the STREAMS mechanisms for flow control and scheduling, whereas the Linux pseudo-terminal implementation uses the over-simplified approach taken by legacy pipes.
A separate paper comparing a TPI STREAMS implementation of UDP with the Linux BSD Sockets implementation has also been prepared. That paper also shows significant performance improvements for STREAMS attributable to the similar causes.
A performance testing script (perftest_sctipt) was used to obtain repeatable results. The script was executed as:
$#> ./perftest_script -a -S10 --hiwat=$((1<<16)) --lowat=$((1<<13))
The script is as follows:
#!/bin/bash set -x interval=5 testtime=2 command=`echo $0 | sed -e 's,.*/,,'` perftestn= perftest= if [ -x `pwd`/perftest ] ; then perftest=`pwd`/perftest elif [ -x /usr/lib/streams/perftest ] ; then perftest=/usr/lib/streams/perftest elif [ -x /usr/libexec/streams/perftest ] ; then perftest=/usr/libexec/streams/perftest elif [ -x /usr/lib/LiS/perftest ] ; then perftest=/usr/lib/LiS/perftest elif [ -x /usr/libexec/LiS/perftest ] ; then perftest=/usr/libexec/LiS/perftest fi if [ -x `pwd`/perftestn ] ; then perftestn=`pwd`/perftestn elif [ -x /usr/lib/streams/perftestn ] ; then perftestn=/usr/lib/streams/perftestn elif [ -x /usr/libexec/streams/perftestn ] ; then perftestn=/usr/libexec/streams/perftestn elif [ -x /usr/lib/LiS/perftestn ] ; then perftestn=/usr/lib/LiS/perftestn elif [ -x /usr/libexec/LiS/perftestn ] ; then perftestn=/usr/libexec/LiS/perftestn fi [ -n "$perftestn" ] || [ -n "$perftest" ] || exit 1 scls= if [ -x `pwd`/scls ] ; then scls=`pwd`/scls elif [ -x /usr/sbin/scls ] ; then scls=/usr/sbin/scls fi ( set -x [ -n "$scls" ] && $scls -a -c -r pipe pipemod for size in 4096 2048 1024 512 256 128 64 32 16 8 4 2 1 do [ -n "$perftest" ] && $perftest -q \ -r -t $testtime -i $interval -m nullmod -p 0 -s $size ${1+$@} [ -n "$perftestn" ] && $perftestn -q \ -r -t $testtime -i $interval -m nullmod -p 0 -s $size ${1+$@} [ -n "$scls" ] && $scls -a -c -r pipe pipemod srvmod nullmod done ) 2>&1 | tee `hostname`.$command.`date -uIseconds`.log
Following are the raw data points captured using the perftest_script benchmarking script:
Table 1 lists the raw data from the perftest program that was used in preparing graphs for FC6 (i386) on Porky.
Table 2 lists the raw data from the perftest program that was used in preparing graphs for CentOS 4 on Porky.
Table 3 lists the raw data from the perftest program that was used in preparing graphs for SuSE OSS 10 on Porky.
Table 4 lists the raw data from the perftest program that was used in preparing graphs for Ubuntu 6.10 on Porky.
Table 5 lists the raw data from the perftest program that was used in preparing graphs for RedHat 7.2 on Pumbah.
Table 6 lists the raw data from perftest, used in preparing graphs for Fedora Core 6 (x86_64) HT on Daisy.
Table 7 lists the raw data from perftest, used in preparing graphs for CentOS 5 (x86_64) HT on Daisy.
Table 8 lists the raw data from perftest, used in preparing graphs for CentOS 5.2 (x86_64) HT on Daisy.
Table 9 lists the raw data from perftest, used in preparing graphs for SuSE 10.0 OSS on Mspiggy.
|
|
|
|
|
|
|
|