[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [openss7] SCTP Proto

To: Chuck Winters <cwinters@atl.lmco.com>
Subject: Re: [openss7] SCTP Proto
From: "Brian F. G. Bidulock" <bidulock@openss7.org>
Date: Tue, 29 May 2001 23:34:25 -0500
Cc: openss7@openss7.org
Organization: http://www.openss7.org/
Sender: owner-openss7@openss7.org

Chuck,

First:  Thank you for performing this testing!

Yes the results are very interesting.  I have a number of questions and
comments:

Q:      Would it be possible for me to post up your results on the
        OpenSS7 site?  (Our list server rejected the size of your
        attachment as too large and I am sure that others would like to
        look at the results.)

Q:      Can you share the test code?  At least the portion which
        interfaces directly with the socket, sends and receives data and
        makes the time measurements?

Q:      How does the test application operate?  Does it send one
        forward message (timestamping) and then poll for an
        acknowledgment (timestamping)?  Or, does it send a stream of
        forward messages (timestamping) and then correlate the
        (timestamped) responses?

Q:      What was the setting of the various SCTP configuration options
        and socket options for the test?  What was the setting of TCP
        configuration options and socket options for the test?  (e.g.
        was TCP set TCP_NODELAY, was CONFIG_SCTP_SLOW_VERIFICATION set?
        etc.)

Q:      Is it possible to get Ethereal dumps generated by a third
        box snooping between the two?

C:      Although using a single byte reply is quite applicable to TCP,
        it is a rather unfair comparison for SCTP.  TCP can place a 1
        bytes acknowledgement into a 21 byte IP payload.  SCTP when
        bundling SACK with DATA chunks requires a 12 byte message header
        a 12 byte SACK chunk, a 12 bytes DATA chunk header and a 4 bytes
        (padded) data for a grand total of a 40 byte IP payload.

        That is, it is far more complicated for SCTP to generate a one
        byte reply than it is for TCP.  A fairer comparison might be an
        echo test where the receiving side merely echoes the data back
        to the originator.

C:      In the test results for SCTP, it appears that 50% of the RTTs
        were exactly 1000 usecs.  This dirac delta function in the
        results makes me suspect a bug in the code, the test code or the
        method of generating the graphs.  It is even more suspicious
        that this spike at 1000 occurs at all frame sizes.

        I seriously doubt that one could write a software clock that was
        this accurate at 1 MHz.

C:      The extremely large variances in the RTT makes me wonder
        whether SCTP is getting itself into a retransmission scenario.
        Ethereal traces would be very helpful.

C:      It is interesting that the SCTP minimums are consistently about
        double that of TCP's minimums.  SCTP SACKs only ever second data
        chunk received unless it sends data.  SACKs are bundled ahead of
        DATA in SCTP messages.  The receiving stack may be introducing a
        delay by processing the SACK before the DATA.  Again, Ethereal
        dumps and the testing code would help here.  If you are sending
        one DATA chunk and waiting for the one byte reply, this might be
        exactly what is happening.

C:      Kernel crashes over 512 bytes is a good debugging lead.  I will
        chase that one down and release a patch.  It would be very
        interesting to see comparisons with TCP over 1024 (TCP's default
        MSS) when TCP is forced to fragment, or comparisons of packet
        sizes greater than the MTU.

Overall your testing indicates that there might be some problems in poll
handling, sleeping or waking proceses, acknowledgement handling, etc.,
but some strong numbers down at the 300 usec side of the histograms
indicate that it is quite possible to get this SCTP stack performing as
well as TCP and even outperforming it at larger message sizes.  A little
more information (test code, ethereal dumps) would make things quite
easy to chase down.

There are about 5 places in the code that I know of where significant
speed improvements can be made once these quirks are found.  There are:

        1)  Rework the copy_and_checksum_from_user.  I turned it off in
            France due to some problems in generating incorrect
            checksums.  As it stands, data is copied from the user and
            the checksum is recalculated on the data each time that it
            is retransmitted.

        2)  Rework cloning of sk_buffs when bundling DATA chunks.  As
            it stands, data is copied too many times.

        3)  Place stream datastructures into kmem caches.  Currently
            the stream data is kmalloc'ed and kfree'd rather than being
            placed in a hardware aligned kmem cache.  This data
            structure is accessed on every DATA chunk transmission and
            should really be cached.

        4)  There is really no need to perform slow verification.  The
            option should be removed.

        5)  The module is compiling -O2 and I'm not sure that the
            compiler is inlining everything that needs to be inlined.  I
            can check this an rewrite as macros those things which are
            missing being inlined.  I particularly suspect the
            established fast path for receive data.

In France we were hoping to get the conformance correct before
addressing performance.  I'm sure that it will not take too much to get
this stack running as fast as you would like.

--Brian

Chuck Winters wrote:                           Tue, 29 May 2001 18:10:45
>
> Hey,
> 	I recompiled my kernel for to only use one processor.  I have been
> doing some preliminary testing of the protocol, and have found it to be
> quite slow.  I am getting average rtt of about 1900 microseconds.  I am
> including two preliminary tests.  One on tcp and one on SCTP.  
> 	You will notice that the sctp one only went to 512, but that is only
> because the kernel crashes every time at that point.  These are only
> preliminary, but I thought they may be interesting. 
> 
> Thanks,
> Chuck
> 
> --
> Chuck Winters                            | Email:  cwinters@atl.lmco.com
> Distributed Processing Laboratory        | Phone:  856-338-3987
> Lockheed Martin Advanced Technology Labs |
> 1 Federal St - A&E-3W                    |
> Camden, NJ 08102                         |

-- 
Brian F. G. Bidulock    Ś The reasonable man adapts himself to the Ś
bidulock@openss7.org    Ś world; the unreasonable one persists in  Ś
http://www.openss7.org/ Ś trying  to adapt the  world  to himself. Ś
                        Ś Therefore  all  progress  depends on the Ś
                        Ś unreasonable man. -- George Bernard Shaw Ś

Follow-Ups:
- Re: [openss7] SCTP Proto
  - From: Chuck Winters <cwinters@atl.lmco.com>

References:
- [openss7] SCTP Proto
  - From: Chuck Winters <cwinters@atl.lmco.com>

Prev by Date: [openss7] SCTP Proto
Next by Date: Re: [openss7] SCTP Proto
Previous by thread: [openss7] SCTP Proto
Next by thread: Re: [openss7] SCTP Proto
Index(es):
- Date
- Thread