< Chap 14 TOC: .NET Network Performance & Scalability | Main | Windows OS Network Resources >




Chapter 14 Part 1:

Network Performance and Scalability



What do we have in this chapter 14 Part 1?

  1. Overview

  2. Underlying Protocols

  3. Transmission Control Protocol (TCP)

  4. User Datagram Protocol (UDP)

  5. Asynchronous I/O Pattern




The previous 13 chapters introduced a number of network classes, including streams, sockets, and Hypertext Transfer Protocol (HTTP). The complexities of these three fundamental classes can be overwhelming at first, and the prospect of writing scalable, high-performance applications might seem daunting. The good news is that it’s less difficult than it seems. The secret to developing applications that offer high performance and scalability rests on understanding three things: the underlying protocol, the asynchronous I/O pattern, and resource management. In addition to these three basic considerations, the Microsoft .NET Framework Web classes require additional knowledge, since these classes hide many of the underlying operations.

First, let’s define what we mean by high performance and scalability. High performance is the ability of an application to send and receive data in the most efficient way possible. Scalability is an application’s ability to handle anywhere from one to thousands of connections or requests without significantly impacting client performance (such as starving a connection). As clients are added, the amount of required system resources follows a linear path. A scalable application should be able to handle an increasing number of connections or requests until system resources are exhausted, rather than failing at the point where the application design creates a bottleneck.

This chapter will cover several general principles for designing applications for performance and scalability, including how the underlying protocol affects performance, tips for using the asynchronous I/O pattern, and resource management. These principles apply to the .NET Framework Socket class and to the Web-related classes covered in Chapters 8, 9, and 10. We’ll also cover issues specific to the Web classes, such as managing threads and connections, issues specific to the HTTP verbs GET and POST, and authentication, among other topics.


Underlying Protocols


A thorough understanding of the underlying transport protocol is necessary for writing a good network application. Many protocols were designed for the lowest common denominator - high-latency, low-bandwidth connections - perfect for connections over a 2400 baud modem or a satellite link. Unfortunately, this tends to cause problems on the fast Local Area Networks (LANs) of today, which is why it’s important to have an understanding of the protocol’s design. The most prevalent protocols in use are Transmission Control Protocol (TCP) and User Datagram Protocol (UDP).


Transmission Control Protocol (TCP)


TCP is the underlying transport for HTTP and is commonly used by socket- based applications. TCP offers several advantages such as being connection- based, reliable, and able to support flow control; however, it also has several possible disadvantages. One of the most common mistakes using TCP occurs when applications serialize calls between sending and receiving. That is, an application will send data, wait for the send to complete, and then receive data. After data is received, another send is made. This practice is undesirable because TCP is a bidirectional protocol where the sending and receiving paths are independent of one another. Alternating between calls to send and calls to receive means an application cannot use bandwidth available for sending, as it is frequently blocked by a receive operation.

A good application should be receiving data at all times. This is because each side of a TCP connection advertises what is called a TCP window to the peer. The window is the number of bytes that the peer can send in such a way that the local sides receive buffers won’t be overrun. If a peer is sending so much data that the receiving side can’t keep up, the window size will go to 0, which tells the peer to stop sending data. If this occurs, your application will alternate between bursts of data and periods of no data being transmitted, which is an inefficient use of the network. An application should have separate send-and-receive operations so that it can receive data as fast as it can.

Another common problem with TCP is having many connections in the TIME_WAIT state. The TCP protocol defines several states for a connection, and when a connection is closed the peer receives indication of the closure. At this point, the side initiating the close waits for an acknowledgement of its close request by the peer. It then waits for the peer to send its own request to close the connection (since TCP is bidirectional, each side must close the connection), which must be acknowledged when it arrives. Whichever side initiates the close request goes into the TIME_WAIT state. The connection can remain in this state for minutes while it ensures that all outstanding data is properly acknowledged.

The TIME_WAIT state is important because the combination of a local IP address and port along with a remote IP address and port must be unique to successfully establish a connection. For example, if a client from makes a connection to a server address of, then another connection from cannot be established because the identifier for the TCP connection would no longer be unique. This isn’t a problem when the client connection is active; however, if the server actively initiates the close, the connection described by, goes into the TIME_WAIT state. The client receives notification that the connection was closed and can close its socket, but if the client attempts another connection to the server from the same address and port of the previous connection that is in TIME_WAIT on the server, it will be refused.

The solution to this problem is to have the client actively initiate the close so that the connection on the client side goes into TIME_WAIT state. This is less problematic since most client sockets do not bind to an explicit local port. By contrast, servers must bind to a well known port for clients to know how to reach them. In the example, the connection would succeed if the client actively initiated the close and then connected to the server from a different local port. Implicit socket binding is another important issue many developers encounter with TCP. A server socket is always bound to a well-known port so that clients know how to reach it, but most clients either do not explicitly bind the socket before calling connect or they bind to the wildcard address and port zero. A client that does not call bind explicitly will have the socket bound implicitly to a local port in the range of 1024 to 5000 when a connection request is made. A final issue that can be disadvantageous in TCP (and HTTP) applications is the Nagle algorithm. When data is sent on a connection, the Nagle algorithm causes network stack delays for a brief moment (up to 200 milliseconds) to see if the application will make another send call. It does this so that the data may be consolidated into a single TCP packet. If the stack were to send the data in its own TCP packet for every send by the application, the majority of the packets would contain a relatively small amount of data. This makes for a congested network as each TCP packet send adds 20 bytes for the IP header and 20 bytes for the TCP header.

Most applications never need to disable the Nagle algorithm, and doing so can degrade network performance, but there are certain classes of applications that do require data to be sent immediately. Applications that return user feedback sometimes need to disable Nagling to be responsive, extra delays in sending data that result in feedback to the user might be perceived as a hang. Applications that send small amounts of data infrequently also benefit from disabling Nagling as the network delay is unnecessary in these cases. Using the Socket class, the Nagle algorithm is disabled by calling the SetSocketOption method with SocketOptionLevel.Tcp and SocketOptionName.NoDelay. For Web requests such as using the WebRequest or HttpWebRequest class, the ServicePoint object associated with a request exposes a Boolean property, UseNagleAlgorithm, which can be set to false.


User Datagram Protocol (UDP)


Unlike TCP, the UDP protocol is very basic with few or no restrictions on how to use it. UDP is connectionless, so there is no overhead associated with establishing a connection before data can be transmitted. UDP also makes no guarantee that data will be delivered, which greatly simplifies its design,  just send and forget. However, these freedoms can lead to problems. Because UDP is connectionless, a single UDP socket can be used to send datagrams to several endpoints, but each socket object within the networking stack possesses a lock that must be acquired to send data. No other send operations can be performed on the socket while the lock is held. Therefore, if data is being sent on a single socket to multiple destinations, the sends are serialized, which can degrade performance. A better solution is to use multiple UDP sockets to send datagrams to different endpoints.

UDP is also unreliable. If an application sends too many datagrams simultaneously, some datagrams can be dropped in the network stack before they are sent. The stack maintains a limited number of buffers to use for sending datagrams. Therefore, data can be dropped when the network stack runs out of buffer space. No error message is indicated to the sender when data loss occurs. This scenario typically happens when a Socket posts too many asynchronous BeginSendTo operations. If you use blocking SendTo, you’ll rarely encounter this problem.

The Address Resolution Protocol (ARP) plays a part in UDP unreliability. When a UDP datagram is sent to a new destination that has not been sent to before, the network stack must determine whether the destination resides on the local network. The network stack must also determine whether the UDP packet must be sent to the default gateway, so that the packet is routed to its destination. The ARP protocol resolves the IP destination address into a physical Ethernet address. When the network stack determines that there is no ARP entry for a given IP destination, it makes an ARP request. As a result, the first UDP datagram is silently discarded while the ARP request is made. Developers should be aware of this behavior if their applications assume that the local network is quiet and the destination will receive all sent packets, applications should send the first packet twice or implement reliability on top of UDP.


Asynchronous I/O Pattern


Using the asynchronous I/O pattern whenever possible provides what is probably the single greatest performance boost. Asynchronous I/O is the only method that can efficiently manage hundreds or thousands of simultaneous operations on multiple resources. The only alternative is creating multiple threads and issuing blocking calls, but this solution doesn’t scale well. Threads are an expensive resource and there is a cost associated with every context switch from one thread to another.

For example, consider a TCP server implemented using the Socket class. If the server is to handle 1000 client connections simultaneously, how will it do so? Many developers would create a thread to handle communication on each connection, but a process that spawns 1000 threads will quickly exhaust available resources and spend needless CPU cycles switching execution between the threads.

The more efficient method is to use the asynchronous I/O pattern, which typically utilizes the .NET Framework thread pool to handle requests. An application posts one or more asynchronous operations and specifies the delegate methods, which are invoked upon completion. These delegates are called from the thread pool maintained by the Framework. This thread pool is a constrained resource and it is important that it not spend too much time in the asynchronous completion routine. That is, the delegate should simply call the end routine for the initiated call (for example, calling EndReceive in response to the BeginReceive) and then return. If the application needs to perform any kind of processing upon completion, the completion routine should not compute in the completion routine - it should queue the operation to be handled by an application-spawned thread.

The following code sample illustrates this process. The IoPacket class is the context information associated with each asynchronous operation. In the sample, IoPacket contains the byte buffer that received the data, the TCP Socket object, and the number of bytes actually received into the buffer. The HandleReceive function is the asynchronous callback invoked when the receive completes. If the receive succeeded by receiving data, the IoPacket object is added to a list and an event is signaled. The ReceiveThread method is a spawned thread that waits for the event to be signaled and then walks the list of pending IoPacket objects and processes them. Note that access to the list is synchronized by using the Monitor class.



ArrayList    receiveList = new ArrayList();

ManualResetEvent    receiveEvent = new ManualResetEvent(false);


public class IoPacket


    public byte [ ]  receiveBuffer = new byte [ 4096 ];

    public Socket   tcpSocket;

    public int      bytesRead;

    // Other state information



void HandleReceive( IAsyncResult ar )


    IoPacket    ioData = (IoPacket) ar.AsyncState;


    ioData.bytesRead = ioData.tcpSocket.EndReceive( ar );

    if ( ioData.bytesRead == 0 )


        // Connection has closed





        Monitor.Enter( receiveList );

        receiveList.Add( ioData );


        Monitor.Exit( receiveList );

        ioData = new IoPacket();

        // Post another BeginReceive with the new ioData object




void ReceiveThread()


    IoPacket    ioData;

    bool        rc;


    while (true)


        rc = receiveEvent.WaitOne();

        if ( rc == true )




            Monitor.Enter( receiveList );

            while ( receiveList.Count > 0 )


                ioData = (IoPacket) receiveList[0];

                receiveList.RemoveAt( 0 );


                // Do something with data


            Monitor.Exit( receiveList );





Visual Basic .NET

Dim receiveList As ArrayList = New ArrayList

Dim receiveEvent As ManualResetEvent = New ManualResetEvent(False)


Public Class IoPacket

    Public receiveBuffer(4096) As Byte

    Public tcpSocket As Socket

    Public bytesRead As Integer


    ' Other state information

End Class


Public Sub HandleReceive(ByVal ar As IAsyncResult)

    Dim ioData As IoPacket = ar.AsyncState


    ioData.bytesRead = ioData.tcpSocket.EndReceive(ar)

    If (ioData.bytesRead = 0) Then

        ' Connection has closed







        ioData = New IoPacket

        ' Post another BeginReceive with the new ioData object

    End If

End Sub


Public Sub ReceiveThread()

    Dim ioData As IoPacket

    Dim rc As Boolean


    While (True)

        rc = receiveEvent.WaitOne()

        If (rc = True) Then



            While (receiveList.Count > 0)

                ioData = receiveList(0)


                ' Do something with data

            End While


        End If

    End While

End Sub

Consider an application that creates multiple connections to a server and receives data that must be written to local files. The application posts one or more asynchronous receive operations on each connection. When the completion routines fire, the application takes the receive buffer, which now contains data, and enqueues the data in some kind of array (such as an ArrayList). An event is then signaled to wake up an application thread to dequeue the buffer and then write the data to disk. The application avoids performing blocking operations in the asynchronous delegate. This prevents the Framework’s thread pool from being blocked and it also disassociates the network operations from the disk operations, which allows data to be received at the fastest rate possible. Note: In versions 1.0 and 1.1 of the .NET Framework, the necessary code access security checks create considerable overhead in executing a callback function when an asynchronous method completes. This overhead results in significantly decreased performance when compared to native Winsock applications. Significant performance improvements for asynchronous callbacks are anticipated in the newer release of the .NET Framework.




< Chap 14 TOC: .NET Network Performance & Scalability | Main | Windows OS Network Resources >