Debugging TCP/IP
by Warren Young
(This article was written with the Winsock programmer in mind,
but the information in it can be used by Unix programmers, as well as
administrators and technicians.)
TCP is a simple protocol in a certain sense: you send data, it delivers
it. Because it was engineered for reliability in networks of uncertain
quality, it works around a lot of problems without bothering the end
user. But partially because of this reliability, TCP exhibits behaviors
that surprise those that don't truly understand the protocol. This
tutorial will introduce you to the most important of these issues,
but it's really the tip of the iceberg. For the submerged part, see TCP/IP Illustrated. Incidentally,
the state/transition diagram below comes from volume 2 of that series. It
happens to be printed in Volume 1 of the series as well, and in Stevens'
Unix Network Programming, volume
1. You can also get that diagram in Postscript format off the web;
see the Miscellaneous Resources
section of the FAQ for a pointer.
In this tutorial, we use the term "packet" to mean "frame" rather
than "datagram." That is, a packet for our purposes is a collection of
data wrapped in a TCP frame. The nebulous thing called "the network" is
allowed to split the data in a TCP frame over multiple hardware frames,
or coalesce data from multiple TCP frames into a single TCP frame, etc.,
but the frame itself will remain functionally intact. This is as opposed
to the "datagram" meaning of "packet," for an inviolable block of data
that is unchanged from sender to receiver.
TCP Control Bits
When a TCP implementation decides to send a packet of data to the
remote peer, it first wraps the data with 20-plus bytes of data called the
"header". Headers are an essential part of network protocols, because they
enable the participants in the network make decisions regarding the data
flowing over it. Every protocol adds headers (and sometimes trailers)
to your data. We won't discuss the TCP and IP headers in detail here,
as that's better left to books like W. Richard Stevens'.
Within the header is a field that I will call the "control bits,"
for lack of a better term. The bits that interest us here are called SYN,
ACK, FIN and RST, for "synchronize," "acknowledge," "finish," and "reset,"
respectively. These bits are set in TCP packets for the sole benefit of
the remote peer's network stack that is, they are the machinery
under the hood that most people never have occasion to examine.
The State/Transition Diagram
Below is the state/transition diagram for the TCP protocol. The
states are in round-ended boxes, and the transitions are the labelled
arrows. The transitions show how how your program can make TCP move
from one state to another. It also shows how the remote peer can make
your stack change TCP states, and how you can recognize these changes at
the application level. Note that transition labels come from the names
of BSD sockets functions; although there are differences in the Winsock
API, the effects are the same at this level. (I apologize for the so-so
readability of the text in this image, but it's already too big at 20K,
so I'm unwilling to make it any bigger if you want a pretty,
readable diagram, get the Postscript file and print your own copy.)
Understanding this diagram is really one of the keys to understanding
TCP, so let's go through a few exercises. But first, you need to know
about the netstat tool. This tool comes with all Microsoft
TCP/IP stacks, and probably others as well. It is modelled after a Unix
tool of the same name, with virtually the same output. (The differences
between each version of this tool are slight enough that once you learn
to use one, the rest are trivial to pick up.)
The netstat tool is usually run from the command line,
often with the -n flag to make it faster. (-n
suppresses the DNS name lookups, displaying the raw address and port
numbers instead.) Another useful flag is the -a flag, which shows
"all" entries, including listeners. (The -a feature is somewhat
broken under Windows 95/98, but works better under Windows NT/2000.) It
is also very helpful to use this tool in combination with a "grep" tool
I recommend the Cygwin port of
GNU grep. The package also includes GNU's "less" pager, which blows the
doors off of "more", especially Microsoft's emasculated versions.
Microsoft netstat s output four columns: the protocol (e.g. TCP
or UDP), the local address/port combination, the remote address/port
combination, and the current state of that connection. The first three
columns are self-explanatory, and are often collectively called the
"connection 5-tuple," which uniquely describes a given TCP or UDP
connection. The last column corresponds directly to the states in the
diagram above.
A Micro-FAQ
Now for those exercises I promised:
- Problem: From the default CLOSED state, how does a
client program normally get to the ESTABLISHED state?
Solution: The client calls the connect() function
(or similar), which causes TCP to send an empty packet with the
SYN control bit set (SYN_SENT). The remote peer's stack sees
this "synchronize" request, and sends back an empty packet with
the SYN and ACK bits set (i.e. "I acknowledge your synchronize
request"). When the client receives the SYN/ACK packet, it sends
back an ACK packet, and reports a successful connection to the
client program.
- Problem: What is the normal TCP shutdown sequence?
Solution: The important thing to understand is that TCP is
a truly bidirectional protocol. So, the connection is shut down
in two identical stages, one for each "direction". One peer sends
a packet with the FIN bit set, which the other end ACKnowledges;
when the other end is also finished sending data, it sends out a
FIN packet, which the other end ACKs, closing the connection.
- Problem: What is the significance of the RST bit?
Solution: This is an abnormal close, also called "slamming
the connection shut." It happens under several circumstances, but
none of the common ones are documented in the Stevens diagram. Two
of these cases you can cause from Winsock: the first method
is to set SO_LINGER to 0 with setsockopt()
and then call closesocket() . The second method is to call
shutdown() with how equal to 2, optionally followed
by a closesocket() call.
From the Winsock client level, the two other common RST
occurrences are "connection refused" and "remote peer terminated
connection." The first happens when you try to connect to a
port that isn't open on a remote machine. The second happens
as a result of the remote peer using one of the two RST-forcing
methods above; alternately, the application could have crashed,
and the peer's stack sent out a RST for its connection. Another
way this can happen is the remote peer catastrophically crashed,
and then after the remote machine came back up, your program sent
it a packet which the stack rightfully had no way of delivering,
so it replied with a RST packet, because the connection's 5-tuple
is now invalid.
Generally speaking, RST signals a problem of some kind: either
something bad happened to the connection, or there's a bug
somewhere. For example, some firewalls improperly use the RST
bit to signal a closed connection. The solution, of course,
is to replace the firewall product. B-)
- Problem: Netstat shows lots of sockets in the TIME_WAIT
state. What's wrong?
Solution: Nothing's wrong. TIME_WAIT is absolutely
normal. If you go through the state-transition diagram above,
you see that every socket that gets closed normally goes through
this state after it's closed. The TIME_WAIT state is a safety
mechanism, to catch stray packets for that connection after the
connection is "officially" closed. Since the maximum time that
such stray packets can exist is 2 times the "maximum segment
lifetime" (the time for a packet to go from your machine to the
remote peer and for the response to come back), the TIME_WAIT
state lasts for 2 * MSL. The only tricky bit is, there is no
easy way to estimate MSL on the fly, so stacks traditionally
hard-code a value for it, from 15 to 60 seconds. Thus, TIME_WAIT
usually lasts 30-120 seconds.
- Problem: My sockets keep getting into a FIN_WAIT_x
state. What's wrong?
Solution: Either your program or the remote peer is
not closing the socket properly. If you walk through the
state-transition diagram above, you can see that FIN_WAIT_1
usually happens when the local program calls shutdown()
with the "how" parameter set to 1 or SD_SEND, but the remote
peer doesn't respond. Likewise FIN_WAIT_2 happens when the
remote peer shuts down its sending half of the connection, and
your program doesn't respond. Since FIN_WAIT states often last
up to 10 minutes, it's well worth the effort to fix the problem
that's causing these FIN_WAIT states. (The exact length of the state
depends on the stack and the circumstances that got it into that
state.)
- Problem: Often my calls to
bind() fail when I
try to re-bind to a port that I was just using. What's wrong?
Solution: The socket is probably in one of the FIN_WAIT
states or in the TIME_WAIT state. If it's a FIN_WAIT problem
and you can't fix it, or if it's the normal TIME_WAIT state, the
best thing to do is to redesign your program so that it doesn't
need to keep re-binding. For example, a server program generally
keeps its listener socket alive so that it doesn't have to keep
re-binding it to the port; if you closesocket() the listener
for some reason after each successful connection, your listener
socket will go into the TIME_WAIT state for somewhere between 30
and 120 seconds, during which you won't be able to re-bind to
that port. However, if you find that re-binding is absolutely
necessary, setting the SOREUSEADDR option with
setsockopt() will usually get you around the problem.
Tools for TCPers
Below is a small batch file I find helpful in dealing with TCP state
issues, which I call "showwait." Basically, it shows you the current
WAIT states every second until you hit Ctrl-C. I have a similar script
on my Unix machines as well.
@echo off
:loop
netstat -na |grep WAIT
delay 1
goto loop
This script depends on a 4DOS feature called "delay". If you don't
use this shell, get an implementation of the "sleep" command, which does
the same thing. The Cygwin toolset, mentioned above, includes one. (Are
you maybe getting the impression that I'm a closet Unixhead? Oh,
noooo.... B-> )
There's one problem with this tool: it only catches problems with
"WAIT" in their name. Less common states like LAST_ACK and SYN_RCVD won't
be seen by this script. SYN_RCVD in particular signals serious problems if
it stays around for a prolonged amount of time, because it indicates that
a remote machine sent your machine a SYN packet, your machine ACK'd it,
and the remote machine has failed to ACK your SYN/ACK. Since this exchange
typically only takes from a few tens to a few hundreds of milliseconds,
a persistent SYN_RCVD indicates a badly-written network stack, or a very
"crashy" computer. If you see many of these states at once, it may mean
you're under a "SYN attack", one of several "Denial of Service" attacks
that are going around these days. At that point, it's time to break out
the network sniffer and start some detective work.
Conclusion
The techniques and information in this article reflect the basic
mental tools that your organization needs to develop, even if it's just
appointing a single "networking guru" who will master this material,
and become a resource for the other developers in the company. This
knowledge is very widely useful; for example, it can make reading sniffer dumps less painful and
more productive. Also, these techniques can reasonably be applied by
technicians working with knowledgeable users over the phone to gather
information about failures in your program that otherwise would get
logged as random failures.
I hope you have learned something about TCP/IP debugging from this
article. If you can think of anything else that would fit within the
scope of this article, propose an extension and I'll seriously consider
adding it.
Happy hacking!
Copyright © 1998-2000 by Warren Young. All rights
reserved.
|