Application Troubleshooting with Packet Captures

The organization I work for uses a 3rd party industry specific application that controls data acquisition from external hardware. Recently it began to randomly malfunction on multiple servers; specifically, it would hang at the beginning of a new batch of work and become unresponsive. The only fix was to restart the application service.

Previous attempts at troubleshooting confirmed the hardware was ok. Various other software configurations were tried with no success. Logging from the application was basically non-existent, there were no clues in the standard Windows event logs, and the software vendor was unhelpful at best. After several days, “the network” was increasingly being blamed as the problem. I doubted that, based on the circumstances. When I was asked to look at the issue my first inclination was to run a packet capture on the machine, primarily to rule out the network, and also to see if I could find any potential sources of the problem.

I’ve worked with Wireshark before, but having just read about the release of the newest Microsoft Network Monitor (version 3.2), I wanted to give that a try. One of the biggest features of this release is the ability to sort network traffic by process. After a quick installation I started the packet capture and ran it overnight, and fortunately there was an application hang. Analyzing the packet capture not only helped me determine that “the network” wasn’t the problem, but also helped me pinpoint the exact cause.

Looking through the capture file at the time of the hang, I was able to see that a process on another server had a data file open via the SMB protocol for about 3 minutes. This data file was needed by the problem application, and coincidentally enough, it attempted to open the file during that 3 minute window (which was later verified on another run with the Sysinternals FileMon utility). It seems the application wasn’t equipped with the logic to either retry the file operation, or display any sort of error message, and it simply became unresponsive when it wasn’t able to open the file. The process on the second server was one that periodically took point in time snapshots of data files for archival purposes (basically a backup program), and required having the file open. Being able to sort traffic by process helped me find this much faster than having to wade through all traffic.

Using the information gathered from the packet capture, we were able to change our workflow processes temporarily to avoid the problem, until we could work with the vendor for a permanent solution. Packet capturing won’t always be the answer to troubleshooting those seemingly unsolvable problems, but it is invaluable when needed. Every problem always has a logical answer, though it sometimes may not seem like it. Having well rounded knowledge, combined with a good tool set, you’ll be well equipped to research and troubleshoot a wide variety of problems.

Links:

Microsoft Network Monitor 3.2

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: