| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Nothing ever works exactly as planned and various techniques and tools as well as an understanding of commonly encountered problems are valuable aids to troubleshooting your web applications.
Resin provides a number of diagnostic messages using the JDK logging facility. Full debug logging is enabled with an empty name (meaning match all names) and a level of 'all'. Since this will produce many messages, it is useful to put the messages in a a seperate file. Full debug logging of the server and all applicationsThe following configuration creates a debug log for each day, and when a problem occurs the debug log is used to help find out what the problem is. Since the <log> entry is in , the log will capture messages from the server itself and all applications. The resulting log messages are found in the file .
More specific areas can be targetted with appropriate values for name. The documentation for <log> provides a table of useful values for name. Full debug logging of one web applicationOften you can narrow down the logging requirements to a single web application. The log entry can be placed in and only log messages from that web application will go to the log file.The following configuration creates a debug log for each day in .
If an application seems stuck, or is running out of resources, a thread dump will reveal the state of the server. Java's thread dumps are a vital tool for server debugging. Because servlets are intrinsically multithreaded, it is very possible to create deadlocks without realizing it, or to have runaway threads that consume resources and cause OutOfMemory exceptions. That's especially true when you start adding third-party software like databases, EJB, and Corba ORBs. Thread dump using JDK 5 toolsand are useful tools included in JDK 5, providing a quick command line method for obtaining stack traces of all current threads.
Thread dump by sending a signalOn Windows, ctrl-break may produce a thread dump. On Unix, "kill -QUIT" will produce a thread dump. On the Linux IBM JDKs, you'll need to signal one of the child processes. On the Linux Sun JDK, you'll need to kill the grandparent process of most of the threads:
The Solaris 1.3 JDK has a single process, so it's easy. The Solaris 1.2 JDK has an extra quirk. You need to run Resin in the foreground (i.e. not using "start".) Then you'll need to answer some questions on the console. Thread dump if signalling doesn't workYou get a thread dump without signalling the process by starting the JVM with some extra arguments to allow a debugger to attach. You can then attach with the debugger at any time to get a thread dump. This technique works on all operating systems. Here are some step by step instructions:
Unix users (and Cygwin users on Windows) will recognize the opportunity to make a script:
Although no rigorous benchmarking has been done, there appears to be little overhead or performance penalties involved in having the JVM start with the options that allow a debugger to attach. Understanding the thread dumpIn any case, you'll eventually get a trace that looks something like the following (each JDK is slightly different):
Each thread is named. Here are some of the common names:
There should be one tcp-accept-xxx thread for each <http> and <srun> that Resin's listening for. The tcp-accept-xxx thread should almost always be in socketAccept. There should be several tcpConnection-xxx-n threads. Each of these is the servlet thread. On a busy server, these can appear anywhere in your code. If several appear in one location, you've likely got some sort of deadlock or at least a slow lock. Idle threads are either in tcpAccept or httpRequest or runnerRequest (for keepalive threads.) For deadlocks, you'll want to look at the "waiting on monitor" threads and any case where lots of threads are stuck at the same location.
Most memory problems are due to memory leaks in the application program. For example, a cache or a vector that fills up with stale data, or a singleton or static variable which doesn't properly detect a web-app reload. Some more exotic memory issues relate to running out of heap memory or virtual memory when using a large number of threads (> 256). The steps to track down a memory problem are:
-verbosegc-verbosegc is a debugging output from the JVM. It's very convenient for checking basic memory usage and garbage collection time. It's a good idea to use -verbosegc for any production system. When starting Resin, you'll use -J-verbosegc. The specific output depends on the JVM, and looks something like the following:
The "(9768K)" is the most important number, showing that the maximum allocated heap is about 10 meg. The other numbers show the actual heap used before and after garbage collection. Check memory usage with a heap dumpIf an application gobbles up memory until it throws an OutOfMemory exception, or seems to be spending an excessive amount of time doing garbage collection, a heap dump can help track down the source of the problem. What you really need to do is get a good CPU and heap profile. The JDK has a simple one (but not very user-friendly), so you don't absolutely need to buy a profiler. An alternative for simple heap monitoring (garbage collection issues), are the jvmstat tools . It's a standard java argument, so "java -Xrunhprof:help" will tell you how to start it. For example, you could start Resin (unix or windows) as
(On Unix, Resin's startup script has a -cpuprof-ascii argument that sets it for you automatically.) Run a load against it for a decent amount of time (you can even use something like Apache's "ab" for 10 minutes), and then stop the server nicely. i.e. you can't just kill it with ctrl-C, but you need to make it do a clean exit. That will dump a java.hprof.txt. Skip to the bottom to get the trace. Understanding java.hprof.txt heapAssuming you're being cheap, and using the JDK's heap profiling instead of buying a profiler, you'll need a bit of help interpreting it. The following is an example of a heap dump running Resin. Here's what you might see skipping down to the "SITES BEGIN" section. (I did mention this is the cheap way of doing things.) As with most profiling, you only want to pay attention to the top 20. Everything else is in the noise. Ignore it.
There are two things to look for. First, if any one class starts chewing up a large number in the "live objs" column, you'll need to take a look. That might be a memory leak. Second, if some class has a huge number in the "alloc'ed objs" column, you may be wasting lots of garbage collection time that could be easily solved with some caching. The [C in the class name means a character array. To see what that really means, you'll need to look at the stack trace (3271):
That's part of Resin's VFS code. Maybe in the future it'll make sense to try to reduce that. You can get a longer trace using the "-prof-depth 10" argument to Resin. (Or the corresponding depth in -Xrunhprof.) That will often give more information. Understanding java.hprof.txt CPUThe CPU profiling is a little easier to read. On some JDK's you'll need to run it with the JIT disabled.
You should only pay attention to the top 20 or so. You'll probably need to ignore a few of the top 10, because they're just waiting for a user response. The SocketInputStream.socketRead is an example. You can use the trace number to get a call trace:
Monitor garbage collection(thanks to Rob Lockstone for his comments) Run Resin with the additional argument where "gc.log" is the name of the log file which will be created in the root Resin directory, e.g. /resin/gc.log.Once the server has run under load for a while, or when problems start appearing, look in the gc.log file and search for "Full". They will happen fairly infrequently at first, and then towards the end of the file, they will become progressively more frequent until they are happening almost continuously (with little or no change in the size of the heap). Note that the "timestamp" in the first column is the number of seconds since the process started. Garbage collection logging will have a small imapct on performance but is invaluable in diagnosing garbe collection related problems. Possible reasons for excessive garbage collection are memory leaks and inadequate heap memory Increase heap memorySee Performance : JVM Tuning : Memory .
Conflicts often arise with older or incompatible versions of classes. Starting Resin with an empty CLASSPATH environmental variable is a first step to eliminating these classes.
If you have manually placed any jars in $RESIN_HOME/lib or you jdk directories, those are possible conflicts as well. If a new version of Resin s installed overtop of an older version of Resin (i.e. in the same directory), some old jars might be left. It is best to give each version it's own directory. If the RESIN_HOME environment variable is not set, Resin may be picking up an older version. You can start httpd with the -verbose option to see the CLASSPATH that is being used when Resin is started.
To monitor just the HTTP headers, the following debug logging can be enabled in $RESIN_HOME/resin.conf:
Intercepting and monitoring the raw data passed between a web browser and Resin can provide valuable insight. This raw data reveals the headers and content that are submitted by the browser, and the headers and content that are returned by Resin. The Apache Axis jar includes a utility "tcpmon" that can be used to intercept and monitor the communication between a browser and Resin. Using tcpmon, you specify a "listen port" and a "target host" and a "target port". For example, if you usually have Resin running on port 8080, you can start tcpmon with a "listen port" os 9090, and a target host of localhost and a target port of 8080. Now you will use a url http://localhost:9090 in your browser. This causes your browser to use tcpmon. tcpmon will log the request, pass it through to Resin (on port 8080), and log the response before returning it to the browser.
Resin's default is to use the `internal' compiler because it is the most likely to be available. Sometimes the internal compiler will start to cause errors, causing exceptions or simply hanging and taking up a thread. The solution is to change the compiler to `javac' in resin.conf:
Jikes is another option for the compiler, as discussed in the documentation for <javac> .
Each thread is allocated a stack, if the size of the stack is too large, then you may run out of memory when a higher number of threads is reached. Techniques to adress this are discussed in JVM Tuning .
The -a indicates that both listening (waiting for a connection) and non-listening (active) sockets should be displayed. The -n causes the display of port numbers (i.e. 80) instead of port names (i.e. http). The -p causes the display of the process that is using the socket. Since the windows netstat is different than the Unix one, -p is not used on Windows. Connection stateThe connection state is probably the most important information. The man page for netstat has a full description of each state that is displayed. "LISTEN" or "LISTENING" is an indication that there is a process waiting for connections on the socket.
"TIME_WAIT" indicates that the socket is waiting after close to handle packets still in the network. After a connection is closed, the socket is kept open by the operating system for a short period of time. Even after the close is done, there may be some stray packets on the network that are destined for the connection. The idea of TIME_WAIT is that it keeps the socket open long enough to catch those stray packets, so that they don't get erronously delivered to a new connection opened on the same socket. Port usageIf Resin is indicating that it cannot bind to a port, it may be an indication that another process is using the port. netstat may help determine the process that is using the port. Since netstat produces a lot of information, it is helpful to filter the output to limit it to the port of interest. The following example is for port 80:
|