Wednesday, February 10, 2010

Say goodbye to awful concurrency bugs -- Showcase of MulticoreSDK on Derby

In my last blog, I illustrate one of the notorious concurrency bugs – deadlocks, and how to find them without reproducing the deadlock using MulticoreSDK. The sample I gave was the classic dining philosophers problem. To verify how effective the tool is, I am thinking that MulticoreSDK should be applied to real-world applications to find real deadlocks.

Finally I found one real deadlock case reported in Derby, an open source relational database implemented in Java. Then this real deadlock case becomes one of our benchmarks to verify effectiveness of MulticoreSDK deadlock detector.

I downloaded the driver program BlobDeadlock.java and the buggy Derby version. Apply MulticoreSDK in the deadlock case with following steps,

  1. Download MulticoreSDK from its website, and install it following the user manual. Suppose MulticoreSDK is extracted under {msdk-cmd}.
  2. Open KingProperties file in props folder, set preference targetClasses = org/apache/derby to instrument and monitor all classes in Derby.
  3. Compile the driver program,
$ mkdir bin && javac -d bin BlobDeadlock.java
  1. Run the driver program with MulticoreSDK (no real deadlock occurs in execution)
$ java -Dcontest.preferences={msdk-cmd}/prop/KingProperties -javaagent:{msdk-cmd}/lib/ConTest.jar -cp .:bin:derby.jar BlobDeadlock
  1. Run post analysis against trace file,
$ java -ea -cp ConTest.jar com.ibm.contest.lock_dis_checker.Main .


Surprisingly, the post analysis found no deadlock cycle. I first checked the trace file generated in step 4 and threaddump.txt indicates where the deadlock happens. According to threaddump.txt file, one of the two threads involved in the deadlock is waiting to acquire lock at java.util.Observable.deleteObserver(Observable.java:78). I realize that in step 2, we didn't specify to instrument Java core classes, such as java.util.Observable, etc. So the locks taken in Observable class were not traced in file. Perhaps it's the root cause why MulticoreSDK doesn't report the deadlock in Derby.

Following additional steps are taken to instrument class Observable,

  1. Open KingProperties file in props folder, set preference targetClasses = java/util/Observable.
  2. Instrument class Observable offline, since JVM doesn't give you a chance to instrument preloaded Java core classes in runtime,
$ java -cp {msdk-cmd}/lib/ConTest.jar:{$JAVA_HOME/jre/lib} com.ibm.contest.instrumentation.Instrument java.util.jar
After that, apply MulticoreSDK in the deadlock case again from step 2 above. The deadlock analysis result is shown below,
Listing 1. Potential Deadlocks Results from Derby
Deadlock Cycle 1: [666, 315]
#315->#666 #666->#315
edge #315->#666 consists of:
Thread [java.lang.Thread@1909682643]: lock taken at [java/util/Observable.java:78 deleteObserver(java.util.Observer) org.apache.derby.impl.store.raw.data.BaseContainerHandle@840] inside a different lock taken at [org/apache/derby/impl/store/raw/data/BasePage.java:1720 releaseExclusive() org.apache.derby.impl.store.raw.data.StoredPage@487]
edge #666->#315 consists of:
Thread [java.lang.Thread@1915449899]: lock taken at [org/apache/derby/impl/store/raw/data/BasePage.java:1334 isLatched() org.apache.derby.impl.store.raw.data.StoredPage@487] inside a different lock taken at [org/apache/derby/impl/store/raw/data/BaseContainerHandle.java:408 close() org.apache.derby.impl.store.raw.data.BaseContainerHandle@840]
===================================================

Now MulticoreSDK successfully reports the same deadlock to the real deadlock case despite that the deadlock doesn't surface once in my execution :)

MulticoreSDK Tool Link
http://www.alphaworks.ibm.com/tech/msdk

Easily find potential deadlocks in concurrent software without reproduction

Deadlock among Java threads is a common concurrency bug stemming from inappropriate synchronization order. It's a disaster that deadlock can make program permanently unable to make forward progress. What's worse, deadlocks do not manifest themselves predictably and when they do surface, it is often at the worst possible time in production, under heavy load. It means that reproducing deadlock which depends on some lucky timing, can be extremely difficult. So traditional bug fix methodology, reproduce error->find root cause->fix bug, is not feasible to fix concurrency bugs without appropriate tool.

A sophisticated method/tool to find potential deadlocks in program without deadlock reproduction is urged. Multicore Software Development Kit (MulticoreSDK) is the toolkit of this kind.

The Java sample below simulates dining philosophers problem which contains potential deadlocks.

Listing 1. Dining Philosophers Problem
package sample.deadlock;

public class PhiloSample extends Thread {
static final int NUM_PHIL = 3;
int id;
Table t;

PhiloSample(int id, Table t) {
this.id = id;
this.t = t;
}
public void run() {
Fork left = t.getFork(id);
synchronized (left) {
Fork right = t.getFork(id + 1);
synchronized (right) {
t.eatFood();
}
}
}

public static void main(String args[]) throws Exception {
Table tab = new Table();
PhiloSample[] p = new PhiloSample[NUM_PHIL];
for (int i = 0; i < NUM_PHIL; ++i) {
p[i] = new PhiloSample(i, tab);
}
for (int i = 0; i < NUM_PHIL; ++i)
{
p[i].start();
}
for (int i = 0; i < NUM_PHIL; ++i)
p[i].join();
}
}


class Fork {
}

class Table {
Fork forks[];
static final int MAX_EAT = 12;
int eatctr;

Table() {
forks = new Fork[PhiloSample.NUM_PHIL];
for (int i = 0; i < PhiloSample.NUM_PHIL; ++i)
forks[i] = new Fork();
eatctr = MAX_EAT;
}
public Fork getFork(int id){
return forks[id % PhiloSample.NUM_PHIL];
}
public synchronized void eatFood() {
eatctr--;
}
public synchronized boolean isDone() {
return eatctr == 0;
}
}



Although the code is buggy, the deadlock is not necessarily to manifest all the time in execution, which increase the difficulty to reveal and fix deadlock problem. MulticoreSDK can be easily used to find potential deadlock in Java program regardless whether the deadlock occurs or not, with following steps:
  1. Download MulticoreSDK from its website, and install it following the user manual. Suppose MulticoreSDK is extracted under {msdk-cmd}.
  2. Open KingProperties file in props folder, set preference targetClasses = sample/deadlock (with slashes instead of dots), which means the tool will work on classes with prefix sample.deadlock.



  3. Compile the sample,
$ mkdir bin && javac -d bin PhiloSample.java



  1. Run the sample with MulticoreSDK,
$ java -Dcontest.preferences={msdk-cmd}/prop/KingProperties -javaagent:{msdk-cmd}/lib/ConTest.jar -cp .:bin sample.deadlock.PhiloSample



  1. If succeed, com_ibm_contest folder and trace file com_ibm_contest/lockDisciplineTraces/lockChains_xxx.trace are created.



  2. Run post analysis against trace file,
$ java -Dcontest.deadlockinfo=true -ea -cp ConTest.jar com.ibm.contest.lock_dis_checker.Main .
The analysis result is shown below, MulticoreSDK found 1 potential deadlock which doesn't manifest itself in execution in step 4. From the result, you can see that the deadlock is contributed by 3 locks #1, #2, #3 whose details are listed in lock group section below. edge #1->#2 represents certain threads acquired lock #1 followed by acquiring lock #2, and the thread is Thread [com.ibm.amino.multicoresdk.contest.PhiloSample@1599561559]({thread instance name}@{thread instance hash code}). It first acquired lock #1 com.ibm.amino.multicoresdk.contest.Fork@1 ({lock instance name}@{ lock instance hash code}) at program location com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run()({filename}:{line number} {method name}), then acquired lock #2 com.ibm.amino.multicoresdk.contest.Fork@2. The deadlock consists of lock acquisitions from 3 threads and they obtain locks in such tricky order that it forms a deadlock cycle, which should cause concerns.
Listing 2. Potential Deadlocks Results
Deadlock Cycle 1: [1, 2, 3]
#1->#2 #2->#3 #3->#1
edge #1->#2 consists of:
Thread [com.ibm.amino.multicoresdk.contest.PhiloSample@1599561559]: lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@2] inside a different lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@1]
edge #2->#3 consists of:
Thread [com.ibm.amino.multicoresdk.contest.PhiloSample@1602903946]: lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@4] inside a different lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@2]
edge #3->#1 consists of:
Thread [com.ibm.amino.multicoresdk.contest.PhiloSample@1606246333]: lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@1] inside a different lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@4]
===================================================

Locations of lock operations were grouped as follows:
===================================================
group #1:lock [com.ibm.amino.multicoresdk.contest.Fork]
com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@1
com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@1

group #2:lock [com.ibm.amino.multicoresdk.contest.Fork]
com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@2
com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@2

group #3:lock [com.ibm.amino.multicoresdk.contest.Fork]
com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@4
com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@4
===================================================

According to analysis result above, you can quickly locate potential deadlocks in your program without reproducing the deadlock in runtime. It makes finding deadlock in concurrent software more easily and effectively since reproducing the deadlock manually can be extremely physical-sensitive work or sometimes mission impossible :)
MulticoreSDK Tool Link

http://www.alphaworks.ibm.com/tech/msdk

Monday, January 25, 2010

jucprofiler : java.util.concurrent locks profiling


1.   Introduction

Performance analysis is an important aspect of the application development process.  It is typically done by a specialist whose main goal is to improve the code performance on a given platform.  This problem becomes even more difficult when dealing with concurrent/multi-threaded applications running on multicore platforms.   In such cases, one not only has to worry about code performance but also about the scalability of the code.  Different types of code profiling tools are used to help with the overall performance analysis process.


With the introduction of the java.util.concurrent (JUC) package in Java 5, a new type of lock was introduced into the Java lanaguage.  There are no tools available in either IBM or externally to profile JUC locks and provide detailed contention information like those provided by JLM for regular Java locks.  Also, usage of the JUC package is becoming more and more popular as more application are either developed or fine tuned to run better on multicore systems.  This absence of a JUC lock profiling tool is the motivation behind the development of our lock tool.

2.   Short Overview of jucprofiler

In juc lock, thread will “stop” execution by following two cases,
1 When a thread A tries to acquire a juc lock, while this lock has been acquired by other thread.  Then thread A has to “stop” its execution, and wait until this lock is released, or time out.
2. When a thread A invokes one of “wait” APIs of java.util.concurrent.locks.Condition, thread A “stop” its execution, until other thread notifies it, or time out.
Let us note the time usage in first case as Contention Time, the second case as Wait Time,

Juc profiler is designed and implemented to capture the time usage of two kinds. 

In order to capture the juc lock runtime data, several juc classes are instrumented offline, and replace original classes in JRE.  Before jucprofiler is used in the first time, user has to run a command to generate PreInstrument.jar.  This step can only be done once, if JRE is not changed.  If users change to another JRE, users have to remove PreInstrument.jar, and re-run this command to generate PreInstrument.jar again.

2.1.1.    Contention Time

We record the allocation of java.util.concurrent.locks.AbstractQueuedSynchronizer and java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject, and assign unique id to them.  For the time usage on lock, we capture the time usage of invoking park(blocker) and parkNanos(blocker, nanos) in class java.util.concurrent.locks.LockSupport, when these two methods are invoked in different places,


Class
Methods
Call Site
java.util.concurrent.locks.LockSupport
park (Object);
parkAndCheckInterrupt() in class AbstractQueuedSynchronizer

parkNanos(Object blocker, long nanos)
doAcquireNanos(int arg, long nanosTimeout)
doAcquireSharedNanos(int arg, long nanosTimeout) in class AbstractQueuedSynchronizer

2.1.2.    Wait Time



Class
Methods
Call Site
java.util.concurrent.locks.LockSupport
park (Object);
Other methods than parkAndCheckInterrupt() in class AbstractQueuedSynchronizer

parkNanos(Object blocker, long nanos)
Other methods than doAcquireNanos(int arg, long nanosTimeout)
doAcquireSharedNanos(int arg, long nanosTimeout) in class AbstractQueuedSynchronizer



3.   How to use jucprofiler

3.1. Run juc profiler

JUC profiler can be used for any java program on JDK 6.  Setting parameters to run, supposing juc profiler is installed in directory $JUCP.


java -Xbootclasspath/p:$JUCP/BCIRuntime.jar:$JUCP/PreInstrument.jar -javaagent:$JUCP/BCIAgent.jar=logger=trace:callStackDepth=10:allocationStackDepth=0:libPath=$JUCP:traceJUC=on -cp .:derby.jar JavaDBDemo
 



After finish running Juc profiler with your program, a trace file named "BCIAgent.***.bci.trace" will be generated, which "***" is a unique time stamp for this execution.



3.2. Trace post process


Run command shown below to get Juc profiling result.

$ java -Xmx1000m -jar $JUCP/BCITraceReader.jar {tracefile} {resultOutputFile}

Where, {tracefile} is the full path of trace file or directory contains trace files, such as BCIAgent.***.bci.trace. {resultOutputFile} is an optional option to set file to store the analysis results, if omitting this option, the analysis results will be printed in console.

Note: The post analysis process to trace file may suffer from some memory overhead, it's better to increase process heap size via -Xmx Java option. In our experiment, analyzing a 160M trace may consume 800M memory.

3.3. Understand profiling result

As figure shown below, the plain text output includes kinds of information, such as lock name, lock contention count and time, lock hold time and count, lock allocation stack trace, duration and stack trace of each lock contention, etc. The result can help user find out the Juc lock contention bottleneck in program.

Before “LEGEND” section, the profiling result report first summarizes all juc lock contentions in program, descending sorted by lock contention count, then contention time. Each summary row item belongs to one of two different types, “AQS” for individual juc lock and “CHM” for ConcurrentHashMap. Since a ConcurrentHashMap is divided by several Segment(s) for elements storage and each Segment is protected by a different juc lock, a ConcurrentHashMap can be viewed as a composition of juc locks from lock perspective. E.g. “CHM@8” below has contention count 276 and contention time 39457000, means that total contention count of all segments locks in “CHM@8” is 276, total contention time of all segments locks in “CHM@8” is 39457000 nanoseconds. This locks grouping helps programmers to identity in which ConcurrentHashMap object the most serious juc lock contentions occur. On the other side, look at the individual juc lock “AQS@1790”, it doesn’t belong to any ConcurrentHashMap object and this lock is used explicitly in program. Note that because lock hold events are not enabled in the example trace, 0 is put in columns HOLD-COUNT and HOLD-TIME.

After “LEGEND” section, the profiling result reports details of each juc lock contention. As result snippet below, for ConcurrentHashMap “CHM@8”, lock contention happens in two segment locks “Lock [AQS@135]” and “Lock [AQS@146]”. For “Lock [AQS@135]”, it contends in one place, follows by contention count, contention time, and full stack back trace of the contention. So does “Lock [AQS@146]”. These details help programmers to locate the lock contention in program and clearly understand which segments of ConcurrentHashMap contend most.


Lock [AQS@135]:

-----------------------------------------------------------------------------------------------------------
Lock Contention 1
CONTD-COUNT: 25
CONTD-TIME: 10827000
Call Stack:
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:758)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:789)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1125)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:197)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:273)
java.util.concurrent.ConcurrentHashMap$Segment.remove(ConcurrentHashMap.java:530)
java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:934)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlock(ConcurrentLockSet.java:740)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlockReference(ConcurrentLockSet.java:784)
org.apache.derby.impl.services.locks.LockSpace.unlockReference(LockSpace.java:275)

End of Lock [AQS@135]:
**************************************************************************************************************


Lock Contention 1
CONTD-COUNT: 22
CONTD-TIME: 2009000
Call Stack:
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:758)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:789)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1125)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:197)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:273)
java.util.concurrent.ConcurrentHashMap$Segment.remove(ConcurrentHashMap.java:530)
java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:934)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlock(ConcurrentLockSet.java:740)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlockReference(ConcurrentLockSet.java:784)
org.apache.derby.impl.services.locks.LockSpace.unlockReference(LockSpace.java:275)



 
Multicore Software Development Tookit Version_2.1

j.u.c Lock Profiler Report

                NAME    CONTD-COUNT          CONTD-TIME     HOLD-COUNT           HOLD-TIME
               CHM@8            276            39457000              0                   0
            AQS@1790             36             4029000              0                   0
             AQS@131             17              630000              0                   0
=================================================================================================
 LEGEND:
                NAME : Name of juc lock(AQS) or ConcurrentHashMap(CHM), format: @
         CONTD-COUNT : Total count of lock contention
          CONTD-TIME : Total time of lock contention in nanosecond
          HOLD-COUNT : Total count of lock hold
           HOLD-TIME : Total time of lock hold in nanosecond
==================================================================================================

ConcurrentHashMap [CHM@8]:

-----------------------------------------------------------------------------------------------------------
Lock [AQS@135]:

-----------------------------------------------------------------------------------------------------------
Lock Contention 1
CONTD-COUNT: 25
CONTD-TIME: 10827000
Call Stack:
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:758)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:789)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1125)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:197)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:273)
java.util.concurrent.ConcurrentHashMap$Segment.remove(ConcurrentHashMap.java:530)
java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:934)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlock(ConcurrentLockSet.java:740)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlockReference(ConcurrentLockSet.java:784)
org.apache.derby.impl.services.locks.LockSpace.unlockReference(LockSpace.java:275)

End of Lock [AQS@135]:
**************************************************************************************************************




Lock [AQS@146]:

-----------------------------------------------------------------------------------------------------------
Lock Contention 1
CONTD-COUNT: 22
CONTD-TIME: 2009000
Call Stack:
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:758)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:789)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1125)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:197)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:273)
java.util.concurrent.ConcurrentHashMap$Segment.remove(ConcurrentHashMap.java:530)
java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:934)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlock(ConcurrentLockSet.java:740)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlockReference(ConcurrentLockSet.java:784)
org.apache.derby.impl.services.locks.LockSpace.unlockReference(LockSpace.java:275)




3.4. Open trace file in Visual Analyzer

There are some views developed in Eclipse to show jucprofiler trace file with tables and figures, which is called Visual Analzyer.  Currently, there are two views for jucprofiler, one is “J.U.C statistics” view, and the other is “J.U.C synchornization view”.

“J.U.C statistics” view is shown as below.  Two columns in right-most are “Contention Times” and “Contention Counts”.  “Allocation Stack” column is about the allcation call site of JUC locks.






“J.U.C synchronization” view is shown as below.  The first lane is Time, indicating when this lock contention occurs.  The second lane is Thread, indicating which thread occurs lock contention.  The third lane is Monitor, indicating which JUC lock is contented.  The last lane is Method, indicating where does the lock contend.



3.5. Online control

During the runing, juc profiler will create a ControlServer listening on port 2009. User can use ControlClient to connect that port and control the behavior of juc profiler, e.g. the trace can be turned on and off on-the-fly,

$ java -cp BCIRuntime.jar com.ibm.msdk.bciruntime.control.ControlClient HOST -m [b|i] -b START -e END

HOST: host name that ControlClient wants to connect to, default is localhost.

-m [b|i]: Mode of ControlClient. b means batch mode, while i means interative mode. In default is interative mode.

-b START: if mode is set to batch mode, START is the time(second) you want to start profiling.

-e END: END is the duration time(second) you want to profiling.

3.5.1.    Interactive mode

A simple shell is provided, and user can type command juc.on and juc.off to turn on and off juc profiler. For example, java -cp BCIRuntime.jar com.ibm.msdk.bciruntime.control.ControlClient, ControlClient will connect to localhost, and open a shell to control juc profiler.


$ java -cp BCIRuntime.jar com.ibm.msdk.bciruntime.control.ControlClient
jucprofiler control> juc.on
juc.on
jucprofiler t control> start
start
jucprofiler control>; stop
stop
jucprofiler control> juc.off
juc.off

 


3.5.2.    Batch mode


$ java -cp BCIRuntime.jar com.ibm.msdk.bciruntime.control.ControlClient localhost -m b -b 2 -e 10
Start tracing in 2 seconds
Start tracing
Stop tracing in 10 seconds
Stop tracing
quit
 

commands can be executed in batch mode. For example, java -cp BCIRuntime.jar com.ibm.msdk.bciruntime.control.ControlClient mtrat-test.dyndns.org -m b -b 10 -e 10, means ControlClient will connect to machine mtrat-test.dyndns.org, start profiler after 10 seconds, and profiling 10 seconds.

4.   Conclusions

With the popularity of multicores, more and more concurrent/multi-threaded Java applications will be developed.  We need better tools for profiling such concurrent applications.  The jucprofiler described in this article fulfills one of the key gaps in Java profiling tools. 
 Both jucprofiler and VisualAnalzyer can be downloaded from http://www.alphaworks.ibm.com/tech/msdk