Wednesday, February 10, 2010

Easily find potential deadlocks in concurrent software without reproduction

Deadlock among Java threads is a common concurrency bug stemming from inappropriate synchronization order. It's a disaster that deadlock can make program permanently unable to make forward progress. What's worse, deadlocks do not manifest themselves predictably and when they do surface, it is often at the worst possible time in production, under heavy load. It means that reproducing deadlock which depends on some lucky timing, can be extremely difficult. So traditional bug fix methodology, reproduce error->find root cause->fix bug, is not feasible to fix concurrency bugs without appropriate tool.

A sophisticated method/tool to find potential deadlocks in program without deadlock reproduction is urged. Multicore Software Development Kit (MulticoreSDK) is the toolkit of this kind.

The Java sample below simulates dining philosophers problem which contains potential deadlocks.

Listing 1. Dining Philosophers Problem
package sample.deadlock;

public class PhiloSample extends Thread {
static final int NUM_PHIL = 3;
int id;
Table t;

PhiloSample(int id, Table t) {
this.id = id;
this.t = t;
}
public void run() {
Fork left = t.getFork(id);
synchronized (left) {
Fork right = t.getFork(id + 1);
synchronized (right) {
t.eatFood();
}
}
}

public static void main(String args[]) throws Exception {
Table tab = new Table();
PhiloSample[] p = new PhiloSample[NUM_PHIL];
for (int i = 0; i < NUM_PHIL; ++i) {
p[i] = new PhiloSample(i, tab);
}
for (int i = 0; i < NUM_PHIL; ++i)
{
p[i].start();
}
for (int i = 0; i < NUM_PHIL; ++i)
p[i].join();
}
}


class Fork {
}

class Table {
Fork forks[];
static final int MAX_EAT = 12;
int eatctr;

Table() {
forks = new Fork[PhiloSample.NUM_PHIL];
for (int i = 0; i < PhiloSample.NUM_PHIL; ++i)
forks[i] = new Fork();
eatctr = MAX_EAT;
}
public Fork getFork(int id){
return forks[id % PhiloSample.NUM_PHIL];
}
public synchronized void eatFood() {
eatctr--;
}
public synchronized boolean isDone() {
return eatctr == 0;
}
}



Although the code is buggy, the deadlock is not necessarily to manifest all the time in execution, which increase the difficulty to reveal and fix deadlock problem. MulticoreSDK can be easily used to find potential deadlock in Java program regardless whether the deadlock occurs or not, with following steps:
  1. Download MulticoreSDK from its website, and install it following the user manual. Suppose MulticoreSDK is extracted under {msdk-cmd}.
  2. Open KingProperties file in props folder, set preference targetClasses = sample/deadlock (with slashes instead of dots), which means the tool will work on classes with prefix sample.deadlock.



  3. Compile the sample,
$ mkdir bin && javac -d bin PhiloSample.java



  1. Run the sample with MulticoreSDK,
$ java -Dcontest.preferences={msdk-cmd}/prop/KingProperties -javaagent:{msdk-cmd}/lib/ConTest.jar -cp .:bin sample.deadlock.PhiloSample



  1. If succeed, com_ibm_contest folder and trace file com_ibm_contest/lockDisciplineTraces/lockChains_xxx.trace are created.



  2. Run post analysis against trace file,
$ java -Dcontest.deadlockinfo=true -ea -cp ConTest.jar com.ibm.contest.lock_dis_checker.Main .
The analysis result is shown below, MulticoreSDK found 1 potential deadlock which doesn't manifest itself in execution in step 4. From the result, you can see that the deadlock is contributed by 3 locks #1, #2, #3 whose details are listed in lock group section below. edge #1->#2 represents certain threads acquired lock #1 followed by acquiring lock #2, and the thread is Thread [com.ibm.amino.multicoresdk.contest.PhiloSample@1599561559]({thread instance name}@{thread instance hash code}). It first acquired lock #1 com.ibm.amino.multicoresdk.contest.Fork@1 ({lock instance name}@{ lock instance hash code}) at program location com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run()({filename}:{line number} {method name}), then acquired lock #2 com.ibm.amino.multicoresdk.contest.Fork@2. The deadlock consists of lock acquisitions from 3 threads and they obtain locks in such tricky order that it forms a deadlock cycle, which should cause concerns.
Listing 2. Potential Deadlocks Results
Deadlock Cycle 1: [1, 2, 3]
#1->#2 #2->#3 #3->#1
edge #1->#2 consists of:
Thread [com.ibm.amino.multicoresdk.contest.PhiloSample@1599561559]: lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@2] inside a different lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@1]
edge #2->#3 consists of:
Thread [com.ibm.amino.multicoresdk.contest.PhiloSample@1602903946]: lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@4] inside a different lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@2]
edge #3->#1 consists of:
Thread [com.ibm.amino.multicoresdk.contest.PhiloSample@1606246333]: lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@1] inside a different lock taken at [com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@4]
===================================================

Locations of lock operations were grouped as follows:
===================================================
group #1:lock [com.ibm.amino.multicoresdk.contest.Fork]
com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@1
com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@1

group #2:lock [com.ibm.amino.multicoresdk.contest.Fork]
com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@2
com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@2

group #3:lock [com.ibm.amino.multicoresdk.contest.Fork]
com/ibm/amino/multicoresdk/contest/PhiloSample.java:16 run() com.ibm.amino.multicoresdk.contest.Fork@4
com/ibm/amino/multicoresdk/contest/PhiloSample.java:18 run() com.ibm.amino.multicoresdk.contest.Fork@4
===================================================

According to analysis result above, you can quickly locate potential deadlocks in your program without reproducing the deadlock in runtime. It makes finding deadlock in concurrent software more easily and effectively since reproducing the deadlock manually can be extremely physical-sensitive work or sometimes mission impossible :)
MulticoreSDK Tool Link

http://www.alphaworks.ibm.com/tech/msdk

No comments: