Monday, May 30, 2016

Benchmarking Java Locks with Counters

These days I am analyzing some Java Flight Recordings from taken from WSO2 API Manager performance tests and I found out that main processing threads were in "BLOCKED" state in some situations.

The threads were mainly blocked due to "synchronized" methods in Java. Synchronizing the methods in a critical section of request processing causes bottlenecks and it has an impact on the throughput and overall latency.

Then I was thinking whether we could avoid synchronizing the whole method. The main problem with synchronized is that only one thread can run that critical section. When it comes to consumer/producer scenarios, we may need to give read access to data in some threads and write access to a thread to edit data exclusively. Java provides ReadWriteLock for these kinds of scenarios.

Java 8 provides another kind of lock named StampedLock. The StampedLock provides an alternative way to the standard ReadWriteLock and it also supports optimistic reads. I'm not going to compare the features and functionalities of the each lock type in this blog post. You may read the StampedLock Idioms by Dr. Heinz M. Kabutz.

I'm more interested in finding out which lock is faster when it is accessed by multiple threads. Let's write a benchmark!


The code for benchmarks


There is an article on "Java 8 StampedLocks vs. ReadWriteLocks and Synchronized" by Tal Weiss, who is the CEO of Takipi. In that article, there is a benchmark for Java locks with different counter implementations. I'm using that counters benchmark as the basis for my benchmark. 

I also found another fork of the same benchmark and it has added the Optimistic Stamped version and Fair mode of ReentrantReadWriteLock. I found out about that from the slides on "Java 8 - Stamped Lock" by Haim Yadid after I got my benchmark results.

I also looked at the article "Java Synchronization (Mutual Exclusion) Benchmark" by Baptiste Wicht.

I'm using the popular JMH library for my benchmark. The JMH has now become the standard way to write Java microbenchmarks. The benchmarks done by Tal Weiss do not use JMH.

See JMH Resources by Nitsan Wakart for an introduction to JMH and related links to get more information about JMH.

I used Thread Grouping feature in JMH and the Group states for benchmarking different counter implementations.

This is my first attempt in writing a proper microbenchmark. If there are any problems with the code, please let me know. When we talk about benchmarks, it's important to know that you should not expect the same results in a real life application and the code may behave differently in runtime.

There are 11 counter implementations. I also benchmarked the fair and non-fair modes of ReentrantLockReentrantReadWriteLock and Semaphore.

Class Diagram for Counter implementations

There are altogether 14 benchmark methods!

  1. Adder - Using LongAdder. This is introduced in Java 8.
  2. Atomic - Using AtomicLong
  3. Dirty - Not using any mechanism to control concurrent access
  4. Lock Fair Mode - Using ReentrantLock
  5. Lock Non-Fair Mode - Using ReentrantLock
  6. Read Write Lock Fair Mode - Using ReentrantReadWriteLock
  7. Read Write Lock Non-Fair Mode - Using ReentrantReadWriteLock
  8. Semaphore Fair Mode - Using Semaphore
  9. Semaphore Non-Fair Mode - Using Semaphore
  10. Stamped - Using  StampedLock
  11. Optimistic Stamped - Using  StampedLock with tryOptimisticRead(); If it fails, I used the read lock. There are no more attempts to tryOptimisticRead().
  12. Synchronized - Using synchronized block with an object
  13. Synchronized Method - Using synchronized keyword in methods
  14. Volatile  - Using volatile keyword for counter variable

The code is available at https://github.com/chrishantha/microbenchmarks/tree/v0.0.1-initial-counter-impl

Benchmark Results


As I mentioned, I used Thread Grouping feature in JMH and I ran the benchmarks for different thread group distributions. There were 10 iterations after 5 warm-up iterations. I measured only the "throughput". Measuring latency would be very difficult (as the minimum throughtput values were having around 6 digits)

The thread group distribution was passed by the "-tg" argument to the JMH and the first number was used for "get" (read) operations and the second number was used for "increment" (write) operations.

There are many combinations we can use to run the benchmark tests. I used 12 combinations for thread group distribution and those are specified in the benchmark script.

These 12 combinations include the scenarios tested by Tal Weiss and Baptiste Wicht.

The benchmark was run on my Thinkpad T530 laptop.

$ hwinfo --cpu --short
cpu:                                                            
                       Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3394 MHz
                       Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3333 MHz
                       Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3305 MHz
                       Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3333 MHz
$ free -m
              total        used        free      shared  buff/cache   available
Mem:          15866        4424        7761         129        3680       11204
Swap:         18185           0       18185

Note: I added the "Dirty" counter only to compare the results, but I omitted it from the benchmark as no one wants to keep a dirty counter in their code. :)

I have committed all results to the Github repository and I used gnuplot for the graphs.

It's very important to note that the graphs show the throughput for both reader and writer threads. If you need to look at individual reader and writer throughput, you can refer the results at https://github.com/chrishantha/microbenchmarks/tree/v0.0.1-initial-counter-impl/counters/results

Let's see the results!

1 Reader, 1 Writer

2 Readers, 2 Writers

4 Readers, 4 Writers

5 Readers, 5 Writers

10 Readers, 10 Writers

16 Readers, 16 Writers

64 Readers, 64 Writers

128 Readers, 128 Writers

1 Reader, 19 Writers

19 Readers, 1 Writer

4 Readers, 16 Writers

16 Readers, 4 Writers



Conclusion


Following are some conclusions we can make when looking at above results

  1. Optimistic Stamped counter has much better throughput when there is high contention.
  2. Fair modes of the locks are very slow.
  3. Adder counter has better throughput than Atomic counter when there are more writers.
  4. When there are less threads, the Synchronized and Synchronized Method counters has better throughput than using a Read Write Lock (in non-fair mode, which is the default)
  5. The Lock counter also has better throughput than Read Write Lock when there are less threads.

Adder, Atomic and Volatile counter examples do not show a way to provide mutual exclusion, but those are thread-safe ways to keep a count. You may refer benchmark results for other counters with Java locks if you want to have a mutual exclusion to some logic in your code.

In this benchmark, the read write lock has performed poorly. The reason could be that there are writers continuously trying to access the write lock. There may be situations that a write lock may be required less frequently and therefore this benchmark is probably not a good way to evaluate performance for read write locks.

Please make sure that you run the benchmarks for your scenarios before making a decision based on these results. Even my benchmarks give slightly different results for each run. So, it's not a good idea to rely entirely on benchmarks and you must test the performance of the overall application.


If there are any questions or comments on the results or regarding benchmark code, please let me know.

Thursday, March 10, 2016

Specifying a custom Event Settings file for Java Flight Recorder

When you are using Java Flight Recorder (JFR), the JFR will use an event settings file to check which event types to record.

By default in JFR, there are two settings, "default" and "profile". The default setting is recommended for Continuous Recordings as it has very low overhead (typically less than 1% overhead). The profile setting has more events and useful when profiling the application.

As mentioned in my previous blog post regarding Java Flight Recorder Continuous Recordings, we use following arguments to do a Continuous Recording.

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:FlightRecorderOptions=defaultrecording=true,disk=true,repository=./tmp,dumponexit=true,dumponexitpath=./


Note: According to the Oracle documentation on "java" command, we should be able to specify "settings" parameter to the -XX:FlightRecorderOptions. However, the settings parameter has no effect when used with the -XX:FlightRecorderOptions and the default settings will be used. This is a known bug in JFR.

In JFR, the "settings" parameter specifies the path and name of the event settings file, which is of type JFC and it has the ".jfc" extension. By default, the "default.jfc" file is used and it's located in JAVA_HOME/jre/lib/jfr directory.

Most of the time, the "default" and "profile" settings are enough. However you might have seen that when analyzing a JFR dump, some tabs in Java Mission Control will tell that the particular event type is not enabled in that recording.

For example, the "Object Count" event is not enabled by default in "default" or "profile" settings. Therefore we cannot see "Object Statistics" in the Memory Group Tab.

We need to enable such events from a settings file.

Creating a custom Event Settings file


The recommended way to create a custom settings event file is to use Java Mission Control. Once you start the Java Mission Control (JMC), open "Flight Recorder Template Manager" from the Window menu.

Screenshot 01: Flight Recorder Template Manager



Let's import the existing profile.jfc to "Flight Recorder Template Manager". Click on "Import Files..." and select JAVA_HOME/jre/lib/jfr/profile.jfc. (Eg: /usr/lib/jvm/jdk1.8.0_74/jre/lib/jfr/profile.jfc)

Now let's duplicate that by selecting the "Profiling" template and clicking on "Duplicate".

Screenshot 02: The Duplicate Profiling Template

The new template is now named as "Profiling (1)". Let's edit it by clicking on "Edit".

Now let's change the "Name", "Description" and other settings that we need to record.

Screenshot 03: Template Options


I selected "Heap Statistics" and "Class Loading". I recommend you to open the default *.jfc files and go through the file to understand the events available in JFR. For example, from the file I can see that we need to enable "Heap Statistics" to enable the "Object Count" event.

Note: When we select the options shown in Screenshot 03, it will only update the "Control Elements". You can see these control elements when you open a JFC file in JAVA_HOME/jre/lib/jfr/. These control elements will change the real state of the settings. When you click on "Advanced", it will display the "Template Event Details". It will also show the warning: " If you click OK, this template will always be opened in advanced mode and the simple controls will be lost." Click on "OK" only if you want get rid of control elements from the settings file. 


Screenshot 04: Template Event Details

Click on "Cancel" and click on "OK" to save the "Template Options".

Let's export this file to a directory using "Export File..." button.

Specifying the Event Settings file for JFR


After creating the settings file, we can directly specify the settings file name in "settings" parameter. Since we cannot specify the "settings" parameter with -XX:FlightRecorderOptions, we will start a new recording from the "jcmd" command.

For example, add following parameters to the Java program.


-XX:+UnlockCommercialFeatures -XX:+FlightRecorder


Then we can start a recording as follows.


$ jcmd `cat wso2carbon.pid` JFR.start settings=/home/isuru/performance/jfr-settings/Heap.jfc  
30849:
Started recording 1. No limit (duration/maxsize/maxage) in use.

Use JFR.dump recording=1 filename=FILEPATH to copy recording data to file.

Note:

  1. You can also use settings parameter with -XX:StartFlightRecording option.
  2. I used a WSO2 server to test JFR recording and the Process ID is available in CARBON_HOME/wso2carbon.pid file.
  3. If you save the JFC file in JAVA_HOME/jre/lib/jfr directory, you can just use the filename in settings parameter. For eg: "settings=Heap"


Now we can get a JFR dump from jcmd command.

$ jcmd `cat wso2carbon.pid` JFR.dump recording=1 filename=heap.jfr
30849:
Dumped recording 1, 58.4 MB written to:

/home/isuru/temp/metrics-test/wso2am-1.10.0/heap.jfr

When you open the JFR, you will be able to see "Object Statistics" in the Memory Group Tab.