Monday, May 30, 2016

Benchmarking Java Locks with Counters

These days I am analyzing some Java Flight Recordings from taken from WSO2 API Manager performance tests and I found out that main processing threads were in "BLOCKED" state in some situations.

The threads were mainly blocked due to "synchronized" methods in Java. Synchronizing the methods in a critical section of request processing causes bottlenecks and it has an impact on the throughput and overall latency.

Then I was thinking whether we could avoid synchronizing the whole method. The main problem with synchronized is that only one thread can run that critical section. When it comes to consumer/producer scenarios, we may need to give read access to data in some threads and write access to a thread to edit data exclusively. Java provides ReadWriteLock for these kinds of scenarios.

Java 8 provides another kind of lock named StampedLock. The StampedLock provides an alternative way to the standard ReadWriteLock and it also supports optimistic reads. I'm not going to compare the features and functionalities of the each lock type in this blog post. You may read the StampedLock Idioms by Dr. Heinz M. Kabutz.

I'm more interested in finding out which lock is faster when it is accessed by multiple threads. Let's write a benchmark!


The code for benchmarks


There is an article on "Java 8 StampedLocks vs. ReadWriteLocks and Synchronized" by Tal Weiss, who is the CEO of Takipi. In that article, there is a benchmark for Java locks with different counter implementations. I'm using that counters benchmark as the basis for my benchmark. 

I also found another fork of the same benchmark and it has added the Optimistic Stamped version and Fair mode of ReentrantReadWriteLock. I found out about that from the slides on "Java 8 - Stamped Lock" by Haim Yadid after I got my benchmark results.

I also looked at the article "Java Synchronization (Mutual Exclusion) Benchmark" by Baptiste Wicht.

I'm using the popular JMH library for my benchmark. The JMH has now become the standard way to write Java microbenchmarks. The benchmarks done by Tal Weiss do not use JMH.

See JMH Resources by Nitsan Wakart for an introduction to JMH and related links to get more information about JMH.

I used Thread Grouping feature in JMH and the Group states for benchmarking different counter implementations.

This is my first attempt in writing a proper microbenchmark. If there are any problems with the code, please let me know. When we talk about benchmarks, it's important to know that you should not expect the same results in a real life application and the code may behave differently in runtime.

There are 11 counter implementations. I also benchmarked the fair and non-fair modes of ReentrantLockReentrantReadWriteLock and Semaphore.

Class Diagram for Counter implementations

There are altogether 14 benchmark methods!

  1. Adder - Using LongAdder. This is introduced in Java 8.
  2. Atomic - Using AtomicLong
  3. Dirty - Not using any mechanism to control concurrent access
  4. Lock Fair Mode - Using ReentrantLock
  5. Lock Non-Fair Mode - Using ReentrantLock
  6. Read Write Lock Fair Mode - Using ReentrantReadWriteLock
  7. Read Write Lock Non-Fair Mode - Using ReentrantReadWriteLock
  8. Semaphore Fair Mode - Using Semaphore
  9. Semaphore Non-Fair Mode - Using Semaphore
  10. Stamped - Using  StampedLock
  11. Optimistic Stamped - Using  StampedLock with tryOptimisticRead(); If it fails, I used the read lock. There are no more attempts to tryOptimisticRead().
  12. Synchronized - Using synchronized block with an object
  13. Synchronized Method - Using synchronized keyword in methods
  14. Volatile  - Using volatile keyword for counter variable

The code is available at https://github.com/chrishantha/microbenchmarks/tree/v0.0.1-initial-counter-impl

Benchmark Results


As I mentioned, I used Thread Grouping feature in JMH and I ran the benchmarks for different thread group distributions. There were 10 iterations after 5 warm-up iterations. I measured only the "throughput". Measuring latency would be very difficult (as the minimum throughtput values were having around 6 digits)

The thread group distribution was passed by the "-tg" argument to the JMH and the first number was used for "get" (read) operations and the second number was used for "increment" (write) operations.

There are many combinations we can use to run the benchmark tests. I used 12 combinations for thread group distribution and those are specified in the benchmark script.

These 12 combinations include the scenarios tested by Tal Weiss and Baptiste Wicht.

The benchmark was run on my Thinkpad T530 laptop.

$ hwinfo --cpu --short
cpu:                                                            
                       Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3394 MHz
                       Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3333 MHz
                       Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3305 MHz
                       Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3333 MHz
$ free -m
              total        used        free      shared  buff/cache   available
Mem:          15866        4424        7761         129        3680       11204
Swap:         18185           0       18185

Note: I added the "Dirty" counter only to compare the results, but I omitted it from the benchmark as no one wants to keep a dirty counter in their code. :)

I have committed all results to the Github repository and I used gnuplot for the graphs.

It's very important to note that the graphs show the throughput for both reader and writer threads. If you need to look at individual reader and writer throughput, you can refer the results at https://github.com/chrishantha/microbenchmarks/tree/v0.0.1-initial-counter-impl/counters/results

Let's see the results!

1 Reader, 1 Writer

2 Readers, 2 Writers

4 Readers, 4 Writers

5 Readers, 5 Writers

10 Readers, 10 Writers

16 Readers, 16 Writers

64 Readers, 64 Writers

128 Readers, 128 Writers

1 Reader, 19 Writers

19 Readers, 1 Writer

4 Readers, 16 Writers

16 Readers, 4 Writers



Conclusion


Following are some conclusions we can make when looking at above results

  1. Optimistic Stamped counter has much better throughput when there is high contention.
  2. Fair modes of the locks are very slow.
  3. Adder counter has better throughput than Atomic counter when there are more writers.
  4. When there are less threads, the Synchronized and Synchronized Method counters has better throughput than using a Read Write Lock (in non-fair mode, which is the default)
  5. The Lock counter also has better throughput than Read Write Lock when there are less threads.

Adder, Atomic and Volatile counter examples do not show a way to provide mutual exclusion, but those are thread-safe ways to keep a count. You may refer benchmark results for other counters with Java locks if you want to have a mutual exclusion to some logic in your code.

In this benchmark, the read write lock has performed poorly. The reason could be that there are writers continuously trying to access the write lock. There may be situations that a write lock may be required less frequently and therefore this benchmark is probably not a good way to evaluate performance for read write locks.

Please make sure that you run the benchmarks for your scenarios before making a decision based on these results. Even my benchmarks give slightly different results for each run. So, it's not a good idea to rely entirely on benchmarks and you must test the performance of the overall application.


If there are any questions or comments on the results or regarding benchmark code, please let me know.

Thursday, March 10, 2016

Specifying a custom Event Settings file for Java Flight Recorder

When you are using Java Flight Recorder (JFR), the JFR will use an event settings file to check which event types to record.

By default in JFR, there are two settings, "default" and "profile". The default setting is recommended for Continuous Recordings as it has very low overhead (typically less than 1% overhead). The profile setting has more events and useful when profiling the application.

As mentioned in my previous blog post regarding Java Flight Recorder Continuous Recordings, we use following arguments to do a Continuous Recording.

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:FlightRecorderOptions=defaultrecording=true,disk=true,repository=./tmp,dumponexit=true,dumponexitpath=./


Note: According to the Oracle documentation on "java" command, we should be able to specify "settings" parameter to the -XX:FlightRecorderOptions. However, the settings parameter has no effect when used with the -XX:FlightRecorderOptions and the default settings will be used. This is a known bug in JFR.

In JFR, the "settings" parameter specifies the path and name of the event settings file, which is of type JFC and it has the ".jfc" extension. By default, the "default.jfc" file is used and it's located in JAVA_HOME/jre/lib/jfr directory.

Most of the time, the "default" and "profile" settings are enough. However you might have seen that when analyzing a JFR dump, some tabs in Java Mission Control will tell that the particular event type is not enabled in that recording.

For example, the "Object Count" event is not enabled by default in "default" or "profile" settings. Therefore we cannot see "Object Statistics" in the Memory Group Tab.

We need to enable such events from a settings file.

Creating a custom Event Settings file


The recommended way to create a custom settings event file is to use Java Mission Control. Once you start the Java Mission Control (JMC), open "Flight Recorder Template Manager" from the Window menu.

Screenshot 01: Flight Recorder Template Manager



Let's import the existing profile.jfc to "Flight Recorder Template Manager". Click on "Import Files..." and select JAVA_HOME/jre/lib/jfr/profile.jfc. (Eg: /usr/lib/jvm/jdk1.8.0_74/jre/lib/jfr/profile.jfc)

Now let's duplicate that by selecting the "Profiling" template and clicking on "Duplicate".

Screenshot 02: The Duplicate Profiling Template

The new template is now named as "Profiling (1)". Let's edit it by clicking on "Edit".

Now let's change the "Name", "Description" and other settings that we need to record.

Screenshot 03: Template Options


I selected "Heap Statistics" and "Class Loading". I recommend you to open the default *.jfc files and go through the file to understand the events available in JFR. For example, from the file I can see that we need to enable "Heap Statistics" to enable the "Object Count" event.

Note: When we select the options shown in Screenshot 03, it will only update the "Control Elements". You can see these control elements when you open a JFC file in JAVA_HOME/jre/lib/jfr/. These control elements will change the real state of the settings. When you click on "Advanced", it will display the "Template Event Details". It will also show the warning: " If you click OK, this template will always be opened in advanced mode and the simple controls will be lost." Click on "OK" only if you want get rid of control elements from the settings file. 


Screenshot 04: Template Event Details

Click on "Cancel" and click on "OK" to save the "Template Options".

Let's export this file to a directory using "Export File..." button.

Specifying the Event Settings file for JFR


After creating the settings file, we can directly specify the settings file name in "settings" parameter. Since we cannot specify the "settings" parameter with -XX:FlightRecorderOptions, we will start a new recording from the "jcmd" command.

For example, add following parameters to the Java program.


-XX:+UnlockCommercialFeatures -XX:+FlightRecorder


Then we can start a recording as follows.


$ jcmd `cat wso2carbon.pid` JFR.start settings=/home/isuru/performance/jfr-settings/Heap.jfc  
30849:
Started recording 1. No limit (duration/maxsize/maxage) in use.

Use JFR.dump recording=1 filename=FILEPATH to copy recording data to file.

Note:

  1. You can also use settings parameter with -XX:StartFlightRecording option.
  2. I used a WSO2 server to test JFR recording and the Process ID is available in CARBON_HOME/wso2carbon.pid file.
  3. If you save the JFC file in JAVA_HOME/jre/lib/jfr directory, you can just use the filename in settings parameter. For eg: "settings=Heap"


Now we can get a JFR dump from jcmd command.

$ jcmd `cat wso2carbon.pid` JFR.dump recording=1 filename=heap.jfr
30849:
Dumped recording 1, 58.4 MB written to:

/home/isuru/temp/metrics-test/wso2am-1.10.0/heap.jfr

When you open the JFR, you will be able to see "Object Statistics" in the Memory Group Tab.

Wednesday, December 2, 2015

Running Java in a Script

Recently I wanted to invoke a JMX operation in a WSO2 server from a script. I was first thinking of writing a simple Java class with a main method and run it as an executable jar file.

Then I found that we can run a script with the command "jrunscript".

I also found an example of interacting with a JMX MBean from Javascript. However this example didn't work with Java 8.

The reason is that from Java 8, the JavaScript Engine for the JVM is Oracle Nashorn. In Java 7, the JavaScript Engine is based on Mozilla Rhino.

Let's see the difference:

Following script works with Java 7, but not with Java 8


packages = new JavaImporter(java.lang,
                            java.lang.reflect);

with (packages)
{
// create Java String array of 5 elements
var a = Array.newInstance(String, 5);

// Accessing elements and length access is by usual Java syntax
a[0] = "scripting is great!";
print(a.length);
print('\n');
print(a[0]);
print('\n');
}


Following is the output:

$ /usr/lib/jvm/jdk1.7.0_80/bin/jrunscript java7-array-test.js 
5
scripting is great! 
$ /usr/lib/jvm/jdk1.8.0_66/bin/jrunscript java7-array-test.js 
script error in file java7-array-test.js : TypeError: Cannot call undefined in java7-array-test.js at line number 7


Let's write the same script to work in Java 8. I referred the Rhino Migration Guide

var Array = Java.type("java.lang.reflect.Array")
var JString = Java.type("java.lang.String")

  // create Java String array of 5 elements
var a = Array.newInstance(JString.class, 5);

// Accessing elements and length access is by usual Java syntax
a[0] = "scripting is great!";
print(a.length);
print(a[0]);


Following is the output:

$ /usr/lib/jvm/jdk1.7.0_80/bin/jrunscript java7-array-test.js 
5
scripting is great! 
$ /usr/lib/jvm/jdk1.8.0_66/bin/jrunscript java7-array-test.js 
script error in file java7-array-test.js : TypeError: Cannot call undefined in java7-array-test.js at line number 7

$ /usr/lib/jvm/jdk1.8.0_66/bin/jrunscript java8-array-test.js 5 scripting is great! $ /usr/lib/jvm/jdk1.7.0_80/bin/jrunscript java8-array-test.js script error in file java8-array-test.js : sun.org.mozilla.javascript.internal.EvaluatorException: missing name after . operator (java8-array-test.js#5) in java8-array-test.js at line number 5

Note: Java 8 also has another tool named "jjs" to invoke the Nashorn engine. See  Java Platform, Standard Edition Nashorn User's Guide

Following is the complete script I wrote to invoke JMX operations in the Metrics Manager MBean in a WSO2 Server.

Monday, September 14, 2015

Java Mixed-Mode Flame Graphs

Recently, I wrote a blog post on "Java CPU Flame Graphs". That blog post shows how to use Linux perf_events to generate Java Mixed-Mode Flame Graphs.

With Java Profilers, we can get information about Java process only. However with Java Mixed-Mode Flame Graphs, we can see how much CPU time is spent in Java methods, system libraries and the kernel. Mixed-mode means that the Flame Graph shows profile information from both system code paths and Java code paths.

This is sometimes very important to identify performance issues. For more information, see Java in Flames and New JVM Option Enables Generation of Mixed-Mode Flame Graphs

In this blog post, I'm using a sample high cpu consuming Java Program with Java 8 Update 60.

$ mvn -version
Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T17:27:37+05:30)
Maven home: /usr/local/apache-maven/apache-maven-3.3.3
Java version: 1.8.0_60, vendor: Oracle Corporation
Java home: /usr/lib/jvm/jdk1.8.0_60/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.19.0-28-generic", arch: "amd64", family: "unix"

$ git clone https://github.com/chrishantha/sample-java-programs.git
$ cd sample-java-programs
$ mvn clean install

Java Flame Graph using Java Flight Recorder


 I ran the highcpu program for a minute.

$ java -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:FlightRecorderOptions=defaultrecording=true,settings=profile,disk=true,repository=./tmp,dumponexit=true,dumponexitpath=./ -jar highcpu/target/highcpu-0.0.1-SNAPSHOT.jar --exit-timeout 60

I used my "jfr-flame-graph" software to generate the flame graph.

jfr-flame-graph$ ./run.sh -i -f ../sample-java-programs/hotspot-pid-30716-id-0-2015_09_14_19_01_38.jfr -o /tmp/output.txt
$ cat /tmp/output.txt | ~/performance/brendangregg-git/FlameGraph/flamegraph.pl --width 1680 > /tmp/highcpu-jfr.svg


Flame Graph Reset Zoom Search oracle.jrockit.jfr.Settings$Aggregator.getDefault(com.oracle.jrockit.jfr.EventInfo) (1 samples, 0.03%) java.math.BigDecimal.(init)(java.lang.String) (527 samples, 14.53%) java.mat.. java.math.BigInteger.add(int[], long) (2 samples, 0.06%) com.github.chrishantha.sample.highcpu.MathWorker.run() (2,462 samples, 67.86%) com.github.chrishantha.sample.highcpu.MathWorker.. java.math.BigDecimal.add(long, int, java.math.BigInteger, int) (132 samples, 3.64%) java.math.BigDecimal.add(long, int, long, int) (1 samples, 0.03%) java.util.Arrays.copyOfRange(char[], int, int) (5 samples, 0.14%) com.github.chrishantha.sample.highcpu.HashingWorker.run() (1,165 samples, 32.11%) com.github.chrishantha.. java.util.UUID.toString() (49 samples, 1.35%) java.lang.Long.toUnsignedString0(long, int) (5 samples, 0.14%) oracle.jrockit.jfr.JFRImpl.cloneRecording(oracle.jrockit.jfr.Recording, java.lang.String, boolean) (1 samples, 0.03%) oracle.jrockit.jfr.Settings$Aggregator.updateDefaults(java.util.Collection) (1 samples, 0.03%) java.security.SecureRandom.nextBytes(byte[]) (530 samples, 14.61%) java.sec.. sun.security.provider.SHA.implDigest(byte[], int) (46 samples, 1.27%) sun.security.provider.ByteArrayAccess.b2iBig64(byte[], int, int[]) (4 samples, 0.11%) java.util.Random.nextDouble() (1 samples, 0.03%) java.lang.StringBuilder.toString() (6 samples, 0.17%) java.lang.String.equals(java.lang.Object) (1 samples, 0.03%) oracle.jrockit.jfr.JFRImpl.dumpOnExit() (1 samples, 0.03%) oracle.jrockit.jfr.Settings$Aggregator.mergeDefaults(java.util.Collection, java.util.Map) (1 samples, 0.03%) sun.security.provider.NativePRNG$RandomIO.ensureBufferValid() (3 samples, 0.08%) java.util.UUID.randomUUID() (531 samples, 14.64%) java.uti.. sun.security.jca.GetInstance.getInstance(java.security.Provider$Service, java.lang.Class) (1 samples, 0.03%) sun.misc.FloatingDecimal.toJavaFormatString(double) (1,312 samples, 36.16%) sun.misc.FloatingDecimal... java.lang.AbstractStringBuilder.(init)(int) (43 samples, 1.19%) sun.misc.FloatingDecimal$BinaryToASCIIBuffer.getChars(char[]) (45 samples, 1.24%) java.security.Security.getImpl(java.lang.String, java.lang.String, java.lang.String) (1 samples, 0.03%) java.lang.String.(init)(char[], int, int) (5 samples, 0.14%) java.lang.AbstractStringBuilder.ensureCapacityInternal(int) (211 samples, 5.82%) ja.. java.lang.Long.toHexString(long) (5 samples, 0.14%) sun.security.provider.SecureRandom.updateState(byte[], byte[]) (1 samples, 0.03%) sun.misc.FloatingDecimal$BinaryToASCIIBuffer.toJavaFormatString() (57 samples, 1.57%) all (3,628 samples, 100%) sun.misc.FloatingDecimal.getBinaryToASCIIConverter(double) (1,254 samples, 34.56%) sun.misc.FloatingDecima.. java.security.MessageDigest.getInstance(java.lang.String) (1 samples, 0.03%) sun.misc.FloatingDecimal.getBinaryToASCIIConverter(double, boolean) (1,254 samples, 34.56%) sun.misc.FloatingDecima.. java.util.Random.nextInt(int) (2 samples, 0.06%) java.util.Arrays.copyOfRange(char[], int, int) (2 samples, 0.06%) sun.security.provider.SHA.implCompress(byte[], int) (16 samples, 0.44%) sun.security.provider.ByteArrayAccess.i2bBig4(int, byte[], int) (13 samples, 0.36%) java.util.Arrays.copyOf(char[], int) (211 samples, 5.82%) ja.. java.math.BigDecimal.subtract(java.math.BigDecimal) (134 samples, 3.69%) sun.nio.cs.UTF_8$Encoder.encode(char[], int, int, byte[]) (4 samples, 0.11%) oracle.jrockit.jfr.Settings$Aggregator.copy(oracle.jrockit.jfr.Settings$Aggregator) (1 samples, 0.03%) com.oracle.jrockit.jfr.DurationEvent.begin() (1 samples, 0.03%) java.util.UUID.digits(long, int) (5 samples, 0.14%) sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(int, long, int, boolean) (1,111 samples, 30.62%) sun.misc.FloatingDec.. sun.security.provider.DigestBase.engineDigest(byte[], int, int) (46 samples, 1.27%) oracle.jrockit.jfr.JFR$2.run() (1 samples, 0.03%) java.lang.String.getBytes() (4 samples, 0.11%) java.security.MessageDigest.update(byte[]) (16 samples, 0.44%) java.math.BigDecimal.add(long, int, java.math.BigInteger, int) (136 samples, 3.75%) java.lang.AbstractStringBuilder.expandCapacity(int) (211 samples, 5.82%) ja.. java.math.BigDecimal.(init)(char[], int, int) (527 samples, 14.53%) java.mat.. java.math.BigInteger.compareMagnitude(long) (41 samples, 1.13%) java.lang.Long.formatUnsignedLong(long, int, char[], int, int) (5 samples, 0.14%) java.io.FileInputStream.read(byte[], int, int) (2 samples, 0.06%) sun.security.provider.ByteArrayAccess.b2iBig64(byte[], int, int[]) (1 samples, 0.03%) java.security.MessageDigest.digest() (46 samples, 1.27%) sun.security.provider.NativePRNG$RandomIO.access$400(sun.security.provider.NativePRNG$RandomIO, byte[]) (499 samples, 13.75%) sun.secu.. java.security.MessageDigest$Delegate.engineDigest() (46 samples, 1.27%) java.math.BigDecimal.add(long, long, int) (1 samples, 0.03%) java.math.BigInteger.compareMagnitude(long) (86 samples, 2.37%) sun.security.provider.DigestBase.engineDigest() (46 samples, 1.27%) java.math.BigDecimal.(init)(char[], int, int, java.math.MathContext) (442 samples, 12.18%) java.ma.. java.lang.ThreadLocal$ThreadLocalMap.getEntry(java.lang.ThreadLocal) (42 samples, 1.16%) sun.security.provider.DigestBase.engineUpdate(byte[], int, int) (16 samples, 0.44%) java.lang.StringBuilder.append(java.lang.String) (224 samples, 6.17%) ja.. java.lang.Math.atan(double) (336 samples, 9.26%) java.. sun.misc.FloatingDecimal$BinaryToASCIIBuffer.access$100(sun.misc.FloatingDecimal$BinaryToASCIIBuffer, int, long, int, boolean) (1,211 samples, 33.38%) sun.misc.FloatingDecim.. sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int) (16 samples, 0.44%) java.lang.StringBuilder.append(java.lang.String) (1 samples, 0.03%) sun.security.provider.NativePRNG$RandomIO.readFully(java.io.InputStream, byte[]) (2 samples, 0.06%) java.lang.ThreadLocal.get() (43 samples, 1.19%) java.util.Arrays.copyOf(char[], int) (1 samples, 0.03%) java.lang.AbstractStringBuilder.append(java.lang.String) (1 samples, 0.03%) java.lang.Thread.run() (3,628 samples, 100.00%) java.lang.Thread.run() java.lang.String.getChars(int, int, char[], int) (8 samples, 0.22%) java.math.BigDecimal.valueOf(double) (1,839 samples, 50.69%) java.math.BigDecimal.valueOf(double) java.math.BigInteger.add(long) (41 samples, 1.13%) sun.security.provider.SHA.implCompress(byte[], int) (31 samples, 0.85%) java.math.BigInteger.add(long) (89 samples, 2.45%) java.math.BigDecimal.valueOf(java.math.BigInteger, int, int) (94 samples, 2.59%) oracle.jrockit.jfr.Settings$Aggregator.updateDefaults() (1 samples, 0.03%) java.security.MessageDigest$Delegate.engineUpdate(byte[], int, int) (16 samples, 0.44%) oracle.jrockit.jfr.JFRImpl.destroy() (1 samples, 0.03%) java.lang.StringBuilder.(init)() (43 samples, 1.19%) sun.misc.FloatingDecimal.getBinaryToASCIIBuffer() (43 samples, 1.19%) java.lang.String.(init)(char[], int, int) (2 samples, 0.06%) java.math.BigDecimal.valueOf(java.math.BigInteger, int, int) (17 samples, 0.47%) java.lang.StringCoding.encode(char[], int, int) (4 samples, 0.11%) sun.security.provider.SecureRandom.engineNextBytes(byte[]) (59 samples, 1.63%) oracle.jrockit.jfr.Settings$Aggregator.replaceEventDefaultSets(java.util.Collection) (1 samples, 0.03%) java.math.BigDecimal.add(java.math.BigDecimal) (137 samples, 3.78%) java.lang.StringCoding$StringEncoder.encode(char[], int, int) (4 samples, 0.11%) sun.security.provider.NativePRNG$RandomIO.implNextBytes(byte[]) (76 samples, 2.09%) java.lang.ThreadLocal$ThreadLocalMap.access$000(java.lang.ThreadLocal$ThreadLocalMap, java.lang.ThreadLocal) (42 samples, 1.16%) java.lang.StringCoding.encode(java.lang.String, char[], int, int) (4 samples, 0.11%) java.lang.Double.toString(double) (1,312 samples, 36.16%) java.lang.Double.toString.. sun.security.jca.GetInstance.getInstance(java.lang.String, java.lang.Class, java.lang.String) (1 samples, 0.03%) oracle.jrockit.jfr.JFRImpl.dumpOnExit(oracle.jrockit.jfr.Recording) (1 samples, 0.03%) com.oracle.jrockit.jfr.EventToken.isEnabled() (1 samples, 0.03%) java.math.BigDecimal.valueOf(long, int) (1 samples, 0.03%) java.lang.AbstractStringBuilder.append(java.lang.String) (223 samples, 6.15%) ja.. java.lang.AbstractStringBuilder.ensureCapacityInternal(int) (1 samples, 0.03%) sun.security.provider.NativePRNG.engineNextBytes(byte[]) (499 samples, 13.75%) sun.secu.. java.lang.AbstractStringBuilder.expandCapacity(int) (1 samples, 0.03%) java.math.BigInteger.negate() (1 samples, 0.03%) java.math.BigDecimal.add(long, long, int) (1 samples, 0.03%) java.math.BigDecimal.add(long, int, long, int) (1 samples, 0.03%)

Java Mixed Mode Flame Graph


I'm using "perf-map-agent". Please build it first as mentioned in README.

I ran the highcpu program with "-XX:+PreserveFramePointer" JVM argument.

$ java -XX:+PreserveFramePointer -jar highcpu/target/highcpu-0.0.1-SNAPSHOT.jar

In another terminal, I started a perf recording for a minute.

$ sudo perf record -F 99 -g -p `pgrep -f highcpu` -- sleep 60

By default, the highcpu Java program runs for 2 minutes and therefore I waited for a minute to create the Java Symbol Map (I did this in another terminal).


$ cd ~/performance/git-projects/perf-map-agent
$ sudo sleep 60;bin/create-java-perf-map.sh `pgrep -f highcpu`

After the perf recording completed and the symbol file created in /tmp, we can generate the Java mixed-mode flame graph


Let's generate the flame graph.


sample-java-programs$ sudo perf script | ~/performance/brendangregg-git/FlameGraph/stackcollapse-perf.pl > /tmp/out.perf-folded
sample-java-programs$ cat /tmp/out.perf-folded | ~/performance/brendangregg-git/FlameGraph/flamegraph.pl --color=java --width 1680  > /tmp/highcpu-perf.svg


Flame Graph Reset Zoom Search pthread_cond_timedwait@@GLIBC_2.3.2 (46 samples, 0.28%) _ZN13SharedRuntime26complete_monitor_locking_CEP7oopDescP9BasicLockP10JavaThread (147 samples, 0.90%) __perf_event_task_sched_in (4 samples, 0.02%) x86_pmu_enable (4 samples, 0.02%) SpinPause (5 samples, 0.03%) native_write_msr_safe (4 samples, 0.02%) do_futex (31 samples, 0.19%) handle_mm_fault (15 samples, 0.09%) local_apic_timer_interrupt (11 samples, 0.07%) perf_pmu_enable (4 samples, 0.02%) __schedule (59 samples, 0.36%) _ZN7Monitor5ILockEP6Thread (5 samples, 0.03%) native_write_msr_safe (52 samples, 0.32%) __perf_event_task_sched_in (4 samples, 0.02%) java/lang/Thread:.sleep (95 samples, 0.58%) perf_pmu_enable (40 samples, 0.24%) _ZN7Monitor28lock_without_safepoint_checkEv (5 samples, 0.03%) hrtimer_interrupt (10 samples, 0.06%) pick_next_entity (4 samples, 0.02%) enqueue_task_fair (9 samples, 0.06%) perf_event_context_sched_in (4 samples, 0.02%) do_futex (46 samples, 0.28%) perf_event_context_sched_in (4 samples, 0.02%) [unknown] (190 samples, 1.16%) __perf_event_task_sched_in (4 samples, 0.02%) __schedule (4 samples, 0.02%) sun/misc/FloatingDecimal$BinaryToASCIIBuffer:.dtoa (9 samples, 0.06%) intel_pmu_enable_all (4 samples, 0.02%) local_apic_timer_interrupt (20 samples, 0.12%) __schedule (5 samples, 0.03%) native_write_msr_safe (4 samples, 0.02%) smp_apic_timer_interrupt (7 samples, 0.04%) wake_up_process (4 samples, 0.02%) _ZN20SafepointSynchronize5blockEP10JavaThread (4 samples, 0.02%) smp_apic_timer_interrupt (11 samples, 0.07%) x86_pmu_enable (4 samples, 0.02%) java/lang/StringBuilder:.toString (11 samples, 0.07%) remote_function (4 samples, 0.02%) x86_pmu_enable (4 samples, 0.02%) native_write_msr_safe (4 samples, 0.02%) perf_event_context_sched_in (4 samples, 0.02%) page_fault (5 samples, 0.03%) _ZL4highd (4 samples, 0.02%) intel_pmu_enable_all (4 samples, 0.02%) perf_pmu_enable (8 samples, 0.05%) _ZN12MutableSpace12cas_allocateEm (4 samples, 0.02%) hrtimer_wakeup (4 samples, 0.02%) x86_pmu_enable (4 samples, 0.02%) _ZN7Monitor28lock_without_safepoint_checkEv (4 samples, 0.02%) tick_sched_timer (4 samples, 0.02%) _ZN13WatcherThread3runEv (23 samples, 0.14%) _ZN20SafepointSynchronize16do_cleanup_tasksEv (4 samples, 0.02%) sys_futex (4 samples, 0.02%) sun/misc/FloatingDecimal$BinaryToASCIIBuffer:.getChars (21 samples, 0.13%) system_call_fastpath (4 samples, 0.02%) __run_hrtimer (7 samples, 0.04%) jlong_disjoint_arraycopy (1,696 samples, 10.37%) jlong.. x86_pmu_enable (8 samples, 0.05%) sys_futex (4 samples, 0.02%) _ZN7Monitor5IWaitEP6Threadl (8 samples, 0.05%) _Z5fabsdd (9 samples, 0.06%) finish_task_switch (41 samples, 0.25%) call_stub (15,579 samples, 95.30%) call_stub system_call_fastpath (32 samples, 0.20%) perf_event_context_sched_in (8 samples, 0.05%) finish_task_switch (8 samples, 0.05%) __schedule (12 samples, 0.07%) scheduler_tick (6 samples, 0.04%) smp_apic_timer_interrupt (4 samples, 0.02%) _ZN7Monitor5IWaitEP6Threadl (4 samples, 0.02%) java/lang/AbstractStringBuilder:.append (8 samples, 0.05%) perf_pmu_enable (4 samples, 0.02%) finish_task_switch (4 samples, 0.02%) wake_up_process (7 samples, 0.04%) _new_array_nozero_Java (285 samples, 1.74%) _ZN20ParallelScavengeHeap12mem_allocateEmPb (24 samples, 0.15%) ttwu_do_activate.constprop.93 (11 samples, 0.07%) x86_pmu_enable (52 samples, 0.32%) __alloc_pages_nodemask (15 samples, 0.09%) tick_sched_handle.isra.16 (12 samples, 0.07%) handle_mm_fault (5 samples, 0.03%) extract_entropy_user (9 samples, 0.06%) hrtimer_interrupt (6 samples, 0.04%) try_to_wake_up (9 samples, 0.06%) java/math/BigDecimal:.subtract (382 samples, 2.34%) _ZN12CompileQueue3getEv (59 samples, 0.36%) scheduler_tick (4 samples, 0.02%) _ZN10JavaThread44check_safepoint_and_suspend_for_native_transEPS_ (10 samples, 0.06%) native_write_msr_safe (4 samples, 0.02%) native_write_msr_safe (4 samples, 0.02%) _ZN13ObjectMonitor6EnterIEP6Thread (110 samples, 0.67%) __schedule (62 samples, 0.38%) pthread_cond_wait@@GLIBC_2.3.2 (63 samples, 0.39%) activate_task (5 samples, 0.03%) page_fault (4 samples, 0.02%) native_write_msr_safe (12 samples, 0.07%) pick_next_task_fair (4 samples, 0.02%) futex_wait_queue_me (4 samples, 0.02%) flush_smp_call_function_queue (4 samples, 0.02%) group_sched_in (4 samples, 0.02%) update_process_times (7 samples, 0.04%) Interpreter (149 samples, 0.91%) futex_wait_queue_me (4 samples, 0.02%) _ZN9OopMapSet6all_doEPK5framePK11RegisterMapP10OopClosurePFvPP7oopDescSA_ES7_ (5 samples, 0.03%) do_futex (4 samples, 0.02%) retint_careful (13 samples, 0.08%) system_call_fastpath (4 samples, 0.02%) native_write_msr_safe (41 samples, 0.25%) enqueue_task_fair (4 samples, 0.02%) local_apic_timer_interrupt (4 samples, 0.02%) try_to_wake_up (7 samples, 0.04%) apic_timer_interrupt (32 samples, 0.20%) java/util/UUID:.randomUUID (31 samples, 0.19%) retint_careful (5 samples, 0.03%) sys_sched_yield (5 samples, 0.03%) x86_pmu_enable (12 samples, 0.07%) pthread_cond_wait@@GLIBC_2.3.2 (4 samples, 0.02%) x86_pmu_commit_txn (4 samples, 0.02%) dequeue_task_fair (9 samples, 0.06%) futex_wait_queue_me (59 samples, 0.36%) sys_futex (17 samples, 0.10%) intel_pmu_enable_all (4 samples, 0.02%) schedule_user (6 samples, 0.04%) call_function_single_interrupt (4 samples, 0.02%) finish_task_switch (4 samples, 0.02%) sun/security/provider/NativePRNG$RandomIO:.implNextBytes (30 samples, 0.18%) _ZN13GCTaskManager8get_taskEj (61 samples, 0.37%) try_to_wake_up (6 samples, 0.04%) try_to_wake_up (4 samples, 0.02%) _ZN7Monitor5IWaitEP6Threadl (12 samples, 0.07%) schedule (4 samples, 0.02%) futex_wait (5 samples, 0.03%) vfs_read (357 samples, 2.18%) schedule (8 samples, 0.05%) native_write_msr_safe (4 samples, 0.02%) __sched_yield (5 samples, 0.03%) perf_event_context_sched_in (12 samples, 0.07%) jshort_arraycopy (10 samples, 0.06%) do_futex (4 samples, 0.02%) __schedule (4 samples, 0.02%) system_call_fastpath (17 samples, 0.10%) _ZN8VMThread4loopEv (54 samples, 0.33%) native_write_msr_safe (8 samples, 0.05%) __memmove_ssse3_back (7 samples, 0.04%) perf_pmu_enable (8 samples, 0.05%) _ZN20SafepointSynchronize5blockEP10JavaThread (4 samples, 0.02%) x86_pmu_enable (60 samples, 0.37%) ttwu_do_activate.constprop.93 (5 samples, 0.03%) __perf_event_task_sched_in (8 samples, 0.05%) perf_event_context_sched_in (52 samples, 0.32%) _ZN2os5sleepEP6Threadlb (88 samples, 0.54%) tick_sched_handle.isra.16 (4 samples, 0.02%) call_rwsem_down_write_failed (4 samples, 0.02%) _ZN13InstanceKlass17allocate_instanceEP6Thread (10 samples, 0.06%) system_call_fastpath (5 samples, 0.03%) perf_event_context_sched_in (60 samples, 0.37%) security_file_permission (13 samples, 0.08%) [unknown] (378 samples, 2.31%) perf_pmu_enable (41 samples, 0.25%) jshort_disjoint_arraycopy (91 samples, 0.56%) __perf_event_task_sched_in (4 samples, 0.02%) _ZN7Monitor4waitEblb (12 samples, 0.07%) do_futex (4 samples, 0.02%) intel_pmu_enable_all (4 samples, 0.02%) perf_pmu_enable (16 samples, 0.10%) __schedule (8 samples, 0.05%) jshort_disjoint_arraycopy (11 samples, 0.07%) enqueue_task_fair (4 samples, 0.02%) sys_futex (72 samples, 0.44%) do_futex (71 samples, 0.43%) _ZN10JavaThread40check_special_condition_for_native_transEPS_ (10 samples, 0.06%) schedule (4 samples, 0.02%) _ZN7Monitor4waitEblb (8 samples, 0.05%) do_futex (4 samples, 0.02%) hrtimer_interrupt (9 samples, 0.06%) __alloc_pages_nodemask (5 samples, 0.03%) _ZN12VM_Operation8evaluateEv (8 samples, 0.05%) x86_pmu_enable (4 samples, 0.02%) __do_softirq (7 samples, 0.04%) do_futex (17 samples, 0.10%) common_file_perm (10 samples, 0.06%) java/util/UUID:.toString (5 samples, 0.03%) compact_zone_order (5 samples, 0.03%) __schedule (4 samples, 0.02%) sys_futex (4 samples, 0.02%) hrtimer_interrupt (23 samples, 0.14%) schedule (4 samples, 0.02%) schedule (42 samples, 0.26%) remote_function (4 samples, 0.02%) x86_pmu_commit_txn (4 samples, 0.02%) intel_pmu_enable_all (8 samples, 0.05%) _ZNK8CodeBlob9is_zombieEv (6 samples, 0.04%) wake_up_process (9 samples, 0.06%) all (16,348 samples, 100%) perf_pmu_enable (4 samples, 0.02%) tick_sched_handle.isra.16 (5 samples, 0.03%) vfs_read (12 samples, 0.07%) system_call_fastpath (372 samples, 2.28%) x86_pmu_enable (8 samples, 0.05%) native_write_msr_safe (4 samples, 0.02%) x86_pmu_enable (4 samples, 0.02%) tick_sched_timer (7 samples, 0.04%) _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread (15,579 samples, 95.30%) _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgum.. sys_futex (8 samples, 0.05%) java/lang/StrictMath:.atan (13 samples, 0.08%) finish_task_switch (40 samples, 0.24%) do_futex (4 samples, 0.02%) perf_event_context_sched_in (4 samples, 0.02%) futex_wait (4 samples, 0.02%) do_futex (4 samples, 0.02%) __run_hrtimer (5 samples, 0.03%) perf_event_context_sched_in (8 samples, 0.05%) do_page_fault (4 samples, 0.02%) _ZN7Monitor5ILockEP6Thread (4 samples, 0.02%) enqueue_task (4 samples, 0.02%) do_futex (4 samples, 0.02%) __perf_event_task_sched_in (12 samples, 0.07%) sys_futex (4 samples, 0.02%) _ZN7Monitor5IWaitEP6Threadl (58 samples, 0.35%) __alloc_pages_direct_compact (5 samples, 0.03%) handle_irq (4 samples, 0.02%) irq_exit (7 samples, 0.04%) java/math/BigDecimal:.init (27 samples, 0.17%) jfabs (75 samples, 0.46%) try_to_wake_up (4 samples, 0.02%) _ZNK20ParallelScavengeHeap29can_elide_tlab_store_barriersEv (4 samples, 0.02%) futex_wait (8 samples, 0.05%) jshort_arraycopy (4 samples, 0.02%) jfabs (4 samples, 0.02%) _ZN11OptoRuntime18new_array_nozero_CEP5KlassiP10JavaThread (278 samples, 1.70%) intel_pmu_enable_all (40 samples, 0.24%) java/io/FileInputStream:.read (5 samples, 0.03%) system_call_fastpath (9 samples, 0.06%) native_write_msr_safe (4 samples, 0.02%) update_process_times (4 samples, 0.02%) __fget_light (8 samples, 0.05%) compact_zone (5 samples, 0.03%) _ZN14TypeArrayKlass15allocate_commonEibP6Thread (4 samples, 0.02%) _ZN29VM_ParallelGCFailedAllocation4doitEv (8 samples, 0.05%) compact_zone (15 samples, 0.09%) intel_pmu_enable_all (4 samples, 0.02%) finish_task_switch (16 samples, 0.10%) __schedule (15 samples, 0.09%) compact_zone_order (15 samples, 0.09%) java/lang/StringBuilder:.append (27 samples, 0.17%) x86_pmu_enable (8 samples, 0.05%) __run_hrtimer (4 samples, 0.02%) apic_timer_interrupt (15 samples, 0.09%) futex_wait (54 samples, 0.33%) schedule (4 samples, 0.02%) sys_mprotect (4 samples, 0.02%) futex_wait_queue_me (8 samples, 0.05%) pthread_cond_signal@@GLIBC_2.3.2 (32 samples, 0.20%) pthread_cond_timedwait@@GLIBC_2.3.2 (74 samples, 0.45%) _ZN9CodeCache9find_blobEPv (14 samples, 0.09%) _ZN7Monitor4waitEblb (56 samples, 0.34%) urandom_read (10 samples, 0.06%) schedule (17 samples, 0.10%) sys_futex (46 samples, 0.28%) apparmor_file_permission (10 samples, 0.06%) intel_pmu_enable_all (8 samples, 0.05%) apic_timer_interrupt (20 samples, 0.12%) _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask (5 samples, 0.03%) [unknown] (461 samples, 2.82%) enqueue_task_fair (4 samples, 0.02%) java/lang/String:.init (33 samples, 0.20%) hrtimer_wakeup (12 samples, 0.07%) perf_event_context_sched_in (4 samples, 0.02%) schedule (12 samples, 0.07%) futex_wait (4 samples, 0.02%) perf_event_context_sched_in (41 samples, 0.25%) flush_smp_call_function_queue (4 samples, 0.02%) smp_call_function_single_interrupt (4 samples, 0.02%) __run_hrtimer (9 samples, 0.06%) java (16,348 samples, 100.00%) java retint_careful (11 samples, 0.07%) perf_event_context_sched_in (4 samples, 0.02%) perf_pmu_enable (60 samples, 0.37%) jshort_disjoint_arraycopy (33 samples, 0.20%) sun/security/provider/SHA:.implCompress (71 samples, 0.43%) _ZN21ThreadStateTransition15trans_and_fenceE15JavaThreadStateS0_ (4 samples, 0.02%) enqueue_task (5 samples, 0.03%) _ZN14TypeArrayKlass15allocate_commonEibP6Thread (101 samples, 0.62%) _ZN11OptoRuntime27is_deoptimized_caller_frameEP10JavaThread (114 samples, 0.70%) x86_pmu_enable (8 samples, 0.05%) __schedule (42 samples, 0.26%) _ZNK13WatcherThread5sleepEv (21 samples, 0.13%) deactivate_task (11 samples, 0.07%) jshort_disjoint_arraycopy (8 samples, 0.05%) intel_pmu_enable_all (12 samples, 0.07%) intel_pmu_enable_all (4 samples, 0.02%) finish_task_switch (12 samples, 0.07%) apic_timer_interrupt (9 samples, 0.06%) _Z13__ieee754_powdd (50 samples, 0.31%) intel_pmu_enable_all (4 samples, 0.02%) _ZN7Monitor28lock_without_safepoint_checkEv (4 samples, 0.02%) futex_wait (17 samples, 0.10%) hrtimer_interrupt (17 samples, 0.10%) ret_from_intr (5 samples, 0.03%) java/math/BigDecimal:.add (286 samples, 1.75%) __do_page_fault (15 samples, 0.09%) futex_wait_queue_me (4 samples, 0.02%) smp_apic_timer_interrupt (21 samples, 0.13%) jni_SetByteArrayRegion (5 samples, 0.03%) native_write_msr_safe (4 samples, 0.02%) java/io/FileInputStream:.readBytes (24 samples, 0.15%) java/io/FileInputStream:.read (24 samples, 0.15%) jshort_arraycopy (13 samples, 0.08%) x86_pmu_enable (4 samples, 0.02%) sys_futex (61 samples, 0.37%) futex_wait (4 samples, 0.02%) futex_wake_op (31 samples, 0.19%) task_tick_fair (4 samples, 0.02%) futex_wait (4 samples, 0.02%) perf_event_context_sched_in (4 samples, 0.02%) system_call_fastpath (8 samples, 0.05%) jni_GetObjectField (16 samples, 0.10%) perf_pmu_enable (4 samples, 0.02%) futex_wait (46 samples, 0.28%) x86_pmu_enable (4 samples, 0.02%) com/github/chrishantha/sample/highcpu/MathWorker:.run (5 samples, 0.03%) system_call_fastpath (4 samples, 0.02%) hrtimer_wakeup (6 samples, 0.04%) sun/security/provider/NativePRNG$RandomIO:.implNextBytes (4 samples, 0.02%) futex_wait_queue_me (4 samples, 0.02%) java/math/BigDecimal:.add (11 samples, 0.07%) enqueue_task (5 samples, 0.03%) enqueue_task (6 samples, 0.04%) perf_pmu_enable (8 samples, 0.05%) hrtimer_wakeup (7 samples, 0.04%) pthread_cond_wait@@GLIBC_2.3.2 (4 samples, 0.02%) local_apic_timer_interrupt (17 samples, 0.10%) jshort_disjoint_arraycopy (74 samples, 0.45%) x86_pmu_enable (40 samples, 0.24%) _ZN13CollectedHeap24common_mem_allocate_initE11KlassHandlemP6Thread (10 samples, 0.06%) _ZN14JNIHandleBlock15allocate_handleEP7oopDesc (6 samples, 0.04%) try_to_wake_up (25 samples, 0.15%) ttwu_do_activate.constprop.93 (6 samples, 0.04%) _ZNK5frame6senderEP11RegisterMap (77 samples, 0.47%) _ZN7Monitor5IWaitEP6Threadl (52 samples, 0.32%) _ZN7Monitor4lockEP6Thread (4 samples, 0.02%) _ZN13ObjectMonitor20TrySpin_VaryDurationEP6Thread (18 samples, 0.11%) perf_event_context_sched_in (8 samples, 0.05%) try_to_compact_pages (5 samples, 0.03%) _ZN9JavaCalls12call_virtualEP9JavaValue6Handle11KlassHandleP6SymbolS5_P6Thread (15,579 samples, 95.30%) _ZN9JavaCalls12call_virtualEP9JavaValue6Handle11KlassHandleP6SymbolS5.. do_futex (12 samples, 0.07%) system_call_fastpath (4 samples, 0.02%) _ZN10JavaThread17thread_main_innerEv (15,643 samples, 95.69%) _ZN10JavaThread17thread_main_innerEv Interpreter (15,579 samples, 95.30%) Interpreter update_process_times (4 samples, 0.02%) _complete_monitor_locking_Java (148 samples, 0.91%) wake_up_process (4 samples, 0.02%) perf_event_context_sched_in (8 samples, 0.05%) readBytes (438 samples, 2.68%) _new_instance_Java (12 samples, 0.07%) _ZN9JavaCalls12call_virtualEP9JavaValue11KlassHandleP6SymbolS4_P17JavaCallArgumentsP6Thread (15,579 samples, 95.30%) _ZN9JavaCalls12call_virtualEP9JavaValue11KlassHandleP6SymbolS4_P17Jav.. extract_buf (8 samples, 0.05%) readBytes (24 samples, 0.15%) do_futex (8 samples, 0.05%) _ZN7Monitor5IWaitEP6Threadl (20 samples, 0.12%) futex_wait_queue_me (4 samples, 0.02%) local_apic_timer_interrupt (6 samples, 0.04%) pthread_cond_wait@@GLIBC_2.3.2 (9 samples, 0.06%) _ZL12thread_entryP10JavaThreadP6Thread (15,579 samples, 95.30%) _ZL12thread_entryP10JavaThreadP6Thread sun/misc/FloatingDecimal$BinaryToASCIIBuffer:.dtoa (9 samples, 0.06%) _ZNK8CodeHeap10find_startEPv (6 samples, 0.04%) x86_pmu_enable (4 samples, 0.02%) x86_pmu_enable (4 samples, 0.02%) _ZN17HandleMarkCleanerD1Ev (4 samples, 0.02%) _ZNK5frame25sender_for_compiled_frameEP11RegisterMap (63 samples, 0.39%) __perf_event_task_sched_in (16 samples, 0.10%) do_futex (17 samples, 0.10%) native_write_msr_safe (4 samples, 0.02%) _ZN7Monitor5ILockEP6Thread (9 samples, 0.06%) _ZN7Monitor28lock_without_safepoint_checkEv (9 samples, 0.06%) finish_task_switch (8 samples, 0.05%) intel_pmu_enable_all (4 samples, 0.02%) __perf_event_task_sched_in (8 samples, 0.05%) local_apic_timer_interrupt (9 samples, 0.06%) smp_apic_timer_interrupt (30 samples, 0.18%) schedule_user (13 samples, 0.08%) _ZN10JfrBackend16is_event_enabledE12TraceEventId (4 samples, 0.02%) sys_futex (32 samples, 0.20%) system_call_fastpath (11 samples, 0.07%) perf_pmu_enable (4 samples, 0.02%) _ZN13CollectedHeap27post_allocation_setup_arrayE11KlassHandleP8HeapWordi (19 samples, 0.12%) java/util/UUID:.toString (137 samples, 0.84%) perf_pmu_enable (4 samples, 0.02%) _ZN7Monitor28lock_without_safepoint_checkEv (4 samples, 0.02%) pthread_cond_wait@@GLIBC_2.3.2 (4 samples, 0.02%) perf_pmu_enable (12 samples, 0.07%) select_task_rq_fair (7 samples, 0.04%) intel_pmu_enable_all (4 samples, 0.02%) _ZN7Monitor4waitEblb (5 samples, 0.03%) system_call_fastpath (61 samples, 0.37%) system_call_fastpath (4 samples, 0.02%) __schedule (10 samples, 0.06%) _ZN19OldToYoungRootsTask5do_itEP13GCTaskManagerj (4 samples, 0.02%) perf_event_context_sched_in (8 samples, 0.05%) alloc_pages_vma (5 samples, 0.03%) finish_task_switch (12 samples, 0.07%) futex_wait_queue_me (54 samples, 0.33%) _ZN7Monitor5ILockEP6Thread (4 samples, 0.02%) com/github/chrishantha/sample/highcpu/HashingWorker:.run (4 samples, 0.02%) retint_careful (8 samples, 0.05%) generic_smp_call_function_single_interrupt (4 samples, 0.02%) schedule (15 samples, 0.09%) hrtimer_wakeup (4 samples, 0.02%) retint_careful (7 samples, 0.04%) __schedule (4 samples, 0.02%) _ZN11StringTable17unlink_or_oops_doEP17BoolObjectClosureP10OopClosurePiS4_ (4 samples, 0.02%) __perf_event_task_sched_in (12 samples, 0.07%) apic_timer_interrupt (4 samples, 0.02%) native_write_msr_safe (4 samples, 0.02%) finish_task_switch (8 samples, 0.05%) x86_pmu_enable (16 samples, 0.10%) perf_pmu_enable (52 samples, 0.32%) do_IRQ (5 samples, 0.03%) _ZN10PSScavenge16invoke_no_policyEv (6 samples, 0.04%) activate_task (6 samples, 0.04%) _ZN13CollectedHeap23allocate_from_tlab_slowE11KlassHandleP6Threadm (14 samples, 0.09%) handle_mm_fault (4 samples, 0.02%) futex_wait (17 samples, 0.10%) _ZN12GCTaskThread3runEv (99 samples, 0.61%) perf_event_context_sched_in (4 samples, 0.02%) _ZN7Monitor28lock_without_safepoint_checkEv (4 samples, 0.02%) _ZN7Monitor5ILockEP6Thread (4 samples, 0.02%) perf_event_context_sched_in (4 samples, 0.02%) __schedule (13 samples, 0.08%) finish_task_switch (4 samples, 0.02%) do_huge_pmd_anonymous_page (5 samples, 0.03%) sun/misc/FloatingDecimal$BinaryToASCIIBuffer:.dtoa (2,921 samples, 17.87%) sun/misc/Fl.. __sched_yield (12 samples, 0.07%) pthread_getspecific (14 samples, 0.09%) schedule (4 samples, 0.02%) __schedule (9 samples, 0.06%) sys_futex (9 samples, 0.06%) schedule (4 samples, 0.02%) futex_wait (4 samples, 0.02%) jshort_disjoint_arraycopy (3,008 samples, 18.40%) jshort_disj.. alloc_pages_vma (15 samples, 0.09%) wake_futex (28 samples, 0.17%) sys_futex (4 samples, 0.02%) scheduler_tick (8 samples, 0.05%) local_apic_timer_interrupt (7 samples, 0.04%) native_write_msr_safe (4 samples, 0.02%) __schedule (4 samples, 0.02%) apic_timer_interrupt (22 samples, 0.13%) _ZN20WaitForBarrierGCTask5do_itEP13GCTaskManagerj (12 samples, 0.07%) __vdso_gettimeofday (4 samples, 0.02%) native_write_msr_safe (4 samples, 0.02%) _ZN13CollectedHeap23allocate_from_tlab_slowE11KlassHandleP6Threadm (4 samples, 0.02%) x86_pmu_enable (4 samples, 0.02%) perf_pmu_enable (8 samples, 0.05%) sun/security/provider/SHA:.implCompress (47 samples, 0.29%) java/util/UUID:.randomUUID (4 samples, 0.02%) tick_sched_timer (12 samples, 0.07%) _raw_spin_lock_irqsave (4 samples, 0.02%) sys_read (370 samples, 2.26%) futex_wait (70 samples, 0.43%) perf_pmu_enable (4 samples, 0.02%) jlong_disjoint_arraycopy (28 samples, 0.17%) pageblock_pfn_to_page (11 samples, 0.07%) _ZN8VMThread18evaluate_operationEP12VM_Operation (8 samples, 0.05%) native_write_msr_safe (40 samples, 0.24%) sys_futex (17 samples, 0.10%) do_futex (5 samples, 0.03%) intel_pmu_enable_all (52 samples, 0.32%) _raw_spin_lock (4 samples, 0.02%) __vfs_read (331 samples, 2.02%) pthread_cond_wait@@GLIBC_2.3.2 (4 samples, 0.02%) __run_hrtimer (20 samples, 0.12%) update_process_times (12 samples, 0.07%) _ZN20SafepointSynchronize5blockEP10JavaThread (4 samples, 0.02%) futex_wait_queue_me (4 samples, 0.02%) _ZN13ObjectMonitor10ExitEpilogEP6ThreadP12ObjectWaiter (35 samples, 0.21%) __alloc_pages_direct_compact (15 samples, 0.09%) ttwu_do_activate.constprop.93 (8 samples, 0.05%) _ZN11OptoRuntime11new_array_CEP5KlassiP10JavaThread (4 samples, 0.02%) __do_page_fault (5 samples, 0.03%) dequeue_task (9 samples, 0.06%) finish_task_switch (12 samples, 0.07%) generic_smp_call_function_single_interrupt (4 samples, 0.02%) _new_array_nozero_Java (4 samples, 0.02%) perf_event_context_sched_in (12 samples, 0.07%) sun/misc/FloatingDecimal$BinaryToASCIIBuffer:.getChars (235 samples, 1.44%) system_call_fastpath (4 samples, 0.02%) _ZN20ParallelScavengeHeap12mem_allocateEmPb (7 samples, 0.04%) tick_sched_timer (7 samples, 0.04%) intel_pmu_enable_all (4 samples, 0.02%) _ZNK7nmethod9is_zombieEv (6 samples, 0.04%) schedule (59 samples, 0.36%) sys_sched_yield (11 samples, 0.07%) try_to_compact_pages (15 samples, 0.09%) perf_pmu_enable (4 samples, 0.02%) __schedule (8 samples, 0.05%) __schedule (4 samples, 0.02%) __schedule (4 samples, 0.02%) do_futex (61 samples, 0.37%) _ZN13SharedRuntime4dpowEdd (64 samples, 0.39%) [unknown] (458 samples, 2.80%) __GI___mprotect (4 samples, 0.02%) futex_wait_queue_me (16 samples, 0.10%) pthread_cond_wait@@GLIBC_2.3.2 (8 samples, 0.05%) schedule (5 samples, 0.03%) pick_next_task_fair (7 samples, 0.04%) schedule_user (10 samples, 0.06%) intel_pmu_enable_all (4 samples, 0.02%) sys_futex (12 samples, 0.07%) task_tick_fair (4 samples, 0.02%) _ZN13ObjectMonitor11NotRunnableEP6ThreadS1_ (6 samples, 0.04%) system_call_fastpath (12 samples, 0.07%) sys_futex (4 samples, 0.02%) do_futex (4 samples, 0.02%) ttwu_do_activate.constprop.93 (4 samples, 0.02%) finish_task_switch (4 samples, 0.02%) schedule (62 samples, 0.38%) native_write_msr_safe (12 samples, 0.07%) _ZN12MutableSpace12cas_allocateEm (5 samples, 0.03%) __perf_event_enable (4 samples, 0.02%) __lll_lock_wait (4 samples, 0.02%) com/github/chrishantha/sample/highcpu/HashingWorker:.run (7,530 samples, 46.06%) com/github/chrishantha/sample/hi.. schedule (4 samples, 0.02%) intel_pmu_enable_all (8 samples, 0.05%) _ZN20ThreadInVMfromNativeD1Ev (4 samples, 0.02%) enqueue_entity (5 samples, 0.03%) perf_pmu_enable (4 samples, 0.02%) _ZN20SafepointSynchronize5blockEP10JavaThread (5 samples, 0.03%) _ZN7Monitor4waitEblb (20 samples, 0.12%) x86_pmu_enable (8 samples, 0.05%) _ZN20ParallelScavengeHeap19failed_mem_allocateEm (8 samples, 0.05%) native_write_msr_safe (8 samples, 0.05%) _ZN10JavaThread44check_safepoint_and_suspend_for_native_transEPS_ (4 samples, 0.02%) _ZN8VMThread3runEv (54 samples, 0.33%) enqueue_task_fair (5 samples, 0.03%) sun/security/provider/NativePRNG$RandomIO:.implNextBytes (625 samples, 3.82%) hrtimer_interrupt (7 samples, 0.04%) futex_wait_queue_me (9 samples, 0.06%) intel_pmu_enable_all (8 samples, 0.05%) __schedule (54 samples, 0.33%) enqueue_task (5 samples, 0.03%) native_write_msr_safe (16 samples, 0.10%) jbyte_arraycopy (4 samples, 0.02%) perf_event_context_sched_in (12 samples, 0.07%) perf_event_context_sched_in (4 samples, 0.02%) futex_wait_queue_me (12 samples, 0.07%) _ZN18CardTableExtension26scavenge_contents_parallelEP16ObjectStartArrayP12MutableSpaceP8HeapWordP18PSPromotionManagerjj (4 samples, 0.02%) futex_wait_queue_me (5 samples, 0.03%) intel_pmu_enable_all (60 samples, 0.37%) __perf_event_enable (4 samples, 0.02%) __run_hrtimer (10 samples, 0.06%) _ZN7Monitor5ILockEP6Thread (4 samples, 0.02%) dequeue_entity (4 samples, 0.02%) schedule (4 samples, 0.02%) wake_up_process (6 samples, 0.04%) _ZN20SafepointSynchronize5blockEP10JavaThread (10 samples, 0.06%) __run_hrtimer (22 samples, 0.13%) hrtimer_wakeup (9 samples, 0.06%) native_write_msr_safe (8 samples, 0.05%) sys_futex (4 samples, 0.02%) Java_java_lang_StrictMath_atan (6 samples, 0.04%) perf_event_context_sched_in (40 samples, 0.24%) _ZN7Monitor5ILockEP6Thread (4 samples, 0.02%) _mix_pool_bytes (56 samples, 0.34%) smp_apic_timer_interrupt (14 samples, 0.09%) _ZN20SafepointSynchronize5beginEv (28 samples, 0.17%) perf_pmu_enable (4 samples, 0.02%) system_call_fastpath (4 samples, 0.02%) futex_wait_queue_me (17 samples, 0.10%) java/math/BigDecimal:.init (1,047 samples, 6.40%) ja.. perf_pmu_enable (4 samples, 0.02%) schedule_user (4 samples, 0.02%) x86_pmu_enable (12 samples, 0.07%) _ZN10JavaThread7oops_doEP10OopClosureP10CLDClosureP15CodeBlobClosure (8 samples, 0.05%) schedule (8 samples, 0.05%) enqueue_entity (5 samples, 0.03%) java/io/FileInputStream:.read (449 samples, 2.75%) do_futex (9 samples, 0.06%) java/lang/StrictMath:.atan (662 samples, 4.05%) j.. finish_task_switch (4 samples, 0.02%) java/math/BigDecimal:.subtract (14 samples, 0.09%) hrtimer_interrupt (20 samples, 0.12%) __perf_event_task_sched_in (4 samples, 0.02%) futex_wait (59 samples, 0.36%) _ZN10JavaThread3runEv (15,643 samples, 95.69%) _ZN10JavaThread3runEv JVM_Sleep (94 samples, 0.57%) try_to_wake_up (11 samples, 0.07%) do_page_fault (5 samples, 0.03%) intel_pmu_enable_all (16 samples, 0.10%) _ZL4highd (6 samples, 0.04%) jshort_disjoint_arraycopy (11 samples, 0.07%) __schedule (4 samples, 0.02%) com/github/chrishantha/sample/highcpu/MathWorker:.run (7,570 samples, 46.31%) com/github/chrishantha/sample/hi.. sha_transform (4 samples, 0.02%) ttwu_do_activate.constprop.93 (5 samples, 0.03%) activate_task (5 samples, 0.03%) __perf_event_task_sched_in (4 samples, 0.02%) jlong_disjoint_arraycopy (101 samples, 0.62%) tick_sched_handle.isra.16 (7 samples, 0.04%) _ZL3lowd (6 samples, 0.04%) system_call_fastpath (46 samples, 0.28%) __schedule (6 samples, 0.04%) intel_pmu_enable_all (12 samples, 0.07%) intel_pmu_enable_all (41 samples, 0.25%) futex_wait (12 samples, 0.07%) jni_GetArrayLength (4 samples, 0.02%) pthread_cond_timedwait@@GLIBC_2.3.2 (17 samples, 0.10%) schedule_user (11 samples, 0.07%) __run_hrtimer (16 samples, 0.10%) [unknown] (12 samples, 0.07%) _ZN13CompileBroker20compiler_thread_loopEv (64 samples, 0.39%) futex_wait_queue_me (4 samples, 0.02%) pthread_cond_timedwait@@GLIBC_2.3.2 (18 samples, 0.11%) system_call_fastpath (55 samples, 0.34%) do_futex (55 samples, 0.34%) do_page_fault (15 samples, 0.09%) futex_wait (4 samples, 0.02%) __vfs_read (11 samples, 0.07%) jatan (592 samples, 3.62%) __perf_event_task_sched_in (4 samples, 0.02%) _ZN13CollectedHeap23allocate_from_tlab_slowE11KlassHandleP6Threadm (7 samples, 0.04%) __perf_event_task_sched_in (8 samples, 0.05%) __perf_event_task_sched_in (40 samples, 0.24%) java/lang/AbstractStringBuilder:.append (120 samples, 0.73%) smp_apic_timer_interrupt (9 samples, 0.06%) __fdget_pos (8 samples, 0.05%) activate_task (4 samples, 0.02%) system_call_fastpath (12 samples, 0.07%) system_call_fastpath (17 samples, 0.10%) futex_wait_queue_me (43 samples, 0.26%) copy_user_enhanced_fast_string (9 samples, 0.06%) do_huge_pmd_anonymous_page (15 samples, 0.09%) __fget_light (4 samples, 0.02%) _ZN9StealTask5do_itEP13GCTaskManagerj (11 samples, 0.07%) system_call_fastpath (4 samples, 0.02%) perf_pmu_enable (4 samples, 0.02%) apic_timer_interrupt (7 samples, 0.04%) perf_pmu_enable (12 samples, 0.07%) finish_task_switch (4 samples, 0.02%) _new_array_Java (5 samples, 0.03%) __perf_event_task_sched_in (12 samples, 0.07%) __do_page_fault (4 samples, 0.02%) rw_verify_area (19 samples, 0.12%) __perf_event_task_sched_in (4 samples, 0.02%) enqueue_task (10 samples, 0.06%) _ZN13ObjectMonitor5enterEP6Thread (140 samples, 0.86%) pthread_cond_wait@@GLIBC_2.3.2 (12 samples, 0.07%) _ZN10JavaThread44check_safepoint_and_suspend_for_native_transEPS_ (4 samples, 0.02%) sha_transform (164 samples, 1.00%) futex_wait (4 samples, 0.02%) _ZN9CodeCache9find_blobEPv (8 samples, 0.05%) futex_wait (4 samples, 0.02%) sys_futex (55 samples, 0.34%) __schedule (4 samples, 0.02%) _ZNK8CodeHeap10find_startEPv (7 samples, 0.04%) futex_wait_queue_me (4 samples, 0.02%) __perf_event_task_sched_in (60 samples, 0.37%) __schedule (8 samples, 0.05%) pthread_cond_timedwait@@GLIBC_2.3.2 (5 samples, 0.03%) sys_futex (4 samples, 0.02%) _ZN8VMThread7executeEP12VM_Operation (5 samples, 0.03%) java/math/BigDecimal:.subtract (14 samples, 0.09%) native_write_msr_safe (60 samples, 0.37%) wake_up_process (12 samples, 0.07%) __perf_event_task_sched_in (4 samples, 0.02%) start_thread (15,819 samples, 96.76%) start_thread smp_call_function_single_interrupt (4 samples, 0.02%) pthread_cond_wait@@GLIBC_2.3.2 (55 samples, 0.34%) java/io/FileInputStream:.readBytes (446 samples, 2.73%) perf_pmu_enable (12 samples, 0.07%) jlong_disjoint_arraycopy (11 samples, 0.07%) local_apic_timer_interrupt (23 samples, 0.14%) __fget (8 samples, 0.05%) jni_GetArrayLength (5 samples, 0.03%) _ZN13SharedRuntime4dpowEdd (4 samples, 0.02%) __perf_event_task_sched_in (52 samples, 0.32%) __perf_event_task_sched_in (8 samples, 0.05%) enqueue_task_fair (6 samples, 0.04%) native_write_msr_safe (8 samples, 0.05%) x86_pmu_enable (4 samples, 0.02%) perf_event_context_sched_in (16 samples, 0.10%) sys_futex (5 samples, 0.03%) system_call_fastpath (72 samples, 0.44%) schedule (54 samples, 0.33%) _ZNK8CodeHeap10find_startEPv (6 samples, 0.04%) finish_task_switch (4 samples, 0.02%) __perf_event_task_sched_in (4 samples, 0.02%) x86_pmu_enable (12 samples, 0.07%) activate_task (10 samples, 0.06%) _ZN22ParallelTaskTerminator17offer_terminationEP20TerminatorTerminator (5 samples, 0.03%) jbyte_disjoint_arraycopy (10 samples, 0.06%) perf_event_context_sched_in (4 samples, 0.02%) account (6 samples, 0.04%) __perf_event_task_sched_in (41 samples, 0.25%) finish_task_switch (52 samples, 0.32%) _ZN11OptoRuntime14new_instance_CEP5KlassP10JavaThread (12 samples, 0.07%) jni_SetByteArrayRegion (4 samples, 0.02%) _ZN7nmethod21get_deopt_original_pcEPK5frame (11 samples, 0.07%) intel_pmu_enable_all (4 samples, 0.02%) _ZN20SafepointSynchronize5blockEP10JavaThread (4 samples, 0.02%) sun/security/provider/ByteArrayAccess:.b2iBig64 (6 samples, 0.04%) memset (8 samples, 0.05%) __perf_event_task_sched_in (8 samples, 0.05%) _ZN5frame16oops_do_internalEP10OopClosureP10CLDClosureP15CodeBlobClosureP11RegisterMapb (5 samples, 0.03%) _ZN10PSScavenge6invokeEv (8 samples, 0.05%) java/math/BigDecimal:.init (10 samples, 0.06%) pthread_cond_wait@@GLIBC_2.3.2 (4 samples, 0.02%) hrtimer_interrupt (4 samples, 0.02%) run_timer_softirq (5 samples, 0.03%) apic_timer_interrupt (11 samples, 0.07%) activate_task (5 samples, 0.03%) schedule (4 samples, 0.02%) _ZL10java_startP6Thread (15,819 samples, 96.76%) _ZL10java_startP6Thread extract_buf (297 samples, 1.82%) page_fault (15 samples, 0.09%) perf_pmu_enable (4 samples, 0.02%) extract_entropy_user (316 samples, 1.93%) sys_read (12 samples, 0.07%) _ZN7Monitor4waitEblb (58 samples, 0.35%) finish_task_switch (4 samples, 0.02%) smp_apic_timer_interrupt (19 samples, 0.12%) _ZN9OopMapSet19update_register_mapEPK5frameP11RegisterMap (21 samples, 0.13%) futex_wait (9 samples, 0.06%) finish_task_switch (4 samples, 0.02%) pthread_cond_wait@@GLIBC_2.3.2 (5 samples, 0.03%) pthread_cond_wait@@GLIBC_2.3.2 (4 samples, 0.02%) _ZN7Monitor28lock_without_safepoint_checkEv (4 samples, 0.02%) group_sched_in (4 samples, 0.02%) finish_task_switch (8 samples, 0.05%) _ZN15ThreadRootsTask5do_itEP13GCTaskManagerj (8 samples, 0.05%) _ZN13ObjectMonitor11NotRunnableEP6ThreadS1_ (5 samples, 0.03%) system_call_fastpath (5 samples, 0.03%) urandom_read (329 samples, 2.01%) __perf_event_task_sched_in (4 samples, 0.02%) _ZN20SafepointSynchronize5blockEP10JavaThread (4 samples, 0.02%) __mix_pool_bytes (59 samples, 0.36%) intel_pmu_enable_all (4 samples, 0.02%) finish_task_switch (4 samples, 0.02%) _ZN7Monitor5IWaitEP6Threadl (5 samples, 0.03%) __schedule (10 samples, 0.06%) x86_pmu_enable (41 samples, 0.25%) jshort_disjoint_arraycopy (15 samples, 0.09%) native_write_msr_safe (12 samples, 0.07%) _ZN13ObjectMonitor20TrySpin_VaryDurationEP6Thread (15 samples, 0.09%) retint_careful (10 samples, 0.06%) finish_task_switch (60 samples, 0.37%) rwsem_down_write_failed (4 samples, 0.02%) finish_task_switch (8 samples, 0.05%) futex_wait_queue_me (64 samples, 0.39%) _ZN7Monitor4waitEblb (4 samples, 0.02%) x86_pmu_enable (4 samples, 0.02%) finish_task_switch (4 samples, 0.02%) system_call_fastpath (4 samples, 0.02%) schedule_user (8 samples, 0.05%) call_function_single_interrupt (4 samples, 0.02%) native_write_msr_safe (8 samples, 0.05%) perf_pmu_enable (8 samples, 0.05%) intel_pmu_enable_all (8 samples, 0.05%) wake_up_state (27 samples, 0.17%) system_call_after_swapgs (5 samples, 0.03%) _ZN13SharedRuntime28complete_monitor_unlocking_CEP7oopDescP9BasicLock (38 samples, 0.23%) __schedule (17 samples, 0.10%) __schedule (4 samples, 0.02%) intel_pmu_enable_all (12 samples, 0.07%)

Summary


When we compare the Flame Graphs, we can see the importance of Java Mixed Mode Flame Graphs. Since there are system code paths in addition to Java methods, we can clearly see all CPU consuming functions.  For example, in above Mixed Mode Flame Graph, we can see HashingWorker threads also consume more CPU. With Flame Graph generated with JFR, we see that MathWorker threads consume more CPU. Please see the flame-graphs in my GitHub repo.

With perf_events, we can do system profiling. The Java "-XX:+PreserveFramePointer" JVM argument and perf-map-agent help to create the Java Symbol Table for perf. The CPU Mixed-Mode Flame graphs can nicely show all CPU consumers in one visualization.