Visualising JVM Metrics Using Prometheus and Grafana

Visualizing Java Virtual Machine metrics is really important for keeping Java applications running smoothly and working well. In life situations things like the Java Virtual Machine using more memory, the system taking a long time to do its job, more threads being used or the website taking longer to respond can get out of hand very quickly. If we can catch these problems in seconds of hours it can make a big difference in whether we have a small issue or a big one that shuts everything down.

This article is a step by step guide to using Java Virtual Machine metrics with Prometheus and Grafana in companies. When we use Prometheus to store data and Grafana to make graphs we have one of the popular ways to monitor Java applications.

We will cover these topics:

  • How to get Java Virtual Machine metrics from a Spring Boot or regular Java application
  • How to set up Prometheus to get and store data over time
  • How to make Grafana dashboards to see what is happening in time
  • How to write PromQL queries to monitor the Java Virtual Machine
  • How to use what we learn to dig and fix problems

By the time we are done we will have a system that can monitor the Java Virtual Machine and tell us about memory usage, garbage collection, thread activity, CPU usage and how the website is doing. And this will help us find and fix problems faster and be more confident in what we are doing with Java Virtual Machine metrics and the Java Virtual Machine.

Why Use Prometheus and Grafana for Visualizing JVM Metrics?

Before diving into code or the implementation, we must understand why this particular pairing has become so important in the Java enterprise ecosystem. 

The Pull-Based Model for JVM Monitoring

The Prometheus database is a time series database. You can find more information about time series databases here. It means the metrics along with their timestamps are stored in the Prometheus database. While they are retrieved, the select query also has a timestamp associated with it. The Prometheus database uses a pull model. That means instead of our application pushing metrics to a central server, Prometheus database scrapes an HTTP endpoint our application exposes. This architecture has important advantages : first being our Java app does not need to do extra work of pushing or writing metrics to an external system. Secondly, our app doesn’t need to know where Prometheus lives since scrape intervals are centrally controlled. We tell the prometheus server what IP and port our applications(s) run(s) on.

Dimensional Data and PromQL for JVM Metrics

Every metric in the Prometheus database is identified by a name and a set of key-value pairs. This allows us to slice and aggregate data with PromQL, Prometheus’ powerful query language. More about PromQL can be found here.

Grafana UI as the Visualization Layer 

Grafana UI for visualizing JVM metrics connects to Prometheus database and provides a rich, interactive, highly customization set of dashboards. It supports variable templating, alerting, and a plugin ecosystem that extends its capabilities far beyond raw charting. Grafana UI for visualizing JVM metrics itself runs as a separate component on a separate server traditionally. But in this article we will use docker to set up the entire stack locally.

Exposing JVM Metrics from a Spring Boot Application

The Actuator works using a tool called Micrometer under the hood. It exposes application JVM metrics like cpu usage, heap usage, thread information, server disk space, application metadata, log levels, Spring beans related information, tomcat session information, open process files and more. More information on how to expose JVM metrics via actuator can be found in this article

Micrometer is a facade for JVM based applications. The actuator module of Spring Boot via the Spring Boot autoconfiguration auto-configures Micrometer and when we add the Prometheus dependency in pom, Spring Boot automatically exposes an /actuator/prometheus endpoint. Prometheus then scrapes this endpoint at a configurable fixed interval periodically

We simply need to add the below dependencies to our pom.xml 

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

And these properties to our properties file:

management.endpoints.web.exposure.include=*
management.endpoint.prometheus.enabled=true

With these two steps, Spring Boot automatically exposes a bunch of JVM metrics including heap, thread, memory pool information, garbage collection pause duration and frequency, bean info, class loading stats etc.

Setting Up Prometheus and Grafana for Visualizing JVM Metrics

services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana_data:/var/lib/grafana
depends_on:
- prometheus
volumes:
prometheus_data:
grafana_data:

Configuring Prometheus for JVM Metrics Scraping

The prometheus configuration file is typically called prometheus.yml. However the name can be anything. It is a declarative configuration to the Prometheus server as what to monitor, how often to monitor, what endpoint to scrape metrics from and where to direct alerts if certain rules are violated. A sample prometheus configuration looks like below:

global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- 'localhost:9093'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'account-service'
metrics_path: '/actuator/prometheus'
scrape_interval: 10s
static_configs:
- targets: ['x.x.x.x:8080']
#More services can be added here

Let’s break down each section:

scrape_interval

These are default settings that apply across the entire configuration. 15s means Prometheus database will poll each target every 15 seconds to collect Java metrics. 

evaluation_interval

It controls how often the Prometheus server checks our alerting rules file to decide if any alerts should be fired or not. Both are set to 15 seconds here, which is a sensible default for most Java applications.

alertmanagers

This is an optional section of the config file. It tells Prometheus where to direct alerts when a preconfigured alert rule is met. In the above example, it points to an Alertmanager service running on port 9093 of the same machine. Alertmanager is a separate and optional component of this stack and must be installed separately. It is responsible for routing alerts to configured destinations like Slack or email notifications. If we are not setting up alerts yet, we can leave this section out entirely.

Building a Grafana Dashboard for Visualizing JVM Metrics

Once our metrics are flowing into Prometheus, Grafana transforms raw time-series data into actionable dashboards. Next we focus and shift our attention to connect Grafana UI to Prometheus database. Grafana starts up on port 3000 by default. Once you sign in you will be able to see the option to add data sources. Go to Configuration then Data Sources under which you will see an option to add a data source. Select Prometheus, and enter the Prometheus server details, the URL or simply the host and port of your installation. Click Save & Test to verify connection. JVM Overview Dashboard Layout A well-structured Java monitoring dashboard typically follows a layered approach. This dashboard that you would see is highly customisable.

Fig: Open Source dashboard for visualising JVM metrics (Dashboard ID : 4701)

Use the Pre-Built Grafana UI for Java metrics library. The Grafana loans hosts public and free to use and community-contributed dashboards at grafana.com/grafana/dashboard. For Visualizing JVM metrics for Java/Spring Boot projects, certain pre-built dashboards built by community members are free to use and can be imported in JSON format using the dashboard ID available for each dashboard. We can also create our own dashboard as per our custom metrics and needs. This however is not in the scope of this article.

PromQL Queries for Visualizing JVM Metrics

JVM Memory Queries

# gives the total heap used in MB
jvm_memory_used_bytes{area="heap"} / 1024 / 1024
# gives the heap usage percentage 
( sum(jvm_memory_used_bytes{area="heap"})
/ sum(jvm_memory_max_bytes{area="heap"}) ) * 100

Thread and CPU Metrics

# gives the thread count by state 
jvm_threads_states_threads
# gives live thread count 
jvm_threads_live_threads
# returns the JVM process CPU usage
process_cpu_usage
# returns the system CPU usage
system_cpu_usage

HTTP Request Metrics (Spring Boot)

# requests per second/ RPS
rate(http_server_requests_seconds_count[5m])
# client and server API errors (4xx and 5xx)
rate(http_server_requests_seconds_count{status=~"4..|5.."}[5m])
# 95th percentile latency by API requests endpoint
histogram_quantile(0.95,
sum(rate(http_server_requests_seconds_bucket[5m])) by (le, uri)
)
# gives throughput by status HTTP code
sum(rate(http_server_requests_seconds_count[5m])) by (status)

Garbage Collection Queries

# returns GC pause time rate 
rate(jvm_gc_pause_seconds_sum[5m])
# GC pause duration 
rate(jvm_gc_pause_seconds_sum[5m]) by (cause)
# GC collections per second
rate(jvm_gc_pause_seconds_count[5m])
# mean GC pause duration in millis
rate(jvm_gc_pause_seconds_sum[5m]) / rate(jvm_gc_pause_seconds_count[5m]) *
1000

Extending Beyond Visualizing JVM Metrics: Deep Diagnostics

While open source tools like Prometheus and Grafana are excellent for visualising JVM metrics and detecting anomalies such as increased GC pauses, rising heap usage or excessive thread consumption, they primarily help us observe symptoms.

To move from observation to root cause analysis and solutions, engineers often rely on specialized diagnostic tools. For example, platforms like yCrash can analyze thread dumps using fastThread, analyse heap dumps and performance issues such as memory leaks, blocked threads, or excessive garbage collection pauses using HeapHero. Here’s a good article that details the combination of monitoring and deep diagnostics enables faster and more reliable resolution of production issues.

Conclusion: Building a Production-Grade JVM Observability Stack

The Prometheus database and Grafana UI for Java metrics stack provides one of the most complete and accessible observability solutions available for Java applications. With Micrometer handling the instrumentation complexity on the Java side, Prometheus managing metric collection and storage, and Grafana turning raw numbers into meaningful visual narratives, you gain genuine visibility into what your JVM and application are doing at all times. Starting with the defaults, the auto-configured JVM metrics from Spring Boot Actuator are immediately valuable. We can add application-specific business metrics as we need most in your system. Building dashboards that our operations’ team will actually look at, and configure alerts that fire at the right threshold to support teams with the right context. Observability is not a feature we add before release, it’s a foundation we invest in continuously. The more instrumentation you add, the faster you move from “something is wrong” to “here is exactly what is wrong and why.”

Share your Thoughts!

Up ↑

Index

Discover more from yCrash

Subscribe now to keep reading and get access to the full archive.

Continue reading