String deduplication, also known as string interning or string pooling, is a process where duplicate instances of strings are replaced with references to a single shared instance. This optimization technique helps conserve memory by minimizing the number of identical string objects stored in memory.
Instead of allocating separate memory for each identical string, string deduplication ensures that only one instance is stored. Subsequent occurrences of the same string reference this single shared instance, thereby reducing memory usage and improving efficiency.
How to apply String Deduplication programmatically?
String deduplication can also be applied programmatically using string references. The Java API provides the intern() method, which places a string into a String pool. When a string is interned, any future occurrences of that string will refer to the pooled instance rather than creating new instances. This helps in conserving memory and optimizing performance.
String fruit = new String(“apple”);
String anotherFruit = “apple”
Here anotherFruit is a reference and interned automatically i.e. the apple is taken from the string pool.
We can also intern a String calling intern() method.
String oneMoreFruit = fruit.intern();
Now fruit, anotherFruit and oneMoreFruit are equal. if we call fruit == anotherFruit and anotherFruit == oneMoreFruit, then it will print true and true
Garbage Collection and String Deduplication
Is there any relation between GC and String deduplication? Yes, when GC is performed, we can see the details about the String deduplication. In other dumps, such as heap and thread dumps, this information is not available. So in the case of GC events only, the string deduplication will be performed by the JVM.
String Deduplication in Java 8
In Java 8, string deduplication operates primarily through the JVM’s string interning mechanism, where duplicate string literals encountered during runtime are replaced with references to shared instances in the string pool. However, in Java 8, string deduplication is not enabled by default and needs to be explicitly activated using JVM flags -XX:+UseStringDeduplication
String Deduplication in Beyond Java 8 version
From Java 8 onwards, string deduplication is enabled automatically, so there is no need to manually pass the -XX:+UseStringDeduplication switch to the JVM. However, if the G1 Garbage Collector (G1GC) is used, enabling string deduplication still requires the use of the -XX:+UseStringDeduplication argument.
It is important to consider that, in some cases, excessive deduplication may impact the garbage collection process.
String Deduplication Log
So far, we have talked about the intricacies of String deduplication. But what does this mean for a developer? Where can the results of String deduplication be observed? Below is the sample image of a String deduplication process when a garbage collection event is run.
In the image above, the entry labeled “String Deduplication” provides information about the deduplication process. The log captures the total size and number of deduplicated strings processed at any given time. This process occurs during the garbage collection (GC) pause.
However, the image above shows no information about deduplication, as the values are zero. To provide a clearer example, let’s examine a real scenario.
The image below illustrates a real-time string deduplication scenario, where 3,243 strings have been deduplicated, resulting in a total size of 137.8 KB.
At a specific point in time, the JVM inspected 18,299 strings. Of these, 9,502 strings survived previous garbage collection, based on the StringDeduplicationAgeThreshold value. During the string deduplication process, 8,797 new strings were added, and the deduplication process identified 3,243 deduplicated strings, representing 17.7% of the total.
How can we enable String Deduplication data?
We need to pass a couple of arguments to the JVM. First, it is required to pass the -XX:+UseStringDeduplication. This will enable the string deduplication events to be captured in the logs.
To collect deduplication log statements, you first need to pass the following arguments to the JVM.
-Xlog:stringdedup*=debug:file=string-dedup-gc.log
We will analyze the above statement in little bit detail. The above argument is to be used if you are using a Java version greater than 8. This switch will divert all the log statements that have the DEBUG level assigned with stringdedup in the GC events to a file named string-dedup-gc.log.
Take a look at the above image to know more about it.
It is important to capture GC events in the same log file as the string deduplication logs. Use the following log file name for capturing both types of information:
-Xloggc:string-dedup-gc.log
To enable logging for string deduplication and GC events, use these JVM arguments:
For detailed string deduplication and GC logs:
-XX:+UseStringDeduplication -Xlog:stringdedup*=debug:file=string-dedup-gc.log -Xloggc:string-dedup-gc.log
In Java 8, you can use the following switch combination to capture the deduplication logs:
-XX:+UseStringDeduplication -Xloggc:string-dedup-gc.log
These settings ensure that you capture comprehensive information about both the string deduplication process and garbage collection events, providing valuable insights for analysis.
Additional String Deduplication Argument
The -XX:StringDeduplicationAgeThreshold parameter specifies the number of garbage collection cycles that must occur before string deduplication is performed. For example, if this value is set to 1, the JVM will execute string deduplication after one garbage collection event. The default value for this parameter is 3, meaning that string deduplication will be triggered after three garbage collection cycles.
Conclusion
We have thoroughly explored the string deduplication process within the JVM, a crucial memory optimization technique. String deduplication helps improve application performance and efficiency by eliminating duplicate string instances. By consolidating identical strings into a single shared instance, this process significantly reduces memory consumption. This not only alleviates memory pressure but also enhances overall application health, leading to more efficient garbage collection and potentially better performance. Implementing string deduplication ensures that your application uses memory more effectively, ultimately contributing to a more stable and responsive environment.

Share your Thoughts!