Improving Performance in Open-Source Libraries, 3rd-Party Frameworks, and Proprietary Software: A Case Study

Page Contents

Hello Java Devs,

In this article we will discuss how to optimize and tune 3rd party code, including open/closed-source libraries, 3rd party frameworks and APIs.

Using 3rd Party Software

Let’s first answer the question, Why do we need others’ code?

The reasons could be:

Don’t reinvent the wheel: use others’ well tested and mature code wherever possible;
Save time and cost;
Your code may not be sufficient, so you need other libraries;
To make your software stack reliable, you need to stack up time-tested libraries/tools;
Keep your own codebase smaller;
Enhance user experience.

So rarely do we come across software that is totally written from scratch and performs all functions it needs on its own.

3rd party software can come in many flavors and licenses, for example:

Open source – software that is developed and maintained through open collaboration, and is available for anyone to use, study, change, and redistribute
Closed source/proprietary – the source code is not shared, and its license may not allow us to meddle with its code and arrive at its derivatives.
APIs – you can call its functions to accomplish tasks, but the source code might not be available.

Strategies to optimize such software will differ based on whether we have the ability to meddle with its code, and how tightly it’s boxed.

Overcoming performance challenges in software libraries and frameworks

Factors Affecting Performance

There can be many, but common factors are inefficient algorithms, excessive memory usage, slow I/O operations, poor threading, and API inefficiencies.

Performance Metrics

We measure performance via key metrics like response time, throughput, CPU usage, memory footprint, and network latency.

Types of Performance Solutions

Some mature practices to improve software performance are:

Optimization – tuning code to suit your needs;
Caching – reducing access to slower storage layers to speed up data retrieval;
Refactoring – improving the quality of code by bug fixing, removing redundancy, and making it more readable, structured, and maintainable;
Scaling – adding more computing resources when demand increases, distributing the load across multiple servers, delegating tasks to other software, continuous monitoring etc.

The broader impact of performance improvement comes as better user experience, increased efficiency and profits.

We encourage developers to proactively monitor, test, and optimize for long-term performance.

Case Study – Optimizing an OPEN SOURCE library

Lets take Apache PDFBox as an example.

We chose PDFBox because PDF processing and Digital signatures in PDF are quite prevalent now. Typical uses are online payments, data analytics, and academic research. A lot of data science/analytics reports reside in PDF files. Most contracts, agreements and legal procedures are being done online.

Hence the need to create, read, process, sign and reproduce PDF files.

When we first run our PDFBox-based interactive digital signature program, it loads a huge PDF ( legal agreements generally have many pages), takes signatures and some more information from the user, and then creates a signature dictionary within the PDF to store the signatures.

Finally it creates a whole new document with the original content + signature(s) + purpose of signature + signatories’ details.

PDFs are complex documents that contain images, text, tables, and dictionaries as well as formatting and ordering data. When PDFBox loads PDFs, the initial memory footprint is high because of on-demand parsing. If we load many/heavy PDFs, the memory footprint of our java process could increase exponentially. Sometimes it may take up all the available memory.

In the report below, you can see that a lot of memory is being used by old generation objects, as they are required to be held in memory for a longer time. The report was produced by GCeasy, which lets us quickly analyze our garbage collection logs.

Fig: Analysis of Memory Usage

Take a note of the average and max GC pause times in the image below, and the fact that GC runs twice to provide memory to Java processes.

Obviously this would lead to further decreased throughput, as CPU cycles are used in garbage collection rather than running the application.

Fig: Key Performance Indicators

Your results may be slightly different, but they are likely to follow the same pattern if your computer has limited RAM.

Now let’s try and improve the situation.

Below are some steps you can implement to optimize an open source library, and in fact software in general.

1. Use software specific settings: Developers of open source software are mindful of the fact that the code they write could be used in many different scenarios with varying availability of computing resources. Hence they themselves provide switches/settings/properties that can be used to fine-tune the APIs behavior.

PDFbox provides org.apache.pdfbox.io.MemoryUsageSetting class to let you specify whether and to what extent either the main program memory or temporary files should be used for buffering streams etc.

If you do have a lot of memory and want to speed things up, you could use the method setupMainMemoryOnly() OR setupMainMemoryOnly(long maxMainMemoryBytes) to setup buffering memory usage to only use main-memory (no temporary files) with unrestricted/restricted size.

But if your digital signature routines are just a small part of a larger Java application, where other components also require a lot of memory, you can direct PDFBox to use temporary files instead for buffering memory.

public static MemoryUsageSetting setupTempFileOnly(long maxStorageBytes)

2. Use smarter ways to read/write data, process information and format the output.

For example, instead of reading a file as a byte array, i.e.

byte[] pdfByte = inFile.getBytes();
PDDocument pdfDoc = PDDocument.load(new ByteArrayInputStream(pdfByte));

Create a file as java.io.file, then read it as PDDocument.load(inFile)

This will avoid an unnecessary byte[] copy and reduce one step from your algorithm, as well as being a bit lighter on the memory. It will have a very small effect, but as we keep shaving off milliseconds further by the steps elaborated below, we can achieve substantial gains in speed and reduced memory footprint of the program.

Next, dig in deeper and start knowing how your open-source software is structured and how the control flows inside it.

3. Replace java.io with the newer and faster java.nio

( A full implementation is not given here, as that’s not the topic of our discussion).

Moreover,when dealing with PDF signatures, the processing becomes multi-step, as only one signature at a time may be added to a document. To sign several times, each time we have to load a document, add a signature, save the incremental file and close again.

Hence the need to make the PDF reads faster.

The explanation below shows how we can do that.

PDFs are represented via objects of the PDDocument class.

Each PDDocument requires a parameter, which is an object that implements interface RandomAccessRead.

RandomAccessBuffer is a concrete class that implements RandomAccessRead interface.

Using the given InputStream, you can create a new RandomAccessReadBuffer, which uses NIO for faster reads.
RandomAccessBuffer uses a simple byte array to read the PDF contents, i.e.

private byte[] currentBuffer;

You can rewrite a bit of code here, and use java.nio instead. A buffer that uses NIO is faster and more efficient, especially when the PDF files are huge in size or large in number.

e.g.

import java.nio.ByteBuffer; import java.nio.charset.StandardCharsets;
protected ByteBuffer currentBuffer; //define your buffer

Most of the code will run as it is, but use nio.ByteBuffer instead of the plain old byte[].

@Override
    public int read() throws IOException
    {
        checkClosed();
        if (pointer >= this.size)
        {
            return -1;
        }
        if (currentBufferPointer >= chunkSize)

     {
            if (bufferListIndex >= bufferListMaxIndex)
            {
                return -1;
            }
            else
            {
                currentBuffer = bufferList.get(++bufferListIndex);
          currentBufferPointer = 0;
            }
        }
        pointer++;
        return currentBuffer.get(currentBufferPointer++) & 0xff;
    }

Also, since we’re now using NIO, you might want to replace the call to “System.arraycopy“.

REPLACE:

private int readRemainingBytes(byte[] b, int offset, int length) throws IOException
    {
        if (pointer >= size)
        {
            return 0;
        }
        int maxLength = (int) Math.min (length, size-pointer);
        int remainingBytes = chunkSize - currentBufferPointer;
        // no more bytes left
        if (remainingBytes == 0)
        {
            return 0;
        }
        if  (maxLength >= remainingBytes)
        {
            // copy the remaining bytes from the current buffer
            System.arraycopy(currentBuffer, currentBufferPointer, b, offset, remainingBytes);
            // end of file reached
            currentBufferPointer += remainingBytes;
            pointer += remainingBytes;
            return remainingBytes;
        }
        else
     {            // copy the remaining bytes from the whole buffer
            System.arraycopy(currentBuffer, currentBufferPointer, b, offset, maxLength);
            // end of file reached
            currentBufferPointer += maxLength;
            pointer += maxLength;
            return maxLength;
        }
    }

WITH:

private int readRemainingBytes(byte[] b, int offset, int length)
    {
        if (pointer >= size)
        {
            return -1;
        }
        int maxLength = (int) Math.min(length, size - pointer);
        int remainingBytes = chunkSize - currentBufferPointer;
        // no more bytes left
        if (remainingBytes == 0)
        {
            return -1;
        }
        if (maxLength >= remainingBytes)
        {
            // copy the remaining bytes from the current buffer
            currentBuffer.position(currentBufferPointer);
            currentBuffer.get(b, offset, remainingBytes);
            // end of file reached
            currentBufferPointer += remainingBytes;
            pointer += remainingBytes;
            return remainingBytes;
        }
        else
     {
            // copy the remaining bytes from the whole buffer
            currentBuffer.position(currentBufferPointer);
            currentBuffer.get(b, offset, maxLength); 
            currentBufferPointer += maxLength;
            pointer += maxLength;
            return maxLength;
        }
    }

Let’s look at the reasoning behind this. Working with buffers: java.nio.ByteBuffer and other buffer types (CharBuffer, IntBuffer, etc.) provides native support for bulk data operations like put() and get() methods. These are optimized for non-blocking I/O and compatibility with channels.

System.arraycopy cannot work directly with buffers or channels, so it requires conversion, which could add overhead.

Note:- If implementing the BufferedFileReader, we would also need to –

import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;

Similarly, we can change the classes used to “write” the PDFs.

4. If using many PDF files together, or a few heavy PDF files, then instead of bulk copy use direct alpha copying.

Now, when we run the same program again, we get the results shown in the report below. GC runs just once and for a smaller fraction of time, giving more CPU cycles to the actual application.

Fig: Memory Allocation After Suggested Changes

You can see above that old generation objects take up less memory, and young generation objects take more memory space, as the task is completed.

This indicates that most short-lived objects are being cleaned up efficiently in the young generation, minimizing the need for slower full garbage collections in the old generation.

As shown below, the throughput increases to 99.257% ( from 99.165%).

Fig: Garbage Collection Statistics

GC runs only once and for a shorter interval. The average and max GC pause time decrease significantly.

These might seem like small improvements, but when you start processing multiple PDFs simultaneously, the in-process PDFs are at different stages of signing, and some memory is used by other programs as well. These improvements will show a significant effect on application response and throughput.

GCeasy makes tracking these changes easy by providing fine-grained details of the ratio of memory used to total memory allocated, the milliseconds used by GC, and the throughput.

GCeasy allows us to drill down and see finer details that let us evaluate our solutions.

Conclusion

We have discussed the importance of performance tuning, its various facets and how to manage them. We have also discussed how we can improve the performance of different software types in your application stack.

We should always look for settings/switches/properties provided by the original developers to fine tune software to suit your needs. If that does not suffice, we can use refactoring.

It’s always a good idea to use powerful and versatile automated tools such as GCeasy for performance monitoring, especially if they employ modern machine learning algorithms and provide effective suggestions.

Happy Coding !

Improving Performance in Open-Source Libraries, 3rd-Party Frameworks, and Proprietary Software: A Case Study

Using 3rd Party Software

Factors Affecting Performance

Performance Metrics

Types of Performance Solutions

Case Study – Optimizing an OPEN SOURCE library

Conclusion

You may also like

Share your Thoughts!Cancel reply

About

Popular Topics

Troubleshooting Tools

Using 3rd Party Software

Factors Affecting Performance

Performance Metrics

Types of Performance Solutions

Case Study – Optimizing an OPEN SOURCE library

Conclusion

You may also like

Share your Thoughts!Cancel reply

About

Popular Topics

Troubleshooting Tools

Discover more from yCrash