Chaos Engineering – Storage Saturation

Production environments frequently face space shortages due to growing disk usage by files like logs, databases, and output files. Failing to monitor and manage these files can lead to application crashes. The article explores simulating disk storage issues, offering a sample program, and strategies for diagnosing storage saturation, including manual and automated approaches using monitoring tools.

Chaos Engineering – Network Lag

The article discusses the impact of network delays on application performance and presents a method to simulate network lag using BuggyApp as a proxy server. It provides a sample program to introduce delays in server responses and explains manual and automated approaches to diagnose network issues. The article highlights the importance of addressing network performance problems in chaos engineering.

Java NIO – OutOfMemoryError

Java NIO allows high-performance non-blocking I/O, enhancing concurrency and efficiency. However, running a Spring Boot application with Java 11 led to frequent 'OutOfMemoryError' issues. Upgrading to Java 17 and increasing Direct Buffer Memory allocation improved performance, allowing for significantly more connections before errors occurred. Solutions include optimizing memory size or upgrading Java versions.

Optimizing the Capacity of a HashMap

This article discusses how to optimize memory allocation in HashMaps by understanding the differences between allocation and mapping capacity. It emphasizes the importance of choosing the correct initial capacity, ideally a power of two, to mitigate space overhead and prevent performance degradation. Several formulas for calculating proper capacity are reviewed.

Actions on OutOfMemoryErrors

OutOfMemoryErrors in applications occur when memory allocation fails. This article discusses effective practices for analyzing such errors, including automatic heap dumps and using JVM parameters. It highlights script usage for automatic actions upon encountering errors and emphasizes the importance of proactive monitoring and automated incident reporting to reduce application downtime.

Chaos Engineering – DB Connection Leak

All modern applications connect with storage, such as a database or cache. Database connection leak is commonly observed in modern applications, which can result in connection leaks that can lead to production outages. In our series of chaos engineering articles, we have been learning how to simulate various performance problems. In this post, let's discuss... Continue Reading →

Chaos Engineering – File Connection Leak

Many Java applications still use files for importing and exporting data. If the connections to these files are not properly managed, it can lead to a significant number of connections leaking, causing the application to slow down or even crash. In our series of chaos engineering articles, we have been learning how to simulate various... Continue Reading →

Troubleshooting App unresponsiveness due to Oracle DB

The application experienced a slowdown when connecting to Oracle RAC due to resource constraints, impacting response times. Troubleshooting involved using the yCrash script for artifact collection and analysis, revealing that 46% of threads were blocking on database calls. The Oracle DBA confirmed resource issues, which, once resolved, restored normal application performance.

Up ↑