Modern infrastructure depends on more than just powerful hardware. It requires intelligent deployment of load balancing and caching tools to ensure sites are fast, reliable, and minimize operational headaches.
At tipi.work, we specialize in designing and implementing high-performance distributed systems. Today, we want to share a compelling case where our expertise turned a system-crippling upgrade into a smooth, non-event.
We were called in to address a critical performance issue during an application upgrade for a customer.
The Scenario: the customer runs a Tomcat-based Java Web client-server application. A new server version required hundreds of client agents to download a mandatory update to maintain compatibility. This mass download event—hundreds of simultaneous requests—caused the public-facing Web interface to become inaccessible for hours, even though the server application itself was technically operating.
Our Analysis shows the numbers quickly revealed the bottleneck:
Crucially, this data was static content (a set of files on the file system) and not dynamic content that Tomcat needed to generate.
The Legacy File-Serving Flow:
This process tied up Tomcat’s resources, rendering it incapable of serving the dynamic web interface, leading to the system-wide outage.
The core question was clear: how do we offload the static content delivery to free up Tomcat?
Our initial thought was to use static file serving outside of Tomcat, with an nginx filesystem-based cache. However, after measuring the I/O operations on a Linux VM, we decided to pursue a faster, less disk-reliant approach.
The fastest storage available to us was the RAM. We confirmed that the VM had sufficient available memory to host a set of files of an update, and decided to use a memory-based caching solution.
While nginx integrates well with memcached, we chose to move forward with redis due to its built-in replication and the ability to create a data dump on a disk for persistence.
The tipi.work stack for the fix:
The solution was thoroughly tested and demonstrated a tremendous performance shift upon deployment: previously, the Web UI was essentially unavailable for the hours required to complete the agent upgrade process. With the tipi.work solution deployed, the agents' update process no longer affects the Web UI availability.
By intelligently offloading the 1.2 TB of static content from the Java application server and serving it from high-speed RAM via nginx, we restored full web service availability and ensured a fast, reliable, and non-disruptive upgrade experience for the client.