At Banzai Cloud we run and deploy containerized applications to Pipeline, our PaaS. Those of you who (like us) run Java applications inside Docker, have probably already come across the problem of JVMs inaccurately detecting available memory when running inside a container. Instead of accurately detecting the memory available in a Docker container, JVMs see the available memory of the machine. This can lead to cases wherein applications that run inside containers are killed whenever they try to use an amount of memory that exceeds the limits of the Docker container.
A follow up and complementary post
The incorrect detection of available memory by JVMs is associated with Linux tools/libs that were created for returning system resource information (e.g. /proc/meminfo
, /proc/vmstat
) before cgroups even existed. These return the resource information of a host (whether that host is a physical or virtual machine).
Let’s explore this process in action by observing how a simple Java application allocates a percentage of memory while running inside a Docker container. We’re going to deploy the application as a Kubernetes pod (using Minikube) to illustrate how the issue is also present on Kubernetes, which is unsurprising, since Kubernetes uses Docker as a container engine.
package com.banzaicloud;
import java.util.Vector;
public class MemoryConsumer {
private static float CAP = 0.8f; // 80%
private static int ONE_MB = 1024 * 1024;
private static Vector cache = new Vector();
public static void main(String[] args) {
Runtime rt = Runtime.getRuntime();
long maxMemBytes = rt.maxMemory();
long usedMemBytes = rt.totalMemory() - rt.freeMemory();
long freeMemBytes = rt.maxMemory() - usedMemBytes;
int allocBytes = Math.round(freeMemBytes * CAP);
System.out.println("Initial free memory: " + freeMemBytes/ONE_MB + "MB");
System.out.println("Max memory: " + maxMemBytes/ONE_MB + "MB");
System.out.println("Reserve: " + allocBytes/ONE_MB + "MB");
for (int i = 0; i < allocBytes / ONE_MB; i++){
cache.add(new byte[ONE_MB]);
}
usedMemBytes = rt.totalMemory() - rt.freeMemory();
freeMemBytes = rt.maxMemory() - usedMemBytes;
System.out.println("Free memory: " + freeMemBytes/ONE_MB + "MB");
}
}
We use a Docker build file to create the image that contains the jar
that’s built from the Java code above. We need this Docker image in order to deploy the application as a Kubernetes Pod.
Dockerfile
FROM openjdk:8-alpine
ADD memory_consumer.jar /opt/local/jars/memory_consumer.jar
CMD java $JVM_OPTS -cp /opt/local/jars/memory_consumer.jar com.banzaicloud.MemoryConsumer
docker build -t memory_consumer .
Now that we have the Docker image, we need to create a pod definition to deploy the application to kubernetes:
memory-consumer.yaml
apiVersion: v1
kind: Pod
metadata:
name: memory-consumer
spec:
containers:
- name: memory-consumer-container
image: memory_consumer
imagePullPolicy: Never
resources:
requests:
memory: "64Mi"
limits:
memory: "256Mi"
restartPolicy: Never
This pod definition ensures that the container is scheduled to a node that has at least 64MB of free memory and that it will not be allowed to use more than 256MB of memory.
$ kubectl create -f memory-consumer.yaml
pod "memory-consumer" created
Output of the pod:
$ kubectl logs memory-consumer
Initial free memory: 877MB
Max memory: 878MB
Reserve: 702MB
Killed
$ kubectl get po --show-all
NAME READY STATUS RESTARTS AGE
memory-consumer 0/1 OOMKilled 0 1m
The Java application that was running inside the container detected 877MB of free memory and consequentially attempted to reserve 702MB of it. Since we previously limited the maximum memory usage to 256MB
, the container was killed.
To avoid this outcome, we need to instruct the JVM as to the correct maximum amount of memory it can reserve. We do that via the -Xmx
option. We need to modify our pod definition to pass an -Xmx
setting through the JVM_OPTS
env variable to the Java application in the container.
memory-consumer.yaml
apiVersion: v1
kind: Pod
metadata:
name: memory-consumer
spec:
containers:
- name: memory-consumer-container
image: memory_consumer
imagePullPolicy: Never
resources:
requests:
memory: "64Mi"
limits:
memory: "256Mi"
env:
- name: JVM_OPTS
value: "-Xms64M -Xmx256M"
restartPolicy: Never
$ kubectl delete pod memory-consumer
pod "memory-consumer" deleted
$ kubectl get po --show-all
No resources found.
$ kubectl create -f memory_consumer.yaml
pod "memory-consumer" created
$ kubectl logs memory-consumer
Initial free memory: 227MB
Max memory: 228MB
Reserve: 181MB
Free memory: 50MB
$ kubectl get po --show-all
NAME READY STATUS RESTARTS AGE
memory-consumer 0/1 Completed 0 1m
This time the application ran successfully; it detected the correct available memory we passed via -Xmx256M
and so did not hit the memory limit memory: "256Mi"
specified in the pod definition.
While this solution works, it requires that the memory limit be specified in two places: once as a limit for the container memory: "256Mi"
, and once in the option that is passed to -Xmx256M
. It would be much more convenient if the JVM accurately detected the maximum amount of available memory based on the memory: "256Mi"
setting, wouldn’t it?
Well, there’s a change in Java 9 that makes it Docker aware, which has been backported to Java 8.
In order to make use of this feature, our pod definition has to look like this:
memory-consumer.yaml
apiVersion: v1
kind: Pod
metadata:
name: memory-consumer
spec:
containers:
- name: memory-consumer-container
image: memory_consumer
imagePullPolicy: Never
resources:
requests:
memory: "64Mi"
limits:
memory: "256Mi"
env:
- name: JVM_OPTS
value: "-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1 -Xms64M"
restartPolicy: Never
$ kubectl delete pod memory-consumer
pod "memory-consumer" deleted
$ kubectl get pod --show-all
No resources found.
$ kubectl create -f memory_consumer.yaml
pod "memory-consumer" created
$ kubectl logs memory-consumer
Initial free memory: 227MB
Max memory: 228MB
Reserve: 181MB
Free memory: 54MB
$ kubectl get po --show-all
NAME READY STATUS RESTARTS AGE
memory-consumer 0/1 Completed 0 50s
Please note the -XX:MaxRAMFraction=1
through which we tell the JVM how much available memory to use as a max heap size.
Having a max heap size set that takes into account the available memory limit, either through -Xmx
or dynamically with UseCGroupMemoryLimitForHeap
, is important since it helps notify the JVM when memory usage is approaching its limit in order that it should free up space. If the max heap size is incorrect (exceeds the available memory limit), the JVM may blindly hit the limit without trying to free up memory, and the process will be OOMKilled.
The java.lang.OutOfMemoryError
error is different. It indicates that the max heap size is not enough to hold all live objects in memory. If that’s the case, the max heap size needs to be increased via -Xmx
or, if UseCGroupMemoryLimitForHeap
is being used, via the memory limit of the container.
The use of cgroups
is extremely practical when running JVM-based workloads on k8s. We touched on this subject in our Apache Zeppelin notebook post, which further highlights the benefits of using a JVM configuration.