How can I mount the same persistent volume on multiple pods?

asdfaewefgav picture asdfaewefgav · Jun 10, 2020 · Viewed 7.3k times · Source

I have a three node GCE cluster and a single-pod GKE deployment with three replicas. I created the PV and PVC like so:

# Create a persistent volume for web content
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nginx-content
  labels:
    type: local
spec:
  capacity:
    storage: 5Gi
  accessModes:
   - ReadOnlyMany
  hostPath:
    path: "/usr/share/nginx/html"
--
# Request a persistent volume for web content
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nginx-content-claim
  annotations:
    volume.alpha.kubernetes.io/storage-class: default
spec:
  accessModes: [ReadOnlyMany]
  resources:
    requests:
      storage: 5Gi

They are referenced in the container spec like so:

    spec:
      containers:
      - image: launcher.gcr.io/google/nginx1
        name: nginx-container
        volumeMounts:
          - name: nginx-content
            mountPath: /usr/share/nginx/html
        ports:
          - containerPort: 80
      volumes:
      - name: nginx-content
        persistentVolumeClaim:
          claimName: nginx-content-claim

Even though I created the volumes as ReadOnlyMany, only one pod can mount the volume at any given time. The rest give "Error 400: RESOURCE_IN_USE_BY_ANOTHER_RESOURCE". How can I make it so all three replicas read the same web content from the same volume?

Answer

mario picture mario · Jun 24, 2020

First I'd like to point out one fundamental discrapency in your configuration. Note that when you use your PersistentVolumeClaim defined as in your example, you don't use your nginx-content PersistentVolume at all. You can easily verify it by running:

kubectl get pv

on your GKE cluster. You'll notice that apart from your manually created nginx-content PV, there is another one, which was automatically provisioned based on the PVC that you applied.

Note that in your PersistentVolumeClaim definition you're explicitely referring the default storage class which has nothing to do with your manually created PV. Actually even if you completely omit the annotation:

annotations:
        volume.alpha.kubernetes.io/storage-class: default

it will work exactly the same way, namely the default storage class will be used anyway. Using the default storage class on GKE means that GCE Persistent Disk will be used as your volume provisioner. You can read more about it here:

Volume implementations such as gcePersistentDisk are configured through StorageClass resources. GKE creates a default StorageClass for you which uses the standard persistent disk type (ext4). The default StorageClass is used when a PersistentVolumeClaim doesn't specify a StorageClassName. You can replace the provided default StorageClass with your own.

But let's move on to the solution of the problem you're facing.

Solution:

First, I'd like to emphasize you don't have to use any NFS-like filesystems to achive your goal.

If you need your PersistentVolume to be available in ReadOnlyMany mode, GCE Persistent Disk is a perfect solution that entirely meets your requirements.

It can be mounted in ro mode by many Pods at the same time and what is even more important by many Pods, scheduled on different GKE nodes. Furthermore it's really simple to configure and it works on GKE out of the box.

In case you want to use your storage in ReadWriteMany mode, I agree that something like NFS may be the only solution as GCE Persistent Disk doesn't provide such capability.

Let's take a closer look how we can configure it.

We need to start from defining our PVC. This step was actually already done by yourself but you got lost a bit in further steps. Let me explain how it works.

The following configuration is correct (as I mentioned annotations section can be omitted):

# Request a persistent volume for web content
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nginx-content-claim
spec:
  accessModes: [ReadOnlyMany]
  resources:
    requests:
      storage: 5Gi

However I'd like to add one important comment to this. You said:

Even though I created the volumes as ReadOnlyMany, only one pod can mount the volume at any given time.

Well, actually you didn't. I know it may seem a bit tricky and somewhat surprising but this is not the way how defining accessModes really works. In fact it's a widely misunderstood concept. First of all you cannot define access modes in PVC in a sense of putting there the constraints you want. Supported access modes are inherent feature of a particular storage type. They are already defined by the storage provider.

What you actually do in PVC definition is requesting a PV that supports the particular access mode or access modes. Note that it's in a form of a list which means you may provide many different access modes that you want your PV to support.

Basically it's like saying: "Hey! Storage provider! Give me a volume that supports ReadOnlyMany mode." You're asking this way for a storage that will satisfy your requirements. Keep in mind however that you can be given more than you ask. And this is also our scenario when asking for a PV that supports ReadOnlyMany mode in GCP. It creates for us a PersistentVolume which meets our requirements we listed in accessModes section but it also supports ReadWriteOnce mode. Although we didn't ask for something that also supports ReadWriteOnce you will probably agree with me that storage which has a built-in support for those two modes fully satisfies our request for something that supports ReadOnlyMany. So basically this is the way it works.

Your PV that was automatically provisioned by GCP in response for your PVC supports those two accessModes and if you don't specify explicitely in Pod or Deployment definition that you want to mount it in read-only mode, by default it is mounted in read-write mode.

You can easily verify it by attaching to the Pod that was able to successfully mount the PersistentVolume:

kubectl exec -ti pod-name -- /bin/bash

and trying to write something on the mounted filesystem.

The error message you get:

"Error 400: RESOURCE_IN_USE_BY_ANOTHER_RESOURCE"

concerns specifically GCE Persistent Disk that is already mounted by one GKE node in ReadWriteOnce mode and it cannot be mounted by another node on which the rest of your Pods were scheduled.

If you want it to be mounted in ReadOnlyMany mode, you need to specify it explicitely in your Deployment definition by adding readOnly: true statement in the volumes section under Pod's template specification like below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: nginx-content
      volumes:
      - name: nginx-content
        persistentVolumeClaim:
          claimName: nginx-content-claim
          readOnly: true

Keep in mind however that to be able to mount it in readOnly mode, first we need to pre-populate such volume with data. Otherwise you'll see another error message, saying that unformatted volume cannot be mounted in read only mode.

The easiest way to do it is by creating a single Pod which will serve only for copying data which was already uploaded to one of our GKE nodes to our destination PV.

Note that pre-populating PersistentVolume with data can be done in many different ways. You can mount in such Pod only your PersistentVolume that you will be using in your Deployment and get your data using curl or wget from some external location saving it directly on your destination PV. It's up to you.

In my example I'm showing how to do it using additional local volume that allows us to mount into our Pod a directory, partition or disk (in my example I use a directory /var/tmp/test located on one of my GKE nodes) available on one of our kubernetes nodes. It's much more flexible solution than hostPath as we don't have to care about scheduling such Pod to particular node, that contains the data. Specific node affinity rule is already defined in PersistentVolume and Pod is automatically scheduled on specific node.

To create it we need 3 things:

StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

PersistentVolume definition:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: example-pv
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /var/tmp/test
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - <gke-node-name>

and finally PersistentVolumeClaim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 10Gi
  storageClassName: local-storage

Then we can create our temporary Pod which will serve only for copying data from our GKE node to our GCE Persistent Disk.

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
    - name: myfrontend
      image: nginx
      volumeMounts:
      - mountPath: "/mnt/source"
        name: mypd
      - mountPath: "/mnt/destination"
        name: nginx-content
  volumes:
    - name: mypd
      persistentVolumeClaim:
        claimName: myclaim
    - name: nginx-content
      persistentVolumeClaim:
        claimName: nginx-content-claim

Paths you can see above are not really important. The task of this Pod is only to allow us to copy our data to the destination PV. Eventually our PV will be mounted in completely different path.

Once the Pod is created and both volumes are successfully mounted, we can attach to it by running:

kubectl exec -ti my-pod -- /bin/bash

Withing the Pod simply run:

cp /mnt/source/* /mnt/destination/

That's all. Now we can exit and delete our temporary Pod:

kubectl delete pod mypod

Once it is gone, we can apply our Deployment and our PersistentVolume finally can be mounted in readOnly mode by all the Pods located on various GKE nodes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: nginx-content
      volumes:
      - name: nginx-content
        persistentVolumeClaim:
          claimName: nginx-content-claim
          readOnly: true

Btw. if you are ok with the fact that your Pods will be scheduled only on one particular node, you can give up on using GCE Persistent Disk at all and switch to the above mentioned local volume. This way all your Pods will be able not only to read from it but also to write to it at the same time. The only caveat is that all those Pods will be running on a single node.