CloudNativePG Bug Recovery From Backup Object Fails With Missing Azure Credentials Error

July 14, 2025 by StackCamp Team 89 views

[Bug] Recovery from Backup Object is Not Working

Introduction

This article addresses a critical bug encountered while attempting to recover a PostgreSQL cluster from a backup using CloudNativePG. The issue arises when restoring from a backup object, resulting in an error related to missing Azure credentials, even though Azure storage might not be directly involved. This article delves into the details of the problem, the steps to reproduce it, and potential solutions or workarounds. This comprehensive guide aims to help users and developers understand the bug, troubleshoot similar issues, and ensure smooth PostgreSQL cluster recovery.

Background

CloudNativePG is a powerful operator that simplifies the management of PostgreSQL clusters in Kubernetes environments. One of its key features is the ability to perform backups and restore clusters from these backups. However, users may encounter issues during the recovery process, as highlighted in this bug report. Specifically, the recovery process fails with an error message indicating missing Azure credentials, which can be misleading if the backup storage doesn't actually rely on Azure.

Problem Description

The core problem is that the recovery process fails with an "missing Azure credentials" error when attempting to restore a PostgreSQL cluster from a backup. This issue occurs even when the backup is stored in a non-Azure environment, suggesting a potential flaw in the backup restoration logic within CloudNativePG. The error message misdirects users, making it difficult to diagnose the root cause of the problem. This can lead to significant downtime and data loss if not addressed promptly.

Steps to Reproduce

To reproduce this bug, follow these steps:

Create a PostgreSQL cluster using CloudNativePG.
Take several backups of the cluster using the kubectl command.

$ kubectl get backups -n postgresql
...
daily-20250705030200                2d8h    quizservicedb   plugin   completed   
daily-20250705040200                2d7h    quizservicedb   plugin   completed   
daily-20250705050200                2d6h    quizservicedb   plugin   completed   
daily-20250705060200                2d5h    quizservicedb   plugin   completed   
daily-20250705070200                2d4h    quizservicedb   plugin   completed   
daily-20250705080200                2d3h    quizservicedb   plugin   completed   
daily-20250705090200                2d2h    quizservicedb   plugin   completed   
daily-20250705100200                2d1h    quizservicedb   plugin   completed   
daily-20250705110200                2d      quizservicedb   plugin   completed   
daily-20250705120200                47h     quizservicedb   plugin   completed   
...

Attempt to restore the cluster from one of the backups by applying a YAML manifest similar to the one below:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: quizservicedb-recover-from-backup
spec:
  instances: 3

  primaryUpdateStrategy: unsupervised

  storage:
    size: 10Gi
    pvcTemplate:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: nvme.network-drives.csi.timeweb.cloud
      volumeMode: Filesystem
  monitoring:
    enablePodMonitor: true
  enableSuperuserAccess: true


  bootstrap:
    recovery:
      backup:
        name: daily-20250707030200

Observe the pods. They will likely enter an Error state.

$ kubectl get pods -n postgresql
NAME                                                      READY   STATUS    RESTARTS   AGE
...
quizservicedb-recover-from-backup-1-full-recovery-4mqzg   0/1     Error     0          7m52s
quizservicedb-recover-from-backup-1-full-recovery-68jqf   0/1     Error     0          13m
quizservicedb-recover-from-backup-1-full-recovery-cfd6l   0/1     Error     0          19m
quizservicedb-recover-from-backup-1-full-recovery-dxkkm   0/1     Error     0          17m
quizservicedb-recover-from-backup-1-full-recovery-hkb4c   0/1     Error     0          18m
quizservicedb-recover-from-backup-1-full-recovery-sttff   0/1     Error     0          18m
quizservicedb-recover-from-backup-1-full-recovery-trgrs   0/1     Error     0          16m

Check the logs of the failing pod. You should see an error message similar to the one below:

kubectl logs -n postgresql quizservicedb-recover-from-backup-1-full-recovery-4mqzg

Defaulted container "full-recovery" out of: full-recovery, bootstrap-controller (init)
{"level":"info","ts":"2025-07-07T11:27:08.64114153Z","msg":"Starting webserver","logging_pod":"quizservicedb-recover-from-backup-1-full-recovery","address":"localhost:8010","hasTLS":false}
{"level":"error","ts":"2025-07-07T11:27:08.747756944Z","msg":"Error while restoring a backup","logging_pod":"quizservicedb-recover-from-backup-1-full-recovery","error":"missing Azure credentials","stacktrace":"github.com/cloudnative-pg/machinery/pkg/log.(*logger).Error\n\tpkg/mod/github.com/cloudnative-pg/machinery@v0.2.0/pkg/log/log.go:125\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.restoreSubCommand\n\tinternal/cmd/manager/instance/restore/restore.go:79\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.(*restoreRunnable).Start\n\tinternal/cmd/manager/instance/restore/restore.go:62\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/manager/runnable_group.go:226"}
{"level":"info","ts":"2025-07-07T11:27:08.747879249Z","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2025-07-07T11:27:08.747893152Z","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2025-07-07T11:27:08.748051396Z","msg":"Webserver exited","logging_pod":"quizservicedb-recover-from-backup-1-full-recovery","address":"localhost:8010"}
{"level":"info","ts":"2025-07-07T11:27:08.748076961Z","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2025-07-07T11:27:08.748201189Z","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2025-07-07T11:27:08.74824993Z","msg":"Stopping and waiting for HTTP servers"}
{"level":"info","ts":"2025-07-07T11:27:08.748267785Z","msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"error","ts":"2025-07-07T11:27:08.7482979Z","msg":"restore error","logging_pod":"quizservicedb-recover-from-backup-1-full-recovery","error":"while restoring cluster: missing Azure credentials","stacktrace":"github.com/cloudnative-pg/machinery/pkg/log.(*logger).Error\n\tpkg/mod/github.com/cloudnative-pg/machinery@v0.2.0/pkg/log/log.go:125\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.NewCmd.func1\n\tinternal/cmd/manager/instance/restore/cmd.go:101\ngithub.com/spf13/cobra.(*Command).execute\n\tpkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1015\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tpkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1148\ngithub.com/spf13/cobra.(*Command).Execute\n\tpkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1071\nmain.main\n\tcmd/manager/main.go:71\nruntime.main\n\t/opt/hostedtoolcache/go/1.24.3/x64/src/runtime/proc.go:283"}

Analysis of the Error

The error message "missing Azure credentials" suggests that the CloudNativePG operator is attempting to access Azure storage, even if the backup is stored elsewhere. This could be due to a misconfiguration or a bug in the operator's backup restoration logic. It's essential to verify the backup configuration and ensure that the correct storage credentials are provided if Azure storage is indeed being used. If Azure is not in use, this error points to an internal issue within the CloudNativePG operator that needs to be addressed.

Impact

This bug has a significant impact on the reliability of the backup and recovery process in CloudNativePG. The inability to restore a cluster from a backup can lead to data loss and prolonged downtime, severely affecting applications relying on the PostgreSQL database. The misleading error message further complicates the troubleshooting process, potentially delaying recovery efforts.

Affected Users

This issue affects users who rely on CloudNativePG for managing PostgreSQL clusters and performing backup and recovery operations. Specifically, it impacts those who attempt to restore a cluster from a backup object, regardless of the actual storage location of the backup. Users who do not use Azure storage might find the error message particularly confusing and frustrating.

Proposed Solutions and Workarounds

While a definitive solution requires a fix in the CloudNativePG operator, there are several potential workarounds and troubleshooting steps that users can take:

Verify Backup Configuration: Double-check the backup configuration to ensure that the correct storage settings are specified. If Azure storage is not intended, ensure that the configuration does not include any Azure-related settings.
Check Logs for More Details: Examine the logs of the CloudNativePG operator and the failing pods for more detailed error messages. These logs might provide additional context and help pinpoint the root cause of the issue.
Try Restoring to a Different Cluster: Attempt to restore the backup to a different cluster or namespace. This can help determine if the issue is specific to the target cluster or a more general problem.
Use a Different Backup: If multiple backups are available, try restoring from a different backup. This can help identify if the issue is specific to a particular backup.
Contact Community Support: Reach out to the CloudNativePG community or support channels for assistance. Other users might have encountered similar issues and can provide valuable insights.

Root Cause Analysis (Potential)

Based on the error message and the context, a potential root cause could be an incorrect or incomplete implementation of the backup restoration logic within the CloudNativePG operator. The operator might be attempting to access Azure storage by default or due to a misconfiguration, even when it's not required. Another possibility is a bug in the error handling, where an unrelated issue is incorrectly reported as a missing Azure credentials problem.

Investigation Steps

To further investigate this issue, the following steps can be taken:

Code Review: Review the CloudNativePG operator's source code, particularly the backup restoration logic, to identify any potential issues.
Debugging: Debug the operator during the restoration process to trace the execution flow and pinpoint the exact location where the error occurs.
Testing: Implement additional test cases to cover various backup and restore scenarios, including those involving different storage providers.

Conclusion

The "missing Azure credentials" error during backup recovery in CloudNativePG is a critical issue that can prevent successful cluster restoration. This article has detailed the steps to reproduce the bug, its impact, and potential workarounds. While a permanent fix requires modifications to the CloudNativePG operator, users can take several steps to mitigate the issue and ensure data availability. By understanding the problem and its potential causes, users and developers can work together to resolve this bug and improve the reliability of CloudNativePG.

This comprehensive analysis aims to provide valuable insights into the bug, enabling users to effectively troubleshoot and recover their PostgreSQL clusters. Addressing this issue is crucial for maintaining the integrity and availability of data in cloud-native environments.