CloudNativePG Bug Recovery From Backup Object Fails With Missing Azure Credentials Error
Introduction
This article addresses a critical bug encountered while attempting to recover a PostgreSQL cluster from a backup using CloudNativePG. The issue arises when restoring from a backup object, resulting in an error related to missing Azure credentials, even though Azure storage might not be directly involved. This article delves into the details of the problem, the steps to reproduce it, and potential solutions or workarounds. This comprehensive guide aims to help users and developers understand the bug, troubleshoot similar issues, and ensure smooth PostgreSQL cluster recovery.
Background
CloudNativePG is a powerful operator that simplifies the management of PostgreSQL clusters in Kubernetes environments. One of its key features is the ability to perform backups and restore clusters from these backups. However, users may encounter issues during the recovery process, as highlighted in this bug report. Specifically, the recovery process fails with an error message indicating missing Azure credentials, which can be misleading if the backup storage doesn't actually rely on Azure.
Problem Description
The core problem is that the recovery process fails with an "missing Azure credentials" error when attempting to restore a PostgreSQL cluster from a backup. This issue occurs even when the backup is stored in a non-Azure environment, suggesting a potential flaw in the backup restoration logic within CloudNativePG. The error message misdirects users, making it difficult to diagnose the root cause of the problem. This can lead to significant downtime and data loss if not addressed promptly.
Steps to Reproduce
To reproduce this bug, follow these steps:
- Create a PostgreSQL cluster using CloudNativePG.
- Take several backups of the cluster using the
kubectl
command.
$ kubectl get backups -n postgresql
...
daily-20250705030200 2d8h quizservicedb plugin completed
daily-20250705040200 2d7h quizservicedb plugin completed
daily-20250705050200 2d6h quizservicedb plugin completed
daily-20250705060200 2d5h quizservicedb plugin completed
daily-20250705070200 2d4h quizservicedb plugin completed
daily-20250705080200 2d3h quizservicedb plugin completed
daily-20250705090200 2d2h quizservicedb plugin completed
daily-20250705100200 2d1h quizservicedb plugin completed
daily-20250705110200 2d quizservicedb plugin completed
daily-20250705120200 47h quizservicedb plugin completed
...
- Attempt to restore the cluster from one of the backups by applying a YAML manifest similar to the one below:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: quizservicedb-recover-from-backup
spec:
instances: 3
primaryUpdateStrategy: unsupervised
storage:
size: 10Gi
pvcTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nvme.network-drives.csi.timeweb.cloud
volumeMode: Filesystem
monitoring:
enablePodMonitor: true
enableSuperuserAccess: true
bootstrap:
recovery:
backup:
name: daily-20250707030200
- Observe the pods. They will likely enter an
Error
state.
$ kubectl get pods -n postgresql
NAME READY STATUS RESTARTS AGE
...
quizservicedb-recover-from-backup-1-full-recovery-4mqzg 0/1 Error 0 7m52s
quizservicedb-recover-from-backup-1-full-recovery-68jqf 0/1 Error 0 13m
quizservicedb-recover-from-backup-1-full-recovery-cfd6l 0/1 Error 0 19m
quizservicedb-recover-from-backup-1-full-recovery-dxkkm 0/1 Error 0 17m
quizservicedb-recover-from-backup-1-full-recovery-hkb4c 0/1 Error 0 18m
quizservicedb-recover-from-backup-1-full-recovery-sttff 0/1 Error 0 18m
quizservicedb-recover-from-backup-1-full-recovery-trgrs 0/1 Error 0 16m
- Check the logs of the failing pod. You should see an error message similar to the one below:
kubectl logs -n postgresql quizservicedb-recover-from-backup-1-full-recovery-4mqzg
Defaulted container "full-recovery" out of: full-recovery, bootstrap-controller (init)
{"level":"info","ts":"2025-07-07T11:27:08.64114153Z","msg":"Starting webserver","logging_pod":"quizservicedb-recover-from-backup-1-full-recovery","address":"localhost:8010","hasTLS":false}
{"level":"error","ts":"2025-07-07T11:27:08.747756944Z","msg":"Error while restoring a backup","logging_pod":"quizservicedb-recover-from-backup-1-full-recovery","error":"missing Azure credentials","stacktrace":"github.com/cloudnative-pg/machinery/pkg/log.(*logger).Error\n\tpkg/mod/github.com/cloudnative-pg/machinery@v0.2.0/pkg/log/log.go:125\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.restoreSubCommand\n\tinternal/cmd/manager/instance/restore/restore.go:79\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.(*restoreRunnable).Start\n\tinternal/cmd/manager/instance/restore/restore.go:62\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/manager/runnable_group.go:226"}
{"level":"info","ts":"2025-07-07T11:27:08.747879249Z","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2025-07-07T11:27:08.747893152Z","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2025-07-07T11:27:08.748051396Z","msg":"Webserver exited","logging_pod":"quizservicedb-recover-from-backup-1-full-recovery","address":"localhost:8010"}
{"level":"info","ts":"2025-07-07T11:27:08.748076961Z","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2025-07-07T11:27:08.748201189Z","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2025-07-07T11:27:08.74824993Z","msg":"Stopping and waiting for HTTP servers"}
{"level":"info","ts":"2025-07-07T11:27:08.748267785Z","msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"error","ts":"2025-07-07T11:27:08.7482979Z","msg":"restore error","logging_pod":"quizservicedb-recover-from-backup-1-full-recovery","error":"while restoring cluster: missing Azure credentials","stacktrace":"github.com/cloudnative-pg/machinery/pkg/log.(*logger).Error\n\tpkg/mod/github.com/cloudnative-pg/machinery@v0.2.0/pkg/log/log.go:125\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.NewCmd.func1\n\tinternal/cmd/manager/instance/restore/cmd.go:101\ngithub.com/spf13/cobra.(*Command).execute\n\tpkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1015\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tpkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1148\ngithub.com/spf13/cobra.(*Command).Execute\n\tpkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1071\nmain.main\n\tcmd/manager/main.go:71\nruntime.main\n\t/opt/hostedtoolcache/go/1.24.3/x64/src/runtime/proc.go:283"}
Analysis of the Error
The error message "missing Azure credentials" suggests that the CloudNativePG operator is attempting to access Azure storage, even if the backup is stored elsewhere. This could be due to a misconfiguration or a bug in the operator's backup restoration logic. It's essential to verify the backup configuration and ensure that the correct storage credentials are provided if Azure storage is indeed being used. If Azure is not in use, this error points to an internal issue within the CloudNativePG operator that needs to be addressed.
Impact
This bug has a significant impact on the reliability of the backup and recovery process in CloudNativePG. The inability to restore a cluster from a backup can lead to data loss and prolonged downtime, severely affecting applications relying on the PostgreSQL database. The misleading error message further complicates the troubleshooting process, potentially delaying recovery efforts.
Affected Users
This issue affects users who rely on CloudNativePG for managing PostgreSQL clusters and performing backup and recovery operations. Specifically, it impacts those who attempt to restore a cluster from a backup object, regardless of the actual storage location of the backup. Users who do not use Azure storage might find the error message particularly confusing and frustrating.
Proposed Solutions and Workarounds
While a definitive solution requires a fix in the CloudNativePG operator, there are several potential workarounds and troubleshooting steps that users can take:
-
Verify Backup Configuration: Double-check the backup configuration to ensure that the correct storage settings are specified. If Azure storage is not intended, ensure that the configuration does not include any Azure-related settings.
-
Check Logs for More Details: Examine the logs of the CloudNativePG operator and the failing pods for more detailed error messages. These logs might provide additional context and help pinpoint the root cause of the issue.
-
Try Restoring to a Different Cluster: Attempt to restore the backup to a different cluster or namespace. This can help determine if the issue is specific to the target cluster or a more general problem.
-
Use a Different Backup: If multiple backups are available, try restoring from a different backup. This can help identify if the issue is specific to a particular backup.
-
Contact Community Support: Reach out to the CloudNativePG community or support channels for assistance. Other users might have encountered similar issues and can provide valuable insights.
Root Cause Analysis (Potential)
Based on the error message and the context, a potential root cause could be an incorrect or incomplete implementation of the backup restoration logic within the CloudNativePG operator. The operator might be attempting to access Azure storage by default or due to a misconfiguration, even when it's not required. Another possibility is a bug in the error handling, where an unrelated issue is incorrectly reported as a missing Azure credentials problem.
Investigation Steps
To further investigate this issue, the following steps can be taken:
-
Code Review: Review the CloudNativePG operator's source code, particularly the backup restoration logic, to identify any potential issues.
-
Debugging: Debug the operator during the restoration process to trace the execution flow and pinpoint the exact location where the error occurs.
-
Testing: Implement additional test cases to cover various backup and restore scenarios, including those involving different storage providers.
Conclusion
The "missing Azure credentials" error during backup recovery in CloudNativePG is a critical issue that can prevent successful cluster restoration. This article has detailed the steps to reproduce the bug, its impact, and potential workarounds. While a permanent fix requires modifications to the CloudNativePG operator, users can take several steps to mitigate the issue and ensure data availability. By understanding the problem and its potential causes, users and developers can work together to resolve this bug and improve the reliability of CloudNativePG.
This comprehensive analysis aims to provide valuable insights into the bug, enabling users to effectively troubleshoot and recover their PostgreSQL clusters. Addressing this issue is crucial for maintaining the integrity and availability of data in cloud-native environments.