Skip to main content
This troubleshooting guide helps you resolve common issues you may encounter while using Qovery. Use the sections below to find solutions quickly.

Service Deployment Issues

Find solutions for the most common deployment errors and issues you may encounter when deploying services on Qovery.
Symptom: Your deployment fails with a “connection refused” error during health checks.Common Causes:
  1. Port mismatch - The port in your Qovery configuration doesn’t match the port your application listens on
  2. Localhost binding - Your application listens on localhost (127.0.0.1) instead of all interfaces (0.0.0.0)
Solutions:
1

Verify Port Configuration

Check that your Qovery service port matches your application’s listening port.
  • If your app listens on port 8080, configure port 8080 in Qovery
  • Check your application logs to see which port it’s actually using
2

Update Application Binding

Ensure your application binds to 0.0.0.0 (all interfaces) instead of localhost:Node.js Example:
// ✅ Correct - Binds to all interfaces
app.listen(8080, '0.0.0.0');

// ❌ Wrong - Only binds to localhost
app.listen(8080, 'localhost');
Python Flask Example:
# ✅ Correct
app.run(host='0.0.0.0', port=8080)

# ❌ Wrong
app.run(host='127.0.0.1', port=8080)
Most web frameworks default to localhost in development. Always explicitly set 0.0.0.0 for containerized environments.
Symptom: Deployment fails with “insufficient resources” or pods remain in “Pending” state.Cause: Your cluster doesn’t have enough CPU or memory capacity to run your service.Solutions:
  • Reduce Service Resources
  • Upgrade Instance Type
  • Increase Node Count
Lower the resource requests for your service if they’re set too high:
  1. Go to your service SettingsResources
  2. Reduce CPU or Memory requests
  3. Redeploy the service
Start with minimum resources and scale up as needed. Most applications don’t need as much as you think!
Symptom: Your service deploys but immediately crashes or restarts repeatedly.Solution: Debug using the Qovery Shell
1

Access the Container

Use the Qovery CLI to access your container:
qovery shell
This opens an interactive shell inside your running container.
2

Investigate the Issue

Once inside:
  • Check environment variables: env
  • Test your startup command manually
  • Review application configuration files
  • Check for missing dependencies
3

For Rapidly Crashing Apps

If your app crashes too fast to shell into:
  1. Remove the port temporarily from service settings (this prevents Kubernetes from restarting it)
  2. Modify your Dockerfile to use a sleep command:
    # Comment out your entrypoint
    # ENTRYPOINT ["npm", "start"]
    
    # Add sleep to keep container running
    ENTRYPOINT ["sleep", "infinity"]
    
  3. Deploy with this change
  4. Use qovery shell to debug
  5. Fix the issue and restore the original entrypoint
Remember to restore your port configuration and entrypoint after debugging!
Symptom: SSL certificates aren’t being generated for your custom domain.Cause: DNS records are not properly configured for your custom domain.Solution:
1

Identify the Problem

Check the Qovery Console for which domain is failing certificate generation. You’ll see an error indicator next to the domain.
Domain check error in console
2

Verify DNS Configuration

Your domain should have a CNAME record pointing to your Qovery cluster URL.
Custom domain configuration
Verify DNS resolution:
dig your-domain.com CNAME
You should see a CNAME pointing to your Qovery cluster domain.
3

Fix and Redeploy

  1. Update your DNS CNAME record with your domain provider
  2. Wait for DNS propagation (can take up to 48 hours, usually minutes)
  3. Redeploy your application in Qovery
  4. Certificate generation should succeed
DNS changes can take time to propagate. Use DNS Checker to verify propagation globally.
Symptom: Your build fails with a timeout error after 30 minutes.Cause: The default Docker build timeout is 1800 seconds (30 minutes). Complex builds (like compiling large codebases) may exceed this limit.Solution:
1

Increase Build Timeout

  1. Go to your service SettingsAdvanced Settings
  2. Find the build.timeout_max_sec parameter
  3. Increase the value (e.g., 3600 for 1 hour)
  4. Save and redeploy
2

Optimize Your Build (Recommended)

Consider optimizing your Dockerfile:
  • Use multi-stage builds
  • Leverage build caching effectively
  • Only copy necessary files
  • Install dependencies before copying source code
Example Multi-stage Dockerfile:
# Build stage
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --production
CMD ["node", "dist/index.js"]
Symptom: Build fails when trying to clone private Git submodules.Cause: Private submodules require authentication, which isn’t available during the build.Solutions:
Symptom: Your Lifecycle Job or Cronjob fails to complete successfully.Common Causes:
  1. Code exceptions - Errors in your application code
  2. Out of memory - Job exceeds memory limits
  3. Execution timeout - Job takes longer than configured maximum duration
Solutions:
1

Check Job Logs

  1. Go to your Job service
  2. Click Logs tab
  3. Look for error messages or stack traces
  4. Identify the root cause (exception, OOM, timeout)
2

Fix Based on Cause

For Code Exceptions:
  • Fix the bug in your code
  • Redeploy the job
For Out of Memory:
  • Increase memory allocation in SettingsResources
  • Optimize your code to use less memory
For Timeouts:
  • Go to SettingsMax Duration
  • Increase the timeout value
  • Or optimize your job to run faster
For long-running jobs, consider breaking them into smaller tasks or using a queue system.
Symptom: Database deletion fails with SnapshotQuotaExceeded error.Cause: Qovery automatically creates a snapshot before deleting a database. If you’ve reached your cloud provider’s snapshot quota, this fails.
Database snapshot quota exceeded
Solutions:
  • Delete Old Snapshots
  • Request Quota Increase
Remove obsolete database snapshots from your cloud provider:AWS RDS:
  1. Go to AWS RDS Console
  2. Navigate to Snapshots
  3. Delete old snapshots you no longer need
  4. Retry database deletion in Qovery
Other Providers:

Service Runtime Issues

Find solutions for common runtime errors and issues you may encounter when operating services on Qovery after successful deployment.
Symptom: Your container terminates unexpectedly with exit code 137 or SIGKILL signal.Cause: Your application has exceeded its memory limit. When system resources become constrained, Kubernetes forcibly terminates the container to reclaim memory (Out of Memory Kill - OOMKill).How to Identify:Check your logs for messages like:
Container killed with exit code 137
or
OOMKilled: true
Solutions:
1

Increase Memory Allocation

  1. Go to your service SettingsResources
  2. Increase the Memory limit
  3. Start with a 50% increase (e.g., 512MB → 768MB)
  4. Redeploy and monitor
Watch your memory usage metrics to find the right allocation. Don’t over-allocate unnecessarily!
2

Investigate Memory Leaks

Before just increasing memory, check if your application has a memory leak:Signs of a Memory Leak:
  • Memory usage steadily increases over time
  • Container was fine, then started crashing after recent code changes
  • Memory never levels off or decreases
Recent Changes to Review:
  • New dependencies or library updates
  • Code changes in recent deployments
  • New features that load data into memory
  • Caching implementations without expiration
3

Optimize Memory Usage

Common optimization strategies:
  • Clear unused variables and objects
  • Implement pagination for large datasets
  • Use streaming for file processing
  • Add proper cache eviction policies
  • Profile your application to find memory-intensive code
Node.js Example - Memory Profiling:
// Check memory usage
console.log(process.memoryUsage());

// Force garbage collection (requires --expose-gc flag)
if (global.gc) {
  global.gc();
}
Python Example - Memory Profiling:
import tracemalloc

tracemalloc.start()
# Your code here
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

for stat in top_stats[:10]:
    print(stat)
Continuously increasing memory without investigating the root cause will lead to higher costs and may just delay the problem!
Symptom: Your application crashes within seconds of starting, making it impossible to connect and debug.Challenge: The container restarts so quickly that you can’t use qovery shell to investigate.Solution:
1

Temporarily Remove Application Port

  1. Go to your service SettingsPorts
  2. Remove or disable the application port
  3. Deploy the changes
Removing the port prevents Kubernetes from performing health checks and auto-restarting the container.
2

Modify Dockerfile to Keep Container Running

Update your Dockerfile to override the entrypoint with a sleep command:
# Comment out your normal entrypoint/CMD
# ENTRYPOINT ["npm", "start"]
# CMD ["python", "app.py"]

# Add sleep to keep container alive
ENTRYPOINT ["sleep", "infinity"]
Or for debugging purposes:
# Run a shell instead
ENTRYPOINT ["/bin/sh"]
CMD ["-c", "while true; do sleep 30; done"]
Commit and deploy these changes.
3

Access the Container

Once deployed, use the Qovery CLI to shell into the container:
qovery shell
Now your container stays running and you can debug interactively!
4

Debug Manually

Inside the container, you can now:Check environment variables:
env | grep -i app
Test your application manually:
# Node.js
node server.js

# Python
python app.py

# Go
./app
Check for missing dependencies:
# Node.js
npm list

# Python
pip list

# Check system packages
which <command>
Review configuration files:
cat config/app.json
cat .env
5

Fix and Restore

  1. Identify and fix the issue in your code
  2. Restore the original Dockerfile entrypoint
  3. Re-add the application port
  4. Deploy the fixed version
Don’t forget to restore your port configuration and original entrypoint! The sleep command is only for debugging.
Symptom: When deploying Helm charts, you can’t see logs or pod status in the Qovery Console.Cause: Qovery requires specific labels and annotations on your Kubernetes resources to enable log access and pod status visibility.Solution:Add Qovery-specific macros to your Helm chart templates:
1

Add Labels and Annotations

Update your Helm chart’s deployment.yaml, service.yaml, or job.yaml to include Qovery macros:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "mychart.fullname" . }}
  labels:
    {{- include "qovery.labels.service" . | nindent 4 }}
  annotations:
    {{- include "qovery.annotations.service" . | nindent 4 }}
spec:
  template:
    metadata:
      labels:
        {{- include "qovery.labels.service" . | nindent 8 }}
      annotations:
        {{- include "qovery.annotations.service" . | nindent 8 }}
    spec:
      containers:
      - name: app
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
2

Required Resources

Apply these macros to the following Kubernetes resources:
  • Deployments - For long-running applications
  • StatefulSets - For stateful applications
  • Jobs - For one-time tasks
  • CronJobs - For scheduled tasks
  • Services - For networking
  • Pods - If you create standalone pods
Both labels and annotations are required for full functionality!
3

Override Values During Deployment

If you can’t modify the Helm chart directly, override the values:
# In your Qovery Helm service overrides
deployment:
  metadata:
    labels:
      qovery.com/service-id: "{{ QOVERY_SERVICE_ID }}"
      qovery.com/service-type: "{{ QOVERY_SERVICE_TYPE }}"
      qovery.com/environment-id: "{{ QOVERY_ENVIRONMENT_ID }}"
    annotations:
      qovery.com/service-version: "{{ QOVERY_SERVICE_VERSION }}"
4

Redeploy and Verify

  1. Update your Helm chart with the macros
  2. Redeploy the Helm service in Qovery
  3. Verify logs are now visible in the Console
  4. Check that pod status appears correctly
These labels and annotations allow Qovery to identify and track your Helm-deployed resources within the cluster.
Symptoms:
  • Application becomes slow or unresponsive
  • CPU throttling warnings in logs
  • Pods getting OOMKilled even with sufficient memory
Quick Checks:
  1. Check CPU metrics in Qovery Console
  2. Review recent code changes that might be CPU-intensive
  3. Look for infinite loops or inefficient algorithms
  4. Check for CPU-intensive operations running on every request
Solutions:
  • Optimize hot paths in your code
  • Implement caching for expensive operations
  • Move heavy processing to background jobs
  • Increase CPU allocation if legitimately needed
  • Use profiling tools to identify bottlenecks
Possible Causes:
  • Database connection issues
  • External API timeouts
  • Insufficient resources
  • Inefficient code paths
  • Network latency
Debugging Steps:
  1. Check application logs for slow queries or timeouts
  2. Review database connection pools
  3. Monitor external API response times
  4. Profile your application to find slow endpoints
  5. Check network policies that might be blocking traffic

Cluster Issues

Find solutions for common errors you might encounter while deploying or updating Qovery clusters.
Symptom: When attempting to delete a Qovery cluster, you receive a DependencyViolation error.Cause: Resources managed outside of Qovery remain attached to cluster infrastructure elements, preventing deletion.Example Error:
DeleteError - Unknown error while performing Terraform command
(terraform destroy -lock=false -no-color -auto-approve), here is the error:

Error: deleting EC2 Subnet (subnet-xxx): operation error EC2: DeleteSubnet,
https response error StatusCode: 400, RequestID: xxx, api error DependencyViolation:
The subnet 'subnet-xxx' has dependencies and cannot be deleted.
Solution:
1

Access Cloud Provider Console

Log into your cloud provider console (AWS, GCP, Azure, or Scaleway).
2

Navigate to VPC Section

AWS Example:
  1. Go to VPC Console
  2. Find the VPC associated with your Qovery cluster
  3. Look for the subnet mentioned in the error message
AWS VPC subnet view
3

Attempt to Delete the Resource

Try to delete the failing resource (subnet, security group, etc.):
  1. Select the resource
  2. Click Delete or ActionsDelete
  3. The cloud provider will show what’s blocking the deletion
4

Identify Blocking Resources

Review the blocking resources, typically:
  • Network Interfaces - Check the Type and Description fields
  • NAT Gateways - May be attached to subnets
  • Load Balancers - Can block subnet deletion
  • EC2 Instances - Running or stopped instances
  • Lambda Functions - With VPC configuration
  • RDS Instances - In the VPC
AWS network interfaces blocking deletion
5

Delete Blocking Resources

Delete any resources that were created outside of Qovery:
Only delete resources you created manually! Don’t delete resources managed by other applications.
  1. Note down which resources are blocking
  2. Delete them from the cloud console
  3. Wait for deletion to complete
6

Retry Cluster Deletion

Return to Qovery and retry the cluster deletion. It should now succeed.
Common culprits: Lambda functions attached to VPC, manually created EC2 instances, or RDS databases not managed by Qovery.
Scenario: You no longer have access to the Qovery platform but need to clean up AWS resources.

Method A: AWS Resource Groups & Tag Editor

1

Access Resource Groups

  1. Log into AWS Console
  2. Go to Resource Groups & Tag Editor service
  3. Click Create Resource Group
AWS Resource Groups search by tag
2

Filter by Qovery Cluster Tag

  1. Choose Tag based group type
  2. Add tag filter:
    • Tag key: ClusterLongId
    • Tag value: Your Qovery cluster ID (found in cluster settings or URL)
  3. Click Preview group resources
3

Review and Delete Resources

  1. Review all resources tagged with your cluster ID
  2. Note the resource types (VPC, EC2, RDS, etc.)
  3. Delete resources in this order:
    • EC2 instances
    • Load balancers
    • RDS databases
    • NAT gateways
    • Internet gateways
    • Route tables
    • Subnets
    • Security groups
    • VPC (last)

Method B: AWS CLI Script

Use this bash script to list all resources in a VPC by ID:
#!/bin/bash

# Set your VPC ID
VPC_ID="vpc-xxxxxxxxx"

echo "=== Resources in VPC: $VPC_ID ==="

# EC2 Instances
echo -e "\n=== EC2 Instances ==="
aws ec2 describe-instances \
  --filters "Name=vpc-id,Values=$VPC_ID" \
  --query "Reservations[].Instances[].[InstanceId,Tags[?Key=='Name'].Value|[0],State.Name]" \
  --output table

# Subnets
echo -e "\n=== Subnets ==="
aws ec2 describe-subnets \
  --filters "Name=vpc-id,Values=$VPC_ID" \
  --query "Subnets[].[SubnetId,CidrBlock,AvailabilityZone]" \
  --output table

# Security Groups
echo -e "\n=== Security Groups ==="
aws ec2 describe-security-groups \
  --filters "Name=vpc-id,Values=$VPC_ID" \
  --query "SecurityGroups[].[GroupId,GroupName]" \
  --output table

# NAT Gateways
echo -e "\n=== NAT Gateways ==="
aws ec2 describe-nat-gateways \
  --filter "Name=vpc-id,Values=$VPC_ID" \
  --query "NatGateways[].[NatGatewayId,State]" \
  --output table

# Internet Gateways
echo -e "\n=== Internet Gateways ==="
aws ec2 describe-internet-gateways \
  --filters "Name=attachment.vpc-id,Values=$VPC_ID" \
  --query "InternetGateways[].[InternetGatewayId]" \
  --output table

# Route Tables
echo -e "\n=== Route Tables ==="
aws ec2 describe-route-tables \
  --filters "Name=vpc-id,Values=$VPC_ID" \
  --query "RouteTables[].[RouteTableId,Associations[0].Main]" \
  --output table

# Load Balancers (ALB/NLB)
echo -e "\n=== Load Balancers (ALB/NLB) ==="
aws elbv2 describe-load-balancers \
  --query "LoadBalancers[?VpcId=='$VPC_ID'].[LoadBalancerName,LoadBalancerArn,Type]" \
  --output table

# Network Interfaces
echo -e "\n=== Network Interfaces ==="
aws ec2 describe-network-interfaces \
  --filters "Name=vpc-id,Values=$VPC_ID" \
  --query "NetworkInterfaces[].[NetworkInterfaceId,Description,Status]" \
  --output table

# VPC Endpoints
echo -e "\n=== VPC Endpoints ==="
aws ec2 describe-vpc-endpoints \
  --filters "Name=vpc-id,Values=$VPC_ID" \
  --query "VpcEndpoints[].[VpcEndpointId,ServiceName,State]" \
  --output table
Usage:
chmod +x list-vpc-resources.sh
./list-vpc-resources.sh
Always double-check you’re deleting the correct resources! Deleting production resources by mistake can cause downtime.
Symptom: You receive one of these errors:
"This account is currently blocked by your cloud provider"
or
"This AWS account is currently blocked and not recognized as a valid account"
Common Causes:
  1. Billing Issues
    • Outstanding payment
    • Credit card expired
    • Payment method declined
  2. Free Tier Restrictions
    • Attempting to deploy in regions not supported by free tier
    • Exceeding free tier limits
  3. Account Compliance Violations
    • Terms of service violations
    • Abuse reports
    • Security issues
Solution:
1

Contact Your Cloud Provider

Qovery cannot resolve account-level blocks. You must contact your cloud provider directly:
2

Verify Billing

  1. Check your billing dashboard
  2. Ensure payment method is valid
  3. Resolve any outstanding payments
3

Check Account Status

Review your account status page for:
  • Active incidents
  • Service health issues
  • Account restrictions
4

Retry After Resolution

Once your cloud provider resolves the block:
  1. Wait 5-10 minutes for changes to propagate
  2. Retry your cluster operation in Qovery
Account blocks are typically resolved quickly once you contact your cloud provider. Most issues are billing-related and easy to fix.
Symptom: Cluster creation fails with a permissions error related to AWS SQS when using Karpenter.Error Message:
Permissions issue. Check your AWS permissions to ensure you have
the necessary authorizations for this action.
Cause: The IAM credentials provided don’t have necessary SQS permissions for Karpenter’s interruption handling.Solution:
1

Verify IAM Policy

  1. Check the official Qovery IAM policy
  2. Compare with your current IAM user/role permissions
  3. Ensure all SQS permissions are included:
    {
      "Effect": "Allow",
      "Action": [
        "sqs:CreateQueue",
        "sqs:DeleteQueue",
        "sqs:GetQueueAttributes",
        "sqs:SetQueueAttributes",
        "sqs:TagQueue"
      ],
      "Resource": "arn:aws:sqs:*:*:qovery*"
    }
    
2

Check AWS Organization SCPs

If you’re using AWS Organizations, Service Control Policies (SCPs) may be blocking permissions:
  1. Go to AWS Policy Simulator
  2. Select your IAM user or role
  3. Choose SQS service
  4. Test these actions:
    • CreateQueue
    • DeleteQueue
    • GetQueueAttributes
    • SetQueueAttributes
    • TagQueue
  5. For resource, use: arn:aws:sqs:::qovery*
3

Interpret Results

Scenario 1: Permissions Allowed (with SCPs enabled)
AWS SQS permissions allowed
If permissions show as allowed with SCPs:Scenario 2: Permissions Denied (with SCPs enabled but allowed when disabled)
AWS SQS permissions denied
If denied with SCPs but allowed without:
  • SCP is blocking Qovery access
  • Contact your AWS administrator
  • Request SQS permissions for Qovery resources
4

Update IAM Policy

If using static credentials:
  1. Go to IAM Console
  2. Find your Qovery IAM user
  3. Update attached policies to include SQS permissions
  4. Retry cluster creation
If using STS Assume Role:
  1. Update the CloudFormation stack
  2. Add missing SQS permissions to the role
  3. Wait for stack update to complete
  4. Retry cluster creation
Karpenter uses SQS to handle EC2 spot instance interruption notifications. Without these permissions, the cluster cannot properly handle node lifecycle events.
Symptoms:
  • Cluster shows “Deploying” for more than 45 minutes
  • No progress in deployment logs
Possible Causes:
  • AWS service quotas exceeded
  • Region capacity issues
  • Network connectivity problems
  • Invalid cluster configuration
Solutions:
  1. Check Cluster Logs:
    • Go to Cluster SettingsLogs
    • Look for specific error messages
  2. Verify Service Quotas:
    • Check AWS Service Quotas for EC2, VPC, ELB
    • Request increases if needed
  3. Try Different Region:
    • Some regions may have capacity issues
    • Try deploying to an alternative region
  4. Contact Support:
    • If issue persists > 1 hour, contact support
    • Provide cluster ID and deployment logs

Need More Help?

If you don’t find what you need in this troubleshooting guide, we’re here to help: