π§ Troubleshooting Guide
This chapter covers every issue encountered during the real GCP deployment plus common problems reported by the community. Start from the top and work down.
π©Ί Quick Diagnostic β Run This Firstβ
# 1. Are SigNoz containers up?
docker compose -p signoz ps
# 2. Are ports listening?
sudo ss -ltnp | egrep ':(3301|4317|4318)\b'
# 3. Does SigNoz respond locally?
curl -I http://localhost:3301
curl http://localhost:3301/api/v1/health
# 4. Is the OTEL Collector running?
sudo systemctl status otelcol-contrib
# 5. Are your apps running?
pm2 list
β Problem: SigNoz Dashboard Not Opening in Browserβ
Symptom: You go to http://YOUR_SERVER_IP:3301 and get nothing β connection refused or timeout.
Check 1 β Is SigNoz actually running?β
curl -I http://localhost:3301
- If this returns
200 OKβ SigNoz is running. The problem is the firewall (see below). - If this returns nothing or
Connection refusedβ SigNoz containers are not up. See container troubleshooting.
Check 2 β GCP Firewall Rule Missingβ
This was the most common issue in our GCP deployment. The service is healthy locally but unreachable from outside.
Fix (GCP Console):
- Go to VPC Network β Firewall
- Click Create Firewall Rule
- Settings:
- Name:
allow-signoz-ui - Direction: Ingress
- Action: Allow
- Targets: All instances (or tag-based if your VM has a network tag)
- Source IP ranges:
0.0.0.0/0(or your team's IP range for security) - Ports: TCP
3301
- Name:
- Save. Wait 30 seconds and try again.
For production, prefer your office/VPN CIDR over 0.0.0.0/0.
Fix (AWS Security Group):
- EC2 β Security Groups β your instance's group
- Inbound Rules β Add Rule
- Type: Custom TCP, Port: 3301, Source: your IP
Check 3 β UFW Local Firewall Blockingβ
sudo ufw status
If active, run:
sudo ufw allow 3301/tcp
sudo ufw reload
Check 4 β Test via SSH Tunnelβ
This isolates whether the issue is the app or the network:
# Run this on YOUR LAPTOP (not the server)
ssh -L 3301:localhost:3301 username@YOUR_VM_IP
Then open http://localhost:3301 in your browser. If it works β firewall issue. If it doesn't β app issue.
β Problem: SigNoz Containers Won't Startβ
Symptom: docker compose -p signoz ps shows containers as Exit 1 or Restarting.
Check container logs:β
docker compose -p signoz logs signoz --tail=50
docker compose -p signoz logs clickhouse --tail=50
docker compose -p signoz logs otel-collector --tail=50
Common cause: Port already in useβ
sudo ss -ltnp | grep ':8080' # Is 8080 taken?
sudo ss -ltnp | grep ':4317' # Is 4317 taken?
If a port is taken, edit docker-compose.yaml and change the host-side port:
# Change this:
- "8080:8080"
# To this (using free port 3301):
- "3301:8080"
Then restart:
docker compose -p signoz down
docker compose -p signoz up -d
Common cause: Not enough memoryβ
free -h
SigNoz + ClickHouse needs ~2β3 GB free RAM. If available is less than 1.5 GB, ClickHouse will crash.
Fix: Stop other containers temporarily:
docker stop grafana prometheus # example
docker compose -p signoz up -d
β Problem: No Traces Showing in SigNozβ
Symptom: Apps are running, dashboard is open, but no data appears in Services tab.
Check 1 β Is tracing.js actually loading?β
pm2 logs healthtune_dev_api --lines 30
Look for lines like:
@opentelemetry/sdk-node starting up
If missing β tracing.js is not being loaded. Make sure your PM2 start command includes -r tracing.js:
pm2 start "node -r /home/youruser/healthtune_api/tracing.js dist/main.js" --name healthtune_dev_api
Use your real app path in the command if different.
Check 2 β Is the OTEL Collector receiving data?β
sudo journalctl -u otelcol-contrib -n 50
Look for errors like connection refused (wrong endpoint) or failed to export (SigNoz unreachable).
Check 3 β Is the collector endpoint correct?β
In your tracing.js:
url: 'http://localhost:4317'
If SigNoz is on a different server, replace localhost with that server's IP:
url: 'http://34.31.206.197:4317'
Check 4 β Generate actual trafficβ
Traces only appear when your API receives requests:
curl http://localhost:3000/
curl http://localhost:3000/api/users
Wait 30β60 seconds, then check SigNoz dashboard.
β Problem: No Logs Showing in SigNozβ
Symptom: Traces work but logs are empty.
Check 1 β Permissions on PM2 log filesβ
sudo journalctl -u otelcol-contrib | grep "permission denied"
If you see permission errors:
sudo apt install -y acl
sudo setfacl -m u:otelcol-contrib:r /home/youruser/.pm2/logs/*
sudo systemctl restart otelcol-contrib
Check 2 β Correct log file paths in collector configβ
cat /etc/otelcol-contrib/config.yaml | grep "include" -A 3
Verify these paths actually exist:
ls -la /home/youruser/.pm2/logs/
β Problem: "cannot create agent without orgId" Errors in Logsβ
Symptom: docker compose -p signoz logs shows repeated messages:
cannot create agent without orgId
Server returned an error response
This is a known issue with the OpAMP-based agent management in SigNoz. It usually means the first admin user has not been created yet.
Fix:
- Open the SigNoz dashboard in your browser
- Create the first admin account
- Watch logs β the error should stop
If it continues after creating the account, it may be a version-specific bug. It does not block basic functionality (traces, logs, metrics still work).
β Problem: High Memory Usage / Server Slowing Downβ
Symptom: free -h shows very little available RAM. Other services becoming slow.
# Check what's using memory
docker stats --no-stream
free -h
Short-term fix: Reduce ClickHouse memory limit.
Edit docker-compose.yaml and find the ClickHouse service. Add:
environment:
- CLICKHOUSE_MAX_SERVER_MEMORY_USAGE_RATIO=0.3 # use max 30% of RAM
Then restart:
docker compose -p signoz down
docker compose -p signoz up -d
Long-term fix: Upgrade VM to at least 8 GB RAM if running SigNoz alongside other services.
β Problem: PM2 Process Shows "Errored" or "Stopped"β
pm2 list # check status
pm2 logs # see why it failed
pm2 restart healthtune_dev_api
If it keeps crashing, check for missing dependencies:
cd /home/youruser/healthtune_api
yarn install # reinstall if node_modules was deleted
Replace /home/youruser/healthtune_api with your app directory.
β Problem: ClickHouse Unhealthy / Migration Errors on Startupβ
Symptom: During first startup, you see migration retry messages. ClickHouse shows unhealthy.
This is normal on first boot β ClickHouse runs database schema migrations which can take 1β3 minutes. Wait and then check again:
# Wait 2 minutes, then:
docker compose -p signoz ps
If ClickHouse is still unhealthy after 5 minutes:
docker compose -p signoz logs clickhouse --tail=100
π Rollback β How to Completely Remove SigNozβ
If SigNoz is causing problems (memory, port conflicts, etc.) and you need to remove it:
cd ~/signoz/deploy/docker
# Stop and remove all SigNoz containers + volumes (data will be deleted)
docker compose -p signoz down -v
# Optionally remove the repository
rm -rf ~/signoz
β οΈ
-vremoves all data volumes. Your ClickHouse data (all traces, logs, metrics) will be permanently deleted. If you want to keep data, remove the-vflag.
π Full Diagnostic Scriptβ
Save this as signoz-check.sh and run it anytime:
#!/bin/bash
echo "=== SigNoz Container Status ==="
docker compose -p signoz ps 2>/dev/null || echo "SigNoz not running via compose"
echo ""
echo "=== Port Listeners ==="
sudo ss -ltnp | egrep ':(3301|4317|4318)\b' || echo "No SigNoz ports listening"
echo ""
echo "=== Local Health Check ==="
curl -s http://localhost:3301/api/v1/health || echo "SigNoz not responding"
echo ""
echo "=== OTEL Collector Status ==="
sudo systemctl status otelcol-contrib --no-pager | head -10
echo ""
echo "=== Memory ==="
free -h
echo ""
echo "=== PM2 Processes ==="
pm2 list 2>/dev/null || echo "PM2 not available"
chmod +x signoz-check.sh
./signoz-check.sh
Official Documentation Linksβ
- SigNoz troubleshooting docs
- SigNoz FAQ
- SigNoz community Slack
- OpenTelemetry troubleshooting overview
- OpenTelemetry SDK environment variables
Read in Sequenceβ
- Previous: 6-production-integration.md
- Next: 8-future-improvements.md