🔧 Troubleshooting Guide
This chapter covers every issue encountered during the real GCP deployment plus common problems reported by the community. Start from the top and work down.
🩺 Quick Diagnostic — Run This First
# 1. Are SigNoz containers up?
docker compose -p signoz ps
# 2. Are ports listening?
sudo ss -ltnp | egrep ':(3301|4317|4318)\b'
# 3. Does SigNoz respond locally?
curl -I http://localhost:3301
curl http://localhost:3301/api/v1/health
# 4. Is the OTEL Collector running?
sudo systemctl status otelcol-contrib
# 5. Are your apps running?
pm2 list
❌ Problem: SigNoz Dashboard Not Opening in Browser
Symptom: You go to http://YOUR_SERVER_IP:3301 and get nothing — connection refused or timeout.
Check 1 — Is SigNoz actually running?
curl -I http://localhost:3301
- If this returns
200 OK→ SigNoz is running. The problem is the firewall (see below). - If this returns nothing or
Connection refused→ SigNoz containers are not up. See container troubleshooting.
Check 2 — GCP Firewall Rule Missing
This was the most common issue in our GCP deployment. The service is healthy locally but unreachable from outside.
Fix (GCP Console):
- Go to VPC Network → Firewall
- Click Create Firewall Rule
- Settings:
- Name:
allow-signoz-ui - Direction: Ingress
- Action: Allow
- Targets: All instances (or tag-based if your VM has a network tag)
- Source IP ranges:
0.0.0.0/0(or your team's IP range for security) - Ports: TCP
3301
- Name:
- Save. Wait 30 seconds and try again.
For production, prefer your office/VPN CIDR over 0.0.0.0/0.
Fix (AWS Security Group):
- EC2 → Security Groups → your instance's group
- Inbound Rules → Add Rule
- Type: Custom TCP, Port: 3301, Source: your IP
Check 3 — UFW Local Firewall Blocking
sudo ufw status
If active, run:
sudo ufw allow 3301/tcp
sudo ufw reload
Check 4 — Test via SSH Tunnel
This isolates whether the issue is the app or the network:
# Run this on YOUR LAPTOP (not the server)
ssh -L 3301:localhost:3301 username@YOUR_VM_IP
Then open http://localhost:3301 in your browser. If it works → firewall issue. If it doesn't → app issue.
❌ Problem: SigNoz Containers Won't Start
Symptom: docker compose -p signoz ps shows containers as Exit 1 or Restarting.
Check container logs:
docker compose -p signoz logs signoz --tail=50
docker compose -p signoz logs clickhouse --tail=50
docker compose -p signoz logs otel-collector --tail=50
Common cause: Port already in use
sudo ss -ltnp | grep ':8080' # Is 8080 taken?
sudo ss -ltnp | grep ':4317' # Is 4317 taken?
If a port is taken, edit docker-compose.yaml and change the host-side port:
# Change this:
- "8080:8080"
# To this (using free port 3301):
- "3301:8080"
Then restart:
docker compose -p signoz down
docker compose -p signoz up -d
Common cause: Not enough memory
free -h
SigNoz + ClickHouse needs ~2–3 GB free RAM. If available is less than 1.5 GB, ClickHouse will crash.
Fix: Stop other containers temporarily:
docker stop grafana prometheus # example
docker compose -p signoz up -d
❌ Problem: No Traces Showing in SigNoz
Symptom: Apps are running, dashboard is open, but no data appears in Services tab.
Check 1 — Is tracing.js actually loading?
pm2 logs healthtune_dev_api --lines 30
Look for lines like:
@opentelemetry/sdk-node starting up
If missing → tracing.js is not being loaded. Make sure your PM2 start command includes -r tracing.js:
pm2 start "node -r /home/youruser/healthtune_api/tracing.js dist/main.js" --name healthtune_dev_api
Use your real app path in the command if different.
Check 2 — Is the OTEL Collector receiving data?
sudo journalctl -u otelcol-contrib -n 50
Look for errors like connection refused (wrong endpoint) or failed to export (SigNoz unreachable).
Check 3 — Is the collector endpoint correct?
In your tracing.js:
url: 'http://localhost:4317'
If SigNoz is on a different server, replace localhost with that server's IP:
url: 'http://34.31.206.197:4317'
Check 4 — Generate actual traffic
Traces only appear when your API receives requests:
curl http://localhost:3000/
curl http://localhost:3000/api/users
Wait 30–60 seconds, then check SigNoz dashboard.
❌ Problem: No Logs Showing in SigNoz
Symptom: Traces work but logs are empty.
Check 1 — Permissions on PM2 log files
sudo journalctl -u otelcol-contrib | grep "permission denied"
If you see permission errors:
sudo apt install -y acl
sudo setfacl -m u:otelcol-contrib:r /home/youruser/.pm2/logs/*
sudo systemctl restart otelcol-contrib
Check 2 — Correct log file paths in collector config
cat /etc/otelcol-contrib/config.yaml | grep "include" -A 3
Verify these paths actually exist:
ls -la /home/youruser/.pm2/logs/
❌ Problem: "cannot create agent without orgId" Errors in Logs
Symptom: docker compose -p signoz logs shows repeated messages:
cannot create agent without orgId
Server returned an error response
This is a known issue with the OpAMP-based agent management in SigNoz. It usually means the first admin user has not been created yet.
Fix:
- Open the SigNoz dashboard in your browser
- Create the first admin account
- Watch logs — the error should stop
If it continues after creating the account, it may be a version-specific bug. It does not block basic functionality (traces, logs, metrics still work).
❌ Problem: High Memory Usage / Server Slowing Down
Symptom: free -h shows very little available RAM. Other services becoming slow.
# Check what's using memory
docker stats --no-stream
free -h
Short-term fix: Reduce ClickHouse memory limit.
Edit docker-compose.yaml and find the ClickHouse service. Add:
environment:
- CLICKHOUSE_MAX_SERVER_MEMORY_USAGE_RATIO=0.3 # use max 30% of RAM
Then restart:
docker compose -p signoz down
docker compose -p signoz up -d
Long-term fix: Upgrade VM to at least 8 GB RAM if running SigNoz alongside other services.
❌ Problem: PM2 Process Shows "Errored" or "Stopped"
pm2 list # check status
pm2 logs # see why it failed
pm2 restart healthtune_dev_api
If it keeps crashing, check for missing dependencies:
cd /home/youruser/healthtune_api
yarn install # reinstall if node_modules was deleted
Replace /home/youruser/healthtune_api with your app directory.
❌ Problem: ClickHouse Unhealthy / Migration Errors on Startup
Symptom: During first startup, you see migration retry messages. ClickHouse shows unhealthy.
This is normal on first boot — ClickHouse runs database schema migrations which can take 1–3 minutes. Wait and then check again:
# Wait 2 minutes, then:
docker compose -p signoz ps
If ClickHouse is still unhealthy after 5 minutes:
docker compose -p signoz logs clickhouse --tail=100
🔄 Rollback — How to Completely Remove SigNoz
If SigNoz is causing problems (memory, port conflicts, etc.) and you need to remove it:
cd ~/signoz/deploy/docker
# Stop and remove all SigNoz containers + volumes (data will be deleted)
docker compose -p signoz down -v
# Optionally remove the repository
rm -rf ~/signoz
⚠️
-vremoves all data volumes. Your ClickHouse data (all traces, logs, metrics) will be permanently deleted. If you want to keep data, remove the-vflag.
📋 Full Diagnostic Script
Save this as signoz-check.sh and run it anytime:
#!/bin/bash
echo "=== SigNoz Container Status ==="
docker compose -p signoz ps 2>/dev/null || echo "SigNoz not running via compose"
echo ""
echo "=== Port Listeners ==="
sudo ss -ltnp | egrep ':(3301|4317|4318)\b' || echo "No SigNoz ports listening"
echo ""
echo "=== Local Health Check ==="
curl -s http://localhost:3301/api/v1/health || echo "SigNoz not responding"
echo ""
echo "=== OTEL Collector Status ==="
sudo systemctl status otelcol-contrib --no-pager | head -10
echo ""
echo "=== Memory ==="
free -h
echo ""
echo "=== PM2 Processes ==="
pm2 list 2>/dev/null || echo "PM2 not available"
chmod +x signoz-check.sh
./signoz-check.sh
Official Documentation Links
- SigNoz troubleshooting docs
- SigNoz FAQ
- SigNoz community Slack
- OpenTelemetry troubleshooting overview
- OpenTelemetry SDK environment variables
Read in Sequence
- Previous: 6-production-integration.md
- Next: 8-future-improvements.md