đ§ Troubleshooting Guide
This chapter covers every issue encountered during the real GCP deployment plus common problems reported by the community. Start from the top and work down.
đŠē Quick Diagnostic â Run This Firstâ
# 1. Are SigNoz containers up?
docker compose -p signoz ps
# 2. Are ports listening?
sudo ss -ltnp | egrep ':(3301|4317|4318)\b'
# 3. Does SigNoz respond locally?
curl -I http://localhost:3301
curl http://localhost:3301/api/v1/health
# 4. Is the OTEL Collector running?
sudo systemctl status otelcol-contrib
# 5. Are your apps running?
pm2 list
â Problem: SigNoz Dashboard Not Opening in Browserâ
Symptom: You go to http://YOUR_SERVER_IP:3301 and get nothing â connection refused or timeout.
Check 1 â Is SigNoz actually running?â
curl -I http://localhost:3301
- If this returns
200 OKâ SigNoz is running. The problem is the firewall (see below). - If this returns nothing or
Connection refusedâ SigNoz containers are not up. See container troubleshooting.
Check 2 â GCP Firewall Rule Missingâ
This was the most common issue in our GCP deployment. The service is healthy locally but unreachable from outside.
Fix (GCP Console):
- Go to VPC Network â Firewall
- Click Create Firewall Rule
- Settings:
- Name:
allow-signoz-ui - Direction: Ingress
- Action: Allow
- Targets: All instances (or tag-based if your VM has a network tag)
- Source IP ranges:
0.0.0.0/0(or your team's IP range for security) - Ports: TCP
3301
- Name:
- Save. Wait 30 seconds and try again.
For production, prefer your office/VPN CIDR over 0.0.0.0/0.
Fix (AWS Security Group):
- EC2 â Security Groups â your instance's group
- Inbound Rules â Add Rule
- Type: Custom TCP, Port: 3301, Source: your IP
Check 3 â UFW Local Firewall Blockingâ
sudo ufw status
If active, run:
sudo ufw allow 3301/tcp
sudo ufw reload
Check 4 â Test via SSH Tunnelâ
This isolates whether the issue is the app or the network:
# Run this on YOUR LAPTOP (not the server)
ssh -L 3301:localhost:3301 username@YOUR_VM_IP
Then open http://localhost:3301 in your browser. If it works â firewall issue. If it doesn't â app issue.
â Problem: SigNoz Containers Won't Startâ
Symptom: docker compose -p signoz ps shows containers as Exit 1 or Restarting.
Check container logs:â
docker compose -p signoz logs signoz --tail=50
docker compose -p signoz logs clickhouse --tail=50
docker compose -p signoz logs otel-collector --tail=50
Common cause: Port already in useâ
sudo ss -ltnp | grep ':8080' # Is 8080 taken?
sudo ss -ltnp | grep ':4317' # Is 4317 taken?
If a port is taken, edit docker-compose.yaml and change the host-side port:
# Change this:
- "8080:8080"
# To this (using free port 3301):
- "3301:8080"
Then restart:
docker compose -p signoz down
docker compose -p signoz up -d
Common cause: Not enough memoryâ
free -h
SigNoz + ClickHouse needs ~2â3 GB free RAM. If available is less than 1.5 GB, ClickHouse will crash.
Fix: Stop other containers temporarily:
docker stop grafana prometheus # example
docker compose -p signoz up -d
â Problem: No Traces Showing in SigNozâ
Symptom: Apps are running, dashboard is open, but no data appears in Services tab.
Check 1 â Is tracing.js actually loading?â
pm2 logs healthtune_dev_api --lines 30
Look for lines like:
@opentelemetry/sdk-node starting up
If missing â tracing.js is not being loaded. Make sure your PM2 start command includes -r tracing.js:
pm2 start "node -r /home/youruser/healthtune_api/tracing.js dist/main.js" --name healthtune_dev_api
Use your real app path in the command if different.
Check 2 â Is the OTEL Collector receiving data?â
sudo journalctl -u otelcol-contrib -n 50
Look for errors like connection refused (wrong endpoint) or failed to export (SigNoz unreachable).
Check 3 â Is the collector endpoint correct?â
In your tracing.js:
url: 'http://localhost:4317'
If SigNoz is on a different server, replace localhost with that server's IP:
url: 'http://34.31.206.197:4317'
Check 4 â Generate actual trafficâ
Traces only appear when your API receives requests:
curl http://localhost:3000/
curl http://localhost:3000/api/users
Wait 30â60 seconds, then check SigNoz dashboard.
â Problem: No Logs Showing in SigNozâ
Symptom: Traces work but logs are empty.
Check 1 â Permissions on PM2 log filesâ
sudo journalctl -u otelcol-contrib | grep "permission denied"
If you see permission errors:
sudo apt install -y acl
sudo setfacl -m u:otelcol-contrib:r /home/youruser/.pm2/logs/*
sudo systemctl restart otelcol-contrib
Check 2 â Correct log file paths in collector configâ
cat /etc/otelcol-contrib/config.yaml | grep "include" -A 3
Verify these paths actually exist:
ls -la /home/youruser/.pm2/logs/
â Problem: "cannot create agent without orgId" Errors in Logsâ
Symptom: docker compose -p signoz logs shows repeated messages:
cannot create agent without orgId
Server returned an error response
This is a known issue with the OpAMP-based agent management in SigNoz. It usually means the first admin user has not been created yet.
Fix:
- Open the SigNoz dashboard in your browser
- Create the first admin account
- Watch logs â the error should stop
If it continues after creating the account, it may be a version-specific bug. It does not block basic functionality (traces, logs, metrics still work).
â Problem: High Memory Usage / Server Slowing Downâ
Symptom: free -h shows very little available RAM. Other services becoming slow.
# Check what's using memory
docker stats --no-stream
free -h
Short-term fix: Reduce ClickHouse memory limit.
Edit docker-compose.yaml and find the ClickHouse service. Add:
environment:
- CLICKHOUSE_MAX_SERVER_MEMORY_USAGE_RATIO=0.3 # use max 30% of RAM
Then restart:
docker compose -p signoz down
docker compose -p signoz up -d
Long-term fix: Upgrade VM to at least 8 GB RAM if running SigNoz alongside other services.
â Problem: PM2 Process Shows "Errored" or "Stopped"â
pm2 list # check status
pm2 logs # see why it failed
pm2 restart healthtune_dev_api
If it keeps crashing, check for missing dependencies:
cd /home/youruser/healthtune_api
yarn install # reinstall if node_modules was deleted
Replace /home/youruser/healthtune_api with your app directory.
â Problem: ClickHouse Unhealthy / Migration Errors on Startupâ
Symptom: During first startup, you see migration retry messages. ClickHouse shows unhealthy.
This is normal on first boot â ClickHouse runs database schema migrations which can take 1â3 minutes. Wait and then check again:
# Wait 2 minutes, then:
docker compose -p signoz ps
If ClickHouse is still unhealthy after 5 minutes:
docker compose -p signoz logs clickhouse --tail=100
đ Rollback â How to Completely Remove SigNozâ
If SigNoz is causing problems (memory, port conflicts, etc.) and you need to remove it:
cd ~/signoz/deploy/docker
# Stop and remove all SigNoz containers + volumes (data will be deleted)
docker compose -p signoz down -v
# Optionally remove the repository
rm -rf ~/signoz
â ī¸
-vremoves all data volumes. Your ClickHouse data (all traces, logs, metrics) will be permanently deleted. If you want to keep data, remove the-vflag.
đ Full Diagnostic Scriptâ
Save this as signoz-check.sh and run it anytime:
#!/bin/bash
echo "=== SigNoz Container Status ==="
docker compose -p signoz ps 2>/dev/null || echo "SigNoz not running via compose"
echo ""
echo "=== Port Listeners ==="
sudo ss -ltnp | egrep ':(3301|4317|4318)\b' || echo "No SigNoz ports listening"
echo ""
echo "=== Local Health Check ==="
curl -s http://localhost:3301/api/v1/health || echo "SigNoz not responding"
echo ""
echo "=== OTEL Collector Status ==="
sudo systemctl status otelcol-contrib --no-pager | head -10
echo ""
echo "=== Memory ==="
free -h
echo ""
echo "=== PM2 Processes ==="
pm2 list 2>/dev/null || echo "PM2 not available"
chmod +x signoz-check.sh
./signoz-check.sh
Official Documentation Linksâ
- SigNoz troubleshooting docs
- SigNoz FAQ
- SigNoz community Slack
- OpenTelemetry troubleshooting overview
- OpenTelemetry SDK environment variables
Read in Sequenceâ
- Previous: 6-production-integration.md
- Next: 8-future-improvements.md