मुख्य कंटेंट तक स्किप करें

🔧 Troubleshooting Guide

SigNoz Dashboard Troubleshooting View

This chapter covers every issue encountered during the real GCP deployment plus common problems reported by the community. Start from the top and work down.


🩺 Quick Diagnostic — Run This First

# 1. Are SigNoz containers up?
docker compose -p signoz ps

# 2. Are ports listening?
sudo ss -ltnp | egrep ':(3301|4317|4318)\b'

# 3. Does SigNoz respond locally?
curl -I http://localhost:3301
curl http://localhost:3301/api/v1/health

# 4. Is the OTEL Collector running?
sudo systemctl status otelcol-contrib

# 5. Are your apps running?
pm2 list

❌ Problem: SigNoz Dashboard Not Opening in Browser

Symptom: You go to http://YOUR_SERVER_IP:3301 and get nothing — connection refused or timeout.

Check 1 — Is SigNoz actually running?

curl -I http://localhost:3301
  • If this returns 200 OK → SigNoz is running. The problem is the firewall (see below).
  • If this returns nothing or Connection refused → SigNoz containers are not up. See container troubleshooting.

Check 2 — GCP Firewall Rule Missing

This was the most common issue in our GCP deployment. The service is healthy locally but unreachable from outside.

Fix (GCP Console):

  1. Go to VPC NetworkFirewall
  2. Click Create Firewall Rule
  3. Settings:
    • Name: allow-signoz-ui
    • Direction: Ingress
    • Action: Allow
    • Targets: All instances (or tag-based if your VM has a network tag)
    • Source IP ranges: 0.0.0.0/0 (or your team's IP range for security)
    • Ports: TCP 3301
  4. Save. Wait 30 seconds and try again.

For production, prefer your office/VPN CIDR over 0.0.0.0/0.

Fix (AWS Security Group):

  1. EC2 → Security Groups → your instance's group
  2. Inbound Rules → Add Rule
  3. Type: Custom TCP, Port: 3301, Source: your IP

Check 3 — UFW Local Firewall Blocking

sudo ufw status

If active, run:

sudo ufw allow 3301/tcp
sudo ufw reload

Check 4 — Test via SSH Tunnel

This isolates whether the issue is the app or the network:

# Run this on YOUR LAPTOP (not the server)
ssh -L 3301:localhost:3301 username@YOUR_VM_IP

Then open http://localhost:3301 in your browser. If it works → firewall issue. If it doesn't → app issue.


❌ Problem: SigNoz Containers Won't Start

Symptom: docker compose -p signoz ps shows containers as Exit 1 or Restarting.

Check container logs:

docker compose -p signoz logs signoz --tail=50
docker compose -p signoz logs clickhouse --tail=50
docker compose -p signoz logs otel-collector --tail=50

Common cause: Port already in use

sudo ss -ltnp | grep ':8080' # Is 8080 taken?
sudo ss -ltnp | grep ':4317' # Is 4317 taken?

If a port is taken, edit docker-compose.yaml and change the host-side port:

# Change this:
- "8080:8080"
# To this (using free port 3301):
- "3301:8080"

Then restart:

docker compose -p signoz down
docker compose -p signoz up -d

Common cause: Not enough memory

free -h

SigNoz + ClickHouse needs ~2–3 GB free RAM. If available is less than 1.5 GB, ClickHouse will crash.

Fix: Stop other containers temporarily:

docker stop grafana prometheus # example
docker compose -p signoz up -d

❌ Problem: No Traces Showing in SigNoz

Symptom: Apps are running, dashboard is open, but no data appears in Services tab.

Check 1 — Is tracing.js actually loading?

pm2 logs healthtune_dev_api --lines 30

Look for lines like:

@opentelemetry/sdk-node starting up

If missing → tracing.js is not being loaded. Make sure your PM2 start command includes -r tracing.js:

pm2 start "node -r /home/youruser/healthtune_api/tracing.js dist/main.js" --name healthtune_dev_api

Use your real app path in the command if different.

Check 2 — Is the OTEL Collector receiving data?

sudo journalctl -u otelcol-contrib -n 50

Look for errors like connection refused (wrong endpoint) or failed to export (SigNoz unreachable).

Check 3 — Is the collector endpoint correct?

In your tracing.js:

url: 'http://localhost:4317'

If SigNoz is on a different server, replace localhost with that server's IP:

url: 'http://34.31.206.197:4317'

Check 4 — Generate actual traffic

Traces only appear when your API receives requests:

curl http://localhost:3000/
curl http://localhost:3000/api/users

Wait 30–60 seconds, then check SigNoz dashboard.


❌ Problem: No Logs Showing in SigNoz

Symptom: Traces work but logs are empty.

Check 1 — Permissions on PM2 log files

sudo journalctl -u otelcol-contrib | grep "permission denied"

If you see permission errors:

sudo apt install -y acl
sudo setfacl -m u:otelcol-contrib:r /home/youruser/.pm2/logs/*
sudo systemctl restart otelcol-contrib

Check 2 — Correct log file paths in collector config

cat /etc/otelcol-contrib/config.yaml | grep "include" -A 3

Verify these paths actually exist:

ls -la /home/youruser/.pm2/logs/

❌ Problem: "cannot create agent without orgId" Errors in Logs

Symptom: docker compose -p signoz logs shows repeated messages:

cannot create agent without orgId
Server returned an error response

This is a known issue with the OpAMP-based agent management in SigNoz. It usually means the first admin user has not been created yet.

Fix:

  1. Open the SigNoz dashboard in your browser
  2. Create the first admin account
  3. Watch logs — the error should stop

If it continues after creating the account, it may be a version-specific bug. It does not block basic functionality (traces, logs, metrics still work).


❌ Problem: High Memory Usage / Server Slowing Down

Symptom: free -h shows very little available RAM. Other services becoming slow.

# Check what's using memory
docker stats --no-stream
free -h

Short-term fix: Reduce ClickHouse memory limit.

Edit docker-compose.yaml and find the ClickHouse service. Add:

environment:
- CLICKHOUSE_MAX_SERVER_MEMORY_USAGE_RATIO=0.3 # use max 30% of RAM

Then restart:

docker compose -p signoz down
docker compose -p signoz up -d

Long-term fix: Upgrade VM to at least 8 GB RAM if running SigNoz alongside other services.


❌ Problem: PM2 Process Shows "Errored" or "Stopped"

pm2 list # check status
pm2 logs # see why it failed
pm2 restart healthtune_dev_api

If it keeps crashing, check for missing dependencies:

cd /home/youruser/healthtune_api
yarn install # reinstall if node_modules was deleted

Replace /home/youruser/healthtune_api with your app directory.


❌ Problem: ClickHouse Unhealthy / Migration Errors on Startup

Symptom: During first startup, you see migration retry messages. ClickHouse shows unhealthy.

This is normal on first boot — ClickHouse runs database schema migrations which can take 1–3 minutes. Wait and then check again:

# Wait 2 minutes, then:
docker compose -p signoz ps

If ClickHouse is still unhealthy after 5 minutes:

docker compose -p signoz logs clickhouse --tail=100

🔄 Rollback — How to Completely Remove SigNoz

If SigNoz is causing problems (memory, port conflicts, etc.) and you need to remove it:

cd ~/signoz/deploy/docker

# Stop and remove all SigNoz containers + volumes (data will be deleted)
docker compose -p signoz down -v

# Optionally remove the repository
rm -rf ~/signoz

⚠️ -v removes all data volumes. Your ClickHouse data (all traces, logs, metrics) will be permanently deleted. If you want to keep data, remove the -v flag.


📋 Full Diagnostic Script

Save this as signoz-check.sh and run it anytime:

#!/bin/bash
echo "=== SigNoz Container Status ==="
docker compose -p signoz ps 2>/dev/null || echo "SigNoz not running via compose"

echo ""
echo "=== Port Listeners ==="
sudo ss -ltnp | egrep ':(3301|4317|4318)\b' || echo "No SigNoz ports listening"

echo ""
echo "=== Local Health Check ==="
curl -s http://localhost:3301/api/v1/health || echo "SigNoz not responding"

echo ""
echo "=== OTEL Collector Status ==="
sudo systemctl status otelcol-contrib --no-pager | head -10

echo ""
echo "=== Memory ==="
free -h

echo ""
echo "=== PM2 Processes ==="
pm2 list 2>/dev/null || echo "PM2 not available"
chmod +x signoz-check.sh
./signoz-check.sh


Read in Sequence