🚀 Future Improvements

SigNoz Icon

This document outlines what to add next — in order of priority — to make your SigNoz setup more robust, secure, and useful.

1. 🔔 Alerting Setup

SigNoz supports alerts that notify you when something breaks — before your users report it.

What to set up:

Alert when error rate > 5%
Alert when API response time > 2000ms
Alert when a service goes down

How:

Open SigNoz dashboard → Alerts
Click New Alert
Choose metric (e.g., signoz_calls_total with filter status_code=error)
Set threshold and notification channel

Notification channels you can add:

Slack webhook
PagerDuty
Email (SMTP)
OpsGenie

Future task: Set up at least a Slack channel for production error alerts.

2. 🐳 Kubernetes Integration

If your apps move to Kubernetes (K8s), the setup changes slightly:

OTEL Collector runs as a DaemonSet (one per node)
Apps auto-discover the collector via K8s DNS
Logs are collected from pod stdout instead of PM2 files

What to do:

# Install cert-manager (required)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml

# Install SigNoz on K8s via Helm
helm repo add signoz https://charts.signoz.io
helm install my-release signoz/signoz --namespace platform --create-namespace

Future task: Migrate SigNoz from Docker Compose to Helm when moving to K8s.

3. 🖥️ Advanced Dashboards

The default SigNoz dashboards cover basics. Build custom dashboards for your services:

Ideas for HealthTune API dashboard:

Requests per second by endpoint
Top 10 slowest API endpoints
DB query time distribution
Error rate by endpoint

Ideas for TrackX API dashboard:

Active users (request count per hour)
Authentication success / failure rate
Slow queries heatmap

How to create a custom dashboard:

SigNoz → Dashboards → New Dashboard
Add panels using PromQL or SigNoz query builder
Save and pin to your team's home view

4. 📝 Log Pipelines & Parsing

Currently logs are collected raw. Add log parsing to extract structured fields:

For example, if your app logs JSON:

{"level":"error","message":"DB timeout","duration":5000,"service":"healthtune"}

Add a log processor in the collector config:

processors:
  logstransform:
    operators:
      - type: json_parser
        field: body

This lets you filter and search logs by level, service, duration in SigNoz.

5. 🔒 TLS / HTTPS for the Dashboard

Currently SigNoz is accessed over plain HTTP (http://server:3301). For production, add HTTPS.

Options:

Option A — Nginx reverse proxy with Let's Encrypt:

server {
    listen 443 ssl;
    server_name signoz.yourcompany.com;

    ssl_certificate /etc/letsencrypt/live/signoz.yourcompany.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/signoz.yourcompany.com/privkey.pem;

    location / {
        proxy_pass http://localhost:3301;
        proxy_set_header Host $host;
    }
}

Option B — Cloudflare proxy (simplest for GCP): Point your domain to the VM, enable Cloudflare orange-cloud, Cloudflare handles SSL automatically.

6. 💾 Data Retention Policy

ClickHouse stores all telemetry data forever by default. Set retention to avoid running out of disk:

Edit /home/signoz/deploy/docker/clickhouse-config.xml or via SQL:

-- Keep traces for 30 days, logs for 7 days
ALTER TABLE signoz_traces.distributed_signoz_index_v3 
  MODIFY TTL toDateTime(timestamp) + INTERVAL 30 DAY;

Or configure in the SigNoz UI under Settings → Retention.

Recommended retention:

Traces: 15–30 days
Logs: 7–14 days
Metrics: 30–90 days

7. 📊 More Services to Instrument

Add OTEL tracing to additional services on the VM:

Service	How
Redis	Use `@opentelemetry/instrumentation-ioredis`
Elasticsearch	Use `@opentelemetry/instrumentation-elasticsearch`
n8n workflows	Add OTEL env variables to n8n Docker config
Jenkins	Jenkins OTEL plugin available

8. 📦 Backup Strategy for ClickHouse

SigNoz data is stored in ClickHouse volumes. Set up backups:

# Backup ClickHouse data directory
docker exec clickhouse clickhouse-client --query "BACKUP DATABASE signoz_traces TO Disk('backups', 'signoz_backup.zip')"

# Or simply snapshot the Docker volume
docker run --rm \
  -v signoz_clickhouse-data:/data \
  -v $(pwd):/backup \
  alpine tar czf /backup/clickhouse-backup.tar.gz /data

Future task: Set up a cron job for weekly backups to GCP Cloud Storage.

9. 👥 Multi-User Access Control

SigNoz supports team accounts. Add your team members:

SigNoz → Settings → Organization
Invite Member → enter email
Assign role: Admin, Editor, or Viewer

Recommended structure:

Dev team: Editor (can see everything, create dashboards)
Manager: Viewer (read-only)
DevOps lead: Admin

🗓️ Suggested Roadmap

Priority	Task	Effort
🔴 High	Set up Slack alerting for errors	30 mins
🔴 High	Set data retention to 30 days	15 mins
🟡 Medium	Add HTTPS via Cloudflare or Nginx	1–2 hours
🟡 Medium	Build custom dashboards per service	2–3 hours
🟢 Low	Structured log parsing	1 hour
🟢 Low	Kubernetes migration	When ready
🟢 Low	ClickHouse backup automation	1–2 hours

Official Documentation Links

Read in Sequence

Previous: 7-troubleshooting.md
Back to start: README.md
Restart sequence: 1-introduction.md

1. 🔔 Alerting Setup​

2. 🐳 Kubernetes Integration​

3. 🖥️ Advanced Dashboards​

4. 📝 Log Pipelines & Parsing​

5. 🔒 TLS / HTTPS for the Dashboard​

6. 💾 Data Retention Policy​

7. 📊 More Services to Instrument​

8. 📦 Backup Strategy for ClickHouse​

9. 👥 Multi-User Access Control​

🗓️ Suggested Roadmap​

Official Documentation Links​

Read in Sequence​