📘 Introduction — The Grafana Observability Stack
This documentation covers setting up four open-source tools that work together to give you complete visibility into your applications. Think of it as building your own self-hosted version of Datadog or New Relic — for free.
🧩 The Four Tools Explained
1. Prometheus — The Metrics Collector
Prometheus works by pulling (scraping) numbers from your apps and servers on a schedule (e.g., every 15 seconds).
Examples of what it collects:
http_requests_total{method="GET", status="200"} = 4823
process_cpu_seconds_total = 12.4
node_memory_MemAvailable_bytes = 3221225472
It stores these as time-series data — the same number measured repeatedly over time — so you can ask questions like:
- "What was my CPU usage at 3 AM yesterday?"
- "How many errors per minute did I have during the deploy?"
Prometheus has its own query language called PromQL which Grafana uses to build charts.
2. Grafana — The Dashboard & Visualization Layer
Grafana does not store any data itself. It is purely a visualization and dashboarding tool.
It connects to data sources:
- Prometheus → for metrics charts
- Loki → for log search
- Tempo → for trace exploration
You build panels (graphs, tables, heatmaps) that query these sources and display them in a unified dashboard.
One Grafana, four data sources, one view.
3. Loki — The Log Aggregator
Loki is sometimes called "Prometheus for logs." It was built by the Grafana team to integrate naturally with Grafana.
Unlike Elasticsearch (the ELK stack), Loki does not index the full content of every log line. Instead it indexes only the labels (like app=healthtune, level=error). This makes it much cheaper on CPU and memory.
You search logs using LogQL — a query language similar to PromQL:
{app="healthtune_api", level="error"} |= "database"
This finds all error logs from the HealthTune app that mention "database."
Promtail is the agent that ships logs from your server into Loki. It watches log files (PM2 logs, Docker logs, syslog) and pushes them.
4. Tempo — The Distributed Tracing Backend
Tempo stores traces — the full journey of a single user request across your services.
When a user logs in to your app:
User Request → API Gateway → Auth Service → DB → Response
0ms 5ms 12ms 67ms 3ms
Each step is a span, all spans together form a trace. Tempo stores every trace and lets you search and visualize them in Grafana.
Tempo accepts data in the OpenTelemetry (OTEL) format, the same standard used by SigNoz, Jaeger, and others.
🔗 How the Four Tools Connect
Your Node.js Apps (PM2)
│
├─── OTEL SDK (tracing.js)
│ ↓
│ OTEL Collector / Tempo (port 4317)
│ ↓
│ Tempo ─────────────────────────┐
│ │
├─── Prometheus metrics endpoint │
│ (e.g., /metrics on port 3001) │
│ ↑ scrapes every 15s │
│ Prometheus ─────────────────┐ │
│ │ │
└─── Log files (PM2 logs) │ │
Promtail watches them │ │
↓ │ │
Loki ──────────────────┐ │ │
│ │ │
Grafana Dashboard
(port 3000)
Shows all three
📊 What Each Tool Is Responsible For
| Question | Tool That Answers It |
|---|---|
| "Is my server CPU above 80%?" | Prometheus + Grafana |
| "How many requests per second right now?" | Prometheus + Grafana |
| "What did my app log at 3 AM?" | Loki + Grafana |
| "Which request caused this error?" | Tempo + Grafana |
| "Why did this API call take 800ms?" | Tempo (shows each step) |
| "When did memory start climbing?" | Prometheus + Grafana |
🆚 Grafana Stack vs SigNoz
| Grafana Stack | SigNoz | |
|---|---|---|
| Setup complexity | Higher (4 separate tools) | Lower (one Docker Compose) |
| Maturity | Very mature, large community | Newer, growing fast |
| Customization | Extremely flexible | Opinionated but simpler |
| Resource usage | Medium (scales well) | Medium (ClickHouse-based) |
| OpenTelemetry | Supported (Tempo) | Native |
| Best for | Teams wanting full control | Teams wanting quick start |
Both are excellent choices. This documentation covers the Grafana Stack because it is the most widely used observability stack in the industry, and learning it transfers to almost any company you join.
🔌 Port Overview
| Tool | Default Port | Can Be Changed? |
|---|---|---|
| Grafana | 3000 | Yes |
| Prometheus | 9090 | Yes |
| Loki | 3100 | Yes |
| Tempo (HTTP) | 3200 | Yes |
| Tempo / OTEL (gRPC) | 4317 | Yes |
| OTEL Collector (HTTP) | 4318 | Yes |
| Node Exporter | 9100 | Yes |
| Promtail | 9080 | Yes |
All of these run inside Docker containers. You only need to expose the ports you actually access — Grafana (3000) from your browser, and 4317/4318 from your apps.