مین مواد پر جائیں

📘 Introduction — The Grafana Observability Stack

This documentation covers setting up four open-source tools that work together to give you complete visibility into your applications. Think of it as building your own self-hosted version of Datadog or New Relic — for free.


🧩 The Four Tools Explained

1. Prometheus — The Metrics Collector

Prometheus works by pulling (scraping) numbers from your apps and servers on a schedule (e.g., every 15 seconds).

Examples of what it collects:

http_requests_total{method="GET", status="200"} = 4823
process_cpu_seconds_total = 12.4
node_memory_MemAvailable_bytes = 3221225472

It stores these as time-series data — the same number measured repeatedly over time — so you can ask questions like:

  • "What was my CPU usage at 3 AM yesterday?"
  • "How many errors per minute did I have during the deploy?"

Prometheus has its own query language called PromQL which Grafana uses to build charts.


2. Grafana — The Dashboard & Visualization Layer

Grafana does not store any data itself. It is purely a visualization and dashboarding tool.

It connects to data sources:

  • Prometheus → for metrics charts
  • Loki → for log search
  • Tempo → for trace exploration

You build panels (graphs, tables, heatmaps) that query these sources and display them in a unified dashboard.

One Grafana, four data sources, one view.


3. Loki — The Log Aggregator

Loki is sometimes called "Prometheus for logs." It was built by the Grafana team to integrate naturally with Grafana.

Unlike Elasticsearch (the ELK stack), Loki does not index the full content of every log line. Instead it indexes only the labels (like app=healthtune, level=error). This makes it much cheaper on CPU and memory.

You search logs using LogQL — a query language similar to PromQL:

{app="healthtune_api", level="error"} |= "database"

This finds all error logs from the HealthTune app that mention "database."

Promtail is the agent that ships logs from your server into Loki. It watches log files (PM2 logs, Docker logs, syslog) and pushes them.


4. Tempo — The Distributed Tracing Backend

Tempo stores traces — the full journey of a single user request across your services.

When a user logs in to your app:

User Request → API Gateway → Auth Service → DB → Response
0ms 5ms 12ms 67ms 3ms

Each step is a span, all spans together form a trace. Tempo stores every trace and lets you search and visualize them in Grafana.

Tempo accepts data in the OpenTelemetry (OTEL) format, the same standard used by SigNoz, Jaeger, and others.


🔗 How the Four Tools Connect

Your Node.js Apps (PM2)

├─── OTEL SDK (tracing.js)
│ ↓
│ OTEL Collector / Tempo (port 4317)
│ ↓
│ Tempo ─────────────────────────┐
│ │
├─── Prometheus metrics endpoint │
│ (e.g., /metrics on port 3001) │
│ ↑ scrapes every 15s │
│ Prometheus ─────────────────┐ │
│ │ │
└─── Log files (PM2 logs) │ │
Promtail watches them │ │
↓ │ │
Loki ──────────────────┐ │ │
│ │ │
Grafana Dashboard
(port 3000)
Shows all three

📊 What Each Tool Is Responsible For

QuestionTool That Answers It
"Is my server CPU above 80%?"Prometheus + Grafana
"How many requests per second right now?"Prometheus + Grafana
"What did my app log at 3 AM?"Loki + Grafana
"Which request caused this error?"Tempo + Grafana
"Why did this API call take 800ms?"Tempo (shows each step)
"When did memory start climbing?"Prometheus + Grafana

🆚 Grafana Stack vs SigNoz

Grafana StackSigNoz
Setup complexityHigher (4 separate tools)Lower (one Docker Compose)
MaturityVery mature, large communityNewer, growing fast
CustomizationExtremely flexibleOpinionated but simpler
Resource usageMedium (scales well)Medium (ClickHouse-based)
OpenTelemetrySupported (Tempo)Native
Best forTeams wanting full controlTeams wanting quick start

Both are excellent choices. This documentation covers the Grafana Stack because it is the most widely used observability stack in the industry, and learning it transfers to almost any company you join.


🔌 Port Overview

ToolDefault PortCan Be Changed?
Grafana3000Yes
Prometheus9090Yes
Loki3100Yes
Tempo (HTTP)3200Yes
Tempo / OTEL (gRPC)4317Yes
OTEL Collector (HTTP)4318Yes
Node Exporter9100Yes
Promtail9080Yes

All of these run inside Docker containers. You only need to expose the ports you actually access — Grafana (3000) from your browser, and 4317/4318 from your apps.