Enhancing Observability With Grafana Dashboards And Loki Log Export For MCP Gateway
In the realm of modern application development, observability is paramount. Having a clear view of your system's behavior is crucial for identifying issues, optimizing performance, and ensuring overall stability. This article delves into a feature request aimed at enhancing the observability of the MCP Gateway by integrating pre-built Grafana dashboards and Loki log export capabilities. This comprehensive solution promises to provide a turn-key observability bundle, empowering developers and operators with the insights they need to effectively manage their systems.
Epic Grafana Dashboards & Loki Log Export
The core goal of this feature is to deliver a turn-key observability bundle for MCP Gateway. This bundle encompasses pre-built Grafana dashboards and a Loki log export pipeline, designed to provide comprehensive insights into the gateway's performance and behavior. The key components of this epic include:
Goal
The primary goal is to ship a comprehensive turn-key observability bundle for MCP Gateway, focusing on:
- Pre-built Grafana dashboards: These dashboards will be provided in JSON format, covering critical metrics such as request latency, error rates, policy-deny counts, CPU/memory utilization, and per-tool call volume. These pre-built dashboards serve as a starting point, enabling users to quickly visualize and analyze key performance indicators.
- Loki log export pipeline: This pipeline will include a Promtail configuration and a sample
docker-compose.yml
file. It will facilitate the seamless export of structured gateway logs to Loki, a powerful log aggregation system. Furthermore, Grafana log panels will be pre-wired, allowing users to easily query and analyze logs within the Grafana interface. This Loki integration ensures that logs are readily accessible and searchable, aiding in troubleshooting and root cause analysis. - Helm-chart options: To simplify deployment and configuration, Helm chart options will be introduced. These options will enable the stack out-of-the-box with a simple command (
helm install gateway --set observability.enabled=true
). This Helm integration streamlines the deployment process, making it easier for users to adopt the observability bundle.
The milestone for this feature is set for Release 0.5.0, emphasizing Enterprise Operability & Observability. The repository impact will primarily be on the charts/mcpgateway/
directory and a new observability/
folder for JSON files and sample compose configurations. This targeted approach ensures a focused and efficient development process.
Type of Feature
This feature falls under two main categories:
- Observability / dashboards: This aspect focuses on providing tools and visualizations for monitoring system behavior.
- Developer & operator tooling: This aspect aims to enhance the overall experience for developers and operators by providing tools that simplify management and troubleshooting.
By addressing both observability and tooling, this feature ensures that users have the necessary resources to effectively manage and monitor their MCP Gateway deployments.
Deliverables: The Building Blocks of Observability
To achieve the goals outlined above, a set of specific deliverables has been defined. These artifacts will form the foundation of the observability bundle, providing users with the tools and resources they need to gain deep insights into their systems. Here's a breakdown of the key deliverables:
Core Grafana Dashboards
These dashboards are designed to provide a high-level overview of the gateway's performance. They focus on key metrics that are essential for understanding the overall health and behavior of the system. The deliverables include:
dashboards/mcpgateway_core.json
: This Grafana JSON file will contain dashboards focused on latency (p50/p95), requests per second (req/sec), and status code mix. These metrics provide a comprehensive view of the gateway's performance and responsiveness. Analyzing latency helps identify potential bottlenecks, while tracking requests per second provides insights into traffic patterns. Monitoring status codes allows for the detection of errors and other issues.
Per-Tool Grafana Dashboards
These dashboards provide a more granular view of performance, breaking down metrics by individual tools. This level of detail is crucial for identifying issues that may be specific to certain components of the system. The deliverable includes:
dashboards/mcpgateway_per_tool.json
: This Grafana JSON file will contain dashboards that display calls, error percentage, and average duration grouped by thetool.name
label. This allows users to easily identify tools that may be experiencing performance issues or errors. By focusing on individual tools, users can pinpoint the root cause of problems more quickly.
Kubernetes Grafana Dashboards
For deployments within Kubernetes, these dashboards provide insights into resource utilization and pod health. This is essential for ensuring that the gateway is running efficiently and that resources are being allocated appropriately. The deliverable includes:
dashboards/mcpgateway_k8s.json
: This Grafana JSON file will contain dashboards that display CPU usage, memory usage, and restarts (via kube-state-metrics). Monitoring these metrics helps ensure that the gateway has sufficient resources and that pods are stable. CPU and memory usage provide insights into resource consumption, while restarts can indicate underlying issues that need to be addressed.
Loki Log Export Pipeline
This pipeline facilitates the export of gateway logs to Loki, enabling efficient log aggregation and analysis. The deliverables include:
loki/promtail.yaml
: This Promtail configuration file will define the pipeline for collecting and forwarding logs to Loki. It will include a multiline JSON parser for gateway logs, ensuring that logs are properly structured and searchable. The Promtail configuration is crucial for ensuring that logs are correctly processed and sent to Loki.loki/docker-compose.yml
: This stand-alone Docker Compose file will provide a quick-start environment with Loki, Grafana, and Promtail. This allows users to easily set up a local observability stack for testing and development. The Docker Compose file simplifies the deployment process, allowing users to quickly spin up a full observability stack.
Helm Chart Integration
To simplify deployment and configuration within Kubernetes, Helm chart options will be added. The deliverable includes:
charts/mcpgateway/values.yaml
: A new sectionobservability.*
will be added to the Helm chart values file. This section will include options such asdashboards.enabled
,loki.enabled
, andretentionDays
. These options allow users to easily configure the observability stack during deployment. The Helm chart integration streamlines the deployment process and makes it easier for users to customize the observability setup.
Documentation
Comprehensive documentation is essential for users to understand how to use the new features. The deliverable includes:
- Docs: A new document
docs/observability/grafana.md
will be created. This document will provide import steps, screenshots, and alert examples. The documentation will guide users through the process of setting up and using the Grafana dashboards and Loki integration. Clear documentation is crucial for ensuring that users can effectively leverage the new observability features.
By delivering these artifacts, the feature request aims to provide a comprehensive and user-friendly observability solution for MCP Gateway. Each component plays a crucial role in providing insights into the system's behavior and performance.
User Stories & Acceptance Criteria: Real-World Scenarios
To ensure that the new features meet the needs of users, a set of user stories and acceptance criteria have been defined. These scenarios outline how users will interact with the observability bundle and what outcomes they expect. By focusing on real-world use cases, the development team can ensure that the features are practical and effective.
Story 1 — One-Command Local Stack
This story focuses on the ease of setting up a local observability stack using Docker Compose. It ensures that developers can quickly spin up a full environment for testing and development.
Scenario: Spin up full stack via docker-compose
Given I clone observability folder
When I run "docker compose up -d"
Then Grafana UI on localhost:3000 shows dashboard "MCP Gateway • Core"
And panels populate after 10 seconds of traffic
This scenario outlines the steps a developer would take to set up a local observability stack. The acceptance criteria ensure that the process is straightforward and that the dashboards are functional.
Story 2 — Helm Chart Auto-Import
This story focuses on the automated import of dashboards when deploying the gateway using Helm. It ensures that dashboards are automatically loaded into Grafana, simplifying the setup process for Kubernetes deployments.
Scenario: Dashboards auto-load in cluster
When I helm install gateway charts/mcpgateway --set observability.enabled=true
Then a ConfigMap "gateway-grafana-dashboards" contains JSON dashboards
And Grafana side-car picks them up within 1 minute
This scenario demonstrates how dashboards should be automatically loaded when deploying the gateway using Helm. The acceptance criteria ensure that the dashboards are correctly configured and available in Grafana.
Story 3 — Loki Log Query Panel
This story focuses on the ability to query logs stored in Loki using Grafana. It ensures that users can easily search and analyze logs to identify issues and understand system behavior.
Scenario: Query deny logs
Given gateway emits log {"level":"warn","msg":"policy_deny","tool":"db.backup"}
When I open Grafana Explore and run {app="gateway"} |= "policy_deny"
Then results include the deny entry within 10 s
This scenario highlights the log querying capabilities provided by the Loki integration. The acceptance criteria ensure that logs can be queried and that relevant entries are returned in a timely manner.
Story 4 — Alert Rule Example
This story focuses on the ability to set up alerts based on metrics and logs. It ensures that users can be notified of potential issues before they impact the system.
Scenario: High 5xx alert fires
Given avg rate of status=5xx > 1 rps for 5 minutes
Then Alertmanager (optional) sends "High Error Rate" alert
This scenario demonstrates how alerts can be configured to notify users of potential issues. The acceptance criteria ensure that alerts are triggered when specific conditions are met.
By defining these user stories and acceptance criteria, the development team can ensure that the new features meet the needs of users and provide a valuable observability solution.
Architecture Sketch (Mermaid): Visualizing the System
To provide a clear understanding of the system architecture, a Mermaid diagram has been created. This diagram visually represents the components of the observability stack and how they interact with each other. The diagram is a valuable tool for understanding the overall system design.
flowchart TD
subgraph Cluster
Gateway((MCP Gateway))
Promtail[[Promtail sidecar]]
Loki[(Loki)]
Grafana[(Grafana)]
DashboardsCM[[Dashboards ConfigMap]]
Gateway --> Promtail
Promtail --> Loki
Grafana --> Loki
DashboardsCM --> Grafana
end
The diagram illustrates the following key components:
- MCP Gateway: The central component that is being monitored.
- Promtail sidecar: Collects logs from the gateway and forwards them to Loki.
- Loki: A log aggregation system that stores and indexes logs.
- Grafana: A visualization tool that displays metrics and logs.
- Dashboards ConfigMap: Stores Grafana dashboard configurations.
The diagram clearly shows how logs flow from the gateway to Promtail, then to Loki, and finally to Grafana. It also illustrates how Grafana retrieves dashboard configurations from the ConfigMap. This visual representation is invaluable for understanding the system's architecture and data flow.
Component Matrix: A Detailed Breakdown
To provide a comprehensive overview of the components involved in this feature, a component matrix has been created. This matrix details the purpose of each component and its location within the repository. The matrix is a valuable reference for developers and operators alike.
Component / Path | Purpose |
---|---|
observability/dashboards/*.json |
Pre-built Grafana dashboards |
observability/loki/promtail.yaml |
Promtail pipeline (k8s & docker) |
observability/loki/docker-compose.yml |
Quick-start stack |
charts/mcpgateway/templates/grafana-dashboards.yaml |
ConfigMap embedding dashboards |
charts/mcpgateway/values.yaml |
observability.enabled , observability.loki.enabled , observability.retentionDays |
docs/observability/grafana.md |
Setup guide, screenshots, sample alerts |
This matrix provides a clear understanding of the role each component plays in the observability solution. It also helps to identify the specific files and configurations that need to be modified or created. The component matrix is a valuable tool for navigating the codebase and understanding the overall system design.
Global Acceptance Checklist: Ensuring Quality
To ensure that the new features meet the highest standards of quality, a global acceptance checklist has been created. This checklist outlines the key criteria that must be met before the feature can be considered complete. By adhering to this checklist, the development team can ensure that the observability bundle is robust, reliable, and user-friendly.
- [ ] Local
docker compose
brings up Gateway + Loki + Grafana; dashboards auto-populate. - [ ] Helm install with
observability.enabled=true
loads dashboards via side-car. - [ ] Promtail parses multiline JSON logs; labels
level
,tool
,status
. - [ ] Example alert rule (
High 5xx
) included and documented. - [ ] Dashboards pass
grafana-dashboard-validator
CI step. - [ ] CI workflow publishes dashboard JSONs as release asset.
This checklist covers a range of criteria, including:
- Ease of setup: Ensuring that the observability stack can be easily set up using Docker Compose and Helm.
- Automated dashboard loading: Verifying that dashboards are automatically loaded into Grafana.
- Log parsing: Confirming that Promtail correctly parses multiline JSON logs and extracts relevant labels.
- Alerting: Ensuring that example alert rules are included and documented.
- Dashboard validation: Verifying that dashboards pass validation checks.
- CI integration: Ensuring that dashboards are published as release assets.
By systematically working through this checklist, the development team can ensure that the observability bundle is of the highest quality.
Roll-Out Plan: A Phased Approach
To ensure a smooth and successful rollout of the new features, a detailed roll-out plan has been created. This plan outlines the steps that will be taken to develop and deploy the observability bundle. By following a phased approach, the development team can minimize risks and ensure that the features are thoroughly tested and validated.
The roll-out plan consists of the following steps:
- Create
observability/
directory with dashboards and compose stack. - Build Promtail pipeline (JSON parser + label mapping).
- Author three dashboards in Grafana 10, export JSON.
- Add Helm chart fields & ConfigMap template; wire Grafana side-car annotation.
- Write docs with screenshots & Loki query snippets.
- Add GitHub Action
dashboard_test.yml
(run json-lint + validate UID uniqueness). - QA on Minikube; update README quick-start.
This plan covers all aspects of the development and deployment process, from creating the initial directory structure to testing the features in a Minikube environment. By following this plan, the development team can ensure that the rollout is well-organized and efficient.
Conclusion: Empowering Observability
In conclusion, the feature request for pre-built Grafana dashboards and Loki log export represents a significant step forward in enhancing the observability of the MCP Gateway. By providing a turn-key observability bundle, this feature will empower developers and operators with the insights they need to effectively manage their systems. The comprehensive set of deliverables, user stories, and acceptance criteria ensures that the new features will meet the needs of users and provide a valuable solution for monitoring and troubleshooting. The detailed roll-out plan ensures that the features will be developed and deployed in a smooth and efficient manner. This observability solution is a critical addition to the MCP Gateway, enabling users to gain deep insights into their systems and ensure optimal performance.