Grafana Cloud Observability Platform Engineer
Top Skills: 1. Production expertise across the full Grafana stack: Mimir, Loki, Tempo, Alloy, Beyla, Grafana Application Observability, Unified Alerting. 2. Strong PromQL, LogQL, and TraceQL authoring skills; able to write recording rules and SLO queries from scratch. 3. OpenTelemetry practitioner OTLP, collectors, SDK/agent instrumentation for at least three of Java,.NET, Go, Python, Node.js. 4. eBPF-based auto-instrumentation experience with Beyla (or equivalent Pixie, Cilium Tetragon) in a production context. 5. Experience integrating Grafana alerts into ServiceNow Event Management (native inbound integration, not webhook-only patterns); familiarity with ServiceNow ITOM, AIOps event correlation, and CMDB CI attachment. 6. Multi-environment hosting fluency on-prem, AWS, Azure and Linux/Windows host agent deployment at scale. 7. Dashboard-as-code and GitOps patterns (Grafana provisioning, Terraform provider, or Grizzly). 8. Excellent written communication solution architecture documents, runbooks, and stakeholder-facing status reporting.
Role Summary: Own the end-to-end technical design, build, and operationalization of the Grafana Cloud observability platform for a 50-application estate spanning Java,.NET, Go, Python, and Node.js workloads hosted across on-premises data centres, AWS, and Azure. The SME serves as the senior technical authority across all eight in-scope Grafana Cloud modules and is accountable for instrumentation strategy, alerting design, dashboarding standards, and integration into ServiceNow ITOM via native Event Management.
Key Responsibilities:
- Platform architecture and configuration across all eight in-scope Grafana Cloud modules.
- Tenancy and access design organizations, folders, teams, role-based access control, dashboard variables, template links, and annotations.
- Application instrumentation strategy by technology stack.
- Log pipeline engineering via Alloy.
- Alerting design.
- Single Pane of Glass.
- Business Dashboards and Reporting.
- ServiceNow ITOM integration.
- Quality assurance authority across all technical deliverables.
- Phased delivery execution.
- Knowledge transfer.
Required Skills & Experience:
- 7+ years in observability/monitoring engineering with deep, recent hands-on Grafana Cloud experience (not just OSS Grafana).
- Production expertise across the full Grafana stack.
- Strong PromQL, LogQL, and TraceQL authoring skills.
- OpenTelemetry practitioner.
- eBPF-based auto-instrumentation experience.
- Experience integrating Grafana alerts into ServiceNow Event Management.
- Multi-environment hosting fluency.
- Dashboard-as-code and GitOps patterns.
- Excellent written communication.
Nice to Have:
- Grafana Certified Professional or equivalent vendor certification.
- Prior experience in a regulated utility, energy, or critical-infrastructure environment.
- Familiarity with SolarWinds and Uptrends.
- Experience with ServiceNow CSDM and Service Mapping governance.
- Exposure to FinOps for observability.
Out of Scope for This Role:
- Server health and network monitoring.
- URL/synthetic endpoint monitoring.
- ServiceNow ITSM workflow ownership.
Grafana Cloud Observability Platform Engineer
Top Skills: 1. Production expertise across the full Grafana stack: Mimir, Loki, Tempo, Alloy, Beyla, Grafana Application Observability, Unified Alerting. 2. Strong PromQL, LogQL, and TraceQL authoring skills; able to write recording rules and SLO queries from scratch. 3. OpenTelemetry practitioner OTLP, collectors, SDK/agent instrumentation for at least three of Java,.NET, Go, Python, Node.js. 4. eBPF-based auto-instrumentation experience with Beyla (or equivalent Pixie, Cilium Tetragon) in a production context. 5. Experience integrating Grafana alerts into ServiceNow Event Management (native inbound integration, not webhook-only patterns); familiarity with ServiceNow ITOM, AIOps event correlation, and CMDB CI attachment. 6. Multi-environment hosting fluency on-prem, AWS, Azure and Linux/Windows host agent deployment at scale. 7. Dashboard-as-code and GitOps patterns (Grafana provisioning, Terraform provider, or Grizzly). 8. Excellent written communication solution architecture documents, runbooks, and stakeholder-facing status reporting.
Role Summary: Own the end-to-end technical design, build, and operationalization of the Grafana Cloud observability platform for a 50-application estate spanning Java,.NET, Go, Python, and Node.js workloads hosted across on-premises data centres, AWS, and Azure. The SME serves as the senior technical authority across all eight in-scope Grafana Cloud modules and is accountable for instrumentation strategy, alerting design, dashboarding standards, and integration into ServiceNow ITOM via native Event Management.
Key Responsibilities:
- Platform architecture and configuration across all eight in-scope Grafana Cloud modules.
- Tenancy and access design organizations, folders, teams, role-based access control, dashboard variables, template links, and annotations.
- Application instrumentation strategy by technology stack.
- Log pipeline engineering via Alloy.
- Alerting design.
- Single Pane of Glass.
- Business Dashboards and Reporting.
- ServiceNow ITOM integration.
- Quality assurance authority across all technical deliverables.
- Phased delivery execution.
- Knowledge transfer.
Required Skills & Experience:
- 7+ years in observability/monitoring engineering with deep, recent hands-on Grafana Cloud experience (not just OSS Grafana).
- Production expertise across the full Grafana stack.
- Strong PromQL, LogQL, and TraceQL authoring skills.
- OpenTelemetry practitioner.
- eBPF-based auto-instrumentation experience.
- Experience integrating Grafana alerts into ServiceNow Event Management.
- Multi-environment hosting fluency.
- Dashboard-as-code and GitOps patterns.
- Excellent written communication.
Nice to Have:
- Grafana Certified Professional or equivalent vendor certification.
- Prior experience in a regulated utility, energy, or critical-infrastructure environment.
- Familiarity with SolarWinds and Uptrends.
- Experience with ServiceNow CSDM and Service Mapping governance.
- Exposure to FinOps for observability.
Out of Scope for This Role:
- Server health and network monitoring.
- URL/synthetic endpoint monitoring.
- ServiceNow ITSM workflow ownership.
Government Careers
Government jobs offer stability, competitive benefits, and the chance to make a meaningful impact on your community and country.
Whether you’re starting your career or seeking new opportunities, these roles provide pathways for growth, security, and service.
Explore positions across a wide range of fields and take the first step toward a rewarding future in public service.
MORE JOBS
-
Certified Polygraph Examiner - Federal Security
- Washington, DC
- POTOMAC MANAGEMENT SOLUTIONS, LLC
- Jul 04, 2026
-
Intelligence Operations Support Officer
- Oxon Hill, Maryland
- Leidos
- Jul 04, 2026
-
Aircrew Rescue Swimmer & Navy Diver
- Waterloo, Iowa
- U.S. Navy
- Jul 04, 2026
-
Senior SIGINT & RF Systems SME (Top Secret)
- Charleston, South Carolina
- Imagine One
- Jul 04, 2026
-
Junior MBSE Systems Engineer - Navy Training Tech
- Virginia Beach, Virginia
- Mission Technologies, a division of HII
- Jul 04, 2026
-
C2 Operations SME - Weapons Modeling & Simulation
- Las Vegas, Nevada
- GC2IT
- Jul 04, 2026