Government Careers
  • Grafana Observability SME

  • Omni Inclusive
  • Poughkeepsie, New York 12601 United States View Map

Grafana Cloud Observability Platform Engineer

Top Skills: 1. Production expertise across the full Grafana stack: Mimir, Loki, Tempo, Alloy, Beyla, Grafana Application Observability, Unified Alerting. 2. Strong PromQL, LogQL, and TraceQL authoring skills; able to write recording rules and SLO queries from scratch. 3. OpenTelemetry practitioner OTLP, collectors, SDK/agent instrumentation for at least three of Java,.NET, Go, Python, Node.js. 4. eBPF-based auto-instrumentation experience with Beyla (or equivalent Pixie, Cilium Tetragon) in a production context. 5. Experience integrating Grafana alerts into ServiceNow Event Management (native inbound integration, not webhook-only patterns); familiarity with ServiceNow ITOM, AIOps event correlation, and CMDB CI attachment. 6. Multi-environment hosting fluency on-prem, AWS, Azure and Linux/Windows host agent deployment at scale. 7. Dashboard-as-code and GitOps patterns (Grafana provisioning, Terraform provider, or Grizzly). 8. Excellent written communication solution architecture documents, runbooks, and stakeholder-facing status reporting.

Role Summary: Own the end-to-end technical design, build, and operationalization of the Grafana Cloud observability platform for a 50-application estate spanning Java,.NET, Go, Python, and Node.js workloads hosted across on-premises data centres, AWS, and Azure. The SME serves as the senior technical authority across all eight in-scope Grafana Cloud modules and is accountable for instrumentation strategy, alerting design, dashboarding standards, and integration into ServiceNow ITOM via native Event Management.

Key Responsibilities:

  • Platform architecture and configuration across all eight in-scope Grafana Cloud modules.
  • Tenancy and access design organizations, folders, teams, role-based access control, dashboard variables, template links, and annotations.
  • Application instrumentation strategy by technology stack.
  • Log pipeline engineering via Alloy.
  • Alerting design.
  • Single Pane of Glass.
  • Business Dashboards and Reporting.
  • ServiceNow ITOM integration.
  • Quality assurance authority across all technical deliverables.
  • Phased delivery execution.
  • Knowledge transfer.

Required Skills & Experience:

  • 7+ years in observability/monitoring engineering with deep, recent hands-on Grafana Cloud experience (not just OSS Grafana).
  • Production expertise across the full Grafana stack.
  • Strong PromQL, LogQL, and TraceQL authoring skills.
  • OpenTelemetry practitioner.
  • eBPF-based auto-instrumentation experience.
  • Experience integrating Grafana alerts into ServiceNow Event Management.
  • Multi-environment hosting fluency.
  • Dashboard-as-code and GitOps patterns.
  • Excellent written communication.

Nice to Have:

  • Grafana Certified Professional or equivalent vendor certification.
  • Prior experience in a regulated utility, energy, or critical-infrastructure environment.
  • Familiarity with SolarWinds and Uptrends.
  • Experience with ServiceNow CSDM and Service Mapping governance.
  • Exposure to FinOps for observability.

Out of Scope for This Role:

  • Server health and network monitoring.
  • URL/synthetic endpoint monitoring.
  • ServiceNow ITSM workflow ownership.

Grafana Cloud Observability Platform Engineer

Top Skills: 1. Production expertise across the full Grafana stack: Mimir, Loki, Tempo, Alloy, Beyla, Grafana Application Observability, Unified Alerting. 2. Strong PromQL, LogQL, and TraceQL authoring skills; able to write recording rules and SLO queries from scratch. 3. OpenTelemetry practitioner OTLP, collectors, SDK/agent instrumentation for at least three of Java,.NET, Go, Python, Node.js. 4. eBPF-based auto-instrumentation experience with Beyla (or equivalent Pixie, Cilium Tetragon) in a production context. 5. Experience integrating Grafana alerts into ServiceNow Event Management (native inbound integration, not webhook-only patterns); familiarity with ServiceNow ITOM, AIOps event correlation, and CMDB CI attachment. 6. Multi-environment hosting fluency on-prem, AWS, Azure and Linux/Windows host agent deployment at scale. 7. Dashboard-as-code and GitOps patterns (Grafana provisioning, Terraform provider, or Grizzly). 8. Excellent written communication solution architecture documents, runbooks, and stakeholder-facing status reporting.

Role Summary: Own the end-to-end technical design, build, and operationalization of the Grafana Cloud observability platform for a 50-application estate spanning Java,.NET, Go, Python, and Node.js workloads hosted across on-premises data centres, AWS, and Azure. The SME serves as the senior technical authority across all eight in-scope Grafana Cloud modules and is accountable for instrumentation strategy, alerting design, dashboarding standards, and integration into ServiceNow ITOM via native Event Management.

Key Responsibilities:

  • Platform architecture and configuration across all eight in-scope Grafana Cloud modules.
  • Tenancy and access design organizations, folders, teams, role-based access control, dashboard variables, template links, and annotations.
  • Application instrumentation strategy by technology stack.
  • Log pipeline engineering via Alloy.
  • Alerting design.
  • Single Pane of Glass.
  • Business Dashboards and Reporting.
  • ServiceNow ITOM integration.
  • Quality assurance authority across all technical deliverables.
  • Phased delivery execution.
  • Knowledge transfer.

Required Skills & Experience:

  • 7+ years in observability/monitoring engineering with deep, recent hands-on Grafana Cloud experience (not just OSS Grafana).
  • Production expertise across the full Grafana stack.
  • Strong PromQL, LogQL, and TraceQL authoring skills.
  • OpenTelemetry practitioner.
  • eBPF-based auto-instrumentation experience.
  • Experience integrating Grafana alerts into ServiceNow Event Management.
  • Multi-environment hosting fluency.
  • Dashboard-as-code and GitOps patterns.
  • Excellent written communication.

Nice to Have:

  • Grafana Certified Professional or equivalent vendor certification.
  • Prior experience in a regulated utility, energy, or critical-infrastructure environment.
  • Familiarity with SolarWinds and Uptrends.
  • Experience with ServiceNow CSDM and Service Mapping governance.
  • Exposure to FinOps for observability.

Out of Scope for This Role:

  • Server health and network monitoring.
  • URL/synthetic endpoint monitoring.
  • ServiceNow ITSM workflow ownership.
Government Careers

Government Careers

Government jobs offer stability, competitive benefits, and the chance to make a meaningful impact on your community and country.

Whether you’re starting your career or seeking new opportunities, these roles provide pathways for growth, security, and service.

Explore positions across a wide range of fields and take the first step toward a rewarding future in public service.

Show more

MORE JOBS