Sunil Marella

Observability, AIOps & MLOps Platform Architect
I architect, deploy, and operate scalable observability and AI platforms across cloud and cloud-native ecosystems—covering telemetry pipelines, automation, AIOps, and emerging MLOps practices to deliver resilient, insight-driven systems.
Splunk - Core, ITSI, ES & Observability Cloud Grafana • Dynatrace • AppDynamics • OpenTelemetry Moogsoft • BigPanda AWS, Azure, Kubernetes Terraform • Ansible • CI/CD MLflow • Kubernetes • CI/CD • Model Monitoring • Drift Concepts
Enterprise Platforms (Observability & AIOps)
Designed & Operated
Cloud & Native
AWS, Azure, Kubernetes-Based
AIOps Enabled
Correlation & Noise Reduction
MLOps Focus
Production ML Lifecycle

About

I architect next-generation cloud platforms where observability, AI-driven operations, and MLOps are foundational — enabling autonomous reliability, scalable AI workloads, and data-driven decision-making.

What I do

I build end-to-end observability platforms (metrics, logs, traces), implement SLO-driven reliability, enable AIOps for event intelligence, and expand MLOps capability to operationalize ML workloads in production.

What I’m known for

Platform Architecture Secure, scalable and platform designs
Operational Excellence SLOs, alert strategy, incident insights, noise reduction
Automation Automation - Terraform/Ansible + CI/CD

Skills

Grouped capabilities across observability, platform engineering, AIOps, and MLOps.

📈 Observability

  • Splunk: Core (Enterprise/Cloud), ITSI, ES, Observability Cloud
  • AppDynamics, Dynatrace, Grafana, New Relic, Datadog
  • Telemetry pipelines: logs, metrics, traces
  • OpenTelemetry instrumentation & collector concepts
  • Python, Golang for automation & integrations

🏗 Platform Engineering

  • Cloud & cloud-native: AWS, Azure, Kubernetes
  • IaC: Terraform, Ansible
  • Git workflows, release practices
  • CI/CD for repeatable platform delivery
  • Security-aware, scalable architecture patterns

🤖 AIOps

  • Moogsoft: correlation, noise reduction, enrichment
  • BigPanda: aggregation, correlation, operational visibility
  • ServiceNow ITOM / CMDB alignment
  • Operational intelligence to reduce toil and accelerate MTTR

🧠 MLOps

  • ML lifecycle: train → deploy → monitor → retrain
  • Deployment patterns: batch vs real-time inference
  • Model monitoring concepts: performance & drift
  • Cloud-native operationalization mindset

My Knowledge Hubs

Dedicated domains where I document architecture patterns, implementation practices, and applied engineering experiments.

🔎 Observability Platform Hub

End-to-end telemetry architecture, monitoring patterns, OpenTelemetry pipelines, and service reliability strategies.

🤖 AIOps Platform Hub

Event correlation concepts, alert intelligence, automation workflows, and operational analytics.

🏗 Architecture Hub

Enterprise architecture thinking, platform design principles, and structured delivery models.

🧠 AI & MLOps Hub

AI learning experiments, ML operationalization insights, and emerging MLOps practices.