Sunil Marella
Observability, AIOps & MLOps Platform Architect
I architect, deploy, and operate scalable observability and AI platforms across cloud and cloud-native ecosystems—covering telemetry pipelines, automation, AIOps, and emerging MLOps practices to deliver resilient, insight-driven systems.
I architect, deploy, and operate scalable observability and AI platforms across cloud and cloud-native ecosystems—covering telemetry pipelines, automation, AIOps, and emerging MLOps practices to deliver resilient, insight-driven systems.
Splunk - Core, ITSI, ES & Observability Cloud
Grafana • Dynatrace • AppDynamics • OpenTelemetry
Moogsoft • BigPanda
AWS, Azure, Kubernetes
Terraform • Ansible • CI/CD
MLflow • Kubernetes • CI/CD • Model Monitoring • Drift Concepts
Enterprise Platforms (Observability & AIOps)
Designed & Operated
Cloud & Native
AWS, Azure, Kubernetes-Based
AIOps Enabled
Correlation & Noise Reduction
MLOps Focus
Production ML Lifecycle
About
I architect next-generation cloud platforms where observability, AI-driven operations, and MLOps are foundational — enabling autonomous reliability, scalable AI workloads, and data-driven decision-making.
What I do
I build end-to-end observability platforms (metrics, logs, traces), implement SLO-driven reliability, enable AIOps for event intelligence, and expand MLOps capability to operationalize ML workloads in production.
What I’m known for
Platform Architecture Secure, scalable and platform designs
Operational Excellence SLOs, alert strategy, incident insights, noise reduction
Automation Automation - Terraform/Ansible + CI/CD
Skills
Grouped capabilities across observability, platform engineering, AIOps, and MLOps.
📈 Observability
- Splunk: Core (Enterprise/Cloud), ITSI, ES, Observability Cloud
- AppDynamics, Dynatrace, Grafana, New Relic, Datadog
- Telemetry pipelines: logs, metrics, traces
- OpenTelemetry instrumentation & collector concepts
- Python, Golang for automation & integrations
🏗 Platform Engineering
- Cloud & cloud-native: AWS, Azure, Kubernetes
- IaC: Terraform, Ansible
- Git workflows, release practices
- CI/CD for repeatable platform delivery
- Security-aware, scalable architecture patterns
🤖 AIOps
- Moogsoft: correlation, noise reduction, enrichment
- BigPanda: aggregation, correlation, operational visibility
- ServiceNow ITOM / CMDB alignment
- Operational intelligence to reduce toil and accelerate MTTR
🧠 MLOps
- ML lifecycle: train → deploy → monitor → retrain
- Deployment patterns: batch vs real-time inference
- Model monitoring concepts: performance & drift
- Cloud-native operationalization mindset
My Knowledge Hubs
Dedicated domains where I document architecture patterns, implementation practices, and applied engineering experiments.
🔎 Observability Platform Hub
End-to-end telemetry architecture, monitoring patterns, OpenTelemetry pipelines, and service reliability strategies.
🤖 AIOps Platform Hub
Event correlation concepts, alert intelligence, automation workflows, and operational analytics.
🏗 Architecture Hub
Enterprise architecture thinking, platform design principles, and structured delivery models.
🧠 AI & MLOps Hub
AI learning experiments, ML operationalization insights, and emerging MLOps practices.