Sujit Patel

Lead Site Reliability Engineer

📍 Bengaluru, India

Helping teams run dependable infrastructure without unnecessary complexity.

About

I’m a Lead Site Reliability Engineer based in Bengaluru, focused on building reliable, scalable, and cost-efficient infrastructure on Kubernetes and AWS.

Over the last 9+ years, I’ve worked across startups and product companies, taking responsibility for reliability, observability, and platform engineering. I like working on systems that need to stay stable under load and enjoy finding practical ways to make them easier to operate.

My core areas of work include EKS, AWS infrastructure, Terraform, and automation with Bash and Python. I’m especially interested in improving how systems scale, how we observe them, and how we reduce day-to-day operational effort through better tooling and automation.

Skills

Platform & Cloud

Kubernetes (EKS)AWSLinuxContainerization (Docker)

Infrastructure as Code & CI/CD

TerraformHelmArgoCDJenkinsGitHub ActionsGitLab CI

Observability

PrometheusGrafanaAlertmanagerELK / EFK stacks

Automation & Scripting

BashPythonInfrastructure automation

Soft Skills

Incident management & postmortemsCross-team collaborationTechnical mentoringDocumentation & knowledge sharing

Experience

Lead Site Reliability Engineer

Freshworks

Bengaluru, IndiaSep 2023Present

  • Leading reliability, scalability, and platform initiatives across large-scale Kubernetes workloads.
  • Designed and executed EKS upgrade strategies, Karpenter migration, and cluster hardening.
  • Improved observability and alerting systems to reduce MTTR and improve signal quality.
  • Driving automation across infra workflows using Terraform, Python, and GitOps practices.
  • Collaborating with product, security, and engineering teams to ensure platform stability.

Senior Production Engineer

nurture.farm

Bengaluru, IndiaApr 2022Sep 2023

  • Reduced cloud infrastructure costs by 30% through compute optimization, database tuning, and improved provisioning.
  • Streamlined alerting/monitoring systems, reducing noise and improving issue detection.
  • Automated infra management using Terraform for more consistent and scalable deployments.
  • Improved RDS performance & reduced instance footprint through right-sizing.
  • Mentored junior engineers and participated in on-call + incident response rotations.

Site Reliability Engineer

Oye Rickshaw

IndiaSep 2020Apr 2022

  • Led the implementation of a container-based microservices architecture using Docker and Kubernetes.
  • Designed and implemented a Kubernetes cluster, using tools such as Kong, Nginx-Ingress, Fluentd, and Prometheus.
  • Worked closely with development teams to implement automated CI/CD production/stage/dev pipelines using Jenkins, GitLab CI and ArgoCD, resulting in faster delivery of new features.
  • Implemented logging, monitoring, and alerting using open-source tools such as Node Exporter, Promtail, Loki, Prometheus, and Grafana, which enables the team to quickly identify and resolve issues/
  • Implemented automation using Jenkins, bash and python scripts resulting in 50% reduction in time spent on manual tasks.
  • Implemented a data warehouse using DMS and Redshift for the analytics team, to enable them to easily access and analyze large sets of data.
  • Worked with IoT Applications such as Mosquitto MQTT.
  • Led the migration of systems and infrastructure to Microsoft Azure, ensuring a smooth transition and minimal disruption to operations.

Senior DevOps Engineer

Unify (Airtel Africa)

RemoteSep 2019Aug 2020

  • Experience with CI/CD practices, pipelines, and workflows
  • Proficient in using automation software such as Jenkins, and Ansible
  • Experience in managing and supporting enterprise Logging, Alerting, and Monitoring technologies
  • Handled incident management and troubleshot complex application problems
  • Have a demonstrable ability to work effectively in a team-oriented environment, managing numerous priorities and projects simultaneously.

DevOps Engineer

NeoStencil Inc.

IndiaMar 2019Aug 2020

  • Maintain infrastructure using Amazon Web Services (EC2, RDS, Route53, IAM, SNS) and Google Cloud Platform (Compute Engine, Storage, VPC Network), and Microsoft Azure (Virtual Machines, Storage Accounts).
  • Utilize Nginx for load balancing, reverse proxy, and web server functions.
  • Monitor and troubleshoot production systems to ensure optimal performance and availability.
  • Adept in Linux Administration and Scripting (Bash, Python).

Tech Support

NeoStencil Inc.

IndiaJan 2016Mar 2019

  • Implemented and installed new system configurations at client sites.
  • Performed regular maintenance on network infrastructure.
  • Developed and uploaded various lecture units to course pages on website.
  • Monitored classroom videos and resolved any technical issues that arose.
  • Ensured complete data management, maintenance, and backups.
  • Managed and supported on-field operations team.
  • Installed and configured computer hardware, operating systems, and applications.
  • Assisted management with scheduling, service protocols improvements, and quality assurance.
  • Provided support, including procedural documentation and relevant reports.
  • Troubleshoot system and network problems, diagnosing and solving hardware or software faults.

Get in Touch

Whether it's SRE, platform engineering, cost optimization, or just a complex production problem feel free to reach out.