About us / Senior DevOps with Python

Senior DevOps with Python

Serbia 🇷🇸Poland 🇵🇱Portugal 🇵🇹Spain 🇪🇸 Croatia 🇭🇷Kazakhstan 🇰🇿Georgia 🇬🇪EngineeringRemoteEnglish: C1 (Advanced)

Join Akvelon to level up your skills and work with top tech companies

Akvelon Inc. (USA) works in the field of software engineering based on a variety of technologies. Our company is a member of the vendor program, which gives employees the opportunity to work with clients in the United States and other countries (Microsoft, Facebook, Airbnb, Dropbox, Pinterest, and many others). This is a great opportunity to learn from and work with leading engineers of world-renowned companies.


About project


The project expands an advanced Kubernetes-based inference benchmarking framework to provide end-to-end performance measurements, cross-cloud comparisons, and updated model test scenarios. It includes automated benchmarking for startup latency and scale-out behavior across multiple model families and hardware configurations, integrates dashboards and reporting, and supports next-generation autoscaling, scheduling, storage, and multi-host workloads to ensure reliable measurement of real-world inference performance.


Responsibilities

  • Provide consistent, platform-wide performance signals across all inference workloads and teams, ensuring clear visibility into system efficiency and bottlenecks


  • Deliver standardized cross-cloud benchmarking across major Kubernetes providers to ensure reliable performance comparisons


  • Support leadership reporting through monthly cloud-wide performance results that enable accurate insights and data-driven decision making


  • Enable teams to validate new features including autoscaling, scheduling, storage, node provisioning, vLLM optimizations, and accelerator support using unified benchmarking frameworks aligned with organizational OKRs


  • Extend the benchmarking framework to cover startup latency and scale-out behavior for multiple model families and hardware configurations


  • Integrate automated benchmarking APIs, dashboards, and reporting pipelines to streamline performance evaluation


  • Collaborate with engineering teams to maintain reusable inference components while ensuring accurate scheduling, infrastructure provisioning, and reporting


Requirements:

  • Advanced Kubernetes expertise including deep understanding of pod lifecycle, deployments, services, autoscaling and troubleshooting, with hands-on experience in GKE
  • Experience with Python
  • Experience with observability, monitoring, logging, tracing and performance benchmarking
  • Practical experience with GCP services including GKE, GCS and Cloud Monitoring / Logging
  • MLOps experience including deploying and operating ML models, working with vLLM or similar frameworks and managing GPU workloads in Kubernetes
  • Solid understanding of autoscaling with HPA, Metrics Server and basic knowledge of Cluster Autoscaler
  • Experience designing and maintaining CI/CD pipelines using GitHub Actions
  • Strong Python skills for automation, scripting and infrastructure tasks

Nice to Have

  • Knowledge of AWS and Azure cloud services

Overlap time requirements

  • 11 AM PST

We Offer

  • Career Development


  • Professional Certification


  • Mentorship


  • Medical Insurance


  • Relocation Support


  • Corporate Events


  • Flexible Work Options (hybrid/remote)


  • Paid Time Off and Sick Leave


Want to Apply?

Fill in the form and we’ll get back to you

Didn't find a match?

Just submit your CV through our Talent Pool form to allow us to discover your potential and stay in touch.


Stay in touch