Akvelon.Dev | Job | Senior DevOps with Python

About us / Senior DevOps with Python

Senior DevOps with Python

Serbia 🇷🇸Poland 🇵🇱Portugal 🇵🇹Spain 🇪🇸 Croatia 🇭🇷Kazakhstan 🇰🇿Georgia 🇬🇪EngineeringRemoteEnglish: C1 (Advanced)

Join Akvelon to level up your skills and work with top tech companies

Akvelon Inc. (USA) works in the field of software engineering based on a variety of technologies. Our company is a member of the vendor program, which gives employees the opportunity to work with clients in the United States and other countries (Microsoft, Facebook, Airbnb, Dropbox, Pinterest, and many others). This is a great opportunity to learn from and work with leading engineers of world-renowned companies.

About project

The project expands an advanced Kubernetes-based inference benchmarking framework to provide end-to-end performance measurements, cross-cloud comparisons, and updated model test scenarios. It includes automated benchmarking for startup latency and scale-out behavior across multiple model families and hardware configurations, integrates dashboards and reporting, and supports next-generation autoscaling, scheduling, storage, and multi-host workloads to ensure reliable measurement of real-world inference performance.

Responsibilities

Provide consistent, platform-wide performance signals across all inference workloads and teams, ensuring clear visibility into system efficiency and bottlenecks
Deliver standardized cross-cloud benchmarking across major Kubernetes providers to ensure reliable performance comparisons
Support leadership reporting through monthly cloud-wide performance results that enable accurate insights and data-driven decision making
Enable teams to validate new features including autoscaling, scheduling, storage, node provisioning, vLLM optimizations, and accelerator support using unified benchmarking frameworks aligned with organizational OKRs
Extend the benchmarking framework to cover startup latency and scale-out behavior for multiple model families and hardware configurations
Integrate automated benchmarking APIs, dashboards, and reporting pipelines to streamline performance evaluation
Collaborate with engineering teams to maintain reusable inference components while ensuring accurate scheduling, infrastructure provisioning, and reporting

Requirements:

Advanced Kubernetes expertise including deep understanding of pod lifecycle, deployments, services, autoscaling and troubleshooting, with hands-on experience in GKE
Experience with Python
Experience with observability, monitoring, logging, tracing and performance benchmarking
Practical experience with GCP services including GKE, GCS and Cloud Monitoring / Logging
MLOps experience including deploying and operating ML models, working with vLLM or similar frameworks and managing GPU workloads in Kubernetes
Solid understanding of autoscaling with HPA, Metrics Server and basic knowledge of Cluster Autoscaler
Experience designing and maintaining CI/CD pipelines using GitHub Actions
Strong Python skills for automation, scripting and infrastructure tasks

Nice to Have

Knowledge of AWS and Azure cloud services

Overlap time requirements

11 AM PST

We Offer

Career Development
Professional Certification
Mentorship
Medical Insurance
Relocation Support
Corporate Events
Flexible Work Options (hybrid/remote)
Paid Time Off and Sick Leave

Want to Apply?

Fill in the form and we’ll get back to you

Didn't find a match?

Just submit your CV through our Talent Pool form to allow us to discover your potential and stay in touch.

Stay in touch