The Update You Can’t Afford to Skip: End of Support for Office 2016 & Office 2019

Read Now
We utilize artificial intelligence for site translations, and while we strive for accuracy, they may not always be 100% precise. Your understanding is appreciated.

DevOps Engineer

Vietnam
Product Engineering
OPSWAT

Protecting the World’s Critical Infrastructure

OPSWAT, a global leader in IT, OT, and ICS critical infrastructure cybersecurity, delivers an end-to-end platform that gives public and private sector organizations and enterprises the critical advantage needed to protect their complex networks, secure their devices, and ensure compliance. Over the last 20 years our commitment to innovative technology has earned the trust of more than 1,700 organizations, governments, and institutions globally, solidifying our role in protecting the world’s critical infrastructure and securing our way of life.

The Position

  • Join the infrastructure team behind MDaaS — a real-time malware scanning platform handling 30M+ requests/day, built on AWS, Kubernetes, and event streaming.

  • You'll work in an agile scrum team, owning infrastructure-as-code, CI/CD pipelines, and observability. Security and compliance are first-class requirements, not afterthoughts.

What You Will be Doing

  • Deploy and maintain workloads on EKS via ArgoCD — review GitOps PRs, handle sync failures, approve image updates

  • Write and update Helm charts / Kustomize overlays across dev / staging / prod

  • Triage alerts from Prometheus / Grafana / Coralogix — root cause analysis, resolve or escalate

  • Review and apply Terraform changes — plan, validate, and merge infra PRs (EKS, MSK, ALB, IAM)

  • Maintain CI/CD pipelines on Bitbucket Pipelines and GitHub Actions — fix broken builds, integrate security scans

  • Configure and tune KEDA ScaledObjects for Kafka / RabbitMQ consumers

  • Triage CVEs from Blackduck / Trivy reports — prioritize CVSS ≥ 7.0, coordinate patches with dev team

  • Rotate secrets, verify External Secrets Operator sync, enforce no-hardcoded-credentials policy

  • Document infrastructure and application changes for engineers and QA

  • Participate in on-call rotation — incident response, post-mortems, runbook updates

  • Research new tools and technologies to address current pain points and improve system reliability, scalability, and security — evaluate, prototype, and propose adoption when appropriate

What We Need from You 

  • Experience developing and troubleshooting distributed cloud-native containerized applications to maximize performance. 

  • Experience with automated testing solutions for regression and performance testing.  

  • Experience with optimizing containers and orchestration [Docker & Kubernetes (k8s + helm)] and cloud automation (terraform / Cloudformation / others).  

  • Experience with one major Cloud provider (access management, networking, compute, serverless, databases, monitoring).  

  • Experience with Nodejs / Nestjs / Python as scripting languages. 

  • Experienced on secure Software delivery lifecycle (GitOps, CI/CD using AWS & TeamCity).  

  • Familiar with application and environment security practices.  

  • Familiar with Monitoring and alerting using Datadog, Prometheus, Grafana.  

  • Engage with AWS architects, OPSWAT architects and Engineering managers to optimize existing services.  

  • Contribute towards evolving application and infrastructure ecosystem.  

  • Communicate efficiently with stakeholders and customers.  

  • Take action to improve SLI/SLO for all services. 

What We Need From You

Education & Background

  • BA/BS in Computer Science, Engineering, or equivalent hands-on experience

Soft Skills

  • Strong verbal and written communication in English

  • Self-motivated; works well in a fast-paced, collaborative team

  • Eager to learn new tools and apply them quickly

  • Passionate about solving problems in a principled, elegant way

  • Comfortable both teaching and learning from teammates

Cloud & Infra

  • AWS hands-on: EKS, ECR, IAM/IRSA, MSK, S3, ALB, VPC, Security Groups

  • Terraform: write modules, manage remote state, integrate with CI

  • Kubernetes: RBAC, ingress, network policies, HPA, resource tuning — cluster management via Rancher or K9s

  • Helm + Ansible: author charts and playbooks, manage versioning

  • Docker: multi-stage builds, image optimization

  • Linux/Windows systems administration

CI/CD & GitOps

  • Bitbucket Pipelines, GitHub Actions, or TeamCity — write and maintain, not just use

  • ArgoCD: sync policy, health checks, rollback

  • PR-based deployments; no direct commits to main/prod

Observability

  • Prometheus, Grafana, CloudWatch, Elasticsearch — setup and maintain

  • Structured logging, alert routing, dashboard authoring

Security

  • Least privilege: IAM, IRSA, K8s RBAC — no wildcard permissions

  • Secret management: External Secrets / AWS Secrets Manager, zero hardcoded credentials

  • Supply chain: dependency scanning (Blackduck / Snyk / Trivy), CVE triage by CVSS score

  • Network segmentation: private subnets, Security Groups, ingress/egress control

  • Working knowledge of ISO/IEC 27001 and SOC 2 Type II — access control, audit trail, change management

  • Familiar with CIS Benchmarks for Kubernetes and Linux hardening

Development

  • Python and/or Go — scripting, tooling, automation

  • Able to read Node.js/TypeScript code to debug service issues independently

AI & Tooling

  • Actively uses AI coding tools (GitHub Copilot, Cursor, Claude) in daily workflow — writing scripts, Terraform modules, Helm templates, and debugging

  • Knows how to prompt effectively, verify AI output, and not blindly trust generated infrastructure code

Nice-to-have

  • Experience in the cybersecurity industry

  • Knowledge of compliance frameworks: NISTCSF, HIPAA, GDPR — applied to real infrastructure

  • Istio: mTLS, VirtualService/DestinationRule, traffic management

  • KEDA advanced: custom metrics, scale-to-zero, cooldown tuning

  • Kafka (MSK) operations: topic management, consumer lag, AKHQ

  • Policy-as-code: OPA/Gatekeeper or Kyverno

  • OWASP container and API security principles

  • Coralogix / Datadog with OpenTelemetry — custom pipelines, alert routing

  • Kubecost cost analysis, Kubeshark traffic capture

  • Experience with large-scale systems: 30M+ requests/day

  • Azure exposure (secondary to AWS)

  • Has used AI to generate, review, or optimize infrastructure-as-code (Terraform, Helm, bash scripts) and understands its limitations: hallucinations, outdated API references, security blind spots

  • Experimented with AI APIs (OpenAI, Anthropic) to build internal automation or tooling

OPSWAT is an equal opportunity employer. We celebrate diversity and are committed to providing an environment where equal employment opportunities are extended to all employees and applicants, free of discrimination and harassment of any type. All employment decisions are based on individual qualifications, job requirements, and business needs without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other category protected by federal, state, or local laws.

Recruiting Agencies: we do not accept unsolicited resumes from third party agencies for any of our open positions. To submit resumes for our jobs, there must be a recruiting contract approved by our legal team and endorsed by both parties. We are currently not accepting additional 3rd party agencies at this time.