Replication & High Availability on K8S

When installing MetaDefender Storage Security in Kubernetes as this technology provide the system with its self-healing feature, it might not be strictly necessary to set it up with at least 2 replicas for each of the pods in case of wanting to have High Availability, but it is always a best practice to have 2 replicas of each pod running in different worker nodes. For those components that could not be running within the cluster, we will provide different solutions that are supported by our application.

However, there are a few components that may incur in high load in some situations, so for those components it is recommended to have more than 1 replica. These components are the following.

MDSS services

  • Web Client -> to provide high availability to outside requests
  • API Gateway -> to provide high availability to outside requests
  • Scanning Service -> to provide high availability when having to handle big number of files to scan

3rd Party components

  • Database -> Deploy an external service for PostgreSQL with HA or PostgreSQL operator running on K8S cluster
  • RabbitMQ -> Use external service for RabbitMQ with HA
  • Redis Cache -> Use external service for Redis cache with HA

HA solutions for PostgreSQL

For CSPs provided service it has been tested and is supported by the following:

If a highly available database is required inside the k8s cluster, then there are publicly available solutions that can deploy a PostgreSQL Replica Set. For a k8s cluster, Zalando provides an operator for deploying a Replica Set: https://github.com/zalando/postgres-operator/blob/master/docs/quickstart.md

HA Solution for Redis Cache

OPSWAT have tested the following services to provided HA to a redis service.

HA Solution for RabbitMQ

OPSWAT have tested the following services to provided high availability to a RabbitMQ service.

RabbitMQ Cluster Operator

  1. Install Cert-Manager (required for operator webhooks)
Bash
Copy
  1. Install RabbitMQ Cluster Operator:

You can download the file using this link: RabbitMQ Cluster Operator

Bash
Copy
  1. Install RabbitMQ Messaging Topology Operator:

You can download the file using this link: RabbitMQ Messaging Topology Operator

Bash
Copy

Update your Helm values to enable the operator:

YAML
Copy

The operator will automatically create a 3-node RabbitMQ cluster with quorum-based HA. During node failures, the cluster continues operating and services automatically reconnect to healthy nodes.

HA deployment for MDSS

MDSS containers can be scaled independently depending on the availability and performance requirements. For example, just the webclientand apigateway pods can be replicated to provide high availability to outside requests and for the web interface.

YAML
Copy

The mongomigrationsservice is used to migrate the data from older versions of MDSS at startup if required and provides no benefits if replicated .

Create environment with HA components

For deploying all the components that provide a high availability, OPSWAT have prepared a terraform module to deploy all the 3rd parties applications (Redis, RabbitMQ & PostgreSQL DB).

There are 2 ways of using that terraform module

Automatic replicas using horizontal pod autoscaling (HPA)

The HPA can be enabled on all MDSS services from the values:

YAML
Copy

In the example above the number of replicas are adjusted within the specified limits depending on the measured CPU usage. The HPA is applied for all MDSS services, for a more granular autoscaling it's recommended to manually create a HPA separately from the helm deployment just for the desired services.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard