Upgrade

This document explains how to move an existing deployment from the previous release (app build 10.5.2603) to this release (app build 10.6.2605) without destroying any live infrastructure.

Conventions used in this document

The actual on-disk folder names in your environment may vary (e.g. mocm-10.5.2603/, mocm-2026.01/, releases/2026-05/, or anything you choose). To stay release-name-agnostic, this document uses two placeholders:

PlaceholderMeaning
<previous-release>/The release directory currently running in production (app build 10.5.2603, holds your live OpenTofu state).
<current-release>/The release directory of this package (app build 10.6.2605) — the new version you are upgrading to.

Replace both placeholders with the real paths on your machine before running any command.

What's new in this release (10.5.260310.6.2605)

LayerChange
Application imagesBumped from 10.5.2603 to 10.6.2605 (see images/images.txt)
TerraformNew example terraform/aws/terraform.tfvars.cost-optimized.example
TerraformNew operational guides under terraform/aws/docs/ (deployment, cost, DR, EKS upgrade, observability, WAF, …).
Helmmocm/values.yaml reset to template form — placeholders must be filled in again.

Review the full diff before you start:

Bash
Copy

Pre-upgrade checklist

  1. Snapshot everything.

    • MongoDB EBS snapshots (DLM should already be running — verify).
    • aws s3 cp The contents of all 7 buckets to a safe location, or rely on versioning if enabled.
    • Take an RDS/ElastiCache final snapshot if applicable.
  2. Back up the local OpenTofu state.

Bash
Copy
  1. Check tool versions match the table in README.md — in particular tofu >= 1.10.4.
  2. Read tofu plan output carefully before any apply. If you ever see destroy or replace on resources you intend to keep, STOP and investigate.

Option A — Keep using the local backend (simplest, POC/single-operator)

Use this if you originally ran tofu apply inside <previous-release>/terraform/aws/ without configuring a remote backend.

Bash
Copy

Open <current-release>/terraform/aws/terraform.tfvars and append the Valkey block below (new in 10.6.2605). Skip if your file already contains valkey_*.

Bash
Copy

Activate the ElastiCache (Redis) file for parallel run. The previous release's ElastiCache resource is shipped as 09-elasticache.tf.2603 so OpenTofu ignores it by default. Rename it so the old Redis cluster keeps running alongside the new Valkey replication group during data migration:

Bash
Copy

After this rename both aws_elasticache_cluster.elasticache (old Redis) and aws_elasticache_replication_group.valkey (new) will appear in the plan as + create / no-op. Leave this file active until applications are fully cut over to Valkey — see Stage 4

Bash
Copy
Bash
Copy

Do this once, then every future release just points at the same bucket.

B.1 First-time migration (still inside <previous-release>)

Bash
Copy

After this step, terraform.tfstate in the local directory becomes irrelevant (OpenTofu reads/writes S3). You may delete it, but archiving is safer:

Bash
Copy

B.2 Apply this release

Bash
Copy

No state file copying is required — both release directories point to the same remote state.

Rollback

From Option A (local backend)

Bash
Copy

From Option B (remote backend)

S3 bucket versioning (enabled in backend.tf.example) lets you restore the previous state object:

Bash
Copy

Helm / Helmfile upgrade (application layer)

The chart upgrade procedure itself (lint → helmfile diffhelmfile sync, release ordering, rollback) is already documented in mocm/README.md → Upgrade. Follow that document for the actual commands.

What is specific to a cross-release upgrade and easy to get wrong:

  1. Do NOT copy <previous-release>/mocm/values.yaml over <current-release>/mocm/values.yaml. The new values.yaml ships with placeholders (< REPLACE_VALUE_* >) and may contain new keys that did not exist in the previous release. A blind copy will silently drop those new keys. Correct workflow — merge, don't overwrite:
Bash
Copy
  1. Bump the image tag in global.image.tag:
Bash
Copy

Make sure the new images are already pushed to ECR (cd <current-release>/images && ./loadimage.sh) before running helmfile sync, otherwise pods will go into ImagePullBackOff.

  1. ** Run a diff first— the 3-release ordering

(mocm-bootstrap-1mocm-bootstrap-2mocm-service) is enforced by Helmfile, but you should still preview the diff:

Bash
Copy

Then apply with the standard command from mocm/README.md:

Bash
Copy
  1. ** Helm rollback is per-release, not per-package. If mocm-service fails to upgrade, roll back only that release:
Bash
Copy

You do not need to roll back the Terraform layer just because the Helm layer failed.

  1. Chart-level changes are tracked in mocm/CHANGELOG.md. Read the entry for the version shipped with this release before upgrading — any ### Changed or ### Removed item there may require values-file edits beyond the steps above.

Stage 4 — Decommission ElastiCache (Redis) after Valkey cutover

After Path A/B + the Helm upgrade above, both caches are running so applications can be migrated without downtime:

ResourceState after the upgrade apply
aws_elasticache_cluster.elasticache (Redis)Still serving live traffic
aws_elasticache_replication_group.valkeyCreated, idle, ready to use

Run Stage 4 only after every workload that used the old Redis is pointing at the Valkey endpoint (stored in Secrets Manager as <name_prefix>/valkey/<valkey_username>) and you have observed Valkey serving traffic for at least one full business cycle.

4.1 Verify nothing still depends on ElastiCache (Redis)

Bash
Copy

Also grep the live Helm values for any hostnames still pointing at the old endpoint:

Bash
Copy

If anything matches, finish the application-layer cutover before continuing.

4.2 (Optional) Take a final snapshot

If you want a last-chance restore point before destroying the cluster:

Bash
Copy

4.3 Rename the file back so OpenTofu plans a destroy

Bash
Copy

Renaming back to .2603 deactivates the resource block and its co-located elasticache_* variables in one step (they live in the same file — see the header comment in 09-elasticache.tf.2603).

4.4 Plan and apply the teardown

Bash
Copy

The matching security group aws_security_group.elasticache is defined in 02-sg.tf and is not touched by renaming 09-elasticache.tf. If nothing else references it after teardown, remove it in a follow-up commit.

VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches