This article apply for MD Core container in AWS ECS
Background
There are situations in ECS deployment that the deployment-ids within a license key get zombie as the container is terminated but it could not properly deactivate that deployment-id.
Goal:
When an MD Core task starts, record its deployment ID; when the task stops (scale-in, crash, OOM, deploy), deactivate that deployment reliably—even if the container can’t run its own shutdown hook.
1 - Architecture (at a glance)
Trigger: EventBridge rule on ECS Task State Change (we filter by
desiredStatusfor speed).Lambda (single function):
- On
desiredStatus=RUNNING→ wait untillastStatus=RUNNING, call MD Core Admin APIGET http://<task-private-ip>:8008/admin/license(headerapikey: <MDCORE_APIKEY>) → readdeployment. - Store mapping
{ task_arn → deployment }in DynamoDB (TTL’d). - On
desiredStatus=STOPPED→ wait untillastStatus=STOPPED, read activation key + deployment → callGET https://activation.dl.opswat.com/deactivation?key=<activation-key>&deployment=<deployment>.
- On
Secrets: Activation key and Admin API key live in AWS Secrets Manager; they are injected into the Task Definition as environment secrets (not plain env).
Networking: Lambda is placed in the same VPC as ECS and allowed to reach the task’s private IP on TCP/8008.
2 - Prerequisites
- ECS Fargate service for MD Core (container name recommended:
mdcore). - VPC with private subnets and either a NAT gateway (or VPC endpoints for Secrets Manager & DynamoDB).
- Basic AWS permissions to create Lambda, EventBridge rules, DynamoDB table, and Secrets.
3 - Store secrets in AWS Secrets Manager
Create two secrets (names are examples; choose your own):
# 1) MD Core activation key (for deactivation calls)aws secretsmanager create-secret \ --name mdcore/license-key \ --secret-string '<YOUR_ACTIVATION_KEY>' # 2) MD Core Admin API key (for /admin/license)aws secretsmanager create-secret \ --name mdcore/admin-apikey \ --secret-string '<STRONG_RANDOM_APIKEY>'4 - Update the Task Definition (container mdcore)
Add environment secrets:
MDCORE_LICENSE_KEY→ value from Secrets Managermdcore/license-keyMDCORE_APIKEY→ value from Secrets Managermdcore/admin-apikey
Tip: You do not need to expose port 8008 externally. Lambda will call the task’s private IP on port 8008.
5 - Create the DynamoDB table (idempotency & mapping)
aws dynamodb create-table \ --table-name mdcore-deployments \ --attribute-definitions AttributeName=task_arn,AttributeType=S \ --key-schema AttributeName=task_arn,KeyType=HASH \ --billing-mode PAY_PER_REQUESTThen enable TTL on attribute ttl (Console → DynamoDB → Table → TTL).

6 - Create the Lambda function

A. Basics
- Author from scratch → Name:
mdcore-license-handler - Runtime: Python 3.11, Architecture: x86_64
- Create.
B. VPC
Configuration → VPC → Edit
- Select the same VPC and private subnets as ECS.
- Security group for Lambda: allow egress to the ECS task SG.
ECS task SG: allow inbound TCP 8008 from the Lambda SG.
C. Environment variables
DDB_TABLE = mdcore-deploymentsMDCORE_CONTAINER_NAME = mdcoreMDCORE_APIKEY_NAME = MDCORE_APIKEYMDCORE_ACTIVATION_KEY_NAME = MDCORE_LICENSE_KEYMDCORE_ADMIN_PORT = 8008WAIT_RUNNING_SEC = 120WAIT_STOPPED_SEC = 90DEACTIVATE_URL = https://activation.dl.opswat.com/deactivationD. Permissions (IAM)
From Lambda → Configuration → Permissions → click the Role name → Add permissions → Create inline policy → JSON:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action":["ecs:DescribeTasks","ecs:DescribeTaskDefinition"], "Resource": "*" }, { "Effect": "Allow", "Action":["secretsmanager:GetSecretValue"], "Resource": "*" }, { "Effect": "Allow", "Action":["dynamodb:GetItem","dynamodb:PutItem","dynamodb:UpdateItem"], "Resource": "*" }, { "Effect": "Allow", "Action": ["logs:CreateLogGroup","logs:CreateLogStream","logs:PutLogEvents"], "Resource": "*" } ]}Later, tighten Resource to specific ARNs (table + secrets). If secrets use a CMK, also grant kms:Decrypt.
E. Code (paste into lambda_function.py, handler = lambda_function.handler)
import os, time, json, urllib.request, urllib.parseimport boto3from botocore.exceptions import ClientErrorecs = boto3.client('ecs')secrets = boto3.client('secretsmanager')ddb = boto3.resource('dynamodb')table = ddb.Table(os.environ['DDB_TABLE'])MDCORE_CONTAINER = os.environ.get('MDCORE_CONTAINER_NAME', 'mdcore')APIKEY_ENV = os.environ.get('MDCORE_APIKEY_NAME', 'MDCORE_APIKEY')ACTKEY_ENV = os.environ.get('MDCORE_ACTIVATION_KEY_NAME', 'MDCORE_LICENSE_KEY')ADMIN_PORT = int(os.environ.get('MDCORE_ADMIN_PORT', '8008'))DEACTIVATE_URL = os.environ.get('DEACTIVATE_URL', 'https://activation.dl.opswat.com/deactivation')WAIT_RUNNING_SEC = int(os.environ.get('WAIT_RUNNING_SEC', '120'))WAIT_STOPPED_SEC = int(os.environ.get('WAIT_STOPPED_SEC', '90'))def _get_secret(maybe_arn): if not maybe_arn: return None v = secrets.get_secret_value(SecretId=maybe_arn) return v.get('SecretString') or v.get('SecretBinary')def _env_from_taskdef(taskdef_arn, name, container_name): td = ecs.describe_task_definition(taskDefinition=taskdef_arn)['taskDefinition'] cdefs = td.get('containerDefinitions', []) c = next((x for x in cdefs if x.get('name') == container_name), (cdefs[0] if cdefs else {})) for e in c.get('environment', []): if e.get('name') == name: return e.get('value') for s in c.get('secrets', []): if s.get('name') == name: return _get_secret(s.get('valueFrom')) return Nonedef _describe_task(cluster_arn, task_arn): t = ecs.describe_tasks(cluster=cluster_arn, tasks=[task_arn])['tasks'][0] last = t.get('lastStatus') taskdef_arn = t.get('taskDefinitionArn') ip = None for att in t.get('attachments', []): if att.get('type') == 'ElasticNetworkInterface': for d in att.get('details', []): if d.get('name') == 'privateIPv4Address': ip = d.get('value'); break return last, taskdef_arn, ipdef _wait_until(cluster_arn, task_arn, want_status, timeout_sec): t0 = time.time() last, taskdef, ip = None, None, None while time.time() - t0 < timeout_sec: last, taskdef, ip = _describe_task(cluster_arn, task_arn) if last == want_status: return last, taskdef, ip time.sleep(3) return last, taskdef, ip def _call_admin_license(ip, apikey, retries=20, sleep=3): url = f'http://{ip}:{ADMIN_PORT}/admin/license' for _ in range(retries): try: req = urllib.request.Request(url, method='GET', headers={'apikey': apikey}) with urllib.request.urlopen(req, timeout=5) as resp: if resp.status == 200: return json.loads(resp.read().decode()) except Exception: time.sleep(sleep) raise RuntimeError("Cannot query /admin/license (timeout)") def _put_mapping(task_arn, container_name, cluster_arn, taskdef_arn, ip, deployment): table.put_item(Item={ 'task_arn': task_arn, 'container_name': container_name, 'cluster_arn': cluster_arn, 'taskdef_arn': taskdef_arn, 'private_ip': ip or '', 'deployment': deployment, 'created_at': int(time.time()), 'status': 'ACTIVE', 'ttl': int(time.time()) + 14*24*3600 }) def _get_mapping(task_arn): r = table.get_item(Key={'task_arn': task_arn}) return r.get('Item') def _mark_deactivated(task_arn): table.update_item( Key={'task_arn': task_arn}, UpdateExpression="SET #s=:s, deactivated_at=:t", ExpressionAttributeNames={'#s': 'status'}, ExpressionAttributeValues={':s': 'DEACTIVATED', ':t': int(time.time())} ) def _deactivate(activation_key, deployment): q = f"{DEACTIVATE_URL}?key={urllib.parse.quote(activation_key)}&deployment={urllib.parse.quote(deployment)}" req = urllib.request.Request(q, method='GET') with urllib.request.urlopen(req, timeout=10) as resp: if resp.status not in (200, 204): body = resp.read().decode() raise RuntimeError(f"Deactivate failed: {resp.status} {body}") def handler(event, _ctx): d = event.get('detail', {}) desired = d.get('desiredStatus') cluster_arn = d.get('clusterArn') task_arn = d.get('taskArn') if not (cluster_arn and task_arn and desired): return {"skipped": True, "reason": "missing fields"} last, taskdef_arn, ip = _describe_task(cluster_arn, task_arn) if desired == 'RUNNING': last, taskdef_arn, ip = _wait_until(cluster_arn, task_arn, 'RUNNING', WAIT_RUNNING_SEC) apikey = _env_from_taskdef(taskdef_arn, APIKEY_ENV, MDCORE_CONTAINER) if not apikey: raise RuntimeError(f"Missing API key env {APIKEY_ENV}") t0 = time.time() while not ip and time.time() - t0 < 60: time.sleep(3) _, _, ip = _describe_task(cluster_arn, task_arn) if not ip: raise RuntimeError("No private IP; cannot reach admin port") lic = _call_admin_license(ip, apikey) dep = lic.get('deployment') if not dep: raise RuntimeError("No 'deployment' in /admin/license") _put_mapping(task_arn, MDCORE_CONTAINER, cluster_arn, taskdef_arn, ip, dep) return {"status": "mapped", "desired": desired, "last": last, "deployment": dep} if desired == 'STOPPED': item = _get_mapping(task_arn) if not item or not item.get('deployment'): return {"status": "no-mapping", "desired": desired, "last": last} dep = item['deployment'] act_key = _env_from_taskdef(taskdef_arn, ACTKEY_ENV, MDCORE_CONTAINER) if not act_key: raise RuntimeError(f"Missing activation key env {ACTKEY_ENV}") last, _, _ = _wait_until(cluster_arn, task_arn, 'STOPPED', WAIT_STOPPED_SEC) _deactivate(act_key, dep) _mark_deactivated(task_arn) return {"status": "deactivated", "desired": desired, "last": last} return {"skipped": True, "reason": f"desired={desired}"}7 - Create the EventBridge rule
Console → EventBridge → Rules → Create rule → Rule with an event pattern → AWS services → Service: Elastic Container Service (ECS) → Event type: ECS Task State Change → Switch to JSON editor and paste:
{ "source": ["aws.ecs"], "detail-type": ["ECS Task State Change"], "detail": { "clusterArn": ["arn:aws:ecs:<region>:<acct>:cluster/<your-cluster>"], "group": ["service:<your-mdcore-service>"], "desiredStatus": ["RUNNING","STOPPED"] }}Target: Lambda function → select mdcore-license-handler.
In Additional settings: enable a DLQ (SQS) and keep default retries.

8 - Test & validate
- Scale service to 0 and back to 1:
- On scale-up, CloudWatch Logs should show:
status=mapped ... deployment=<id>. Verify DynamoDB row{ task_arn, deployment, status=ACTIVE }. - On scale-in/stop, Logs should show:
status=deactivated. Row updated tostatus=DEACTIVATED.
- Manual curl (optional)
- Get deployment from inside the task:
curl -H "apikey: $MDCORE_APIKEY" http://localhost:8008/admin/license- Expected JSON contains
"deployment": "<MSCL...>".
- Security Group check
- If mapping fails, ensure Lambda SG - ECS task SG allows TCP/8008
9 - Troubleshooting quick list
- EventBridge rule not firing: wrong region/bus; pattern mismatch (check
clusterArn/group); use CloudWatch “Metrics for matched events”. - Mapping fails (RUNNING): Lambda not in VPC / no IP on task yet / SG not allowing 8008 / wrong
MDCORE_APIKEY_NAME. - Deactivation fails (STOPPED): No DDB mapping (task died before RUNNING), wrong
MDCORE_LICENSE_KEYsecret, outbound internet blocked (need NAT for the activation URL) or corporate proxy. - Timeouts: Increase
WAIT_RUNNING_SEC/WAIT_STOPPED_SEC; keep EventBridge target retry window ≥ 1h.
10 - Hardening & Ops
- Least privilege: restrict IAM to specific ARNs; add
kms:Decryptif secrets use a CMK. - Observability: add CloudWatch metric filters for
"Deactivate failed"and alarm. - Idempotency: deactivation endpoint should be safe to call multiple times; DDB state prevents repeats.
- Cost: all serverless; negligible under normal loads.
11 - Alternative "sidecar register" (if opening 8008 to Lambda is hard)
Add a tiny sidecar container that waits for MD Core to be ready, calls /admin/license, and writes the mapping directly to DynamoDB (using the task role). Then the Lambda only needs to handle STOPPED events to deactivate. This avoids Lambda→8008 traffic.
If Further Assistance is required, please proceed to log a support case or chatting with our support engineer.
