Pular para o conteúdo principal

ECS - Elastic Container Service

Orchestrate Docker containers at scale with ECS Fargate (serverless) or EC2.

Prerequisite: AWSProvider Configuration

Before creating any AWS resource, you need to configure an AWSProvider that manages credentials and authentication with AWS.

IRSA:

apiVersion: aws-infra-operator.runner.codes/v1alpha1
kind: AWSProvider
metadata:
name: production-aws
namespace: default
spec:
region: us-east-1
roleARN: arn:aws:iam::123456789012:role/infra-operator-role
defaultTags:
managed-by: infra-operator
environment: production

Static Credentials:

apiVersion: v1
kind: Secret
metadata:
name: aws-credentials
namespace: default
type: Opaque
stringData:
access-key-id: test
secret-access-key: test
---
apiVersion: aws-infra-operator.runner.codes/v1alpha1
kind: AWSProvider
metadata:
name: localstack
namespace: default
spec:
region: us-east-1
accessKeyIDRef:
name: aws-credentials
key: access-key-id
secretAccessKeyRef:
name: aws-credentials
key: secret-access-key
defaultTags:
managed-by: infra-operator
environment: test

Check Status:

kubectl get awsprovider
kubectl describe awsprovider production-aws
aviso

For production, always use IRSA (IAM Roles for Service Accounts) instead of static credentials.

Create IAM Role for IRSA

To use IRSA in production, you need to create an IAM Role with the required permissions:

Trust Policy (trust-policy.json):

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:infra-operator-system:infra-operator-controller-manager",
"oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:aud": "sts.amazonaws.com"
}
}
}
]
}

IAM Policy - ECS (ecs-policy.json):

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:CreateCluster",
"ecs:DeleteCluster",
"ecs:DescribeClusters",
"ecs:UpdateCluster",
"ecs:UpdateClusterSettings",
"ecs:PutClusterCapacityProviders",
"ecs:TagResource",
"ecs:UntagResource",
"ecs:ListTagsForResource"
],
"Resource": "*"
}
]
}

Create Role:

# Create IAM Role
aws iam create-role \
--role-name infra-operator-ecs-role \
--assume-role-policy-document file://trust-policy.json

# Attach policy
aws iam put-role-policy \
--role-name infra-operator-ecs-role \
--policy-name ecs-policy \
--policy-document file://ecs-policy.json

# Annotate Service Account in K8s
kubectl annotate serviceaccount infra-operator-controller-manager \
-n infra-operator-system \
eks.amazonaws.com/role-arn=arn:aws:iam::123456789012:role/infra-operator-ecs-role

Creating ECS Cluster

Cluster Basic (Fargate):

apiVersion: aws-infra-operator.runner.codes/v1alpha1
kind: ECSCluster
metadata:
name: fargate-cluster
namespace: default
spec:
providerRef:
name: production-aws

clusterName: my-fargate-cluster

# Container Insights enabled by default
settings:
- name: containerInsights
value: enabled

tags:
Name: fargate-cluster
Environment: production
Team: platform

Cluster with Fargate + EC2:

apiVersion: aws-infra-operator.runner.codes/v1alpha1
kind: ECSCluster
metadata:
name: hybrid-cluster
namespace: default
spec:
providerRef:
name: production-aws

clusterName: hybrid-cluster

# Capacity providers (Fargate + EC2)
capacityProviders:
- FARGATE
- FARGATE_SPOT
- my-ec2-capacity-provider

# Default strategy (80% Fargate, 20% Fargate Spot)
defaultCapacityProviderStrategy:
- capacityProvider: FARGATE
weight: 80
base: 10 # Ensures 10 tasks on regular Fargate
- capacityProvider: FARGATE_SPOT
weight: 20

settings:
- name: containerInsights
value: enabled

tags:
Name: hybrid-cluster
Environment: production

Cluster with ECS Exec:

apiVersion: aws-infra-operator.runner.codes/v1alpha1
kind: ECSCluster
metadata:
name: debug-cluster
namespace: development
spec:
providerRef:
name: dev-aws

clusterName: debug-cluster

# Configuration for ECS Exec (debugging)
configuration:
executeCommandConfiguration:
logging: OVERRIDE
kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012
logConfiguration:
cloudWatchLogGroupName: /ecs/execute-command
cloudWatchEncryptionEnabled: true

settings:
- name: containerInsights
value: enabled

tags:
Name: debug-cluster
Environment: development

Check Status:

# List clusters
kubectl get ecscluster

# View details
kubectl describe ecscluster fargate-cluster

# View cluster ARN
kubectl get ecscluster fargate-cluster -o jsonpath='{.status.clusterARN}'

# View statistics
kubectl get ecscluster fargate-cluster -o jsonpath='{.status}'

Specification Fields

FieldTypeRequiredDescription
clusterNamestringECS cluster name (1-255 characters)
capacityProviders[]stringList of capacity providers (FARGATE, FARGATE_SPOT, or custom)
defaultCapacityProviderStrategy[]objectDefault distribution strategy among providers
settings[]objectCluster settings (containerInsights)
configurationobjectAdvanced configuration (execute command)
serviceConnectDefaultsobjectDefaults for Service Connect
tagsmap[string]stringCustom tags for cluster
deletionPolicystringDeletion policy: Delete (default) or Retain

Capacity Provider Strategy

FieldTypeRequiredDescription
capacityProviderstringCapacity provider name
weightint32Relative weight (0-1000, default: 0)
baseint32Minimum number of tasks on this provider

Execute Command Configuration

FieldTypeDescription
loggingstringLogging type: NONE, DEFAULT, OVERRIDE
kmsKeyIdstringKMS key for log encryption
logConfiguration.cloudWatchLogGroupNamestringCloudWatch log group
logConfiguration.cloudWatchEncryptionEnabledboolEnable CloudWatch encryption
logConfiguration.s3BucketNamestringS3 bucket for logs
logConfiguration.s3EncryptionEnabledboolEnable S3 encryption

Status Fields

FieldTypeDescription
readyboolIf the cluster is active (status = "ACTIVE")
clusterARNstringARN of created cluster
statusstringState: PROVISIONING, ACTIVE, DEPROVISIONING, FAILED, INACTIVE
registeredContainerInstancesCountint32Number of registered EC2 instances
runningTasksCountint32Number of running tasks
pendingTasksCountint32Number of pending tasks
activeServicesCountint32Number of active services
lastSyncTimetimeLast synchronization with AWS

Use Cases

Cluster Serverless (Fargate Only)

Example:

apiVersion: aws-infra-operator.runner.codes/v1alpha1
kind: ECSCluster
metadata:
name: serverless-cluster
namespace: production
spec:
providerRef:
name: production-aws

clusterName: serverless-prod

capacityProviders:
- FARGATE
- FARGATE_SPOT

defaultCapacityProviderStrategy:
- capacityProvider: FARGATE
weight: 70
base: 5
- capacityProvider: FARGATE_SPOT
weight: 30

settings:
- name: containerInsights
value: enabled

tags:
Type: serverless
Environment: production

Cluster for Development with Debugging

Example:

apiVersion: aws-infra-operator.runner.codes/v1alpha1
kind: ECSCluster
metadata:
name: dev-cluster
namespace: development
spec:
providerRef:
name: dev-aws

clusterName: development

configuration:
executeCommandConfiguration:
logging: OVERRIDE
logConfiguration:
cloudWatchLogGroupName: /ecs/dev/exec
cloudWatchEncryptionEnabled: false

settings:
- name: containerInsights
value: enabled

tags:
Environment: development
Debug: enabled

Troubleshooting

Cluster does not become ACTIVE

Check cluster state

Command:

kubectl describe ecscluster my-cluster

Look for:

  • Status: PROVISIONING - Cluster still being created (normally 1-2 minutes)
  • Status: FAILED - Creation failed (check operator logs)

Check IAM permissions

Command:

# View operator logs
kubectl logs -n infra-operator-system -l control-plane=controller-manager

# Look for errors like "AccessDenied"

Common error: IAM role without ECS permissions Solution: Add policy ecs:* to IRSA role

Check capacity providers

Command:

# Check if capacity providers exist
aws ecs describe-capacity-providers \
--capacity-providers my-ec2-capacity-provider

Common error: Capacity provider does not exist Solution: Create capacity provider first or use only FARGATE/FARGATE_SPOT

Tasks do not start

Check capacity providers

Command:

# View cluster details
kubectl get ecscluster my-cluster -o yaml

If there are no capacityProviders configured:

kubectl patch ecscluster my-cluster --type=merge -p '{"spec":{"capacityProviders":["FARGATE"]}}'

Check subnets and security groups

Fargate tasks need:

  • Subnets with internet connectivity (or NAT Gateway)
  • Security groups allowing necessary traffic

Command:

# Check task subnets
aws ecs describe-tasks --cluster my-cluster --tasks <task-id> \
--query 'tasks[0].attachments[0].details'

Check account limits

Command:

# Check Fargate vCPUs quota
aws service-quotas get-service-quota \
--service-code fargate \
--quota-code L-3032A538

# View running tasks
kubectl get ecscluster my-cluster -o jsonpath='{.status.runningTasksCount}'

If limit reached:

  • Request quota increase via AWS Console
  • Scale tasks horizontally

Error deleting cluster

Cluster with active services or tasks

Command:

# View tasks and services
kubectl get ecscluster my-cluster -o jsonpath='{.status}'

If there are tasks/services running:

# Stop all tasks
aws ecs list-tasks --cluster my-cluster | \
xargs -I {} aws ecs stop-task --cluster my-cluster --task {}

# Delete all services
aws ecs list-services --cluster my-cluster | \
xargs -I {} aws ecs delete-service --cluster my-cluster --service {} --force

Wait 1-2 minutes and try deleting again.

Registered container instances

Command:

# View instances
kubectl get ecscluster my-cluster -o jsonpath='{.status.registeredContainerInstancesCount}'

If there are EC2 instances:

# Deregister instances
aws ecs list-container-instances --cluster my-cluster | \
xargs -I {} aws ecs deregister-container-instance --cluster my-cluster --container-instance {} --force

Deletion Policies

Delete (Default)

When the CR is deleted, the ECS cluster is automatically deleted:

spec:
deletionPolicy: Delete # Default
aviso

The cluster can only be deleted if it has no tasks, services, or container instances.

Retain

The ECS cluster is kept even after deleting the CR:

spec:
deletionPolicy: Retain

Use case: Clusters with complex configurations or important data.

Advanced Examples

Cluster with Service Connect

Example:

apiVersion: aws-infra-operator.runner.codes/v1alpha1
kind: ECSCluster
metadata:
name: service-mesh-cluster
spec:
providerRef:
name: production-aws

clusterName: service-mesh

serviceConnectDefaults:
namespace: my-app-namespace

settings:
- name: containerInsights
value: enabled

tags:
ServiceMesh: enabled

Cluster with Complete Logging

Example:

apiVersion: aws-infra-operator.runner.codes/v1alpha1
kind: ECSCluster
metadata:
name: audit-cluster
spec:
providerRef:
name: production-aws

clusterName: audit-prod

configuration:
executeCommandConfiguration:
logging: OVERRIDE
kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/audit-key
logConfiguration:
cloudWatchLogGroupName: /ecs/audit/exec
cloudWatchEncryptionEnabled: true
s3BucketName: audit-logs-bucket
s3EncryptionEnabled: true
s3KeyPrefix: ecs-exec/

settings:
- name: containerInsights
value: enabled

tags:
Compliance: sox
Audit: enabled

Next Steps

After creating the ECS cluster:

  1. Create Task Definitions defining containers and resources
  2. Configure Services to run tasks permanently
  3. Setup Auto Scaling to scale automatically
  4. Configure Load Balancers (ALB/NLB) to distribute traffic
  5. Implement Service Discovery with AWS Cloud Map
  6. Configure CI/CD for automated deployment
informação

The operator manages only the cluster. Task definitions, services, and tasks must be managed separately.

Monitoring

CloudWatch Container Insights

With containerInsights: enabled, you have access to:

  • Cluster metrics: CPU, memory, network of cluster
  • Service metrics: CPU, memory per service
  • Task metrics: CPU, memory per task
  • Container metrics: Metrics per container

Command:

# View metrics in CloudWatch
aws cloudwatch get-metric-statistics \
--namespace AWS/ECS \
--metric-name CPUUtilization \
--dimensions Name=ClusterName,Value=my-cluster \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-01T23:59:59Z \
--period 3600 \
--statistics Average

ECS Exec (Debugging)

With execute command enabled, you can execute commands in running containers:

# Enable execute command in task definition
# Then, execute commands:
aws ecs execute-command \
--cluster my-cluster \
--task <task-id> \
--container my-container \
--interactive \
--command "/bin/sh"

Comparison: ECS vs EKS

AspectECSEKS (Kubernetes)
ComplexitySimpleComplex
Lock-inAWS onlyMulti-cloud
CostFree (pay for Fargate/EC2)$0.10/hour per cluster
AWS IntegrationNative and deepVia AWS Load Balancer Controller
CommunitySmallerGiant
Learning curveFastSlow

Use ECS if:

  • You are 100% AWS
  • Want simplicity
  • Small/medium team

Use EKS if:

  • Need multi-cloud
  • Already using Kubernetes
  • Want portability

References