Effective IAM for AWS

Architect AWS organization for scale

Do you have any dramatic stories of accidents triggered by important activities colliding in a single AWS account? Like:

An application engineer almost deleted a production database.

Load testing an application triggered a production outage by exceeding the AWS API rate limits.

A team new to AWS wants to deploy a prototype and they've asked for IAM admin privileges.

In this chapter you will learn how to solve those problems and architect your AWS organization for scale.

First, architect security domains using AWS accounts to limit the access control problems in any one environment. Then govern the activities that can occur in those accounts with Service Control Policies.

This builds the foundations of AWS Identity access control. And moves your organization towards a safe, sustainable path.

Create security domains with AWS accounts

The most fundamental tool to organize and protect AWS cloud resources is the AWS account. AWS accounts are architectural elements that create management, fault, and security domains with well-defined boundaries. Identities and cloud resources always reside within one and only one account. But many organizations do not use accounts properly, which puts the organization and its customers at risk.

The Single Use Case Rule: Operate each major use case in a dedicated AWS account.

Important resources and limits are shared within an account: IAM, resource limits, and API limits. Separate use cases so they do not interfere with each other.

Examples of use cases include:

Developing and testing a customer-facing Ecommerce application so it can be deployed to production
Operating the Ecommerce application so customers can place orders
Continuous integration and delivery of changes to environments
Collecting application and infrastructure telemetry for internal analysis

Use cases are supported by one to several workloads and are generally environment-specific.

The AWS Well-Architected program defines a workload as:

A collection of resources and code that delivers business value, such as a customer-facing application or a backend process.

A workload might consist of a subset of resources in a single AWS account or be a collection of multiple resources spanning multiple AWS accounts. A small business might have only a few workloads while a large enterprise might have thousands.

— AWS Well-Architected

AWS recommends isolating workloads in accounts, but this may not be practical in your organization. Review AWS Security's account reference architecture for their view on how to organize accounts and deploy core Security services.

This section shows how to organize a medium or large organization’s Cloud accounts to deliver changes quickly and operate safely. Tailor this to fit your needs.

Partition accounts by Use Case

Let’s start with use cases shared across the organization, then examine those for running end-user applications. AWS Organization Reference Architecture

Figure 3.1 AWS Organization Reference Architecture

Enterprise use cases

There are several use cases every Enterprise must support in their Cloud deployment (green). Provision accounts for each use case:

Management

The Management account is the root of trust for the entire organization and its security is paramount. Use the Management account to manage accounts within the organization, consolidate billing, and to (optionally) provision people's access with AWS Single Sign-on. No other workloads should run in the Management account.

Security

The Security account contains the organization’s Cloud API activity logs (CloudTrail), access logs (S3, ELB, VPC, etc), and resource configuration inventory (Config). Ingest these logs into log search tools in the Shared Services account.

Shared Services

Operate monitoring, logging, DNS, directory, and security tools in a Shared Services account. Collect telemetry from AWS, third parties, your infrastructure, and your applications running in other accounts. People with high privileges in other accounts may use this data and services, but should not be able to modify operational telemetry.

Delivery

The Delivery account operates the powerful CI/CD systems that build applications and manage infrastructure. This account will have privileged access to most of the organization. Operating CI/CD in a dedicated account simplifies securing that function.

You may find more Enterprise accounts useful. Many organizations use dedicated accounts for centralized networking use cases. One common case is to route between cloud and traditional data center networks. Review the AWS Secure Environment Accelerator architecture for how to build highly secure networks across many accounts.

With Enterprise-wide accounts set, let's focus on running your applications.

Runtime use cases

Some organizations start with a single AWS account. Or they may have an account for each business unit. Those organizations often run into problems managing security and costs because use cases are not isolated.

Isolate workloads in runtime accounts for each Business Unit

Figure 3.2 Isolate Business Unit use cases with dedicated Runtime accounts

Solve these problems with a set of ‘Runtime’ accounts for each business unit to develop, test, and operate their applications (blue).

Let’s see how partitioning Runtime accounts by business unit and then software delivery phase influences autonomy, security, and cost.

Partition by Business Unit

Most organizations have multiple business units (or departments). Decouple decision making and access management between business units by provisioning a set of Runtime accounts for each business unit. This provides the freedom necessary for business units to do their jobs with minimal coordination.

Then collect each business unit's accounts into an AWS Organizational Unit (OU).

Recognize these choices will guide relationships between people and services within the enterprise going forward. Conway's Law is real and so is the effort to move workloads between accounts.

Autonomy

Business Units use different architectures, team structures, deployment technology, and operational practices. Recognize and accept these differences to help business units coexist and adopt the Cloud in harmony. Avoid battling over standards and shared implementations. Partitioning accounts by business unit creates a clear boundary for leaders to exercise authority within a business unit.

Don't force VPs to lobby their peers to use a technology or make a change within an AWS account. This frustrates everyone. In practice, an Ecommerce BU operates differently than a Data Warehouse. Enable those operational differences with dedicated Runtime accounts so each BU can get their work done.

Security and Safety

IAM users, roles, and policies are scoped to an account. Consequently, an engineer or application in one business unit can use resources without affecting another business unit. This limits risk of security compromises, too. An attacker with a foothold in one business unit cannot automatically access another. Cross-account access can be enabled, but must be done so explicitly.

Cost Management

Tracking and managing AWS operational costs at the business unit level is much easier with both AWS and third-party Cloud cost management tooling.

Partition by Delivery Phase

Most business units deliver applications through multiple phases. Create accounts that match each business unit’s delivery phases so they can satisfy the requirements of each phase safely and efficiently.

Partition accounts by delivery phase

Figure 3.3 Isolate environments by delivery phase within each Business Unit

For example, the Ecommerce BU with delivery phases dev, stage, and prod would have accounts: ecommerce-dev, ecommerce-stage, and ecommerce-prod.

Generally these phases map to what the organization calls 'environments.' Some organizations deploy multiple environments into a single account when their purposes and security requirements are similar. For example, the Ecommerce team might deploy both the development and user acceptance testing environments into the ecommerce-dev account.

Deploy changes to environments using automation running in a delivery account. When the organization uses centralized CI/CD services, the delivery account will probably be shared too. If the business unit runs their own CI/CD services, they should use their own delivery account as well.

Autonomy

Application development teams can deploy changes and get feedback rapidly without fear of breaking downstream environments, particularly production.

Security and Safety

Varying a person’s permissions by delivery phase is straightforward when each phase has its own account. An IAM user or role in the dev account won’t automatically get the same permissions in stage or prod. This simplifies giving the right level of access to data and operations at each phase of delivery. Deleting databases may be ok in dev, but almost never in prod.

Partitioning accounts by delivery phase also demarcates audit boundaries, keeping non-production accounts out of scope.

Cost Management

Partitioning by delivery phase helps you understand how money is spent on each environment and set resource usage limits appropriately. You can configure modest resource limits in dev and high limits in prod. These limits often vary by an order of magnitude, so can only be useful with account-level separation.

Let's pull all of this together in an example AWS Organization.

Full example of an AWS Organization

When you apply these principles to a business operating Ecommerce and Data Warehouse business units in AWS, you'll get an AWS organization that looks like:

Example: AWS Organization architecture

Figure 3.4 Example: AWS Organization architecture

Organize AWS accounts into Organizational Units (OUs) within your AWS Organization. OUs help you administer accounts as a single unit. OUs may be nested into a hierarchy up to five levels deep. You can attach Service Control Policies to an OU, and all accounts within the OU will inherit those security policies.

Model your organization's management and operational requirements with OUs. Then collect accounts into the appropriate OU. Each account belongs to a single OU, at any level of the hierarchy.

The example in Figure 3.4 collects accounts for Enterprise use cases into the Enterprise OU, except for the management account. The management account is left alone within the Root OU as a reminder that Service Control Policies do not apply to an AWS Organization's Management account.

The Runtime use cases are organized into a Runtime OU, which contains dedicated OUs for the Ecommerce and Data Warehouse delivery phases. The Runtime OU also contains a shared sandbox account which supports exploratory development work.

Now let's enable just the capabilities needed to support these accounts' workloads.

Govern capabilities with Service Control Policies

Govern which AWS service capabilities are available in your organization with Service Control Policies. AWS has more than 150 services and 20 regions. You're not going to use them all. This excess capability creates latent risk of accidental use or abuse.

A Service Control Policy (SCP) is a security policy that limits an entire AWS organizational unit or account's use of an AWS service's API actions. SCPs establish the maximum set of allowed actions that IAM can allow within a given account. SCPs have properties that are very useful for governance and security:

SCPs can be applied at any point in an AWS Organizational hierarchy, providing the only policy inheritance capability in the AWS IAM ecosystem.
SCPs are defined within the Management account, and cannot be overridden in a managed account, even by an IAM administrator.

The SCP inheritance model is sequential and reductive.

SCPs filter which permissions flow through the organizational hierarchy to the accounts below.

IAM evaluates each AWS API request against the SCPs at each level of the hierarchy. This determines whether the request proceeds to the account's IAM policies:

Example: AWS Organization with Deny list SCPs

Figure 3.5 Example: AWS Organization with Deny list SCPs

SCPs are not gathered from the path to the account then evaluated as a single set.

Each level of the hierarchy filters permissions: root OU, intermediate OUs, then account.

There are two strategies for implementing SCPs.

The allow list strategy explicitly allows only the desired actions at each level of the hierarchy. Implementing this can be difficult. To allow a new capability in an account, you must allow those actions at each level on the path from the root OU to the account. You can attach up to 5 policies at each level of the hierarchy.

The deny list strategy allows everything, then denies unwanted actions where you choose. This approach is generally more maintainable and flexible. Also, SCPs only support conditions in Deny statements. So some rules like region restrictions are only possible with denies.

To use a deny list strategy, first attach a policy that allows actions at that level. This is usually the FullAWSAccess managed policy. Without a policy that allows, the level will allow no access because no permissions will flow through (implicit deny). Then attach specific policies that explicitly deny unwanted actions at that level.

Let's illustrate the utility of SCPs with a few examples that satisfy common operational and regulatory requirements. We'll start by enforcing the use of approved services.

Limit use to certified or approved services

Many organizations want to limit the AWS services used by their organization. Limiting available services eases compliance with PCI, HIPAA, FedRAMP, ISO27001, and other standards. These standards commonly require use of encryption at rest, access audit, and more. Preventing teams from adopting a non-compliant service dependency can save a lot of trouble down the road. SCPs are a great way to do that.

This service control policy only allows services that AWS certifies as PCI-compliant:

{
      "Version": "2012-10-17",
      "Statement": {
        "Sid": "AllowPCICompliantServices",
        "Effect": "Deny",
        "Resource": "*",
        "NotAction": [
          "access-analyzer:*",
          "account:*",
          "acm:*",
          "amplify:*",
          "amplifybackend:*",
          "apigateway:*",
          "application-autoscaling:*",
          "appstream:*",
          "appsync:*",
          "artifact:*",
          "athena:*",
          "autoscaling:*",
          "autoscaling-plans:*",
          "aws-portal:*",
          "backup:*",
          "... snip ~140 services..."
        ]
      }
    }

The full policy allows 157 services certified by AWS as PCI compliant or are security services that are out of scope. The list of AWS services compliant with one scheme or another grows constantly. So you'll need a strategy to get started and keep up. Fortunately, there are open source tools that generate an SCP that only allows access to services supporting a particular compliance scheme (examples: aws-allowlister and terraform_aws_scp).

You can stay out of trouble with regulators by only using AWS services certified for your compliance requirements. But you should go further.

Limit AWS services used within your organization to a strategic and sustainable set. The greatest adoption cost for an AWS service is often not found in the monthly AWS bill. Rather it's the effort spent understanding and making things work every day.

You'll need to find a balance between "using the best tool for the job" and "doing the job with the tools we know best".

AWS service sprawl is a real problem in many organizations. What portion of AWS services do you think delivery teams actually need to support the business mission? One-third? One half? Surely not all of them.

You don't need to be heavy-handed, but you do need to be deliberate about adopting new services. At least a few people will need to become skilled in using each of them. Define a process for reviewing and adopting new services. Let teams champion a new service by proposing how to use and support that service.

Next, we'll restrict operations to certain regions by leveraging request context keys.

Limit use to particular regions

Suppose you want to ensure your organization uses only two AWS regions: us-east-1 and us-west-2. You can't say that directly in the AWS console, but you can prevent use of AWS services outside of those regions using a Service Control Policy like:

{
     "Version": "2012-10-17",
     "Statement": [
       {
         "Sid": "DenyUnsupportedRegions",
         "Effect": "Deny",
         "Resource": "*",
         "NotAction": [
           "... list of services used by your organization ..."
         ],
         "Condition": {
           "StringNotEquals": {
             "aws:RequestedRegion": [
               "us-east-1",
               "us-west-2"
             ]
           }
         }
       }
     ]
    }

This policy works by denying requests to global services unless the request is executed in an approved region. AWS identifies the region every request is executed in with the aws:RequestedRegion context key so you can write logic against it.

Without the ability to use services like iam, it's practically impossible to use a region. Allow the services used in your organization by adding them to the NotAction list. When you need to accept requests from across the world, you may need to enable services like CloudFront in more regions.

If the entire organization operates in the same regions, you can attach the policy to the root OU so that it applies to all accounts. When business units operate in different regions of the world, apply an appropriate SCP to each business unit's OU.

Finally, let's use SCPs to prevent known-dangerous actions.

Prevent Dangerous Actions from Being Executed

You can prevent specific dangerous actions from being executed with SCPs. For example, once you've set up an AWS account there's a good chance you don't want anyone to be able to:

Remove or disassociate the AWS account from the AWS Organization
Stop CloudTrail logs
Stop or delete AWS Config recording

This protect account policy enforces those rules. Consider attaching a similar policy to the organizational root so it applies to all accounts.

You may want to protect data by denying the ability to delete RDS database clusters or DynamoDB tables. This policy might be attached only to production accounts. Or if deleting databases is only normal activity in dev, attach it to all non-development accounts.

For more SCP ideas, review the service's examples and the Secure Environment Accelerator's reference policies.

You can reduce a lot of big risks to the organization with Service Control Policy. Think through what your risks are, and what your target delivery process and architecture look like. This enables you to write and attach service control policies to the right accounts in the organization.

Summary

Scale AWS security with a strong account architecture that supports your desired delivery process and team autonomy.

Organize AWS accounts to support your organization's use cases, structure, and delivery processes. Partition major use cases into dedicated AWS accounts to create safe environments for data and workloads. Then control risk by limiting the AWS services and actions in use with Service Control Policies.

Next, create IAM principals for people and applications, then provision only the access they need.

Edit this page on GitHub

Effective IAM for AWS

Architect AWS organization for scale

Architect AWS organization for scale

Create security domains with AWS accounts

Partition accounts by Use Case

Enterprise use cases

Runtime use cases

Partition by Business Unit

Partition by Delivery Phase

Full example of an AWS Organization

Govern capabilities with Service Control Policies

Limit use to certified or approved services

Limit use to particular regions

Prevent Dangerous Actions from Being Executed

Summary

2. Why AWS IAM is so hard to use

4. Create IAM principals and provision access

On this page