How to remove AWS Credentials from your CI

Ben Riou
Inside Doctrine
Published in
11 min readMar 10, 2023

--

Keep your secrets… secret… thanks to OIDC Federation

The code written by our engineers comes to a living form thanks to automated pipelines called “Continuous Integration and Continuous Deployment systems” (or CI/CD for short). These are automated systems that complete various tasks to deploy to Production. Various CI/CD providers exist on the market today, and at Doctrine, we use CircleCI. The concept is relatively straightforward, any code modification on GitHub triggers a notification on the CI/CD provider, and some predefined lists of tasks are executed.

These automated systems need several credentials related to the targeted systems to carry out their mission, Push on ECR, deploy on EKS, and change a database, …

What if… returning from New Year’s eve, you receive a security notification from your CI/CD provider at 3 AM (Paris Time), informing you that all your CI stored secrets may have been leaked in nature? Yes, you guessed it right, this happened to the Foundation Team returning from St Sylvester. The January 2023 CircleCI security incident. Happy new year, team!

Dismay and Reactivity

Inventory the list of secrets deployed on the CI systems

When we first learned of the incident, we realized the potential severity upon finding each credential used by every CI pipeline. After a thorough inventory, we counted nearly 200 credentials stored in CircleCI, targeting 15 different systems or services.

Break the CI pipelines to safeguard the infrastructure and data.

The Foundation Team, responsible for infrastructure and CI systems, decided to locate and revoke all leaked secrets to ensure unauthorized access was impossible. This meant disabling developers’ ability to build or release changes, but that was the cost of reacting quickly. We immediately disabled or deleted any known AWS secrets, GitHub tokens, or SSH keys used by Circle CI.

Make sure we’ve not been compromised.

We must thoroughly investigate to ensure no unauthorized individuals have gained access to or targeted our sensitive systems and data. Therefore, our team has comprehensively reviewed all audit logs and activity in parallel across our systems to search for suspicious or abnormal behavior. After in-depth analysis of all of our logs over several days, we are confident that none of our secrets were used to gain unauthorized access.

Keep our secrets… secrets.

So now that we’ve broken everything, we need to fix all our CI pipelines to enable our engineers to deploy their new features. A race against time began as the 40 developers stared at our team. “When will the CI be restored? When will you be able to test and deploy our features?”

Trust is broken; not make the same mistake twice.

We identified the most critical secrets needed for our CI pipelines to run and investigated how to avoid providing them to CircleCI. One of the most critical secrets was a super-admin service account (IAM User) used to deploy any service on AWS.

We decided to remove any IAM keys on CircleCI and instead use an OIDC federation and short-lived STS credentials.

OIDC Federation — Traveler analogy

If you’ve ever booked a hotel on a third-party booking platform, you likely know what an open ID connect foundation is.

Only by contracting with a third party (Booking.com in our example), can you gain access to some resources (like a hotel room via a keycard) at an establishment you’ve never contacted before (the hotel). This magic is possible because the third-party booking platform and the hotel established a partnership beforehand.

OIDC Federation — Real-life example

The only difference in the real-life example is that no one goes anywhere (we know, that’s sad!). The hotel is Amazon Web Services, and the booking platform is the CI provider. The traveler is the CI docker runner. The room is replaced by the actual resources the CI can access. Your CI runner can still present a token to AWS to justify its identity.

If the token holder has permission to perform actions on AWS, IAM will associate a role with the token holder (the CI). The token holder can then request temporary credentials from STS based on the assigned role.

The OIDC Token

An OIDC token is a JSON web token (JWT) provided by CircleCI as an environment variable to the docker container. JWT tokens contain three parts:

  • A header contains metadata about the JSON web token (JWT) itself.
  • The payload holds important details used for OpenID Connect (OIDC) authentication.
  • A signature from the OIDC token issuer verifies the token.

If you want an exhaustive explanation, see this link. Here’s a simplified example:

https://jwt.io/ has been used to parse the JWT token

The most interesting part is, of course, the payload, which contains standardized values.

  • SUBject — Who was granted the OIDC token to → This contains three unique IDs: the CircleCI organization, the project (repository), and the user who ran the CI process.
  • AUDience — The CircleCI organization → This is a unique ID for our company interactions with CircleCI
  • ISSuer — The authority that generated the (OIDC) OpenID Connect token
  • IAT (Issued At) — The date the (OIDC) OpenID Connect token was created
  • EXPiration — When the (OIDC) OpenID Connect token will expire

All these fields are decrypted and can be used in any IAM Policy — more on that soon.

Setting up the OIDC Federation (CircleCi > AWS)

To set up the OIDC federation, we need to declare a new Identity Provider. This can be achieved via IaC, or on the AWS Console on : IAM / Access Management / Identity Providers.

The IAM user guide provides very detailed information on the process. Issuing a certificate thumbprint signature from the OIDC provider is the only challenging part — but this process is well documented.

The OpenID Connect federation utilizes the “well-known openid configuration” endpoint provided by your OIDC provider. This standardized configuration file is automatically retrieved by AWS IAM when the configuration is added.

If you’re curious, append .well-known/openid-configuration to the OIDC provider URL to retrieve it. The example should look like this :

Example of openid-configuration file

The OIDC Federation can be configured via the AWS Console, Terraform or CloudFormation.

The OIDC-Provider IAM object

Once set up, an OpenID Connect Federation (OIDC) resource appears in IAM. Like an IAM user, an OIDC provider alone cannot do anything. However, it can authenticate an OIDC token sent to AWS.

The OIDC provider acts like an IAM user: it has no permissions by default. By itself, the OIDC provider can only verify an OIDC token submitted to AWS.

Example of OIDC provider configured for CircleCI on IAM

Each “audience” (an OIDC token’s AUD field that matches a CircleCI Organization) can have a default set of permissions (via a default role). More interestingly, an OIDC Provider can AssumeRoleWithWebIdentity for a role if the Trust Relationship allows it.

An OIDC Provider can take on a role for a specific audience if it has a trusted relationship that permits it.

Example of roles assumable by an OIDC trusted entity

Here’s a simple example of trust relationship assumed.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "circlecioidcassumerole",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::XXXXXXXXXXX:oidc-provider/oidc.circleci.com/org/7b75b559-4f9d-4ad0-92c5-XXXXXXXXXXX"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringLike": {
"oidc.circleci.com/org/7b75b559-4f9d-4ad0-92c5-XXXXXXXXXXX:sub": "org/7b75b559-4f9d-4ad0-92c5-XXXXXXXXXXX/project/2262bc9f-3590-4d3b-9457-45de4b01051b/*"
}
}
}
]
}

Additional conditions to assume a role

Something important here : looking at the Principal, we see the CircleCI OIDC provider declared.

We have a single CircleCI OIDC provider for our entire organization, covering all CI workflows.

By default, any CI pipeline with a valid OIDC token issued by CircleCI should be able to assume any role where the trusted entity is the OIDC provider. However, this also means we cannot control which roles each CI can assume.

That’s problematic, as we want to limit each CI to a single dedicated role.

This is where the condition block becomes handy. We can inspect the OIDC token details and filter based on the conditions. Luckily, the OIDC token contains a “Subject” field, and each pipeline has its own project ID in that field.

This is how you can specify which CI/CD pipeline should have a particular function.

On the CI pipeline (the client) side

We’ve created a private orb on CircleCI that manages several steps automatically. For the developers, a single line is required : doctrine-common-aws/setup-oidc

jobs:
terraform-plan-apply:
executor: python
parameters:
stack:
type: string
steps:
- checkout:
path: ~/project/
- doctrine-common-aws/setup-oidc <<<<< here

Invoking this private orb results in 5 different steps.

Let’s dive into each step to understand each detail.

As a friendly reminder, the AWS CLI can be configured in multiple ways. However, the configuration order is essential as some locations will overwrite existing settings.

The first step is just a user-friendly information notice about the role that we want to assume.

The role and session name are automatically defined using predefined Circle CI environment variables.

******** Checking requirements... ********
About to assume the following role circleci-oidc-doctrine-infra on account production
The session name is : doctrine-infra

In the second step, the AWS CLI is retrieved and installed.

In the third step, the role is assumed via the following command. The STS (Security Token Service) is invoked, and we retrieve three values in a text format:

  • AccessKeyId
  • SecretAccessKey
  • SessionToken
aws sts assume-role-with-web-identity \\
--role-arn "${PARAM_AWS_CLI_ROLE_ARN}" \\
--role-session-name "${PARAM_ROLE_SESSION_NAME}" \\
--web-identity-token "${CIRCLE_OIDC_TOKEN}" \\
--duration-seconds "${PARAM_SESSION_DURATION}" \\
--query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \\
--output text

The PARAM values are automatically defined based on the AWS CLI public orb example. Each CI project will assume a role with its own name, allowing a single role per CI project.

In the fourth step, we write these credentials in the ~/.aws/configure file. This is the correct method to preserve these credentials across multiple CI steps.

aws configure set aws_access_key_id \\
"$PARAM_AWS_CLI_ACCESS_KEY_ID" \\
--profile "$PARAM_AWS_CLI_PROFILE_NAME"
aws configure set aws_secret_access_key \\
"$PARAM_AWS_CLI_SECRET_ACCESS_KEY" \\
--profile "$PARAM_AWS_CLI_PROFILE_NAME"
if [ -n "${AWS_SESSION_TOKEN}" ]; then
aws configure set aws_session_token \\
"${AWS_SESSION_TOKEN}" \\
--profile "$PARAM_AWS_CLI_PROFILE_NAME"

In the last and final step, we perform a simple invocation to ensure that the role has been correctly assumed and the credentials are valid.

AWS_PAGER= aws sts get-caller-identity
{
"UserId": "AROA6DJELIKAIYEGYPYNO:doctrine-infra",
"Account": "************",
"Arn": "arn:aws:sts::************:assumed-role/circleci-oidc-doctrine-infra/doctrine-infra"
}
CircleCI received exit code 0

Who does what?

One issue we faced was assigning a single “super-admin” role for all our CI projects. Now that each CI project has a dedicated, limited role, it’s simpler to define restricted IAM permissions, per CI.

However, we did not have the time to write a separate IAM statement for each CI, so we tried to automate the creation of IAM statements based on the needs of each CI.

The idea is simple :

  1. deploy the role with unrestricted access (the IAM statement allows anything) ;
  2. let the CI project use its dedicated role for a few days ;
  3. analyze the CloudTrail events logged on behalf of an assumed role ;
  4. and deduct a boundary-limited IAM Policy.

This assumes that you’ve enabled beforehand a Trail Audit on management events, as well as data events for S3, Lambda and DynamoDB.

IAM Policy Generator

On the AWS Console (IAM), the IAM policy Generator, you can ask for a policy that covers a user or role activity over 90 days.

Wizard for IAM Policy Generator

Connect to the AWS Console, select IAM, then pick a role or user. At the bottom of the page, the “Generate Policy” button takes you to a dedicated wizard.

The main shortcoming of the IAM Policy Generator is that the policies it generates are far from comprehensive (it does not generate policy statements for all available services).

We found two alternatives to attempt to achieve the same result: TrailScaper and CloudTracker.

TrailScaper

TrailScaper acts the same way as the IAM Policy Generator, in a more exhaustive way.

Python-based, TrailScapes can easily be deployed on a local computer without additional setup. However, it requires gathering the CloudTrail logs from S3 in a first place.

The generated IAM policies are very detailed (at object-level for S3).

CloudTracker

CloudTracker compares an existing IAM role or user with a list of CloudTrail events. It detects over-privilege and offers remediation.

It is based on Amazon Athena and not the project hasn’t been updated for three years. Not sure if it is still active.

In the end, we were unable to automatically generate IAM statements based on role activities. This demanding job requires manual testing and iteration to tailor a correct set of permissions.

Wrap Up

What a journey!

We have been able to remove any AWS long-term credentials from CircleCI, and we’re now generating custom short-term credentials. Once set, the operations are very reliable, and we can sleep better.

This was only a part of the secret rotation effort required by this leakage.

Here are some takeaways about this OIDC migration :

  • This process is straightforward to set up and well-documented.
  • Having dedicated roles for continuous integration allows for better auditing and monitoring.
  • The AWS operations behave the same way with short-lived keys as with permanent programmatic keys.

Switching to OIDC is safer :

  • There are no more access keys to rotate in AWS.
  • The on-the-fly generated programmatic access keys temporarily expire after a certain period.
  • We have lowered the risk of our access keys being leaked. This risk was very high, especially since programmatic access keys used for CI does not use multi-factor authentication.

However, if you haven’t adopted a least-privileged access policy from the start, it’s very difficult — or simply impossible — to automatically generate the appropriate, limited policy.

This is likely the key takeaway here. Turn a security vulnerability into an opportunity, and always consider security and isolation at the start of projects and infrastructure.

What’s next ?

Thanks to the Vault SSH Engine, we have also removed all SSH keys from CircleCI. We are also progressively removing any secret key value from CircleCI. With about 200 secrets to migrate, this process takes time.

We’ll chat more about this in our next update!

--

--