jasonbutz.info

IAM Policy Conditions & SQS Queue Access

AWS, IAM, SQS, TIL

The other week, I was helping a client work through an interesting challenge. The entire problem resulted from decisions made when the company was designing how they would build and connect their AWS accounts. They had decided there would be two kinds of AWS accounts: one with access to their internal network and one without. The accounts with access to the internal network cannot have any ingress from the public internet; all ingress must be through the corporate network. From a security perspective, I see why they made that choice. They decided to trust AWS and allow ingress to their network from these accounts but disallow ingress from the internet to reduce the chances of an adversary gaining access to their network. For the team I am working with, this presents a problem. They need to integrate with an outside SaaS service and receive webhook requests, but the Kubernetes cluster they are using is part of a platform in one of these accounts without internet ingress. That leaves them unable to receive the webhook events. I proposed we use one of these accounts with public internet ingress to build the integration and then take advantage of the cluster using EKS and supporting IAM Roles to gain access to resources in a different account and integrate with SQS to receive the webhook events.

AWS architecture diagram showing two AWS accounts. In one account is an API Gateway with an arrow to a Lambda function with an arrow to an SNS Topic with arrows to two SQS Queues. The other account shows an EKS Cluster with two pods, each with an arrow to one of the SQS Queues in the other account.

I created an API Gateway with a single endpoint to receive the webhook request from the SaaS application. I built a Lambda function to validate the webhook request, perform limited validation, and then publish the event to an SNS topic. I connected that Lambda to the API Gateway with a Lambda integration. I didn’t use a Lambda Authorizer with the API Gateway because validating the webhook required access to the HTTP request’s body, which isn’t available to Lambda Authorizers. The SNS topic filters messages and ensures only the right messages reach the SQS queues. Overall, it’s a simple integration, but it’s very effective.

Identity & Access Management

I have multiple AWS certifications and good working knowledge of IAM policies, but it sometimes feels like magic. The Policy Evaluation Logic page in the AWS IAM documentation is a must-have reference when doing more complex work. One of my weak points with IAM policies is the conditions. I haven’t used them enough to be entirely confident, so I go through trial and error. In this case, the SQS resource policy and working with cross-account access meant I spent a lot of time in the IAM documentation.

The easy part of the queues’ resource policy was allowing the SNS topic to send messages. I found an example in AWS’s documentation and barely had to think about the conditions. I knew it should “just work.” The policy below lets the SNS service send messages to the specified SQS queue when the source is the listed SNS topic.

{
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "sns.amazonaws.com"
      },
      "Action": "sqs:SendMessage",
      "Resource": "arn:aws:sqs:us-east-2:000000000000:MyQueue",
      "Condition": {
        "ArnEquals": {
          "aws:SourceArn": "arn:aws:sns:us-east-2:000000000000:MyTopic"
        }
      }
    }
  ]
}

To allow the pods in the other account access to receive and delete messages, I figured I could do something similar, so I copied and modified the policy statement. I’ve got an example of what I came up with below. For those of you who know your IAM policies, you might already see where I messed up.

{
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "sns.amazonaws.com"
      },
      "Action": "sqs:SendMessage",
      "Resource": "arn:aws:sqs:us-east-2:000000000000:MyQueue",
      "Condition": {
        "ArnEquals": {
          "aws:SourceArn": "arn:aws:sns:us-east-2:000000000000:MyTopic"
        }
      }
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": ["sqs:ReceiveMessage", "sqs:DeleteMessage"],
      "Resource": "arn:aws:sqs:us-east-2:000000000000:MyQueue",
      "Condition": {
        "ArnLike": {
          "aws:SourceArn": [
            "arn:aws:iam::999999999999:role/MySemiPredictableRoleName*"
          ]
        }
      }
    }
  ]
}
// NOTE: This policy does not work as expected; do not use it

This policy didn’t work out; the application running on the pods was logging IAM errors about not having access due to the queue’s resource policy. I began working through the policy evaluation logic for IAM, focusing on resource policies. I tried using the ArnEquals condition type. I tried using the IAM role session instead of the IAM role. I tried changing where the wildcard (*) was. I couldn’t get it to work. So, after about 30 minutes of frustration and redeploying the policy, I started digging into the IAM condition keys.

Looking into the AWS docs for aws:SourceArn I was about to find my answer. The aws:SourceArn condition key is for when making service-to-service requests where the principal is an AWS service. Instead, I needed to use aws:PrincipalArn. It was all spelled out for me in the documentation. I’ve included an excerpt from the AWS documentation below.

Use this key to compare the Amazon Resource Name (ARN) of the resource making a service-to-service request with the ARN that you specify in the policy, but only when the request is made by an AWS service principal. When the source’s ARN includes the account ID, it is not necessary to use aws:SourceAccount with aws:SourceArn.

This key does not work with the ARN of the principal making the request. Instead, use aws:PrincipalArn.

As it turned out, aws:SourceArn is a condition key used to avoid the confused deputy problem during actions between services. The documentation says to only use the condition key in resource-based policies where the Principal is an AWS service principal. The confused deputy problem is a security issue where an entity without access to a resource coerces a more privileged entity to access the resource. The IAM documentation has a great page discussing the confused deputy problem.

Now that I knew the correct condition key, the application could access the SQS queues. Below is what my resource policy looked like.

{
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "sns.amazonaws.com"
      },
      "Action": "sqs:SendMessage",
      "Resource": "arn:aws:sqs:us-east-2:000000000000:MyQueue",
      "Condition": {
        "ArnEquals": {
          "aws:SourceArn": "arn:aws:sns:us-east-2:000000000000:MyTopic"
        }
      }
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": ["sqs:ReceiveMessage", "sqs:DeleteMessage"],
      "Resource": "arn:aws:sqs:us-east-2:000000000000:MyQueue",
      "Condition": {
        "ArnLike": {
          "aws:PrincipalArn": [
            "arn:aws:iam::999999999999:role/MySemiPredictableRoleName*"
          ]
        }
      }
    }
  ]
}

This policy did precisely what I needed, and the application could pull messages from the SQS queues as soon as I deployed my update. It’s such a small change, and there isn’t anything profound about it. I usually write posts to share something interesting I found, how I do things, or to show something off. This post does show something off, but it’s one of my mistakes. It reminds me to double-check that I’m using IAM conditions right, and hopefully, next time, I won’t make this same mistake.