Service Discovery for Serverless Microservices on AWS

March 13, 2023 • AWS, Serverless, Microservices, Service Discovery, Architecture

Across all my experience with microservices, coupling has been the single most significant cause of headaches. It requires effort and conscious decisions to avoid tightly coupling microservices together. Whether AMIs and EC2 instances, containers, or serverless, applications tend to become tightly coupled over time. The earlier in the development process foundations are laid to prevent tight coupling, the more effort you can save.

Service Discovery is a concept discussed in many microservices articles to help reduce tightly coupling services. Most of the articles I recall seeing were heavily focused on containers and didn’t engage with serverless applications. There are a variety of solutions available, be they from a cloud provider or open-source projects like some of those offered by the CNCF. But what solutions exist and work well for serverless applications?

Before diving into the options, consider why we want service discovery. A pattern I see frequently has URLs to other microservices hard-coded per environment in a service’s configuration. This pattern is not an issue if DNS is used appropriately, with predictable service-specific subdomains. This pattern is used by Kubernetes and works well; clients do not need to know how to look up the location of a given service. Instead, network communication uses DNS much as we generally expect. This pattern is unacceptable when URLs are specific to a given deployment; for example, they might contain an API Gateway ID. Under standard conditions, there are no issues. In a disaster recovery (DR) scenario, when new API Gateways are created in a new environment, the generated IDs will be different and necessitate numerous redeploys as each service is deployed and API Gateway IDs are hard coded. DR scenarios aside, a single API Gateway being recreated can cause a cascading wave of deployments if it is a central microservice. There are many kinds of tight coupling; here, I focus on the coupling between services that can exist at deployment time. Discussing coupling in the communication between services during runtime is a topic for another post.

In my exploration of potential service discovery solutions, I’ve focused on AWS, and I am focusing on options that are, effectively, serverless. I’ve also focused on solutions that receive updates without requiring redeployment. If Service A calls Service B’s API, I want to avoid redeploying Service A if Service B’s URL changes. Instead, I want to receive those changes dynamically.

For these reasons, I’ve excluded solutions from my investigation that don’t avoid this redeployment issue. One example solution is using CloudFormation Custom Resources to look up values based on tags. I’ve seen this used to look up network details, such as VPCs and subnets. So long as the custom resource is well made, this works fine; you will only want to move a resource to a new VPC or subnet with the intervention of a deployment.

I’ve created a proof-of-concept project showing my sample use case of the services below deployed using the AWS CDK. A link to the repository is at the bottom of the page.

AWS Cloud Map

AWS’s service discovery solution is called Cloud Map. I’d never seen it used for serverless applications, though there is an AWS Blog article about it. This was an excellent opportunity for a proof of concept project.

Cloud Map supports DNS-based service discovery as well as API-based service discovery. The DNS-based service discovery supports both public and private DNS. Generally, it’s best to keep internal application details private, so public DNS isn’t an appealing solution. As I am looking at serverless scenarios, the private DNS-based approach didn’t make sense either. If I am creating Lambda functions, I don’t attach them to a VPC unless that is required. AWS has mitigated the performance impacts for the most part, but it’s still an unnecessary use of resources in many cases.

Cloud Map breaks things down into namespaces, services, and service instances. In my proof of concept (POC), I defined a namespace in a shared infrastructure stack and made some of its properties available to the service stacks. For mock services, I created API Gateways that proxy HTTP calls to a cat facts API I found online. Each mock service stack defines a service in the Cloud Map namespace and then creates a service instance with details about the API Gateway.

One disappointment was the need for more built-in support for different resources when defining the service instances. If you want to reference an EventBridge bus, an SQS Queue, an API Gateway, or some other resource, you must define the structure of the data held yourself. I decided on a design that takes some inspiration from CloudFormation. I included a type, an ARN, and then a URL. This structure makes sense for an API Gateway or an SQS Queue but might need expansion for different resources.

{
  "type": "AWS::ApiGateway::RestApi",
  "arn": "arn:aws:apigateway:us-east-1::/restapis/abcdef123",
  "url": "https://abcdef123.execute-api.us-east-2.amazonaws.com/dev"
}

I was also disappointed with the AWS CDK and how much information it required about the namespace to create a service. In a real-world scenario, I want to hard-code or pass as little configuration into each service as possible. In working with the HttpNamespace class, I had to provide the namespace’s name, id, and ARN. Looking at the CloudFormation resource specification, only the namespace ID is required or used in that situation. It may be possible to add some extra casting, coerce the types, and provide fewer details. In the worst-case scenario, you could develop custom L2 CDK constructs to wrap the CfnService and CfnInstance constructs.

Systems Manager Parameter Store

I wanted to examine another solution to contrast Cloud Map. While AWS doesn’t describe Parameter Store as a service discovery solution, it can fit that purpose. In the end, we need to be able to retrieve details about our configuration. With Parameter Store, DNS is not an option; you must retrieve the values with an API call. One benefit of Parameter Store is a Lambda Extension to assist with retrieving and caching parameters. A walkthrough was published on the AWS Compute blog, giving a good introduction. Additional details are available in the AWS documentation for Systems Manager, the documentation could use expansion, but it’s enough to help you figure things out. One thing I wish the documentation called out more explicitly is that the response from the extension is the same as the response from the Systems Manager GetParameter API request. One potential feature that would make the extension even more helpful is supporting the GetParametersByPath request. I initially used the AWS SDK and the GetParametersByPath request to retrieve the values but switched to the Lambda Extension. One unexpected change was that I needed to increase the Lambda function’s timeout. In the end, it’s unsurprising; I moved from a single API request to retrieve my service URLs to four; two from my handler to the Lambda Extension and two from the extension to the Systems Manager API.

AWS Cloud Map

Systems Manager Parameter Store

Other Potential Solutions