WorkOS: enterprise-grade auth for modern SaaS apps (Sponsored)
WorkOS helps companies become Enterprise Ready ridiculously fast.
→ It supports a complete User Management solution along with SSO, SCIM, RBAC, & FGA.
→ Unlike other auth providers that rely on user-centric models, WorkOS is designed for B2B SaaS with an org modeling approach.
→ The APIs are flexible, easy-to-use, and modular. Pick and choose what you need and integrate in minutes.
→ Best of all, User Management is free up to 1 million MAUs and includes bot protection, impersonation, MFA, & more.
Disclaimer: The details in this post have been derived from multiple AWS reInvent talks and articles by the Amazon engineering team. All credit for information about AWS Lambda’s internals goes to the presenters and authors. The link to these videos and articles is present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.
AWS Lambda is a serverless compute service that allows developers to run code without provisioning or managing servers.
Lambda functions can be triggered by various AWS services, HTTP endpoints, or custom events generated by applications. This makes it highly versatile for building event-driven architectures.
There are some interesting stats about AWS Lambda:
Processes over 10 trillion invocations per month.
More than a million customers have built applications using AWS Lambda.
The customer base includes individuals, startups, enterprises, and large organizations.
Supports various applications, such as IT automation, data processing pipelines, microservices-based applications, web applications, and machine learning workloads.
The timeline below shows the evolution of AWS Lambda over the years.
In this post, we will look at the architecture and internals of AWS Lambda.
How is a Lambda Function Invoked?
There are two types of invocation models supported by Lambda.
1 - Synchronous Invocation
In synchronous invocation, the caller directly calls the Lambda function. This can be done using the AWS CLI, SDK, or other services like the API Gateway.
The diagram below shows the entire process:
The process involves the following components:
Front-End Service: The front-end service performs authentication and authorization of the request. It also loads and caches metadata to optimize performance.
Counting Service: This service checks quota limits based on account, burst, or reserved concurrency. It ensures high throughput and low latency while operating across multiple availability zones (AZs).
Assignment Service: The assignment service routes invocation requests to available worker hosts. It coordinates the creation of secure execution environments, setting up limits such as memory and CPU allocation. We will talk more about it in a later section.
Placement Service: The assignment service communicates with the placement service to determine the best worker host to handle the new execution environment. The placement service uses machine learning models to make informed decisions about where to place new execution environments. These models take into account variables such as packing density and fleet utilization to make the decisions.
Worker Hosts: They are responsible for downloading, mounting, and running the function code within a microVM. They manage multiple runtimes (such as Node.js, Java, and Python) and Lambda agents for monitoring and operation.
When an invocation occurs, it can have a warm or cold start.
If a warm (already running) execution environment is available, the payload is sent directly to it, and the function runs immediately.
If no warm environment is available, a new execution environment is created. The process involves initializing the runtime, downloading function code, and preparing the environment for execution.
The assignment service keeps track of execution environments. If there are errors during initialization, the environment is marked unavailable. When environments are near the end of their lease, they are gracefully shut down to ensure stability and resource efficiency.
2 - Asynchronous Invocation
In asynchronous invocation, the caller doesn’t wait for the function’s response. Instead, the event is placed in an internal queue and processed separately.
The process involves:
Event Invoke Frontend Service: This service handles asynchronous requests. It authenticates and authorizes the request, then places the event in an internal SQS queue.
Polling Instances: A fleet of pollers reads messages from the SQS queues and sends them to the synchronous front-end service for processing.
Synchronous Processing: Despite being an async request, the actual function execution uses the same synchronous processing pathway as described earlier.
Event Destinations: These can be configured to handle callbacks after processing, providing notifications of success or failure.
The diagram below shows the asynchronous invocation process:
Lambda manages multiple internal queues, scaling them dynamically based on load. The queue manager oversees the creation, deletion, and monitoring of these queues to ensure efficient processing.
As far as polling is concerned, Lambda can poll from various event sources like Kinesis, DynamoDB, SQS, Kafka, and Amazon MQ. The pollers perform the following functions:
Read and Filter: Poll messages from the sources, optionally filtering them to reduce traffic and simplify function code.
Batch Processing: Batch records together before sending them to the function, optimizing performance and reducing costs.
Synchronous Invocation: Use the same synchronous pathway to execute the function, ensuring consistency.
Control plane services (such as State Manager and Stream Tracker) manage the pollers, ensuring they are assigned correctly, scaling them up or down, and handling failures gracefully.
In case a function fails, the message is returned to the queue with a visibility timeout, allowing for retries.
Latest articles
If you’re not a paid subscriber, here’s what you missed.
To receive all the full articles and support ByteByteGo, consider subscribing:
State Management in Lambda
In the earlier architecture of AWS Lambda, the Worker Manager service was responsible for coordinating the execution environments between the frontend invoke service and the worker hosts.
The worker manager performed a couple of important tasks:
State Management: The Worker Manager kept track of which execution environments were available and on which hosts they were running.
Execution Lifecycle: It managed the lifecycle of these environments, ensuring that they were ready to process invocations and handling termination when no longer needed.
There were some challenges with the worker manager:
Single Point of Failure: Each Worker Manager instance stored the state of execution environments in memory. If a Worker manager failed, the execution environments it managed could become orphaned.
Orphaned Environments: Orphaned environments could continue processing current invocations but could not be used for new invocations, leading to cold starts and inefficient resource utilization.
Zonal Failures: In case of a failure in an Availability Zone (AZ), Worker Managers in that AZ would become unavailable. This will also make the environments managed by them unavailable, resulting in capacity loss and increased cold start latencies.
Introduction of the Assignment Service
To address the challenges with Worker Manager, AWS Lambda introduced the Assignment Service which offers a more robust and resilient way to manage execution environments.
The Assignment Service is written in Rust, which is chosen for its performance, memory safety, and low latency characteristics.
The diagram below shows the high-level design of the Assignment Service
The architecture of the Assignment Service has some key properties:
1 - Partitioned Design
The Assignment Service is divided into multiple partitions. Each partition consists of one leader and two followers, spread across different Availability Zones (AZs). This ensures high availability and fault tolerance.
Multiple partitions run simultaneously, with each Assignment Service host managing several partitions. This helps with load distribution and provides redundancy.
2 - External Journal Log Service
The leader in each partition writes the state of execution environments to an external journal log service. The followers read from this log to keep their state in sync with the leader.
In case the leader fails, one of the followers can quickly take over as the new leader using the state information from the journal log.
3 - Frontend Interaction
The frontend invoke service interacts with the leader of the relevant partition to manage invocations.
The leader coordinates with the placement service to create new execution environments and writes this information to the log for followers to update their state.
4 - Failure Handling
If an AZ experiences an outage, the leader and followers in other AZs continue to operate. This means that execution environments in the unaffected AZs remain available and aren’t orphaned.
If a leader fails, a follower can take over quickly using the replicated state, ensuring that there are no interruptions in service and reducing the chances of cold starts.
The Role of Firecracker MicroVM
Firecracker is a lightweight virtual machine manager (VMM) designed specifically for running serverless workloads such as AWS Lambda and AWS Fargate. It uses Linux’s Kernel-based Virtual Machine to create and manage microVMs.
The primary goal of Firecracker is to provide secure and fast-booting microVMs.
Initially, AWS Lambda used EC2 instances to run functions, dedicating a T2 instance to each tenant. This was secure but inefficient, leading to high overhead due to the need for provisioning entire EC2 instances for each function.
In 2018, Lambda transitioned to using Firecracker for managing execution environments. This allowed for much smaller, lightweight microVMs, significantly reducing resource overhead and improving the fleet’s efficiency.
Here are some key technical details of the Firecracker implementation.
1 - MicroVMs and Isolation
Firecracker enables the creation of microVMs, which are smaller and more efficient than traditional VMs. Each microVM runs its isolated execution environment, including the runtime, function code, and any extensions.
Firecracker ensures a strong solution between microVMs, which is crucial for multi-tenant environments where different customer functions run on the same physical host.
2 - Startup Times
Firecracker microVMs are designed to boot in milliseconds, significantly reducing the cold start latency for Lambda functions.
For further optimization, Lambda uses a feature called SnapStart (initially available for the Corretto Java 11 runtime). SnapStart pre-initializes execution environments and takes snapshots, which are restored on demand when needed. We will talk more about it in the next section.
3 - Resource Utilization
By running multiple microVMs on a single physical host, Firecracker improves CPU utilization, memory, and storage resources. This allows Lambda to handle more functions with fewer resources.
Also, Firecracker supports the dynamic scaling of resources for quick adjustment to changing workloads.
4 - Storage and Networking
Lambda uses a virtual file system driver to present an EXT4 file system to the microVMs. This driver handles file system calls and maps them to the underlying storage.
Firecracker optimizes network performance by providing high-speed virtual network interfaces, ensuring fast and reliable communication.
Lambda as a Storage Service
AWS Lambda is primarily known as a serverless compute service. However, it also incorporates significant storage service functionalities.
Think of Lambda not just as a compute service but also as a storage service because it deals with storing and managing a lot of data behind the scenes.
But why is storage important for Lambda?
It’s because when you run a function on Lambda, it needs three main things:
Input Data: The information or data your function will process
Function Code: The code you wrote that tells Lambda what to do
Execution Environment: The virtual machine (VM) where your code runs.
To make all of this work smoothly, Lambda needs to manage and store these components efficiently.
1 - Storing Code and Data Efficiently
Instead of storing your entire code or data as one big piece, Lambda breaks it down into smaller pieces called chunks.
This process is known as chunking and it allows Lambda to handle large and complex functions more effectively.
Here’s how it works:
Breaking down container images: When you upload a container image (which can be up to 10 GB) to AWS Lambda, it doesn’t store the entire image as one big file. Instead, Lambda breaks this image down into smaller pieces or chunks. For example, the base OS layer is broken down into smaller pieces, the runtime layer into another set of pieces, and so on.
Storage and Access: These chunks are then stored separately, making it easier to manage and retrieve them as needed. When your function runs, Lambda only loads the chunks it needs at that moment, rather than the entire image. This is much faster and more efficient.
Retrieving Chunks: When your function is invoked, Lambda checks which parts of the container image it needs to execute the function. It then retrieves only those specific chunks from storage.
Optimizing Performance: Since only the required chunks are loaded, this reduces the amount of data that needs to be transferred and processed, leading to faster startup times and better performance. Frequently accessed chunks can be cached to avoid fetching them from the main storage repeatedly.
2 - Sharing Common Data
Lambda often runs many functions that use similar code or data such as base operating system layer or common runtime libraries.
Instead of storing multiple copies of the same thing, it stores one copy and reuses it.
To make sure the shared data is secure, Lambda uses a technique called convergent encryption.
But how does convergent encryption work?
Plaintext Chunk Creation: Lambda starts with a plaintext chunk, which is a segment of the flattened file system of a container image. Each chunk represents a specific range of the block device that makes up the file system.
Appending Extra Data: To ensure security and integrity, additional data is appended to the chunk. This extra data is generated by the Lambda service and can include metadata or unique identifiers. The purpose of this extra data is to add a layer of uniqueness to the chunk.
Hash Computation: A cryptographic hash function is used to compute a hash of the combined plaintext chunk and extra data. This hash serves as the encryption key for that specific chunk, ensuring that identical chunks produce the same hash and thus the same encryption key.
Encryption: The plaintext chunk, along with the extra data, is then encrypted using the computed hash as the encryption key.
Manifest Creation: Lambda creates a manifest file that contains the keys (hashes) and pointers to the encrypted chunks in the storage subsystem. The manifest itself is encrypted with a customer-specific AWS Key Management Service, ensuring that only authorized entities can access it.
3 - Making Initialization Faster with SnapStart
When a Lambda function is invoked and there isn’t an already running execution environment, Lambda has to create a new one. This involves several steps:
Provisioning: Setting up a new virtual machine
Downloading: Retrieving the function code from storage
Initializing: Starting the runtime and initializing the function code.
These steps can take time, especially for complex functions or runtimes like Java, which have longer startup times. This delay is known as cold start latency.
SnapStart is a feature designed to significantly reduce cold start latency by pre-initializing the execution environment and then using snapshots.
Here’s how it works:
When you update your function code or deploy a new version, Lambda performs an initial setup and creates a new execution environment.
Once the execution environment is fully set up and ready to handle invocations, Lambda takes a snapshot of this environment. The snapshot captures the entire state of the VM, including the memory state, runtime state, and initialized data. The snapshot is then broken down into smaller pieces and encrypted to ensure security.
These encrypted chunks are stored securely.
When a new invocation request arrives, Lambda first checks if there’s an already running execution environment (a warm start). If not, it proceeds to use the snapshot. The necessary chunks are loaded and decrypted and the environment is brought back to the state it was in when the snapshot was taken.
Conclusion
AWS Lambda revolutionized the way developers build and deploy applications by providing a serverless environment that abstracts away the complexities of infrastructure management.
Here are the main takeaways from the internals of AWS Lambda:
To achieve the impressive statistics associated with Lambda, AWS leverages multiple advanced techniques such as chunking and convergent encryption to manage and share common data.
The introduction of SnapStart addressed the critical challenge of cold start latency, particularly for runtimes like Java.
The transition from the Worker Manager to the Assignment Service marked a significant improvement in state management.
With Firecracker microVMs, Lambda achieves lightweight and secure virtualization.
References:
SPONSOR US
Get your product in front of more than 500,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing hi@bytebytego.com
Thank you for the newsletter, once you understand the internals of the tools you use; I believe you use them more efficiently
I don’t understand the last part/decision to start containers operating in a firecracker VM on an ec2 host . Could they use ec2 .metals for security and just invoke a docker container ?
I mean I’m aware of container escapes but how common are they really especially if you have a WAF in front of the service