Integrate API users 50% faster (Sponsored)
Creating a frictionless API experience for your partners and customers no longer requires an army of engineers. Speakeasy’s platform makes crafting type-safe, idiomatic SDKs for enterprise APIs easy. That means you can unlock API revenue while keeping your team focused on what matters most: shipping new products. Make SDK generation part of your API’s CI/CD and distribute libraries that users love at a fraction of the cost of maintaining them in-house.
Uber has a diverse customer base consisting of riders, drivers, eaters, couriers, and merchants.
Each user persona has different support requirements when reaching out to Uber’s customer agents through various live and non-live channels. Live channels are chat and phone while non-live is Uber’s inApp messaging channel.
For the users, the timely resolution of issues takes center stage. However, Uber’s main concern revolves around customer satisfaction and the cost of resolution of tickets. To keep costs in control, Uber needs to maintain a low CPC (cost-per-contact) with a good customer satisfaction rating.
Based on their analysis, they found that the live chat channel offers the most value when compared to other channels. It allows for:
Higher automation rate
Higher staffing efficiency since agents can handle multiple chats at the same time
High FCR (first contact resolution)
However, from 2019 to 2024, only 1% of all support interactions (also known as contacts) were served via the live chat channel because the chat infrastructure at Uber wasn’t capable of meeting the demand.
In this post, we look at how Uber built their real-time chat channel to work at the required scale.
The Legacy Chat Architecture
The legacy architecture for live chat at Uber was built using the WAMP protocol. WAMP or Web Application Messaging Protocol is a WebSocket subprotocol that is used to exchange messages between application components.
It was primarily used for message passing and PubSub over WebSockets to relay contact information to the agent’s machine.
The below diagram shows a high-level flow of the chat contact from being created to being routed to an agent on the front end.
This architecture had some core issues as follows:
1 - Reliability
Once the traffic scaled to 1.5X, the system started to face reliability issues.
Almost 46% of the events from the backend were not getting delivered to the Agent’s browser, adding to the customer’s wait time to speak to an agent. It also created delays for the agent resulting in wastage of bandwidth.
2 - Scale
Once the request per second crossed 10, the system’s performance deteriorated due to high memory usage and file descriptor leaks.
Also, it wasn’t possible to horizontally scale the system due to limitations with older versions of the WAMP library.
3 - Observability and Debugging
There were major issues related to observability and debugging:
It wasn’t possible to track the health of the chat contacts. This made it difficult to understand whether chat contacts were missing SLAs due to engineering or staffing issues.
Almost 8% of the chat volume wasn’t getting routed to any customer agent.
The WAMP protocol and its libraries were deprecated and didn’t provide sufficient insights into their inner workings. This made debugging for issues difficult.
4 - Stateful
The services in the architecture were stateful resulting in maintenance and restart complications.
This caused frequent spikes in message delivery time and losses.
Latest articles
If you’re not a paid subscriber, here’s what you missed.
To receive all the full articles and support ByteByteGo, consider subscribing:
Goals of the New Chat Architecture
Due to these challenges, Uber decided to build a new real-time chat infrastructure with the following goals:
Scale up the chat traffic from 1% to 80% of the overall contact volume by the end of 2023. This came to around 3 million tickets per week.
The process of connecting a customer to an agent after identification should have a greater than 99.5% success rate on the first trial.
Build end-to-end observability and debuggability over the entire chat flow.
Build stateless services that can be scaled horizontally.
The New Live Chat Architecture
It was important for the new architecture to be simple to improve transparency and scalability.
The team at Uber decided to go with the Push Pipeline. It was a simple WebSocket server that agent machines would connect to and be able to send and receive messages through one generic socket channel.
The below diagram shows the new architecture.
Below are the details of the various components:
Front End UI
This is used by the agents to interact with the customers.
Widgets and different actions are made available to the agents to take appropriate actions for the customer.
Contact Reservation
The router is the service that finds the most appropriate match between the agent and contact depending on the contact details.
An agent is selected based on the concurrency set of the agent’s profile such as the number of chats an agent can handle simultaneously. Other considerations include:
SLA-based routing: Chat contacts are prioritized based on SLA.
Sticky routing: Reopened contacts are sent to the same agents that handled the ticket earlier.
Priority routing: Prioritizing based on different rules
On finding the match, the contact is pushed into a reserved state for the agent.
Push Pipeline
When the contact is reserved for an agent, the information is published to Kafka and is received by the GQL Subscription Service.
On receiving the information through the socket via GraphQL subscriptions, the Front End loads the contact for the agent along with all the necessary widgets and actions.
Agent State
When the agent starts working, he/she goes online via a toggle on the Front End.
This updates the Agent State service, allowing the agent to be mapped to a contact.
GQL Subscription Service
The front-end team was already using GraphQL for HTTP calls to the services. Due to this familiarity, the team selected GraphQL subscriptions for pushing data from the server to the client.
The below diagram shows how GraphQL subscriptions work on a high level.
In GraphQL subscriptions, the client sends messages to the server via subscription requests. The server matches the queries and sends back messages to the client machines. In this case, the client machines are the agent machines.
Uber’s engineering team used GraphQL over WebSockets by leveraging the graphql-ws library. The library had almost 2.3 million weekly downloads and was also recommended by Apollo.
To improve the availability, they used a few techniques:
Enabled bidirectional ping pong messages to prevent hung sockets and disconnect them automatically in case of an unreliable connection.
Backed-off reconnects are automatically attempted after any disconnects. On successful reconnection, all the reserved or assigned contacts are fetched so the agent can receive them.
For a reserved contact, if the front end doesn’t send an acknowledgment to the chat service, they reserve the same contact for another available agent. The health of the WebSocket and http protocols are checked by sending a heartbeat over the GraphQL subscriptions.
Test Results from the New Chat Architecture
Uber performed functional and non-functional tests to ensure that both customers and agents received the best experience.
Some results from the tests were as follows:
Each machine was able to establish a 10K socket connection. They were able to horizontally scale and test for 20 times the number of events supported by the previous architecture.
Shadow traffic flows were tested with a capacity of 40K contacts and 2000 agents daily. The process didn’t reveal any problems and the latency and availability were satisfactory.
Existing traffic was directed through the new system with the old user interface for agents to serve as a reliability test. They were able to successfully manage the traffic while maintaining latency within the defined SLAs.
The earlier system had bugs where agents who finished work for the day remained online in the system if they closed their tabs. This resulted in increased customer wait times since the pipeline tried to push events to these offline agents. With the new system, they started to automatically log out agents based on recent acknowledgment misses.
The chat channel has been scaled to around 36% of the overall Uber Contact volume that is routed to agents. Over the next few months, the plan is to increase the share further.
The error rate of delivering the contact has gone down from 46% to just 0.45% in the new architecture.
The new architecture has fewer services, protocols, and better observability thereby improving the overall simplicity of the setup.
References:
SPONSOR US
Get your product in front of more than 500,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing hi@bytebytego.com.
Great read :)
Do we have any idea how many tickets they're *actually* doing per week, rather than just the perceived capacity?