FusionAuth: Auth. Built for devs, by devs. (Sponsored)
Hosting Flexibility: You host or we host - the choice is yours with no loss of features
Scale Confidently: Lightning-fast performance for 10 users or 10 million users (or more)
Developer-Centric: True API first design, quick integration, built on standards, highly flexible & customizable
Total Control: Deploy on any computer, anywhere in the world and integrate easily with any tech stack
Data Isolation: Single tenant by design means your data is physically isolated from everyone else's
Unlimited: Unlimited IDPs, unlimited users, unlimited tenants, unlimited applications, always free
FusionAuth is a complete auth & user platform that has 10M+ downloads and is trusted by industry leaders!
Disclaimer: The details in this post have been derived from the Facebook/Meta Engineering Blog. All credit for the technical details goes to the Facebook engineering team. The links to the original articles are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.
A clock showing the wrong time is worse than a faulty clock.
This is the challenge Facebook had to deal with while operating millions of servers connected to the Internet and each other.
All of these devices have onboard clocks that are expected to be accurate. However, many onboard clocks contain inaccurate internal oscillators, which cause seconds of inaccuracy per day and need to be periodically corrected.
Think of these internal oscillators as the “heartbeat” of the clock. Just like how an irregular heartbeat can affect a person’s health, an inaccurate oscillator can cause the clock to gain or lose time.
Incorrect time can lead to issues with varying degrees of impact. It could be missing a simple reminder or failing a spacecraft launch.
As Facebook’s infrastructure has grown, time precision has become extremely important. For example, knowing the accurate time difference between two random servers in a data center is critical to preserving the order of transactions across these servers.
In this post, we’ll learn how Facebook achieved time precision across its millions of servers with NTP and later with PTP.
Network Time Protocol
Facebook started with Network Time Protocol (NTP) to keep the devices in sync.
NTP is a way for computers to synchronize their clocks over a network. It helps ensure that all devices on the network have the same, accurate time.
Having synchronized clocks is critical for many tasks, such as:
Scheduling events and meetings
Logging and tracking activities
Ensuring a proper sequence of transactions
Coordinating actions between different systems
NTP uses a hierarchical system of time servers where the most accurate servers are at the top. There is a 3-step process to how NTP works:
Computers on the network periodically request the current time from these servers.
The servers respond with their current time taking into account the network delay.
The requesting computer adjusts its clock based on the information received from the server.
The diagram below shows the hierarchical system of servers used by NTP.
Facebook built an NTP service at scale. They used chrony, a modern NTP server implementation. While they used the ntpd service initially, testing revealed that chrony was far more accurate and scalable.
Chrony was a fairly new daemon at the time, but it offered a chance to bring the precision down to nanoseconds. Also, from a resource consumption perspective, chrony consumed less RAM compared to ntpd.
See the diagram below that shows the difference of around 1MB when it came to RAM consumption between chrony and ntpd.
They designed the NTP service in four layers based on the hierarchical structure of NTP.
Stratum 0 is a layer of satellites with precise atomic clocks from a global navigation satellite system (GNSS), such as GPS, GLONASS, or Galileo.
Stratum 1 is a Facebook atomic clock synchronizing with a GNSS.
Stratum 2 is a pool of NTP servers synchronizing to the Stratum 1 devices.
Lastly, Stratum 3 is a tier of servers configured for a larger scale.
The diagram below shows the layers of Facebook’s NTP service.
There are a couple of interesting concepts to note over here:
Leap second
Smearing
The Earth’s rotation is not consistent and can vary slightly over time. Therefore, clocks are kept in sync with the Earth’s rotation by occasionally adding or removing a second. This is called a leap second.
While adding or removing a leap second is hardly noticeable to humans, it can cause server issues. Servers expect time to move forward continuously, and a sudden change of a second can cause them to miss important tasks.
To mitigate the impact of leap seconds on servers, a technique called “smearing” is used.
Instead of adding or removing a full second at once, the time is gradually adjusted in small increments over several hours. It’s similar to masking a train’s delay by spreading the adjustment across multiple stations.
In the case of Facebook’s NTP service, the Leap-second smearing happens at Stratum 2. The Stratum 3 servers receive smeared time and are ignorant of leap seconds.
Latest articles
If you’re not a paid subscriber, here’s what you missed.
To receive all the full articles and support ByteByteGo, consider subscribing:
The Arrival of Precision Time Protocol
NTP adoption was quite successful for Facebook. It helped them improve accuracy from 10 milliseconds to 100 microseconds.
However, as Facebook wanted to scale to more advanced systems and build the metaverse, they wanted even greater levels of accuracy.
Therefore, in late 2022, Facebook moved from NTP to Precision Time Protocol (PTP).
There were some problems with NTP, which are as follows:
NTP and Asynchronous Systems: Systems using NTP are asynchronous, meaning they work independently without a shared global clock. These systems periodically check in with each other to ensure synchronization. However, as the system grows larger, more check-ins are required, which can slow down the network.
NTP and Timekeeping Methods: NTP is susceptible to variance and latency due to its timekeeping methods of using physical clocks. In other words, NTP is like a microwave clock that keeps time on-device. If there’s a time change (e.g., daylight savings), the clock needs to be manually adjusted.
In contrast, PTP works like a smartphone clock that updates its time automatically. When there’s a time change or the phone moves to a new time zone, the clock updates itself by referencing the time over a network.
While NTP provided millisecond-level synchronization, PTP networks could hope to achieve nanosecond-level precision.
What makes PTP More Effective?
As discussed earlier, a special computer called a Stratum acts as a time reference for other computers on a network. When a computer needs to know the current time, it sends a request to the Stratum, which replies with the current time. This process is known as sync messaging.
When the Stratum sends the current time to another computer, the information travels across the network, resulting in some latency. Several factors can increase this latency, such as:
The speed at which signals travel through the fiber optic cables.
The time it takes for the network devices to convert signals.
The quality of network equipment, like routers and switches.
The time it takes for software and drivers to process the time information.
Due to latency, the time received by the other computer is no longer accurate when it arrives at the receiving computer.
The obvious solution is to measure the latency and add it to the time received by the other computer to get a more accurate time. However, measuring latency is challenging because each computer has its clock, and there is no universal clock to compare against.
To measure latency, two assumptions about consistency and symmetry are made:
The latency a packet experiences while traveling across the network is consistent.
The latency from the Stratum to the other computer is equal to the latency from the other computer back to the Stratum. In other words, the network is symmetric.
Therefore, it follows that the accuracy of time synchronization can be improved by maximizing consistency and symmetry in the network.
PTP is a solution that helps achieve this.
PTP uses hardware timestamping to improve consistency. This means that timestamps are added to the time information at the hardware level, reducing the impact of software and driver delays.
PTP also uses transparent clocks, which are special devices that measure and compensate for the time the information spends passing through network equipment.
The Need for PTP
Let’s look at a practical case where PTP is needed.
Imagine you’re using Facebook and you post a new status update. When you try to view your post, there’s a chance that your request to see the post is handled by a different server than the one that originally processed your post.
If the server handling your view request doesn’t have the latest updated data, you might not see your post. This is annoying for users and goes against the promise that interacting with a distributed system like Facebook should work the same as interacting with a single server that has all the data.
In the old solution, Facebook used to send your view request to multiple servers and wait for a majority of them to agree on the data before showing it to you. But this takes up extra computing resources and adds delay because of the back-and-forth communication over the network.
By using PTP to keep precise time synchronization across all its servers, they can simply have the view request wait until the server has caught up to the timestamp of your original post. There is no need for multiple requests and responses.
The diagram below shows this scenario.
However, this only works if all the server clocks are closely synchronized. Also, the difference between a server’s clock and the reference time needs to be known.
PTP provides this tight synchronization. It could synchronize time about 100 times more precisely than NTP, which was necessary for Facebook’s requirement.
This was just one example. There were several additional use cases where PTP excelled such as:
Event tracing
Cache invalidations
Privacy violation detection improvements
Latency compensation in the metaverse
The PTP Architecture
Facebook’s PTP architecture consists of three main components:
The PTP Rack
The PTP Network
The PTP Client
See the diagram below for a high-level view of the architecture:
Let’s look at each component and understand how they work together to provide precise timekeeping.
The PTP Rack
The PTP rack houses the hardware and software that serve time to clients.
It consists of critical components such as:
GNSS Antenna: This is where time originates on Earth. The antenna receives time signals from GPS, Galileo, and other satellite constellations. Facebook uses a GNSS-over-fiber technology to distribute the signal, which is more reliable and easier to install than traditional coaxial cables.
Time Appliance: This is the heart of the timing infrastructure. It disciplines the time received from the GNSS antenna using atomic clocks for improved accuracy and stability. Facebook has developed a new Time appliance that can support up to 1 million clients without compromising accuracy.
Oscillatord: This is a software component that configures and monitors the Time appliance, including the GNSS receiver and atomic clocks. It exports data that helps decide if the Time Appliance should serve clients or be taken offline.
Network Card (NIC): It’s the interface between the Time Appliance and the network. It timestamps PTP packets using its clock, which is synchronized with the Time Appliance for nanosecond-level accuracy.
Ptp4u: This is Facebook’s custom-built PTP server software that can handle over 1 million clients per server, far more than existing solutions. It runs on the Time Appliance and sends PTP messages to the clients.
The PTP Network
The PTP network is responsible for distributing time from the PTP rack to clients. Facebook uses PTP over a standard IP network with a few key enhancements:
PTP Unicast: They use unicast PTP instead of multicast for simpler network configuration and better scalability. Clients request time from servers, and servers grant requests and send PTP messages.
Transparent Clocks: Each network switch between the client and server acts as a transparent clock. It measures the time each PTP packet spends in the switch and records it in the packet. This allows clients to accurately account for network delays.
Boundary Clock Avoidance: Facebook avoided using boundary clocks, which act as both clients and servers, to reduce complexity. They rely solely on transparent clocks in the network switches.
A typical PTP unicast flow consists of the following steps:
Client Initiates Negotiation: The PTP client starts the process by requesting unicast transmission from the PTP server. It sends three types of requests:
Sync Grant Request: The client asks the server to send a specified number of Sync and Follow-Up messages per second, containing the current time, for a certain duration. This helps the client adjust its clock to match the server’s clock.
Announce Grant Request: The client requests the server to send a specific number of Announce messages per second, containing the server’s status, for a certain duration. This helps the client make sure that the server hasn’t stopped or gone haywire.
Delay Response Grant Request: The client informs the server that it will send Delay Request messages, and asks the server to respond with Delay Response packets for a specified duration. This helps the client account for any delay in communication.
Server Grants Requests: The PTP server needs to grant these requests and send corresponding grant responses to the client.
Server Sends PTP Messages: Once the requests are granted, the server starts sending the requested PTP messages
Client Sends Delay Requests: The client sends Delay Request messages to the server at the agreed-upon interval to determine the network path delay.
Client Refreshes Grant: The client needs to periodically refresh the grant by repeating the negotiation process before the current grant expires.
The diagram below shows the PTP Exchange Process
The PTP Client
The PTP client software runs on each server that needs accurate time. Facebook uses a few different components:
ptp4I: An open-source PTP client that receives PTP messages from the server and disciplines the NIC hardware clock. Facebook has made several enhancements to ptp4I to handle its scale and unique requirements.
fbclock: This is Facebook’s custom API that provides PTP time to applications. Instead of a single timestamp, it gives a “window of uncertainty” - a time range that is guaranteed to contain the true time with a high degree of certainty.
Kernel Timestamping: The Linux kernel on each server timestamps incoming and outgoing PTP packets in hardware for maximum accuracy. This relies on NIC driver support and careful configuration.
Conclusion
To conclude, Facebook’s adoption of Precision Time Protocol (PTP) across its infrastructure is a significant step forward in ensuring precise and reliable timekeeping at an unprecedented scale. By redesigning and rebuilding various components, Facebook has pushed the boundaries of what’s possible with PTP.
Also, the open-source nature of most of the work helps us learn from the PTP solution implemented by them.
References:
Really crazy how Meta went above and beyond to handle millisecond precisions delivering the best experience ever with suite of Social Media 👏🏻
Great post! I love these short informative notes. One q: you say that NTP makes corrections for latency, and that it could get to nanosecond accuracy. I thought those were only in PTP - can you 2x check that it's true for NTP?