SoftwareYoga

Articles on Coding and Architecture

Latency Numbers Everyone Should Know

Latency

In a computer network, latency is defined as the amount of time it takes for a packet of data to get from one designated point to another.

In more general terms, it is the amount of time between the cause and the observation of the effect.

As you would expect, latency is important, very important. As programmers, we all know reading from disk takes longer than reading from memory or the fact that L1 cache is faster than the L2 cache.

But do you know the orders of magnitude by which these aspects are faster/slower compared to others?

Latency for common operations

Jeff Dean from Google studied exactly that and came up with figures for latency in various situations.

With improving hardware, the latency at the higher ends of the spectrum are reducing, but not enough to ignore them completely! For instance, to read 1MB sequentially from disk might have taken 20,000,000 ns a decade earlier and with the advent of SSDs may probably take 1,000,000 ns today. But it is never going to surpass reading directly from memory.

The table below presents the latency for the most common operations on commodity hardware. These data are only approximations and will vary with the hardware and the execution environment of your code. However, they do serve their primary purpose, which is to enable us make informed technical decisions to reduce latency.

For better comprehension of  the multi-fold increase in latency, scaled figures in relation to L2 cache are also provided by assuming that the L1 cache reference is 1 sec.

Scroll horizontally on the table in smaller screens

Operation Note Latency Scaled Latency
L1 cache reference Level-1 cache, usually built onto the microprocessor chip itself. 0.5 ns Consider L1 cache reference duration is 1 sec
Branch mispredict During the execution of a program, CPU predicts the next set of instructions. Branch misprediction is when it makes the wrong prediction. Hence, the previous prediction has to be erased and new one calculated and placed on the execution stack. 5 ns 10 s
L2 cache reference Level-2 cache is memory built on a separate chip. 7 ns 14 s
Mutex lock/unlock Simple synchronization method used to ensure exclusive access to resources shared between many threads. 25 ns 50 s
Main memory reference Time to reference main memory i.e. RAM. 100 ns 3m 20s
Compress 1K bytes with Snappy Snappy is a fast data compression and decompression library written in C++ by Google and used in many Google projects like BigTable, MapReduce and other open source projects. 3,000 ns 1h 40 m
Send 1K bytes over 1 Gbps network 10,000 ns 5h 33m 20s
Read 1 MB sequentially from memory Read from RAM. 250,000 ns 5d 18h 53m 20s
Round trip within same datacenter We can assume that the DNS lookup will be much faster within a datacenter than it is to go over an external router. 500,000 ns 11d 13h 46m 40s
Read 1 MB sequentially from SSD disk Assumes SSD disk. SSD boasts random data access times of 100000 ns or less. 1,000,000 ns 23d 3h 33m 20s
Disk seek Disk seek is method to get to the sector and head in the disk where the required data exists. 10,000,000 ns 231d 11h 33m 20s
Read 1 MB sequentially from disk Assumes regular disk, not SSD. Check the difference in comparison to SSD! 20,000,000 ns 462d 23h 6m 40s
Send packet CA->Netherlands->CA Round trip for packet data from U.S.A to Europe and back. 150,000,000 ns 3472d 5h 20m

References:

  1. Designs, Lessons and Advice from Building Large Distributed Systems
  2. Peter Norvig’s post on – Teach Yourself Programming in Ten Years