Figure 1. Simplified example of TCP congestion control, starting with a send rate of 10 packets (adapted from , the is available. What Does It All Mean?
QUIC is still bound by the laws of physics and the need to be nice to other senders on the Internet. This means that
it will not magically download your website resources much more quickly than TCP. However, QUIC’s flexibility means that experimenting with new congestion-control algorithms will become easier, which should improve things in the future for both TCP and QUIC. 0-RTT Connection Set-Up
A second performance aspect is about
how many round trips it takes before you can send useful HTTP data (say, page resources) on a new connection. Some claim that QUIC is two to even three round trips faster than TCP + TLS, but we’ll see that it’s really only one. Did You Know? As we’ve said in part 1, a connection typically performs one (TCP) or two (TCP + TLS) handshakes before HTTP requests and responses can be exchanged. These handshakes exchange initial parameters that both client and server need to know in order to, for example, encrypt the data. As you can see in figure 2 below, each individual handshake takes at least one round trip to complete (TCP + TLS 1.3, (b)) and sometimes two (TLS 1.2 and prior (a)). This is inefficient, because we need at least two round trips of handshake waiting time (overhead) before we can send our first HTTP request, which means waiting at least three round trips for the first HTTP response data (the returning red arrow) to come in. On slow networks, this can mean an overhead of 100 to 200 milliseconds. Figure 2: TCP + TLS versus QUIC connection set-up. ( Large preview)
You might be wondering why the TCP + TLS handshake cannot simply be combined, done in the same round trip. While this is conceptually possible (QUIC does exactly that), things were initially not designed like this, because we need to be able to
use TCP with and without TLS on top. Put differently, TCP simply does not support sending non-TCP stuff during the handshake. There have been efforts to add this with the TCP Fast Open extension; however, as discussed in part 1, this has turned out to be difficult to deploy at scale.
Luckily, QUIC was designed with TLS in mind from the start, and as such does combine both the transport and cryptographic handshakes in a single mechanism. This means that the QUIC handshake will take only one round trip in total to complete, which is one round trip less than TCP + TLS 1.3 (see figure 2c above).
You might be confused, because you’ve probably read that QUIC is two or even three round trips faster than TCP, not just one. This is because most articles only consider the worst case (TCP + TLS 1.2, (a)), not mentioning that the modern TCP + TLS 1.3 also “only” take two round trips ((b) is rarely shown). While a speed boost of one round trip is nice, it’s hardly amazing. Especially on fast networks (say, less than a 50-millisecond RTT), this will be
barely noticeable, although slow networks and connections to distant servers would profit a bit more.
Next, you might be wondering why we need to wait for the handshake(s) at all. Why can’t we send an HTTP request in the first round trip? This is mainly because, if we did, then that first request would be sent
unencrypted, readable by any eavesdropper on the wire, which is obviously not great for privacy and security. As such, we need to wait for the cryptographic handshake to complete before sending the first HTTP request. Or do we?
This is where a clever trick is used in practice. We know that users often revisit web pages within a short time of their first visit. As such, we can use the
initial encrypted connection to bootstrap a second connection in the future. Simply put, sometime during its lifetime, the first connection is used to safely communicate new cryptographic parameters between the client and server. These parameters can then be used to encrypt the second connection from the very start, without having to wait for the full TLS handshake to complete. This approach is called “session resumption”.
It allows for a powerful optimization: We can now safely send our first HTTP request along with the QUIC/TLS handshake,
saving another round trip! As for TLS 1.3, this effectively removes the TLS handshake’s waiting time. This method is often called 0-RTT (although, of course, it still takes one round trip for the HTTP response data to start arriving).
Both session resumption and 0-RTT are, again, things that I’ve often seen wrongly explained as being QUIC-specific features. In reality, these are actually
TLS features that were already present in some form in TLS 1.2 and are now fully fledged in TLS 1.3.
Put differently, as you can see in figure 3 below, we can get the performance benefits of these features over TCP (and thus also HTTP/2 and even HTTP/1.1) as well! We see that even with 0-RTT, QUIC is still
only one round trip faster than an optimally functioning TCP + TLS 1.3 stack. The claim that QUIC is three round trips faster comes from comparing figure 2’s (a) with figure 3’s (f), which, as we’ve seen, is not really fair. Figure 3: TCP + TLS versus QUIC 0-RTT connection set-up. ( Large preview)
The worst part is that when using 0-RTT, QUIC can’t even really use that gained round trip all that well due to security. To understand this, we need to understand one of the reasons why the TCP handshake exists. First, it allows the client to be sure that the server is actually available at the given IP address before sending it any higher-layer data.
Secondly, and crucially here, it allows the server to make sure that the client opening the connection is actually who and where they say they are before sending it data. If you recall how we defined a connection with the 4-tuple in
part 1, you’ll know that the client is mainly identified by its IP address. And this is the problem: IP addresses can be spoofed!
Suppose that an attacker requests a very large file via HTTP over QUIC 0-RTT. However, they spoof their IP address, making it look like the 0-RTT request came from their victim’s computer. This is shown in figure 4 below. The QUIC server has no way of detecting whether the IP was spoofed, because this is the very first packet(s) it is seeing from that client.
Figure 4: Attackers can spoof their IP address when sending a 0-RTT request to a QUIC server, triggering an amplification attack on the victim. ( Large preview)
If the server then simply starts sending the large file back to the spoofed IP, it could end up
overloading the victim’s network bandwidth (especially if the attacker were to do many of these fake requests in parallel). Note that the QUIC response would be dropped by the victim, because it doesn’t expect incoming data, but that doesn’t matter: Their network still needs to process the packets!
This is called a
, and it’s a significant way that hackers execute distributed denial-of-service (DDoS) attacks. Note that this doesn’t happen when 0-RTT over TCP + TLS is being used, precisely because the TCP handshake needs to complete first before the 0-RTT request is even sent along with the TLS handshake. reflection, or amplification, attack
QUIC has to be conservative in replying to 0-RTT requests, limiting how much data it sends in response until the client has been verified to be a real client and not a victim. For QUIC, this data amount has been set to three times the amount received from the client.
Put differently, QUIC has a maximum “amplification factor” of three, which was determined to be an acceptable trade-off between performance usefulness and security risk (especially compared to some incidents that had an
amplification factor of over 51,000 times). Because the client typically first sends just one to two packets, the QUIC server’s 0-RTT reply will be capped at just 4 to 6 KB (including other QUIC and TLS overhead!), which is somewhat less than impressive.
In addition, other security problems can lead to, for example, “replay attacks”, which limit the type of HTTP request you can do. For example, Cloudflare only allows
HTTP GET requests without query parameters in 0-RTT. These limit the usefulness of 0-RTT even more.
Luckily, QUIC has options to make this a bit better. For example, the server can check whether the 0-RTT comes from an
IP that it has had a valid connection with before. However, that only works if the client stays on the same network (somewhat limiting QUIC’s connection migration feature). And even if it works, QUIC’s response is still limited by the congestion controller’s slow-start logic that we discussed above; so, there is no extra massive speed boost besides the one round trip saved. Did You Know? It’s interesting to note that QUIC’s three-times amplification limit also counts for its normal non-0-RTT handshake process in figure 2c. This can be a problem if, for example, the server’s TLS certificate is too large to fit inside 4 to 6 KB. In that case, it would have to be split, with the second chunk having to wait for the second round trip to be sent (after acknowledgements of the first few packets come in, indicating that the client’s IP was not spoofed). In this case, QUIC’s handshake might still end up taking two round trips, equal to TCP + TLS! This is why for QUIC, techniques such as certificate compression will be extra important. Did You Know? It could be that certain advanced set-ups are able to mitigate these problems enough to make 0-RTT more useful. For example, the server could remember how much bandwidth a client had available the last time it was seen, making it less limited by the congestion control’s slow start for reconnecting (non-spoofed) clients. This has been investigated in academia, and there’s even a proposed extension in QUIC to do this. Several companies already do this type of thing to speed up TCP as well. Another option would be to have clients send more than one or two packets (for example, sending 7 more packets with padding), so the three-times limit translates to a more interesting 12- to 14-KB response, even after connection migration. I’ve written about this in one of my papers. Finally, (misbehaving) QUIC servers could also intentionally increase the three-times limit if they feel it’s somehow safe to do so or if they don’t care about the potential security issues (after all, there’s no protocol police preventing this). What does it all mean?
QUIC’s faster connection set-up with
0-RTT is really more of a micro-optimization than a revolutionary new feature. Compared to a state-of-the art TCP + TLS 1.3 set-up, it would save a maximum of one round trip. The amount of data that can actually be sent in the first round trip is additionally limited by a number of security considerations.
As such, this feature will mostly shine either if your users are on networks with
very high latency (say, satellite networks with more than 200-millisecond RTTs) or if you typically don’t send much data. Some examples of the latter are heavily cached websites, as well as single-page apps that periodically fetch small updates via APIs and other protocols such as DNS-over-QUIC. One of the reasons Google saw very good 0-RTT results for QUIC was that it tested it on its already heavily optimized search page, where query responses are quite small.
In other cases, you’ll gain only a
few dozens of milliseconds at best, even less if you’re already using a CDN (which you should be doing if you care about performance!). Connection Migration
A third performance feature makes QUIC faster when transferring between networks, by
keeping existing connections intact. While this indeed works, this type of network change doesn’t happen all that often and connections still need to reset their send rates.
As discussed in
part 1, QUIC’s connection IDs (CIDs) allow it to perform connection migration when switching networks. We illustrated this with a client moving from a Wi-Fi network to 4G while doing a large file download. On TCP, that download might have to be aborted, while for QUIC it might continue.
First, however, consider how often that type of scenario actually happens. You might think this also occurs when moving between Wi-Fi access points within a building or between cellular towers while on the road. In those set-ups, however (if they’re done correctly), your device will typically keep its IP intact, because the transition between wireless base stations is done at a lower protocol layer. As such, it occurs only when you
move between completely different networks, which I’d say doesn’t happen all that often.
Secondly, we can ask whether this also works for other use cases besides large file downloads and live video conferencing and streaming. If you’re loading a web page at the exact moment of switching networks, you might have to re-request some of the (later) resources indeed.
However, loading a page typically takes in the order of seconds, so that coinciding with a network switch is also not going to be very common. Additionally, for use cases where this is a pressing concern,
other mitigations are typically already in place. For example, servers offering large file downloads can support HTTP range requests to allow resumable downloads.
Because there is typically some
overlap time between network 1 dropping off and network 2 becoming available, video apps can open multiple connections (1 per network), syncing them before the old network goes away completely. The user will still notice the switch, but it won’t drop the video feed entirely.
Thirdly, there is no guarantee that the new network will have as much bandwidth available as the old one. As such, even though the conceptual connection is kept intact, the QUIC server cannot just keep sending data at high speeds. Instead, to avoid overloading the new network, it needs to
reset (or at least lower) the send rate and start again in the congestion controller’s slow-start phase.
Because this initial send rate is typically too low to really support things such as video streaming, you will see some
quality loss or hiccups, even on QUIC. In a way, connection migration is more about preventing connection context churn and overhead on the server than about improving performance. Did You Know? Note that, as discussed for 0-RTT above, we can devise some advanced techniques to improve connection migration. For example, we can, again, try to remember how much bandwidth was available on a given network last time and attempt to ramp up faster to that level for a new migration. Additionally, we could envision not simply switching between networks, but using both at the same time. This concept is called multipath, and we discuss it in more detail below.
So far, we have mainly talked about active connection migration, where users move between different networks. There are, however, also cases of
passive connection migration, where a certain network itself changes parameters. A good example of this is network address translation (NAT) rebinding. While a full discussion of NAT is out of the scope of this article, it mainly means that the connection’s port numbers can change at any given time, without warning. This also happens much more often for UDP than TCP in most routers.
If this occurs, the QUIC CID will not change, and most implementations will assume that the user is still on the same physical network and will thus not reset the congestion window or other parameters. QUIC also includes some features such as
PINGs and timeout indicators to prevent this from happening, because this typically occurs for long-idle connections.
We discussed in
part 1 that QUIC doesn’t just use a single CID for security reasons. Instead, it changes CIDs when performing active migration. In practice, it’s even more complicated, because both client and server have separate lists of CIDs, (called source and destination CIDs in the QUIC RFC). This is illustrated in figure 5 below. Figure 5: QUIC uses separate client and server CIDs. ( in a conference. Patrick Meenan, of on just this topic. Stream multiplexing differences can have a large impact on website loading in different browsers. ( Large preview)
Luckily, we can explain the basics relatively easily. As you may know, some resources can be
downloaded in full in order to be used (although they can often be incrementally parsed and compiled). As such, these resources need to be loaded as soon as possible, with the highest priority. Let’s contemplate what would happen if A, B, and C were all render-blocking resources. Figure 6: The stream multiplexing approach affects (render-blocking) resource completion time. ( Large preview)
If we use a
round-robin multiplexer (the top row in figure 6), we would actually delay each resource’s total completion time, because they all need to share bandwidth with the others. Since we can only use them after they are fully loaded, this incurs a significant delay. However, if we multiplex them sequentially (the bottom row in figure 6), we would see that A and B complete much earlier (and can be used by the browser), while not actually delaying C’s completion time.
However, that doesn’t mean that sequential multiplexing is always the best, because some (mostly non-render-blocking) resources (such as HTML and progressive JPEGs) can actually be
processed and used incrementally. In those (and some other) cases, it makes sense to use the first option (or at least something in between).
most web-page resources, it turns out that sequential multiplexing performs best. This is, for example, what Google Chrome is doing in the video above, while Internet Explorer is using the worst-case round-robin multiplexer. Packet Loss Resilience
Now that we know that all streams aren’t always active at the same time and that they can be multiplexed in different ways, we can consider what happens if we have packet loss. As explained in
part 1, if one QUIC stream experiences packet loss, then other active streams can still be used (whereas, in TCP, all would be paused).
However, as we’ve just seen, having many concurrent active streams is typically not optimal for web performance, because it can delay some critical (render-blocking) resources, even without packet loss! We’d rather have just one or two active at the same time, using a sequential multiplexer. However, this reduces the impact of QUIC’s HoL blocking removal.
Imagine, for example, that the sender could transmit
12 packets at a given time (see figure 7 below) — remember that this is limited by the congestion controller). If we fill all 12 of those packets with data for stream A (because it’s high priority and render-blocking — think
main.js), then we would have only one active stream in that 12-packet window.
If one of those packets were to be lost, then QUIC would still end up
fully HoL blocked because there would simply be no other streams it could process besides
A: All of the data is for
A, and so everything would still have to wait (we don’t have
C data to process), similar to TCP.
Figure 7: Packet loss impact depends on the multiplexer used. (Note that we assume each stream has more data to send than in the previous similar images. ( Large preview)
We see that we have a kind of contradiction: Sequential multiplexing (
AAAABBBBCCCC) is typically better for web performance, but it doesn’t allow us to take much advantage of QUIC’s HoL blocking removal. Round-robin multiplexing (
ABCABCABCABC) would be better against HoL blocking, but worse for web performance. As such,
one best practice or optimization can end up undoing another.
And it gets worse. Up until now, we’ve sort of assumed that individual packets get lost one at a time. However, this isn’t always true, because packet loss on the Internet is
often “bursty”, meaning that multiple packets often get lost at the same time.
above, an important reason for packet loss is that a network is overloaded with too much data, having to drop excess packets. This is why the congestion controller starts sending slowly. However, it then keeps growing its send rate until… there is packet loss!
Put differently, the mechanism that’s intended to prevent overloading the network actually
overloads the network (albeit in a controlled fashion). On most networks, that occurs after quite a while, when the send rate has increased to hundreds of packets per round trip. When those reach the limit of the network, several of them are typically dropped together, leading to the bursty loss patterns. Did You Know? This is one of the reasons why we wanted to move to using a single (TCP) connection with HTTP/2, rather than the 6 to 30 connections with HTTP/1.1. Because each individual connection ramps up its send rate in pretty much the same way, HTTP/1.1 could get a good speed-up at the start, but the connections could actually start causing massive packet loss for each other as they caused the network to become overloaded. At the time, Chromium developers speculated that this behaviour caused most of the packet loss seen on the Internet. This is also one of the reasons why BBR has become an often used congestion-control algorithm, because it uses fluctuations in observed RTTs, rather than packet loss, to assess available bandwidth. Did You Know? Other causes of packet loss can lead to fewer or individual packets becoming lost (or unusable), especially on wireless networks. There, however, the losses are often detected at lower protocol layers and solved between two local entities (say, the smartphone and the 4G cellular tower), rather than by retransmissions between the client and the server. These usually don’t lead to real end-to-end packet loss, but rather show up as variations in packet latency (or “jitter”) and reordered packet arrivals.
So, let’s say we are using a per-packet round-robin multiplexer (
ABCABCABCABCABCABCABCABC…) to get the most out of HoL blocking removal, and we get a bursty loss of just 4 packets. We see that this will always impact all 3 streams (see figure 8, middle row)! In this case, QUIC’s HoL blocking removal provides no benefits, because
all streams have to wait for their own retransmissions. Figure 8: Depending on the multiplexer used and the packet loss pattern, more or fewer streams are affected. ( Large preview)
To lower the risk of multiple streams being affected by a lossy burst, we need to concatenate more data for each stream. For example,
AABBCCAABBCCAABBCCAABBCC… is a small improvement, and
AAAABBBBCCCCAAAABBBBCCCC… (see bottom row in figure 8 above) is even better. You can again see that a more sequential approach is better, even though that reduces the chances that we have multiple concurrent active streams.
In the end, predicting the actual impact of QUIC’s HoL blocking removal is difficult, because it depends on the number of streams, the size and frequency of the loss bursts, how the stream data is actually used, etc. However,
most results at this time indicate it will not help much for the use case of web-page loading, because there we typically want fewer concurrent streams.
If you want even more detail on this topic or just some concrete examples, please check out my
in-depth article on HTTP HoL blocking. Did You Know? As with the previous sections, some advanced techniques can help us here. For example, modern congestion controllers use packet pacing. This means that they don’t send, for example, 100 packets in a single burst, but rather spread them out over an entire RTT. This conceptually lowers the chances of overloading the network, and the QUIC Recovery RFC strongly recommends using it. Complementarily, some congestion-control algorithms such as BBR don’t keep increasing their send rate until they cause packet loss, but rather back off before that (by looking at, for example, RTT fluctuations, because RTTs also rise when a network is becoming overloaded). While these approaches lower the overall chances of packet loss, they don’t necessarily lower its burstiness. What does it all mean?
While QUIC’s HoL blocking removal means, in theory, that it (and HTTP/3) should perform better on lossy networks, in practice this depends on a lot of factors. Because the use case of web-page loading typically favours a more sequential multiplexing set-up, and because packet loss is unpredictable, this feature would, again,
likely affect mainly the slowest 1% of users. However, this is still a very active area of research, and only time will tell.
Still, there are situations that might see more improvements. These are mostly outside of the typical use case of the first full page load — for example, when resources are not render blocking, when they can be processed incrementally, when streams are completely independent, or when less data is sent at the same time.
repeat visits on well-cached pages and background downloads and API calls in single-page apps. For example, Facebook has seen some benefits from HoL blocking removal when using HTTP/3 to load data in its native app. UDP and TLS Performance
A fifth performance aspect of QUIC and HTTP/3 is about how efficiently and performantly they can actually
create and send packets on the network. We will see that QUIC’s usage of UDP and heavy encryption can make it a fair bit slower than TCP (but things are improving).
already discussed that QUIC’s usage of UDP was more about flexibility and deployability than about performance. This is evidenced even more by the fact that, up until recently, sending QUIC packets over UDP was typically much slower than sending TCP packets. This is partly because of where and how these protocols are typically implemented (see figure 9 below). Figure 9: Implementation differences between TCP and QUIC. (. to stream its videos over TCP + TLS.
Similarly, Facebook has said that QUIC will probably mainly be used
between end users and the CDN’s edge, but not between data centers or between edge nodes and origin servers, due to its larger overhead. In general, very high-bandwidth scenarios will probably continue to favour TCP + TLS, especially in the next few years. Did You Know? Optimizing network stacks is a deep and technical rabbit hole of which the above merely scratches the surface (and misses a lot of nuance). If you’re brave enough or if you want to know what terms like
SO_TXTIME, kernel bypass, and
recvmmsg() mean, I can recommend some excellent articles on optimizing QUIC , and an in-depth talk from .
What does it all mean?
QUIC’s particular usage of the UDP and TLS protocols has historically made it much slower than TCP + TLS. However, over time, several improvements have been made (and will continue to be implemented) that have closed the gap somewhat. You probably won’t notice these discrepancies in typical use cases of web-page loading, though, but they might give you headaches if you maintain large server farms.
Up until now, we’ve mainly talked about new performance features in QUIC versus TCP. However, what about HTTP/3 versus HTTP/2? As discussed in ”). Using this system directly over QUIC would lead to some potentially very wrong tree layouts, because adding each resource to the tree would be a separate control message.
Additionally, this approach turned out to be needlessly complex, leading to app.
QUIC version 1 is
just the start. Many advanced performance-oriented features that Google had earlier experimented with did not make it into this first iteration. However, the goal is to quickly evolve the protocol, introducing new extensions and features at a high frequency. As such, over time, QUIC (and HTTP/3) should become clearly faster and more flexible than TCP (and HTTP/2). Conclusion
In this second part of the series, we have discussed the many different
performance features and aspects of HTTP/3 and especially QUIC. We have seen that while most of these features seem very impactful, in practice they might not do all that much for the average user in the use case of web-page loading that we’ve been considering.
For example, we’ve seen that QUIC’s use of UDP doesn’t mean that it can suddenly use more bandwidth than TCP, nor does it mean that it can download your resources more quickly. The often-lauded 0-RTT feature is really a micro-optimization that saves you one round trip, in which you can send about 5 KB (in the worst case).
HoL blocking removal doesn’t work well if there is
bursty packet loss or when you’re loading render-blocking resources. Connection migration is highly situational, and HTTP/3 doesn’t have any major new features that could make it faster than HTTP/2.
As such, you might expect me to recommend that you just skip HTTP/3 and QUIC. Why bother, right? However, I will most definitely do no such thing! Even though these new protocols might not aid users on fast (urban) networks much, the new features do certainly have the potential to be
highly impactful to highly mobile users and people on slow networks.
Even in Western markets such as my own Belgium, where we generally have fast devices and access to high-speed cellular networks, these situations can affect 1% to even 10% of your user base, depending on your product. An example is someone on a train trying desperately to look up a critical piece of information on your website, but having to wait 45 seconds for it to load. I certainly know I’ve been in that situation, wishing someone had deployed QUIC to get me out of it.
However, there are other countries and regions where things are much worse still. There, the average user might look a lot more like the slowest 10% in Belgium, and the slowest 1% might never get to see a loaded page at all. In
many parts of the world, web performance is an accessibility and inclusivity problem.
This is why we should never just test our pages on our own hardware (but also use a service like
Webpagetest) and also why you should definitely deploy QUIC and HTTP/3. Especially if your users are often on the move or unlikely to have access to fast cellular networks, these new protocols might make a world of difference, even if you don’t notice much on your cabled MacBook Pro. For more details, I highly recommend Fastly’s post on the issue.
If that doesn’t fully convince you, then consider that QUIC and HTTP/3 will
continue to evolve and get faster in the years to come. Getting some early experience with the protocols will pay off down the road, allowing you to reap the benefits of new features as soon as possible. Additionally, QUIC enforces security and privacy best practices in the background, which benefit all users everywhere.
Finally convinced? Then
continue to part 3 of the series to read about how you can go about using the new protocols in practice. Part 1: HTTP/3 History And Core Concepts This article is targeted at people new to HTTP/3 and protocols in general, and it mainly discusses the basics. Part 2: HTTP/3 Performance Features This one is more in depth and technical. People who already know the basics can start here. Part 3: Practical HTTP/3 Deployment Options This third article in the series explains the challenges involved in deploying and testing HTTP/3 yourself. It details how and if you should change your web pages and resources as well.
(vf, il, al)