Pluggable Transports and the Censorship Circumvention Landscape
This post is intended for readers with interest and some existing knowledge of Internet censorship and resistance practices and techniques.
First things first
When talking about Internet censorship and circumvention, it is generally safe to assume that state-level censors have significant resources that can be leveraged to run, block or undermine any tool that can be developed for circumvention. However, censors still have constraints that can be exploited when designing circumvention tools. Internet shutdowns, for example, are very costly for businesses, both those based inside the censored region and those relying on patronage from the region. As Internet connectivity becomes essential for the transactions of day-to-day life, complete Internet shutdowns become more and more costly, both monetarily and in terms of public perception. To mitigate these costs, censors typically rely on automated tools that are programmed to inspect and filter traffic flows based on strategically constructed rules that don’t simultaneously disrupt essential services. When developing circumvention tools then, it is important to use Kerchkoff’s principle: to design tools that are cryptographically secure even if everything about the system, except the secret key, is known to the censor. This assumes that a censor will trivially be able to reverse-engineer or run their own instance of the system in order to find and exploit any weakness that would help the censor’s blocking efforts. Circumvention tools like pluggable transports (PTs) are designed with this principle in mind, to strategically target weaknesses in the tools censors use for censorship or else to make use of infrastructure that a region relies on for daily life. Using these strategies, PTs make it difficult for a censor to decide whether or not a given traffic flow should be blocked.
Let’s zoom in on a few different areas from our previous blog post to get more familar with tools that exist for censorship circumvention and how they work.
What’s in a Pluggable Transport (PT)?
Put simply, pluggable transports help to get around Internet filtering and blocking by censors so users can continue to access the uncensored Internet. PTs do this by making the network traffic that is at risk of being blocked resemble traffic that is permitted by a given censor. Generally, PTs fall into three different categories: diverting, which involves proxying through a “safe” end point, scrambling, which involves changing the look of the traffic such that it cannot be identified as a particular type of traffic, and shapeshifting, which involves hiding censored traffic inside traffic flows that look like some allowed protocol or traffic flow. There are a number of different PTs that approach this problem from different angles to tackle known censor blocking strategies.
Obfsproxy is used to alter traffic flows in order to by-pass censors’ automated flow inspection mechanisms (deep packet inspection devices). Obfsproxy has several modules (obfs2, obfs3, and obfs4) with obfs4 being the most popular, up to date, and least vulnerable to attacks by censors. The obfs modules are based on a “look-like-nothing” design. This is most effective against automated deep packet inspection (DPI) machines, deployed by many censors, that are able to look into each packet sent in a connection and decide to block the connection based on some pre-determined rule, like its destination, size or the timing between packets. Obfs4 exploits the idea that a censor that has successfully filtered specific protocols based on their unique features would fail to specifically identify obfsproxy and be unable to filter connections through automated rules-based blocking. Obfs4 incorporates several ideas from ScrambleSuit with the addition of the ntor handshake protocol that includes a full key exchange with obfuscated public keys. Obfs4 requires users to prove knowledge of a secret before connecting, making it difficult for a probing censor to confirm that a user is connecting to a Tor bridge. Since an active probe by the censor does not know the secret, they will be unable to connect to the bridge to run the Tor protocol and confirm that the connection is being used for circumvention, they can’t be certain that the connection is suspicious.
Format-transforming encryption (FTE) exploits the white-listing mechanism of modern deep packet inspection (DPI) systems that serve to allow specific protocols such as HTTP and SSH. FTE works through the use of strategically constructed regular expressions generated by the user, producing ciphertexts that are guaranteed to match some allowed protocol so that their traffic is not classified as suspicious.
Meek takes a different approach by relying on domain fronting to relay requests through large cloud providers that would be difficult for a censor to block without incurring significant collateral damage. This strategy has proven to be successful but can be expensive to maintain and requires the continued cooperation of large cloud providers that may be unreliable for censorship circumvention in the long term due to conflicting interests.
Snowflake, which evolved from Flashproxy, allows users to connect to Tor through a transient proxy run on a volunteer’s web browser. If the connection to the Snowflake proxy is broken, a broker provides the user with a new Snowflake proxy from another volunteer automatically. Snowflake proxies hide traffic in WebRTC and are shorter lived and less stable than bridges but can allow a user to bypass a censor by using a replaceable, transient connection to another user outside of the censored region.
Outside of these PTs used (not necessarily exclusively) by Tor, several alternative pluggable transports that aim to mimic existing proxies in order to resist probes have been proposed such as ( StegoTorus and SkypeMorph ). However, as Houmansadr et al. point out, minor inconsistencies between the unaltered proxies and their imitations can be easily detected by censors, making imitation a fundamentally flawed approach.
Censors make rules, Censorship resistance tools exploit or avoid them
Another branch of related research works to develop tools using machine learning (ML) that can distinguish pluggable transports and other censorship circumvention systems based on small differences in the handshake, and packet timings and sizes of their traffic flows. While such efforts may seem counterintuitive to the effort of circumvention, ML classifiers can help us to identify how easy it is for a censor to make a blocking rule for a given tool. Tools like the Multimedia Protocol Tunneling Analyzer are useful for comparable traffic flows, where the endpoints are the same and traffic patterns are only differentiated by the use of a particular multimedia tunnelling tool. However, it seems unlikely that a general purpose ML tool could be trained to accurately predict traffic associated with any given circumvention tool, since traffic will be randomized by many different factors (speed of connection, source and destination addresses, etc.). Though censors could plausibly capture traffic flows between several endpoints and label their own use of circumvention tools to train a model, such a model would need constant updating to identify new tools that break the rules.
Currently, there is no evidence that censors are using ML for traffic classification, though it is not impossible that they are. That being said, hard-coded rules are sufficient to block many tools used for evasion and it is easier and more cost-effective for a censor to develop rules to target large systems that use a set of well-defined tools. Ultimately, the censor must decide whether to risk the collateral damage of blocking a connection they can’t prove is suspicious, or leaving the connection unblocked and risking users circumventing censorship. Provided the system is large enough, and the tools have sufficently differentiable traffic, censors will be willing to spend time and effort to target that specific system in order to limit both the amount of evasion that can occur and the collateral damage to essential services. This is an important insight to consider for those developing circumvention systems and tools and suggests that polymorphism, diversity, transience and infrequency can be included as strategies to leverage against censors when creating circumvention systems, making imperfect tools temporarily viable, at least until they reach a larger audience.
In that vein, Geneva( https://geneva.cs.umd.edu/) turns the idea of a rule-based censorship tool on its head, using a set of core TCP packet-level manipulations as the “genes” in genetic algorithms. Geneva works by training against a censor’s rules, generating random sequences of packet manipulations and exploiting the patterns that are best able to bypass the censor’s filtering rules to generate similar sequences. The discovered sequences are then used to manipulate the network stream in order to confuse the censor without impacting the client/server communication. Marionette goes a level beyond, allowing users to control encrypted traffic features at a variety of levels (ciphertext formats, protocol semantics, statistical properties), and easily adjust their obfuscation strategy to account for the censor environment. The Operator Foundation similarly provides a variety of approaches in a single tool through ShapeShifter, a dispatcher and Golang library of various traffic shaping protocols that can be swapped in and out as necessary to best counter a specific censor’s blocking rules.
Another mimicing protocol that uses UDP-based protocols, Castle takes advantage of the common features (buildings, units and rally points) of many real-time strategy games, along with the fact that these protocols are encrypted, to create a plugin that can be adapted to covertly send censored content through RTS game protocols.
What does all of this mean for our UDP-based PT for LEAP’s VPN?
Ultimately, each of these pluggable transports and supportive tools help to avoid censorship by making it more difficult and expensive for a censor to decide to block something. The more expensive, whether through expending resources or blocking essential services, it is for a censor to make this decision, the better chance circumvention tools have at achieving success.
From our previous blog post, we discussed findings from Xue et al. that OpenVPN has fingerprintability issues but using obfs4 or v2ray’s VMess to obfuscate the connection, helps to make openvpn connections undetectable.
Despite concerns from the original author about the continued use of a suboptimal obfuscation tool, Obfs4 is still widely used for evading blocking by censors because it is not easily classified by the automated rules-based blocking tools (DPI machines) employed by most censors. That being said, obfs4 is not a cure-all solution. The obfs4 protocol requires TCP-like reliability, so doesn’t have an immediately useful way of handling UDP traffic. HTTP/3 protocols such as QUIC and KCP may be able to help with this since they add reliability to UDP connections. The QUIC spec in particular has many features built in ( padding frames and connection migration) that can help with LEAP’s goals of traffic flow obfuscation and end point obfuscation. KCP is largely following their lead, however, things like connection migration are not yet implemented in QUIC or KCP go libraries.
One concern with using these HTTP/3 protocols is that they are susceptible to blocking by protocol since it is not especially detrimental to censors to block HTTP/3 traffic while they make up a relatively low percentage of the overall Internet traffic. With Google planning to rely more and more on QUIC, and other large Internet companies following suit, it may eventually become impossible to completely block the QUIC protocol without incurring significant collateral damage, but that is currently not the case, as was seen recently in Russia. While we can investigate solutions using these protocols and design solutions that utilize them, we should be mindful not to exclusively rely on solutions that depend on a particular set of assumptions about the future.
With that caveat, Psiphon, a free and open-source VPN, has developed a QUIC-based obfuscator module to obfuscate their traffic over a QUIC connection with random padding of packets and a uniformly random payload. However, their implementation is not currently cryptographically secure. MASQUE (Multiplexed Application Substrate over QUIC Encryption) is an IETF protocol that supports a suite of solutions for tunneling over QUIC by extending HTTP CONNECT. In order to disguise our VPN connections as QUIC connections, MASQUE is probably the most sensible direction for LEAP to take our obfuscated UDP-based PT development.
Next Up for LEAP’s obfsVPN
In the meantime, the next release of LEAP’s clients (desktop and android) will include a kcp flavour for the obfs4 bridges. Additionally, our friends at riseup will be working on deploying this new bridge flavour into their infrastructure.
Cool, so where can I jump in?
First of all, we could use help with enhancing this blog post! If you are excited about PTs and also able to paint a picture (literally) of how any of the PTs or ideas mentioned above look so that readers can get a better sense of how these technologies work at a glance, we’d love to include your system artwork! Please send us your digital designs by email(info at leap.se) or ping us on irc and we’ll work with you to include your attributed artwork in an appropriate place in this blogpost!
The Pluggable Transport website is most likely the best resource for all things PT, from learning about how they work and interact with different layers of the network stack, to libraries that can help you implement your own, to connecting with people doing similar work.
In addition, you can follow the development of obfsvpn. Our main provider, riseup will be working to add this new flavor to their infrastructure in the coming months. We also always welcome people who are interested in helping out with LEAP’s efforts. Get in touch on irc @ #leap on libera!