Https

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(Performance settings)
(Failover)
Line 76: Line 76:
  
 
To solve this, for the caching datacenter we configured nginx with two backends. One backend is in-datacenter, and is http, and the other backend is out-of-datacenter, and is https. Two location directives are used. The first directive is for /, which proxies in-datacenter, and if that proxy_pass fails, it falls back to an @fallback directive, which proxies to the out-of-datacenter backend, using "error_page 502 503 504 = @fallback".
 
To solve this, for the caching datacenter we configured nginx with two backends. One backend is in-datacenter, and is http, and the other backend is out-of-datacenter, and is https. Two location directives are used. The first directive is for /, which proxies in-datacenter, and if that proxy_pass fails, it falls back to an @fallback directive, which proxies to the out-of-datacenter backend, using "error_page 502 503 504 = @fallback".
 +
 +
Of course, there's a possible issue here. If the sh LVS scheduler hashes all three caching datacenter's SSL terminator IPs to the same SSL terminator server in the primary datacenter, it will likely overload that server. In this situation we're likely to just failover to the primary datacenter anyway, though.
  
 
== Performance settings ==
 
== Performance settings ==

Revision as of 01:14, 14 July 2011

Contents

Design

Service names

For HTTP we use name based virtual hosts, where the appservers know which service to serve based on a host header. For HTTPS we use IP based virtual hosts, as HTTPS requires this unless SNI is used. SNI is only supported in fairly modern browsers, so we must use IP based virtual hosts. Our current CNAME approach will not work in this scenario.

In our current CNAME approach we use three service names: text.wikimedia.org, bits.wikimedia.org, and upload.wikimedia.org. All project domains (wikipedia, wikimedia, etc.), languages (en.wikipedia, de.wikinews, etc) and sites (commons.wikimedia, meta.wikimedia, etc.) were CNAME'd to text.wikimedia.org.

text.wikimedia.org is a CNAME itself as well, due to geodns. Depending on the DNS scenario we are in, the CNAME points to either text.esams.wikimedia.org, or text.pmtpa.wikimedia.org (and soon text.eqiad.wikimedia.org).

To support IP based virtual hosts, we made the following service name CNAMES:

  • wikimedia-lb.wikimedia.org
  • wikipedia-lb.wikimedia.org
  • wiktionary-lb.wikimedia.org
  • wikiquote-lb.wikimedia.org
  • wikibooks-lb.wikimedia.org
  • wikisource-lb.wikimedia.org
  • wikinews-lb.wikimedia.org
  • wikiversity-lb.wikimedia.org
  • mediawiki-lb.wikimedia.org
  • foundation-lb.wikimedia.org

These CNAMES, like text.wikimedia.org point to <servicename>.<datacenter>.wikimedia.org, based on the DNS scenario. The records being pointed to are A records, meaning that for each service we need, we need an IP address per datacenter. Based on the above, this requires 30 IP addresses.

text.wikimedia.org will change to a backend IP, as described in the next section.

Load balancing

We use LVS-DR for load balancing. This means the LVS server will direct incoming traffic for the services to a number of realservers. Each realserver binds the service IP address to the lo device. The realserver answers directly to the client, bypassing the director.

The fact that the realserver binds the IP address to lo is problematic for a couple reasons:

  1. Since we are simply doing SSL termination, we want to decrypt the connection, and proxy it to the port 80 service. The port 80 service has the same IP. Since the IP is bound to lo, it will end up sending the backend requests back to itself.
  2. pybal does health checks on the realserver to ensure it is alive and can properly serve traffic. Since we are doing IP based virtual hosts the health checks would need to check the service ip, and not the realserver IP. This isn't possible from the LVS server.

To bypass problem #1 we will change text.wikimedia.org to text.svc.<datacenter>.wmnet (a private routable address) which will be used as the backend. We take the same approach for bits.wikimedia.org and upload.wikimedia.org. bits and upload are assigned a private routable address (bits.svc.<datacenter>.wmnet/upload.svc.<datacenter>.wmnet). We use the private routable addresses as the backend.

To bypass problem #2 we disable content health checks in the normal way, but keep the idle connection health check. To re-enable the content health checks, we use the SSH health check and have it make requests to the service address directly on the host.

SSL termination

To perform SSL termination we are using a cluster of nginx servers. The nginx servers answer requests on IP based virtual hosts and proxy the requests directly to the backends unencrypted. Headers are set for the host requested, the client's real IP, forwarded-for information, and forwarded-protocol information.

SSL termination servers in esams talk to services in esams, and failover to services in pmtpa. SSL termination servers in pmtpa talk to services only in pmtpa.

Logging

Logging generally occurs at the squid level. When using SSL termination, however, the IP address that the squids see are the SSL terminators, not the client IP. It's possible to use the X-Forwareded-For header, but we can only trust this header if the request is coming from the SSL terminators (as they strip and set that header). This is painful in squid.

Normally this wouldn't be terribly problematic, you'd just write the logs in squid format on nginx, and combine them. We, however, don't use log files. Squid sends the logs as UDP packets to a log collector. To solve this we modified a UDP syslog logging module for NGINX to send logs in our format without the extra syslog information, to servers and ports of our choosing.

geoiplookup support

Our bits cluster has support for providing geographical JSON data based on the client's IP address. Like logging, since the bits cluster is behind the SSL terminators, it sees the IP address of the SSL terminators, not the client, which causes the bits cluster to send back the geographical information of the SSL terminators (which isn't terribly useful).

To solve this we modified the geoip inline C in the varnish VCL to use X-Forwarded-For, if the client IP is one of the SSL terminators.

Secure cookies

Since we are doing SSL termination MediaWiki does not see incoming traffic as being HTTPS, since it is receiving the requests over HTTP. This is problematic when sending cookies. When users log in using HTTPS, we need to protect their cookies, in case an attacker forces them to HTTP, or they accidentally visit a HTTP link to our sites, or if there is any mixed content that causes requests to travel to our sites.

To solve this we used the X-Forwarded-Proto header. If the header is set, and is https, we mark cookies as secure. Like X-Forwarded-For in geoiplookup we only trust this header if it is coming from the SSL terminators. In squid and varnish we strip this header if the request is not being sent by the SSL terminators.

Protocol-relative URLs

To make http and https coexist happily, we need to use protocol-relative URLs like //en.wikipedia.org/wiki/Main_Page whenever we link off-domain to one of our sites (images, interwiki links and such). This also ensures that we don't split our squid and varnish caches, by caching pages with https and http links.

Of course, this also means that our parser, squid, and varnish caches needs to be fully purged to properly enable https.

This required a lot of MediaWiki configuration changes, and modification of core as well. See the Server admin log and the commit log for these changes.

Failover

Ignoring our normal geodns based datacenter failover, the SSL termination cluster needs to failover from the caching datacenter's backends to the backends in the primary datacenters. The difficult thing here, is that the traffic for this must travel over the WAN, which means we can't do SSL termination to our other datacenter.

To solve this, for the caching datacenter we configured nginx with two backends. One backend is in-datacenter, and is http, and the other backend is out-of-datacenter, and is https. Two location directives are used. The first directive is for /, which proxies in-datacenter, and if that proxy_pass fails, it falls back to an @fallback directive, which proxies to the out-of-datacenter backend, using "error_page 502 503 504 = @fallback".

Of course, there's a possible issue here. If the sh LVS scheduler hashes all three caching datacenter's SSL terminator IPs to the same SSL terminator server in the primary datacenter, it will likely overload that server. In this situation we're likely to just failover to the primary datacenter anyway, though.

Performance settings

  • HTTP keepalive: 65 seconds, 100 requests
    • Lowering requests likely a good idea
  • SSL cache: shared, 50m (roughly 200,000 sessions)
    • should use roughly 1.1GB RAM for all open sessions
  • SSL timeout: default (5 minutes)
  • Limit ssl_ciphers: RC4-SHA:RC4-MD5:DES-CBC3-SHA:AES128-SHA:AES256-SHA
    • Do not set the server preference for cipher use
  • Used a chained certificate
  • Disabled access log
  • Worker connections set to 32768
  • Worker processes set to number of cores
  • esams servers set to hit esams squids, then pmtpa SSL terminators if esams squids are down or failing
  • Proxy buffering is disabled to avoid responses eating all memory
  • sh LVS scheduler used to allow session reuse, and to ensure session cache is maximized

Initial connection testing

Using ab we were able to get an average of 5,100 requests per second on a single processor, quad core server, with 4GB RAM. We used the following command, which was run three times concurrently:

ab -c2000 -n100000 -H 'Host: upload.wikimedia.org' -H 'User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' https://wikimedia-lb.wikimedia.org/pybaltestfile.txt

This test has 6,000 concurrent clients, making 300,000 requests. Since we are testing the number of requests per second based on initial connections for each request, the resource requested is small and static, to ensure speed isn't heavily affected by the backend. We pull from the backend to ensure that we are opening connections both for the client and for the backend, and to ensure that any backend related issues will also be reflected.

The server's total CPU usage was on average 85%. Memory usage was roughly 1GB.

Image transfer with keepalive testing

Using ab we were able to get an average of 600 requests per second. Hardware tested was same as in the initial connection testing. We used the following command, which was run three times concurrently:

ab -k -c500 -n20000 -H 'Host: upload.wikimedia.org' -H 'User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' https://wikimedia-lb.wikimedia.org/wikipedia/commons/thumb/f/ff/Viaduc_Saillard.jpg/691px-Viaduc_Saillard.jpg

This test has 1,500 concurrent clients, using keepalive, making 60,000 requests. We used keepalive since we were testing the number of thumbnail requests per second, allowing us to bypass the overhead of the initial connection. The thumbnail chosen was the size shown on an image page.

The server's total CPU usage was on average 25%, suggesting there is likely a bottleneck in the client when testing. Running the same test against the http backend directly had a similar number of requests per second. Memory usage was negligible.

Text transfer with keepalive testing

Using ab we were able to get an average of 1,400 requests per second. Hardware tested was same as in the initial connection testing. We used the following command, which was run three times concurrently:

ab -k -c2000 -n100000 -H 'Host: meta.wikimedia.org' -H 'User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' https://wikimedia-lb.wikimedia.org/wiki/Main_Page

This test has 6,000 concurrent clients, using keepalive, making 300,000 requests. We used keepalive since we were testing the number of text requests per second, allowing us to bypass the overhead of the initial connection.

The server's total CPU usage was on average 20%, suggesting there is likely a bottleneck in the client when testing. Running the same test against the http backend directly had a similar number of requests per second. Memory usage was negligible.

Security settings

  • Limit protocols: SSLv3 TLSv1
  • Limit ssl_ciphers

Comparing the SSL security of secure against the SSL security of the new cluster shows secure with a score of 52 (a C), and the new cluster with a score of 85 (an A).

Config changes needed

Protocol-relative URLs

To make http and https coexist happily, we need to use protocol-relative URLs like //en.wikipedia.org/wiki/Main_Page whenever we link off-domain (images, interwiki links and such).

NOTE: Protocol-relative URLs MUST NOT be enabled on secure. This can be done by overriding the relative var with the old default in secure.php. If a var already sets a custom value in secure.php (e.g. due to a different URL scheme), it's already fine.

  • TONS of references to http://upload.wikimedia.org all over the place. See ack-grep upload.wikimedia.org wmf-config/ . Most importantly:
    • wgUploadPath['default'] (InitialiseSettings)
    • wgMathPath (InitialiseSettings)
    • stdlogo (CommonSettings)
    • a zillion custom logo settings (InitialiseSettings)
    • a zillion favicons (InitialiseSettings)
    • ExtensionDistributor config (CommonSettings)
    • wgForeignFileRepos (CommonSettings)
  • bits.wikimedia.org
    • wgExtensionAssetsPath (CommonSettings)
    • wgStyleSheetPath (CommonSettings)
    • wgLoadScript (CommonSettings)
    • wgCopyrightIcon (CommonSettings)
  • interwiki table
    • need to change all intra-WMF links BUT NOT the external ones
    • where does this stuff even live these days? It's not in the interwiki table and I don't have damn clue where it is. I guess RobH or Tim would know
      • See $wgInterwikiCache in InitialiseSettings, and dumpInterwiki.php in /maintenance. The script may need some tweaking.
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox