Https
(→Service names) |
(→Design) |
||
| Line 26: | Line 26: | ||
text.wikimedia.org will be kept, but will be used for a different purpose, described in the next section. | text.wikimedia.org will be kept, but will be used for a different purpose, described in the next section. | ||
| − | === | + | === Load balancing === |
We use LVS-DR for load balancing. This means the LVS server will answer incoming requests for the services, and will direct the traffic to one of a cluster of realservers. Each realserver binds the service IP address to the lo device. The realserver answers directly to the client, bypassing the director. | We use LVS-DR for load balancing. This means the LVS server will answer incoming requests for the services, and will direct the traffic to one of a cluster of realservers. Each realserver binds the service IP address to the lo device. The realserver answers directly to the client, bypassing the director. | ||
| Line 39: | Line 39: | ||
To bypass problem #2 we disable content checks in the normal way, but keep the idle connection check. To re-enable the content checks, we use the ssh connection check and have it pull content from the service address directly on the host. | To bypass problem #2 we disable content checks in the normal way, but keep the idle connection check. To re-enable the content checks, we use the ssh connection check and have it pull content from the service address directly on the host. | ||
| − | === | + | === SSL termination === |
To perform SSL termination we are using a cluster of nginx servers. The nginx servers answer requests on ip based virtual hosts and proxy the requests directly to the backends unencrypted. Headers are set for the host requested, the client's real IP, forwarded-for, and forwarded-protocol. | To perform SSL termination we are using a cluster of nginx servers. The nginx servers answer requests on ip based virtual hosts and proxy the requests directly to the backends unencrypted. Headers are set for the host requested, the client's real IP, forwarded-for, and forwarded-protocol. | ||
Revision as of 06:51, 2 June 2011
Contents |
Design
Service names
For HTTP we use name based virtual hosts, where the appservers know which service to serve based on a host header. For HTTPS we use IP based virtual hosts, as HTTPS requires this unless SNI is used. SNI is only supported in fairly modern browsers, so we must use IP based virtual hosts. Our current CNAME approach will not work in the scenario.
In out current CNAME approach we use three service names: text.wikimedia.org, bits.wikimedia.org, and upload.wikimedia.org. All project domains (wikipedia, wikimedia, etc.), languages (en.wikipedia, de.wikinews, etc) and sites (commons.wikimedia, meta.wikimedia, etc.) were CNAME'd to text.wikimedia.org.
text.wikimedia.org is a CNAME itself as well, due to geodns. Depending on the DNS scenario we are in, the CNAME points to either text.esams.wikimedia.org, or text.pmtpa.wikimedia.org (and soon text.eqiad.wikimedia.org).
To support IP based virtual hosts, we made the following service name CNAMES:
- wikimedia-lb.wikimedia.org
- wikipedia-lb.wikimedia.org
- wiktionary-lb.wikimedia.org
- wikiquote-lb.wikimedia.org
- wikibooks-lb.wikimedia.org
- wikisource-lb.wikimedia.org
- wikinews-lb.wikimedia.org
- wikiversity-lb.wikimedia.org
- mediawiki-lb.wikimedia.org
- foundation-lb.wikimedia.org
These CNAMES, like text.wikimedia.org point to <servicename>.<datacenter>.wikimedia.org, based on the DNS scenario. The records being pointed to are A records, meaning that for each service we need, we need an IP address per datacenter. Based on the above, this requires 30 IP addresses.
text.wikimedia.org will be kept, but will be used for a different purpose, described in the next section.
Load balancing
We use LVS-DR for load balancing. This means the LVS server will answer incoming requests for the services, and will direct the traffic to one of a cluster of realservers. Each realserver binds the service IP address to the lo device. The realserver answers directly to the client, bypassing the director.
The fact that the realserver binds the IP address to lo is problematic for a couple reasons:
- Since we are simply doing SSL termination, we want to decrypt the connection, and proxy it to the port 80 service. The port 80 service has the same IP. Since the IP is bound to lo, it will end up sending the traffic back to itself.
- pybal does health checks on the realserver to ensure it is alive and can properly serve traffic. Since we are doing IP based virtual hosts, the health checks would need to check the service ip, and not the realserver IP. This isn't possible.
To bypass problem #1 we use text.wikimedia.org as the backend, and not the service name. We take a somewhat similar approach for bits.wikimedia.org and upload.wikimedia.org. bits and upload are assigned a private routable IP address, and so are the ssl terminators. We use the private routable IPs as the backend.
To bypass problem #2 we disable content checks in the normal way, but keep the idle connection check. To re-enable the content checks, we use the ssh connection check and have it pull content from the service address directly on the host.
SSL termination
To perform SSL termination we are using a cluster of nginx servers. The nginx servers answer requests on ip based virtual hosts and proxy the requests directly to the backends unencrypted. Headers are set for the host requested, the client's real IP, forwarded-for, and forwarded-protocol.
SSL termination servers in esams talk to services in esams, and failover to services in pmtpa. SSL termination servers in pmtpa talk to services only in pmtpa.
Performance settings
- HTTP keepalive: 65 seconds, 100 requests
- Lowering requests likely a good idea
- SSL cache: shared, 50m (roughly 200,000 sessions)
- should use roughly 1.1GB RAM for all open sessions
- SSL timeout: default (5 minutes)
- Limit ssl_ciphers: RC4-SHA:RC4-MD5:DES-CBC3-SHA:AES128-SHA:AES256-SHA
- Using a chained certificate
- Disabled access log
- Worker connections set to 32768
- Worker processes set to number of cores
- esams servers set to hit esams squids, then pmtpa squids if esams squids are down or failing
- Max fails set to 2, to avoid pounding backends when they are flapping
- Proxy buffering is disabled to avoid responses eating all memory
- sh scheduler used to allow session reuse, and to ensure session cache is maximized
Security settings
- Limit protocols: SSLv3 TLSv1
- Limit ssl_ciphers