MySQL
Jasper Deng (Talk | contribs) (outdated) |
|||
| (19 intermediate revisions by 6 users not shown) | |||
| Line 1: | Line 1: | ||
| − | {{ | + | {{outdated}} |
| − | + | == Start & Stop == | |
| + | Start: | ||
| + | /etc/init.d/mysql start || /usr/local/mysql/bin/mysqld_safe & | ||
| + | Stop: | ||
| + | pkill mysqld | ||
| + | |||
| + | == Source == | ||
| + | |||
| + | The patched MySQL 4.0 source is available in a special repository on svn.wikimedia.org: | ||
| + | |||
| + | :http://svn.wikimedia.org/svnroot/mysql | ||
| + | |||
| + | Sun has officially stopped maintaining MySQL 4.0, but community development continues to some extent, mainly in the form of backported patches from MySQL 5.0. Domas was responsible for patch selection in Wikimedia's fork of MySQL, and he called the result "version 4.0.40", thinking the number suitably poetic. | ||
| + | |||
| + | FB build is living at https://code.launchpad.net/~mysqlatfacebook/mysqlatfacebook/5.1 - it may need patching to run off 4.0 masters, though. | ||
| + | |||
| + | == Packaging == | ||
| + | |||
| + | This source is packaged in the wikimedia-mysql4 package, available from [[apt.wikimedia.org]] and with debian directory checked in to /mediawiki/trunk/debs as per usual. The package version number contains the revision number of the mysql repository described above. | ||
| + | |||
| + | The package contains a Wikimedia-specific default configuration file. Note that this configuration file must be edited manually after installation to give it a unique server ID. | ||
| + | |||
| + | == Installation == | ||
| + | |||
| + | # Partition the hard drive with a small root partition, and the bulk space mounted at /a. XFS and JFS are considered to be appropriate filesystems for /a. | ||
| + | # Install the wikimedia-mysql4 package. This creates a mysql user. | ||
| + | # Create directory /a/tmp, owned by mysql:mysql. | ||
| + | # Create directory /a/sqldata. | ||
| + | #* /a/sqldata is generally copied from an existing MySQL server, so that the new server can be added to that replicated cluster. | ||
| + | #* For those rare cases when you need to start a data directory from scratch, the shell script /usr/mysql-wikimedia/bin/mysql_install_db should be used. This creates necessary administrative files in /a/sqldata. | ||
| + | # Edit /etc/my.cnf and change: | ||
| + | #* The server-id variable, to a unique ID derived from the IP address | ||
| + | #* The innodb_buffer_pool_size, to around 80% of physical memory. | ||
| + | #* Note that many settings need to be changed if MyISAM is to be used, e.g. for external storage. | ||
| + | |||
| + | === Copying the data directory === | ||
| + | |||
| + | * Depool a slave from the same cluster, which will be used as the data source. | ||
| + | * Shut down MySQL on it. | ||
| + | * Find the following settings in /etc/my.cnf on the source server and copy them to the destination server: | ||
| + | ** innodb_data_file_path. The names and sizes of the data files must match. | ||
| + | ** innodb_log_file_size. Note that if this is small (<100MB) it can cause performance problems. You can change the log file size after a clean shutdown by deleting the ib_logfile* files.[http://mysqldatabaseadministration.blogspot.com/2007/01/increase-innodblogfilesize-proper-way.html] | ||
| + | * Copy the data. Various methods have been tried for this. Rsync is good but has some nasty performance cases that can cause problems. An uncompressed tarpipe over netcat is probably best. First start the receiver: | ||
| + | |||
| + | nc -l 9999 | tar -xv -C/a | ||
| + | |||
| + | Then start the sender: | ||
| + | |||
| + | tar -cv -C/a sqldata | nc ''<dest_server>'' 9999 | ||
| + | |||
| + | This should work with <tt>netcat-openbsd</tt> available on Ubuntu, other flavours may require other command-line options to avoid hanging on to the connection after EOF. If the server you are on has <tt>netcat-traditional</tt>, do <tt>apt-get install netcat-openbsd</tt>. | ||
| + | |||
| + | == Editing db.php == | ||
| + | |||
| + | <tt>db.php</tt> is MediaWiki's configuration file for database connections. It sets a variable called $wgLBFactoryConf. | ||
| + | |||
| + | ===Common tasks=== | ||
| + | |||
| + | ; Depooling : If the server is down, comment it out in sectionLoads and (if necessary) groupLoadsByDB. To remove load from an overloaded but up server, set its load ratio in sectionLoads to zero. A server which is in sectionLoads but with zero load will delay maintenance scripts, they will wait for it to have low replication lag. | ||
| + | |||
| + | ; Load tuning : The load numbers in sectionLoads are ratios. They can be converted to a load fraction by dividing by the sum of loads in that section. So for example if you have (db1 => 100, db2 => 200), then db1 will have 33% and db2 will have 67%. Tweak these values to balance the I/O load as seen in [http://ganglia.wikimedia.org Ganglia]. | ||
| + | |||
| + | ; Master switch : The first server in a given section in sectionLoads is the master, so you can change the master by changing the order of servers or commenting out the first server. See [[switch master]] for the full procedure. The master should generally have zero read load, because the load from applications requiring a master connection will be substantial. | ||
| + | |||
| + | ; Reclaiming disk space : See [[Reclaim space on full db servers]]. | ||
| + | |||
| + | === Comprehensive documentation === | ||
| + | |||
| + | The primary source of documentation is includes/db/LBFactory_Multi.php, copied below: | ||
| + | |||
| + | ;sectionsByDB : A map of database names to section names | ||
| + | |||
| + | ;sectionLoads : A 2-d map. For each section, gives a map of server names to load ratios. For example: array( 'section1' => array( 'db1' => 100, 'db2' => 100 ) ) | ||
| + | |||
| + | ;serverTemplate : A server info associative array as documented for [[mw:Manual:$wgDBservers|$wgDBservers]]. The host, hostName and load entries will be overridden. | ||
| + | |||
| + | ;groupLoadsBySection : A 3-d map giving server load ratios for each section and group. For example: array( 'section1' => array( 'group1' => array( 'db1' => 100, 'db2' => 100 ) ) ) | ||
| + | |||
| + | ;groupLoadsByDB : A 3-d map giving server load ratios by DB name. | ||
| + | |||
| + | ;hostsByName : A map of hostname to IP address. | ||
| + | |||
| + | ;externalLoads : A map of external storage cluster name to server load map | ||
| + | |||
| + | ;externalTemplateOverrides : A set of server info keys overriding serverTemplate for external storage | ||
| + | |||
| + | ;templateOverridesByServer : A 2-d map overriding serverTemplate and externalTemplateOverrides on a server-by-server basis. Applies to both core and external storage. | ||
| + | |||
| + | ;templateOverridesByCluster : A 2-d map overriding the server info by external storage cluster | ||
| + | |||
| + | ;masterTemplateOverrides : An override array for all master servers. | ||
| + | |||
| + | ;readOnlyBySection : A map of section name to read-only message. Missing or false for read/write. | ||
| + | |||
| + | == Replication lag == | ||
| + | |||
| + | When a transaction completes on the master database server, the transaction is written to the binlog, and then the slave servers begin executing it. Replication lag is the time from when the binlog entry was written, to the time the slave server finishes executing that entry. | ||
| + | |||
| + | Replication lag is a performance problem, and so the solution is potentially complex, involving software optimisation or hardware expansion. | ||
| + | |||
| + | === Long-running reads === | ||
| + | |||
| + | The most common cause is when a long-running read query executes on the slave, thrashing the disk and causing all other queries (including write queries) to execute slowly. The more severe incarnation of this is when large numbers of heavy read queries are queued up on all slaves, all slaves become lagged, and the master dies due to stale data failover in MediaWiki. | ||
| + | |||
| + | To diagnose this, run the SQL query <tt>SHOW FULL PROCESSLIST</tt> and save the result for later analysis. Look queries that have been running for longer than a few minutes. Identify common features of these queries, such as the function comment. Note that if the server is running very slowly, innocuous queries will become long-running. Generally the longest-running of all queries is the culprit. | ||
| + | |||
| + | The best emergency response to this, when general site performance is affected, is to first kill all the long-running queries. You can do this with shellscript: | ||
| + | |||
| + | <pre> | ||
| + | hosts="ixia db8 lomaria thistle" | ||
| + | pattern="BadClass::overloadServers" | ||
| + | for host in $hosts; do | ||
| + | mysql -h $host -e 'show processlist' | grep "$pattern" | awk '{print "kill",$1,";"}' | mysql -h $host | ||
| + | done | ||
| + | </pre> | ||
| + | |||
| + | This typically buys you a few minutes while the users resubmit their queries. Use these few minutes to identify the culprit feature, to patch it out and deploy. You can buy a few minutes more by running the kill script in a loop. | ||
| + | |||
| + | === General overload === | ||
| + | |||
| + | If SHOW PROCESSLIST shows large numbers of read queries running, but the read queries are just the usual MediaWiki traffic, and nothing new or special stands out, then this suggests a general overload. Check ganglia, iostat, etc. for the server in question. If one server is especially overloaded, reduce its load ratio. If all slaves are overloaded, buy more hardware. | ||
| + | |||
| + | === High write rate === | ||
| + | |||
| + | If SHOW PROCESSLIST shows few running read queries, but the writer thread is constantly busy, and the replication lag is increasing, this implies a write-driven overload. | ||
| + | |||
| + | (Note that if lag is decreasing in this scenario, it implies that some previous event caused the replication lag but that the server is recovering.) | ||
| + | |||
| + | The solution to this is to identify the source of the writes, and to throttle them appropriately. Maintenance scripts, for instance, need to do short-running queries and to call wfWaitForSlaves(5); regularly. | ||
| + | |||
| + | Hardware and configuration problems can also be responsible. A server with write-behind caching disabled (either due to BIOS configuration or a dead RAID controller battery) will have very slow write performance and so will tend to exhibit write-driven lag. | ||
| + | |||
| + | Similarly, servers with [http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit innodb_flush_log_at_trx_commit]=1 will be unacceptably slow for our write load. It should be set to zero instead. | ||
| + | |||
| + | === Broken replication === | ||
| + | |||
| + | Broken replication can be identified by a constantly increasing replication lag with no replicated write queries running. SHOW SLAVE STATUS will typically show an error message which allows you identify the problem in more detail. | ||
| + | |||
| + | Connection errors, when the slave fails to connect to the master, are a common problem for new slaves. These show up in the server's error log. | ||
| + | |||
| + | Unusual errors on established slaves may indicate data corruption. After identifying the source of the problem, the simplest solution is to wipe slave's hard drive and resynchronise it from another slave. | ||
| + | |||
| + | == Maintenance Recipies == | ||
| + | === clear out old binlogs to free up space === | ||
| + | Sometimes binlogs don't get automatically expunged. To safely delete old unneeded binlogs (good to keep 5-7 days worth if you can): | ||
| + | * look at the time stamps of the binlogs and decide the oldest one you want to keep (eg .010) | ||
| + | * purge all older logs: | ||
| + | PURGE BINARY LOGS TO 'mysql-bin.010'; | ||
| + | This command can be safely run on a master. More info can be found in the [http://dev.mysql.com/doc/refman/5.0/en/purge-binary-logs.html mysql documentation for purge binary logs]. | ||
| + | |||
| + | == misc. snippets == | ||
| + | === show database / table sizes === | ||
| + | ==== size of one database, tables in detail ==== | ||
| + | <pre> | ||
| + | SELECT TABLE_NAME, table_rows, data_length, index_length, | ||
| + | round(((data_length + index_length) / 1024 / 1024),2) "Size in MB" | ||
| + | FROM information_schema.TABLES WHERE table_schema = "schema_name"; | ||
| + | </pre> | ||
| + | ==== all databases ==== | ||
| + | <pre> | ||
| + | SELECT table_schema "Data Base Name", SUM( data_length + index_length) / 1024 / 1024 | ||
| + | "Data Base Size in MB" FROM information_schema.TABLES GROUP BY table_schema ; | ||
| + | </pre> | ||
| + | [http://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/] | ||
| − | |||
| − | |||
| − | |||
[[Category:Software]] | [[Category:Software]] | ||
| + | [[Category:MySQL]] | ||
Latest revision as of 01:25, 2 November 2012
Contents |
[edit] Start & Stop
Start:
/etc/init.d/mysql start || /usr/local/mysql/bin/mysqld_safe &
Stop:
pkill mysqld
[edit] Source
The patched MySQL 4.0 source is available in a special repository on svn.wikimedia.org:
Sun has officially stopped maintaining MySQL 4.0, but community development continues to some extent, mainly in the form of backported patches from MySQL 5.0. Domas was responsible for patch selection in Wikimedia's fork of MySQL, and he called the result "version 4.0.40", thinking the number suitably poetic.
FB build is living at https://code.launchpad.net/~mysqlatfacebook/mysqlatfacebook/5.1 - it may need patching to run off 4.0 masters, though.
[edit] Packaging
This source is packaged in the wikimedia-mysql4 package, available from apt.wikimedia.org and with debian directory checked in to /mediawiki/trunk/debs as per usual. The package version number contains the revision number of the mysql repository described above.
The package contains a Wikimedia-specific default configuration file. Note that this configuration file must be edited manually after installation to give it a unique server ID.
[edit] Installation
- Partition the hard drive with a small root partition, and the bulk space mounted at /a. XFS and JFS are considered to be appropriate filesystems for /a.
- Install the wikimedia-mysql4 package. This creates a mysql user.
- Create directory /a/tmp, owned by mysql:mysql.
- Create directory /a/sqldata.
- /a/sqldata is generally copied from an existing MySQL server, so that the new server can be added to that replicated cluster.
- For those rare cases when you need to start a data directory from scratch, the shell script /usr/mysql-wikimedia/bin/mysql_install_db should be used. This creates necessary administrative files in /a/sqldata.
- Edit /etc/my.cnf and change:
- The server-id variable, to a unique ID derived from the IP address
- The innodb_buffer_pool_size, to around 80% of physical memory.
- Note that many settings need to be changed if MyISAM is to be used, e.g. for external storage.
[edit] Copying the data directory
- Depool a slave from the same cluster, which will be used as the data source.
- Shut down MySQL on it.
- Find the following settings in /etc/my.cnf on the source server and copy them to the destination server:
- innodb_data_file_path. The names and sizes of the data files must match.
- innodb_log_file_size. Note that if this is small (<100MB) it can cause performance problems. You can change the log file size after a clean shutdown by deleting the ib_logfile* files.[1]
- Copy the data. Various methods have been tried for this. Rsync is good but has some nasty performance cases that can cause problems. An uncompressed tarpipe over netcat is probably best. First start the receiver:
nc -l 9999 | tar -xv -C/a
Then start the sender:
tar -cv -C/a sqldata | nc <dest_server> 9999
This should work with netcat-openbsd available on Ubuntu, other flavours may require other command-line options to avoid hanging on to the connection after EOF. If the server you are on has netcat-traditional, do apt-get install netcat-openbsd.
[edit] Editing db.php
db.php is MediaWiki's configuration file for database connections. It sets a variable called $wgLBFactoryConf.
[edit] Common tasks
- Depooling
- If the server is down, comment it out in sectionLoads and (if necessary) groupLoadsByDB. To remove load from an overloaded but up server, set its load ratio in sectionLoads to zero. A server which is in sectionLoads but with zero load will delay maintenance scripts, they will wait for it to have low replication lag.
- Load tuning
- The load numbers in sectionLoads are ratios. They can be converted to a load fraction by dividing by the sum of loads in that section. So for example if you have (db1 => 100, db2 => 200), then db1 will have 33% and db2 will have 67%. Tweak these values to balance the I/O load as seen in Ganglia.
- Master switch
- The first server in a given section in sectionLoads is the master, so you can change the master by changing the order of servers or commenting out the first server. See switch master for the full procedure. The master should generally have zero read load, because the load from applications requiring a master connection will be substantial.
- Reclaiming disk space
- See Reclaim space on full db servers.
[edit] Comprehensive documentation
The primary source of documentation is includes/db/LBFactory_Multi.php, copied below:
- sectionsByDB
- A map of database names to section names
- sectionLoads
- A 2-d map. For each section, gives a map of server names to load ratios. For example: array( 'section1' => array( 'db1' => 100, 'db2' => 100 ) )
- serverTemplate
- A server info associative array as documented for $wgDBservers. The host, hostName and load entries will be overridden.
- groupLoadsBySection
- A 3-d map giving server load ratios for each section and group. For example: array( 'section1' => array( 'group1' => array( 'db1' => 100, 'db2' => 100 ) ) )
- groupLoadsByDB
- A 3-d map giving server load ratios by DB name.
- hostsByName
- A map of hostname to IP address.
- externalLoads
- A map of external storage cluster name to server load map
- externalTemplateOverrides
- A set of server info keys overriding serverTemplate for external storage
- templateOverridesByServer
- A 2-d map overriding serverTemplate and externalTemplateOverrides on a server-by-server basis. Applies to both core and external storage.
- templateOverridesByCluster
- A 2-d map overriding the server info by external storage cluster
- masterTemplateOverrides
- An override array for all master servers.
- readOnlyBySection
- A map of section name to read-only message. Missing or false for read/write.
[edit] Replication lag
When a transaction completes on the master database server, the transaction is written to the binlog, and then the slave servers begin executing it. Replication lag is the time from when the binlog entry was written, to the time the slave server finishes executing that entry.
Replication lag is a performance problem, and so the solution is potentially complex, involving software optimisation or hardware expansion.
[edit] Long-running reads
The most common cause is when a long-running read query executes on the slave, thrashing the disk and causing all other queries (including write queries) to execute slowly. The more severe incarnation of this is when large numbers of heavy read queries are queued up on all slaves, all slaves become lagged, and the master dies due to stale data failover in MediaWiki.
To diagnose this, run the SQL query SHOW FULL PROCESSLIST and save the result for later analysis. Look queries that have been running for longer than a few minutes. Identify common features of these queries, such as the function comment. Note that if the server is running very slowly, innocuous queries will become long-running. Generally the longest-running of all queries is the culprit.
The best emergency response to this, when general site performance is affected, is to first kill all the long-running queries. You can do this with shellscript:
hosts="ixia db8 lomaria thistle"
pattern="BadClass::overloadServers"
for host in $hosts; do
mysql -h $host -e 'show processlist' | grep "$pattern" | awk '{print "kill",$1,";"}' | mysql -h $host
done
This typically buys you a few minutes while the users resubmit their queries. Use these few minutes to identify the culprit feature, to patch it out and deploy. You can buy a few minutes more by running the kill script in a loop.
[edit] General overload
If SHOW PROCESSLIST shows large numbers of read queries running, but the read queries are just the usual MediaWiki traffic, and nothing new or special stands out, then this suggests a general overload. Check ganglia, iostat, etc. for the server in question. If one server is especially overloaded, reduce its load ratio. If all slaves are overloaded, buy more hardware.
[edit] High write rate
If SHOW PROCESSLIST shows few running read queries, but the writer thread is constantly busy, and the replication lag is increasing, this implies a write-driven overload.
(Note that if lag is decreasing in this scenario, it implies that some previous event caused the replication lag but that the server is recovering.)
The solution to this is to identify the source of the writes, and to throttle them appropriately. Maintenance scripts, for instance, need to do short-running queries and to call wfWaitForSlaves(5); regularly.
Hardware and configuration problems can also be responsible. A server with write-behind caching disabled (either due to BIOS configuration or a dead RAID controller battery) will have very slow write performance and so will tend to exhibit write-driven lag.
Similarly, servers with innodb_flush_log_at_trx_commit=1 will be unacceptably slow for our write load. It should be set to zero instead.
[edit] Broken replication
Broken replication can be identified by a constantly increasing replication lag with no replicated write queries running. SHOW SLAVE STATUS will typically show an error message which allows you identify the problem in more detail.
Connection errors, when the slave fails to connect to the master, are a common problem for new slaves. These show up in the server's error log.
Unusual errors on established slaves may indicate data corruption. After identifying the source of the problem, the simplest solution is to wipe slave's hard drive and resynchronise it from another slave.
[edit] Maintenance Recipies
[edit] clear out old binlogs to free up space
Sometimes binlogs don't get automatically expunged. To safely delete old unneeded binlogs (good to keep 5-7 days worth if you can):
- look at the time stamps of the binlogs and decide the oldest one you want to keep (eg .010)
- purge all older logs:
PURGE BINARY LOGS TO 'mysql-bin.010';
This command can be safely run on a master. More info can be found in the mysql documentation for purge binary logs.
[edit] misc. snippets
[edit] show database / table sizes
[edit] size of one database, tables in detail
SELECT TABLE_NAME, table_rows, data_length, index_length, round(((data_length + index_length) / 1024 / 1024),2) "Size in MB" FROM information_schema.TABLES WHERE table_schema = "schema_name";
[edit] all databases
SELECT table_schema "Data Base Name", SUM( data_length + index_length) / 1024 / 1024 "Data Base Size in MB" FROM information_schema.TABLES GROUP BY table_schema ;