Setting up a MySQL replica

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(new mysql server setup and using xtrabackup to make into a slave)
 
 
(30 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Pre-requisite ==  
+
'''<span style=text-decoration:blink>Notice!</span> this document is intended for building new production wiki databases. Following it while trying to build a one-off instance for a random app in your labs instance may result in untold suffering. It is a violation of federal law to use this product in a manner inconsistent with its labeling. '''
 +
== New Server Setup ==
 +
===Hardware Raid===
 +
* Ensure hardware raid is properly configured (raid-10, 256k strip, writeback cache, no read ahead)
 +
*:* Verify with MegaCli
 +
*:**<code>root@db59:~# MegaCli -LDInfo -lALL -aALL | egrep "Level|Current Cache|Stripe"
 +
*:**RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
 +
*:**Stripe Size: 256kB
 +
*:**Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU</code>
 +
*:* If cache policy says 'ReadAdaptive' instead of 'ReadAheadNone', run <code>MegaCli -LDSetProp NORA -Lall -Aall</code>
 +
===XFS===
 +
* A newly imaged server should have LVS setup correctly, with /dev/mapper/tank-data mounted at /a.  However since lucid doesn't allow us to pass mkfs.xfs options at install time, /a needs to be reformatted with correct sunit/swidth settings.  Mount options in fstab also need to be changed.
 +
*:* <code>umount /a
 +
*:* mkfs.xfs -f -d sunit=512,swidth=4096 -L data /dev/mapper/tank-data</code>
 +
*:* edit /etc/fstab, change the mount options for /a from <code>defaults</code> to <code>noatime,nobarrier</code>
 +
*:* <code>mount /a</code>
  
* Install current xtrabackup, i.e. xtrabackup_1.4-193.lucid.25_amd64.deb on all servers involved ('''To Do:''' add to wfm apt repo, install via puppet on all dbs)  
+
===Puppet===
 +
* Make sure server is in the correct groups in site.pp
 +
<code>
 +
        node /db6[0-9]\.pmtpa\.wmnet/ {
 +
        include db::core,
 +
                mysql::mysqluser,
 +
                mysql::datadirs,
 +
                mysql::conf,
 +
                mysql::packages }</code>
 +
* Make sure the server is defined in the correct shard section in mysql.pp
 +
<code>
 +
        #######################################################################
 +
        ### Cluster Definitions - update if changing / building new dbs
 +
        #######################################################################
 +
        if $hostname =~ /^db(12|32|36|38|42|52|53|59|60|1001|1017|1033|1043|1047)$/ {
 +
                $db_cluster = "s1"
 +
        }
 +
</code>
 +
* If the server is intended to be an lvm snapshot host, add here in mysql.pp
 +
** '''Do this last, after replication is configured and caught up.'''
 +
<code>
 +
        #######################################################################
 +
        ### LVM snapshot hosts - currently only puppet managed in eqiad
 +
        #######################################################################
 +
        if $hostname =~ /^db(24|25|26|32|33|44|46|1005|1007|1018|1020|1022|1033|1035)$/ {
 +
                $snapshot_host = true
 +
        }
 +
</code>
 +
* puppetd --test on the new server
  
== New Server Setup ==
+
==Transferring Data==
 +
=== On New/Intended Slave ===
 +
* ensure puppet runs cleanly
 +
* <code>cd /a/sqldata</code>
 +
* <code>nc -l 9210 | tar xvi</code>
  
* Ensure hardware raid is properly configured (raid-10, 256k strip, writeback cache, no read ahead)
+
=== On Existing DB ===
* Setup /a xfs volume according to Domas' xfsfix (i.e. ''mkfs.xfs -f -d sunit=512,swidth=4096 -L data /dev/sda3'' with uuid preservation)
+
'''This process is only intended for databases using only InnoDB tables!'''
** Can be run from root@fenari via: ('''To Do:''' put this somewhere standard, /home/w/bin?)
+
* Select a source slave based on the intended use of the new slave. 
*** cat /home/midom/xfsfix | ssh NEWDB "python" - to see what will runif ok:
+
** If you are creating a slave in the active datacenter intended for site production, use either the secondary master in the failover datacenter which can be found via dns (<code>host sX-secondary</code>) or an active production slave. This way replication data will be based on the active write master.
*** cat /home/midom/xfsfix | ssh NEWDB "python | bash"
+
** If creating a slave in the failover datacenter, or for analytics/non-site use in any datacenter, select a slave in the alternate datacenterReplication in this case needs to come from the secondary, not the live write master.
* Add to /etc/fstab as /a via device uuid and include nobarrier,noatime options, mount /a
+
* <code>innobackupex-1.5.1 --stream=tar /a/sqldata --user=root --slave-info | nc NEW-SERVER 9210</code>
* mkdir /a/sqldata /a/tmp ; chown -R mysql: /a
+
** This results in a brief write lock towards the end of the process while the table definition files (but not data) are copied. For enwiki, this is around 200Mb of data. For this reason, this shouldn't be run on a write master if slaves are available.
* Install current prod mysql version
+
** Currently copying /usr/local/mysql-at-facebook from the source db and symlinking to /usr/local/mysql ('''To Do:''' make dpkg of our facebook patched mysql 5.1)
+
* Install /etc/my.cnf by copying from another host in the same mysql cluster
+
** Set server-id to the first, third, and fourth octets of the server ip address (i.e. 10.0.6.42 = 10642)
+
** Alter buffer pool size and other tunables if hardware sufficiently differs from the source server
+
  
== On Setup Slave ==
+
=== On New Slave, After xtrabackup ===
 +
'''Do this before starting mysql!'''
 +
* <code>innobackupex-1.5.1 --apply-log --use-memory=22G --ibbackup=xtrabackup_51 /a/sqldata</code>
 +
* <code>chown -R mysql: /a</code>
 +
* <code>/etc/init.d/mysql start</code>
 +
** tail the error log in /a/sqldata and check for any errors
  
* cd /a/sqldata
+
=== Starting Replication ===
* nc -l 9210 | tar xvif
+
* Check /a/sqldata/xtrabackup_slave_info on the new host, it provides a partial CHANGE MASTER statement based on whichever host the target was replicating from.  If this is correct, use this, adding MASTER_HOST and the repl account/password portions that are missing.
 +
* If building a prod slave and had to run the hot backup from the master, use the information in /a/sqldata/xtrabackup_binlog_info as slave_info won't exists.
 +
* If you are building a failover slave, or something for analytics and copied from the secondary master, xtrabackup_slave_info will point to the active master; use xtrabackup_binlog_info instead.
 +
** '''Important: If a production slave, this should point to the correct binlog position on the actual write master at which to start replication.  If creating a slave in the failover datacenter or for analytics, make sure you have replication information for the second level master. Do not point these dbs to the active master!'''
  
== On Existing DB ==  
+
=== Post Replication ===
 +
* Once everything is caught up, restart gmond: <code>/etc/init.d/ganglia-monitor restart</code>
 +
* If part of a cluster that uses heartbeat replication monitoring, restart: <code>/etc/init.d/pt-heartbeat restart</code>
  
* innobackupex-1.5.1 --stream=tar /a/sqldata --user=root --password=PASSWORD --slave-info | nc NEW-SERVER 9210
+
== Adding to Production ==
* This results in a brief write lock towards the end of the process while the table definition files (but not data) are copied. For enwiki, this is around 200Mb of data.  For this reason, this shouldn't be run on a write master if slaves are available.  
+
* Edit /home/w/common/wmf-config/db.php
 +
** Make sure the IP address mapping is present in hostsByName[]
 +
** Add to the appropriate shard in sectionLoads[]
 +
** Initially add with a low weight to give it time to warm up and deply.
 +
** Ensure the db operates well (check err and slow logs on the db, and db-error.log on fenari) and that replication keeps up under load.
 +
** Once warm, raise to its full load level, typically the same as other slaves of the same hardware type, or greater depending on capacity.
  
== On New Slave, after data copy ==  
+
== Slaves Not In Active Production (db.php) ==
 +
=== Make sure they'll receive schema migrations ===
 +
* Slaves of the production cluster that don't belong in db.php still need to get schema migrations whenever they are applied in prod in order to keep replicating.
 +
* Make sure they're added to the correct section in fenari:/home/w/common/wmf-config/db-secondary.php - it is an override for $wgLBFactoryConf['sectionLoads'] pointing only to slaves not used by the live MW site.
  
* innobackupex-1.5.1 --apply-log --use-memory=22G --ibbackup=xtrabackup_51 /a/sqldata
+
=== Making use-specific tweaks ===
* This is required to make the copied data consistent to a specific binlog position.   
+
Some DBs have differences from the normally generated configThese are applied via the puppet generated my.cnf, and further use case changes for prod slaves should be made similarly.  This outlines what we do for the research slaves.
* Start mysql, and CHANGE MASTER based on the information in /a/sqldata/xtrabackup_slave_info on the new host. '''Important:''' this should point to the correct binlog position on the actual write master on the cluster at which to start replication.  innobackupex will also display the log position on the slave the copy was made from, but we don't want to chain replication.
+
==== Research Slave ====
 +
Note: These tweaks are reflected in puppet within mysql.pp as $research_dbs
 +
*'''Hosts''': db42, db1047
 +
*'''Purpose''': slaves of the s1 cluster dedicated to intense queries by the Summer of Research team (aka community intern statisticians).
 +
*'''Contact''':
 +
*'''Tweaks''':
 +
** no binlog
 +
** innodb-locks-unsafe-for-binlog
 +
** slave_transaction_retries = very high
 +
[[Category:How-To]]
 +
[[Category:MySQL]]

Latest revision as of 22:07, 1 May 2012

Notice! this document is intended for building new production wiki databases. Following it while trying to build a one-off instance for a random app in your labs instance may result in untold suffering. It is a violation of federal law to use this product in a manner inconsistent with its labeling.

Contents

[edit] New Server Setup

[edit] Hardware Raid

  • Ensure hardware raid is properly configured (raid-10, 256k strip, writeback cache, no read ahead)
    • Verify with MegaCli
      • root@db59:~# MegaCli -LDInfo -lALL -aALL | egrep "Level|Current Cache|Stripe"
      • RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
      • Stripe Size: 256kB
      • Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
    • If cache policy says 'ReadAdaptive' instead of 'ReadAheadNone', run MegaCli -LDSetProp NORA -Lall -Aall

[edit] XFS

  • A newly imaged server should have LVS setup correctly, with /dev/mapper/tank-data mounted at /a. However since lucid doesn't allow us to pass mkfs.xfs options at install time, /a needs to be reformatted with correct sunit/swidth settings. Mount options in fstab also need to be changed.
    • umount /a
    • mkfs.xfs -f -d sunit=512,swidth=4096 -L data /dev/mapper/tank-data
    • edit /etc/fstab, change the mount options for /a from defaults to noatime,nobarrier
    • mount /a

[edit] Puppet

  • Make sure server is in the correct groups in site.pp

       node /db6[0-9]\.pmtpa\.wmnet/ {
       include db::core,
               mysql::mysqluser,
               mysql::datadirs,
               mysql::conf,
               mysql::packages }
  • Make sure the server is defined in the correct shard section in mysql.pp

       #######################################################################
       ### Cluster Definitions - update if changing / building new dbs
       #######################################################################
       if $hostname =~ /^db(12|32|36|38|42|52|53|59|60|1001|1017|1033|1043|1047)$/ {
               $db_cluster = "s1"
       }

  • If the server is intended to be an lvm snapshot host, add here in mysql.pp
    • Do this last, after replication is configured and caught up.

       #######################################################################
       ### LVM snapshot hosts - currently only puppet managed in eqiad
       #######################################################################
       if $hostname =~ /^db(24|25|26|32|33|44|46|1005|1007|1018|1020|1022|1033|1035)$/ {
               $snapshot_host = true
       }

  • puppetd --test on the new server

[edit] Transferring Data

[edit] On New/Intended Slave

  • ensure puppet runs cleanly
  • cd /a/sqldata
  • nc -l 9210 | tar xvi

[edit] On Existing DB

This process is only intended for databases using only InnoDB tables!

  • Select a source slave based on the intended use of the new slave.
    • If you are creating a slave in the active datacenter intended for site production, use either the secondary master in the failover datacenter which can be found via dns (host sX-secondary) or an active production slave. This way replication data will be based on the active write master.
    • If creating a slave in the failover datacenter, or for analytics/non-site use in any datacenter, select a slave in the alternate datacenter. Replication in this case needs to come from the secondary, not the live write master.
  • innobackupex-1.5.1 --stream=tar /a/sqldata --user=root --slave-info | nc NEW-SERVER 9210
    • This results in a brief write lock towards the end of the process while the table definition files (but not data) are copied. For enwiki, this is around 200Mb of data. For this reason, this shouldn't be run on a write master if slaves are available.

[edit] On New Slave, After xtrabackup

Do this before starting mysql!

  • innobackupex-1.5.1 --apply-log --use-memory=22G --ibbackup=xtrabackup_51 /a/sqldata
  • chown -R mysql: /a
  • /etc/init.d/mysql start
    • tail the error log in /a/sqldata and check for any errors

[edit] Starting Replication

  • Check /a/sqldata/xtrabackup_slave_info on the new host, it provides a partial CHANGE MASTER statement based on whichever host the target was replicating from. If this is correct, use this, adding MASTER_HOST and the repl account/password portions that are missing.
  • If building a prod slave and had to run the hot backup from the master, use the information in /a/sqldata/xtrabackup_binlog_info as slave_info won't exists.
  • If you are building a failover slave, or something for analytics and copied from the secondary master, xtrabackup_slave_info will point to the active master; use xtrabackup_binlog_info instead.
    • Important: If a production slave, this should point to the correct binlog position on the actual write master at which to start replication. If creating a slave in the failover datacenter or for analytics, make sure you have replication information for the second level master. Do not point these dbs to the active master!

[edit] Post Replication

  • Once everything is caught up, restart gmond: /etc/init.d/ganglia-monitor restart
  • If part of a cluster that uses heartbeat replication monitoring, restart: /etc/init.d/pt-heartbeat restart

[edit] Adding to Production

  • Edit /home/w/common/wmf-config/db.php
    • Make sure the IP address mapping is present in hostsByName[]
    • Add to the appropriate shard in sectionLoads[]
    • Initially add with a low weight to give it time to warm up and deply.
    • Ensure the db operates well (check err and slow logs on the db, and db-error.log on fenari) and that replication keeps up under load.
    • Once warm, raise to its full load level, typically the same as other slaves of the same hardware type, or greater depending on capacity.

[edit] Slaves Not In Active Production (db.php)

[edit] Make sure they'll receive schema migrations

  • Slaves of the production cluster that don't belong in db.php still need to get schema migrations whenever they are applied in prod in order to keep replicating.
  • Make sure they're added to the correct section in fenari:/home/w/common/wmf-config/db-secondary.php - it is an override for $wgLBFactoryConf['sectionLoads'] pointing only to slaves not used by the live MW site.

[edit] Making use-specific tweaks

Some DBs have differences from the normally generated config. These are applied via the puppet generated my.cnf, and further use case changes for prod slaves should be made similarly. This outlines what we do for the research slaves.

[edit] Research Slave

Note: These tweaks are reflected in puppet within mysql.pp as $research_dbs

  • Hosts: db42, db1047
  • Purpose: slaves of the s1 cluster dedicated to intense queries by the Summer of Research team (aka community intern statisticians).
  • Contact:
  • Tweaks:
    • no binlog
    • innodb-locks-unsafe-for-binlog
    • slave_transaction_retries = very high
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox