Search
Contents |
Usage
lucene-search is a search extension for MediaWiki based on the "Apache Lucene" search engine. This page attempts to give some information about the extension and how it is set up in the WikiMedia cluster, and to give details about the Lucene search engine.
Overview
Software
The system has two major software components, Extension:MWSearch and lsearchd.
The version of Lucene is 2.1 and jdk is sun-j2sdk1.6_1.6.0+update30.
Extension:MWSearch
Extension:MWSearch is a MW extension that overrides default search backend and send requests to lsearchd.
lsearchd
lsearchd (Extension:Lucene-search) is a versatile java daemon that can act as frontend, backend, searcher, indexer, highlighter, spellchecker, ... we use it to searches, highlight, spell-checks and act as an incremental indexer
Essentials
- configuration files:
- /etc/lsearch.conf - per-host local configuration
- in puppet: pmtpa: puppet/templates/lucene/lsearch.conf, eqiad: puppet/templates/lucene/lsearch.new.conf
- /home/wikipedia/conf/lucene/lsearch-global-2.1.conf - cluster-wide shared configuration.
- in puppet: pmtpa: puppet/templates/lucene/lsearch-global-2.1.conf.pmtpa.erb, eqiad: puppet/templates/lucene/lsearch-global-2.1.conf.eqiad.erb
- /etc/lsearch.conf - per-host local configuration
- started via /etc/init.d/lsearchd in pmtpa and /etc/init.d/lucene-search-2 in eqiad
- search frontent port 8123, index frontend port 8321; backend - RMI (RMI registry port 1099)
- logs in /a/search/logs
- indexes in /a/search/indexes
- jar in /a/search/lucene-search
- test with curl http://localhost:8123/search/enwiki/test
Installation
Now is deployed via puppet and without nfs with adding the class role::lucene::front-end::(pool[1-4]|prefix)
See http://wikitech.wikimedia.org/view/Search#Cluster_Host_Hardware_Failure for more details of bringing up a host.
Configuration
There is a shared configuration file /home/wikipedia/conf/lucene/lsearch-global-2.1.conf that contains information about the roles hosts are assigned in the search cluster. This way lsearchd daemons can communicate with each other to obtain the latest index versions, forward request if necessary, search over many hosts if the index is split, etc..
The per-host local configuration file is at /etc/lsearch.conf. Most importantly it defines SearcherPool.size, which should be set to local number of CPUs+1 if only one index is searched. This prevents CPUs from locking each other out. The other important property is Search.updatedelay which prevents all searches from trying to update their working copies of the index at the same time, and thus generate noticeable performance degradation.
Indexing
In pmtpa, searchidx2 is the indexer. In eqiad, searchidx1001 is the indexer.
- the search indexer serves as the indexer for the cluster
- the search indexer's lsearchd daemon is configured to act as indexer in addition to another proc, the incremental updater
- other indexing jobs, like indexing private wikis, spell-check rebuilds etc are in rainman's crontab on the search indexer
- the search indexer runs rsyncd to allow cluster members to fetch indexes
- other cluster hosts fetch indexes by rsync every 30 seconds, as defined by Search.updateinterval in lsearch-global-2.1.conf
Search Cluster: Shards, Pools, and Load Balancing Oh My!
This section has been derived from the following configuration:
- /home/wikipedia/common/wmf-config/lucene.php
- /home/wikipedia/conf/lucene/lsearch-global-2.1.conf
- /home/wikipedia/conf/pybal/pmtpa/search_pool[1-3]
Index Sharding
We shard search indexes across hosts in the cluster to accomodate index data footprint, hardware limitations, and utilization. As of Feb 2012, indexes were blatted across the cluster like so: see SearchShards.
Pools
We use a mixture of single-host and multi-host pools to direct requests to the servers that host the appropriate indexes. Where multi-hosts pools are employed we use pybal/LVS load balancing (running on lvs3) or in-code load balancing. As of Feb 2012 we have the following pool configuration:
| host | mw(?) pool | lvs pool | indexed data |
|---|---|---|---|
| search1 | enwiki | search_pool1 | enwiki.nspart1.sub1 enwiki.nspart1.sub2 |
| search2 | - | - | enwiki.nspart1.sub1.hl enwiki.spell |
| search3 | enwiki | search_pool1 | enwiki.nspart1.sub1 enwiki.nspart1.sub2 |
| search4 | enwiki | search_pool1 | enwiki.nspart1.sub1 enwiki.nspart1.sub2 |
| search5 | - | - | enwiki.nspart1.sub2.hl enwiki.spell |
| search6 | dewiki frwiki jawiki |
search_pool2 | dewiki.nspart1 dewiki.nspart2 frwiki.nspart1 frwiki.nspart2 itwiki.nspart1.hl jawiki.nspart1 jawiki.nspart2 |
| search7 | itwiki nlwiki plwiki ptwiki ruwiki svwiki zhwiki |
search_pool3 | itwiki.nspart1 nlwiki.nspart1 plwiki.nspart1 ptwiki.nspart1 ruwiki.nspart1 svwiki.nspart1 zhwiki.nspart1 |
| search8 | enwiki.prefix | - | enwiki.prefix |
| search9 | enwiki | search_pool1 | enwiki.nspart1.sub1 enwiki.nspart1.sub2 |
| search10 | - | - | dewiki.spell eswiki.spell frwiki.spell itwiki.spell nlwiki.spell plwiki.spell ptwiki.spell ruwiki.spell svwiki.spell |
| search11 | catch-all | - | *? commonswiki.nspart1 commonswiki.nspart1.hl commonswiki.nspart2 commonswiki.nspart2.hl |
| search12 | - | - | dewiki.|frwiki.|itwiki.|nlwiki.|ruwiki.|svwiki.|plwiki.|eswiki.|ptwiki.|jawiki.|zhwiki.))*.hl enwiki.spell |
| search13 | - | - | enwiki.nspart2* |
| search14 | eswiki | - | enwiki.nspart1.sub1.hl eswiki |
| search15 | dewiki frwiki jawiki |
search_pool2 | dewiki.nspart1 dewiki.nspart2 frwiki.nspart1 frwiki.nspart2 itwiki.nspart1.hl itwiki.nspart2 itwiki.nspart2.hl jawiki.nspart1 jawiki.nspart2 nlwiki.nspart1.hl nlwiki.nspart2 nlwiki.nspart2.hl plwiki.nspart2 ptwiki.nspart1.hl ptwiki.nspart2 ptwiki.nspart2.hl ruwiki.nspart1.hl ruwiki.nspart2 ruwiki.nspart2.hl svwiki.nspart2 zhwiki.nspart2 |
| search16 | - | - | dewiki.nspart1.hl dewiki.nspart2.hl eswiki.hl frwiki.nspart1.hl frwiki.nspart2.hl itwiki.nspart1.hl itwiki.nspart2.hl nlwiki.nspart1.hl nlwiki.nspart2.hl plwiki.nspart1.hl plwiki.nspart2.hl ptwiki.nspart1.hl ptwiki.nspart2.hl ruwiki.nspart1.hl ruwiki.nspart2.hl svwiki.nspart1.hl svwiki.nspart2.hl |
| search17 | - | - | dewiki.nspart1.hl dewiki.nspart2.hl eswiki.hl frwiki.nspart1.hl frwiki.nspart2.hl itwiki.nspart1.hl itwiki.nspart2.hl nlwiki.nspart1.hl nlwiki.nspart2.hl plwiki.nspart1.hl plwiki.nspart2.hl ptwiki.nspart1.hl ptwiki.nspart2.hl ruwiki.nspart1.hl ruwiki.nspart2.hl svwiki.nspart1.hl svwiki.nspart2.hl |
| search18 | *.prefix | - | *.prefix |
| search19 | - | - | dewiki.|frwiki.|itwiki.|nlwiki.|ruwiki.|svwiki.|plwiki.|eswiki.|ptwiki.))*.spell enwiki.nspart1.sub1.hl enwiki.nspart1.sub2.hl |
| search20 | - | - | enwiki.nspart1.sub1.hl enwiki.nspart1.sub2.hl |
Administration
Dependencies
[content needed]
Health/Activity Monitoring
[content needed]
Software Updates
The following script will build the latest version of lucene-search and deploy it to all searchers:
/home/rainman/salsa
(sync-all-lucene-search)
Stopping and fall back to MediaWiki's search
To disable lucene and fall back to MediaWiki's search, set $wgUseLuceneSearch = false in CommonSettings.php.
Adding new wikis
When a new wiki is created an initial index build needs to be made. First restart the indexer on searchidx2 to make sure the indexer knows about the new wikis, and then run the build-new script on appropriate wiki database name (i.e. replace wikidb with the wiki database name, e.g. wikimania2012wiki).
Run on searchidx2 as user rainman:
root@searchidx2:~# sudo -u rainman nohup /home/rainman/scripts/search-restart-indexer root@searchidx2:~# sudo -u rainman /home/rainman/scripts/build-new wikidb
Trouble
Main indexer on searchidx2/searchidx1001 is stuck
The search indexers very occassionally fall over. This looks like the ganglia load/traffic graphs falling to near-zero, and the cpu idle near 100%.
If indexing is stuck on searchidx2, run this script this script as user rainman (so he can restart later if necessary):
root@searchidx2:~# sudo -u rainman /home/rainman/scripts/search-restart-indexer
If indexing is stuck on searchidx1001, do the following:
root@searchidx1001:~# killall -g java root@searchidx1001:~# /etc/init.d/lucene-search-2 start root@searchidx1001:~# su -s /bin/bash -c "/a/search/lucene.jobs.sh inc-updater-start" lsearch
Individual lsearchd processes are crashing or nonresponsive
- Try starting the lsearch process in the foreground so you can watch what it does:
start-stop-daemon --start --user lsearch --chuid lsearch --pidfile /var/run/lsearchd.pid --make-pidfile --exec /usr/bin/java -- -Xmx20000m -Djava.rmi.server.codebase=file:///a/search/lucene-search/LuceneSearch.jar -Djava.rmi.server.hostname=$HOSTNAME -jar /a/search/lucene-search/LuceneSearch.jar
- Check log at /a/search/log/log for indications of obvious issues
root@search3:~# grep "^Caused by" /a/search/log/log|tail -20 Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded (oops, we hit java's memory limit)
Space Issues on Cluster Host
- check /a/search/indexes for unintended indexes, i.e. cruft from previous configurations, as the daemon doesn't know to delete indexes that are no longer in use.
- Can also create new shards. This will involve making a new lvs pool, and new entries into the hash structure in manifests/roles/lucene.pp
Cluster Host Hardware Failure
- If a host in lvs fails, lvs should depool it automatically, and a least one other host will pick up the load. If the host is not in lvs, and instead is accessed via RMI, then RMI will take care of the depooling.
- To bring up a new node with the same indexes/role, at it to the has structure in manifests/roles/lucene.pp, and into site.pp with the appropriate role class (ie.: the same as the failed node.)
- Bring up new node with puppet, make sure that the lucene-search-2 daemon is running, and that the rsync of the indexes from the indexer has finished.
- If the node has main namespace indexes, something of the form ??wiki.nspart[12] or ??wiki.nspart[12].sub[12], you can test that it's giving proper responses with something of the form
curl http://NODENAME:8123/search/??wiki/SomeTerm
- If the failed node has main namespace indexes, something of the form ??wiki.nspart[12] or ??wiki.nspart[12].sub[12], then you will need to adjust the pool's pybal configs accordingly (i.e. out with the old, in with the new).
Indexer Host Hardware Failure
- [what's the procedure for deploying a replacement indexer?]
Excess Load on a Cluster Host
- [which logs etc. to check for evidence of i.e. abuse, configuration issues, etc]