Parsoid

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(Created page with "Parsoid is a service that parses converts between wikitext and HTML. The HTML contains additional metadata that allows it to be converted back ("round...")
 
Line 1: Line 1:
[[mw:Extension:Parsoid|Parsoid]] is a service that parses converts between wikitext and HTML. The HTML contains additional metadata that allows it to be converted back ("round-tripped") to wikitext. VisualEditor fetches the HTML for a given page from Parsoid, edits it, then delivers the modified HTML to Parsoid, which converts it back to wikitext.
+
[[mw:Extension:Parsoid|Parsoid]] is a service that parses converts between wikitext and HTML. The HTML contains additional metadata that allows it to be converted back ("round-tripped") to wikitext. VisualEditor fetches the HTML for a given page from Parsoid, edits it, then delivers the modified HTML to Parsoid, which converts it back to wikitext. Parsoid is a stateless HTTP server running on port 8000.
  
 
== Data flow ==
 
== Data flow ==
Line 21: Line 21:
 
                                                           |
 
                                                           |
 
                                                         Database
 
                                                         Database
 +
</pre>
 +
 +
== Caching and load balancing ==
 +
Parsoid is load balanced using LVS. The assigned service IPs are:
 +
* parsoid.svc.pmtpa.wmnet = 10.2.1.28 served by lvs3/lvs4, [http://noc.wikimedia.org/pybal/pmtpa/parsoid list of backends]
 +
* parsoid.svc.eqiad.wmnet = 10.2.2.28 served by lvs1003/lvs1006, [http://noc.wikimedia.org/pybal/eqiad/parsoid list of backends]
 +
 +
In pmtpa, there is also a Varnish machine (celsus) in front of the LVS group. MediaWiki is configured to access Parsoid through celsus. celsus also runs Parsoid, but it's depooled to allow Varnish to use the system's resources. It's set up this way so we can quickly change celsus from a caching proxy to a backend if we decide we need more CPU resources in the pool.
 +
 +
<pre>
 +
        celsus:6081            10.2.1.28:8000                    $selected_backend:8000
 +
MW API  -------------> Varnish ----------------> LVS (lvs3/lvs4) ------------------------> Parsoid
 
</pre>
 
</pre>

Revision as of 23:04, 11 December 2012

Parsoid is a service that parses converts between wikitext and HTML. The HTML contains additional metadata that allows it to be converted back ("round-tripped") to wikitext. VisualEditor fetches the HTML for a given page from Parsoid, edits it, then delivers the modified HTML to Parsoid, which converts it back to wikitext. Parsoid is a stateless HTTP server running on port 8000.

Data flow

Parsoid runs entirely on an internal subnet, so requests to it are proxied through the ve-parsoid API module. This module is implemented in extensions/VisualEditor/ApiVisualEditor.php and is invoked with a POST request to /w/api.php?action=ve-parsoid. The API module then sends a request to Parsoid, either GET /$prefix/$pagename to get the HTML for a page, or POST /$prefix/$pagename to submit HTML and get wikitext back. Parsoid itself also issues requests to /w/api.php to get the wikitext of the requested page and to do template expansion.

Once the ve-parsoid API module receives a response from Parsoid, it either relays it back to the client (when requesting HTML), or saves the returned wikitext to the page (when submitting HTML).

                (POST /w/api.php?action=ve-parsoid)              (GET /en/Barack_Obama)                (requests for page content and template expansions)
Client browser ------------------------------------------> API ---------------------------->  Parsoid -----------------------------------------------------> API
    ^                                                      | ^                                 |   ^                                                          |
    |                  (response)                          | |      (HTML)                     |   |                   (responses)                            |
    +------------------------------------------------------+ +---------------------------------+   +----------------------------------------------------------+


                (POST /w/api.php?action=ve-parsoid)              (POST /en/Barack_Obama)
Client browser ------------------------------------------> API ---------------------------->  Parsoid
                                                           | ^                                 |
                                               (save page) | |      (wikitext)                 |
                                                           | +---------------------------------+
                                                           |
                                                        Database

Caching and load balancing

Parsoid is load balanced using LVS. The assigned service IPs are:

In pmtpa, there is also a Varnish machine (celsus) in front of the LVS group. MediaWiki is configured to access Parsoid through celsus. celsus also runs Parsoid, but it's depooled to allow Varnish to use the system's resources. It's set up this way so we can quickly change celsus from a caching proxy to a backend if we decide we need more CPU resources in the pool.

         celsus:6081            10.2.1.28:8000                    $selected_backend:8000
MW API  -------------> Varnish ----------------> LVS (lvs3/lvs4) ------------------------> Parsoid
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox