User:Dan Nessett/Technical/Notes on setting up CZ clones

Notes

Creating a CZ clone

Had to modify createAndPromote.php to check password for validity before creating user. Otherwise, if the password is invalid a "ghost" user is created in mwuser table.
Had to create a dummy user with user_id of 0, so XML dump import would work.
Need to set Xdebug variables for both apache2/php and php-cli versions of php.ini
Had to set xdebug.max_nesting_level=200 in /etc/php5/cli/php.ini so dump import wouldn't croak.
Some useful information on MW XML dumps: http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg01712.html, http://www.gossamer-threads.com/lists/wiki/wikitech/180598, http://meta.wikimedia.org/wiki/Data_dumps, http://meta.wikimedia.org/wiki/Xml2sql
Used cloc to count lines of PHP code in CZ:

Directory	Files	Blank Lines	Comments	PHP code statements
CZ phase 3	1005	56590	69544	460125
CZ includes	321	14769	33313	97375
CZ extensions	142	3769	6742	27350
CZ includes+extensions	463	18583	40055	124725

Using importDump.php in /maintenance I populated a version of CZ as a local development environment. The Statistics special page showed in excess of 129,000 pages. The import reported populating 116,400 pages (looking at the pages table, the exact number is 116,486). This checks out, since the daily dump of CZ does not include histories. There are approximately 12,700 live articles, each of which would have a history page. Noting, 116,500 + 12,700 = 129,200, it appears all content pages were loaded. However, it took in excess of 3 1/2 days (about 80 hours) to import the content. This suggests looking at more efficient import strategies (e.g., using mwdumper or converting to SQL with xml2sql and importing directly into the database).
Had trouble getting skins to work. I needed to set $wgScriptPath to /mediawiki/CZ_1_13_2/phase3. Originally had it set to $IP. But, that expands to /usr/local/src/mediawiki/CZ_1_13_2/phase3, which is not accessible through the apache2 server. The correct value uses the /mediawiki apache2 alias.

I now need to run maintenance/runJobs.php. The statistics page shows 272,975 queued jobs, so running all queued jobs is going to take a while. Dan Nessett 22:39, 23 November 2009 (UTC)

Had trouble getting texvc to work:

The message "failed to parse cannot write to or create math temp directory" signals problems with permissions on the images directory in phase3.
Need to ensure images directory has both a /math and /tmp subdirectory with read/write access and the images directory is accessible to the apache2 server (I simply chmod 777 both of them).
Originally had $wgUploadPath to "$IP/images". This is incorrect. This variable must be set to a URL prefix that is accessible to the apache2 server. Set it to "$wgScriptPath/images" and TeX math worked.
Ran into a strange problem where no matter how I changed the permissions on images/math and images/tmp, the message "failed to parse cannot write to or create math temp directory" appeared. Somehow this message stopped showing up. I don't know exactly why, but perhaps you need to clear the browser cache.
I tried putting the directories/files in images into the www-data subgroup and owned by www-data and then changing permissions on everything below images to 775. However, subversion needs to get to locks in this directory tree (even when images has the svn:ignore property). So, while math rendering worked, when I committed changes to the repository, subversion failed on attempting to create a lock in images/.svn. So, I finally gave up and executed sudo chmod -R 777 images. This seems to fix all math rendering and subversion problems, but it is very insecure.

Had trouble getting email to work. Since the installation is intended for local development, I chose to set up only local email. Therefore, every user must have an email address of <username>@localhost. When Ubuntu is installed, the exim4 MTA/MDA is installed by default. It is only necessary to set up an email client to receive emails. I used GNOME evoution (which is also installed by default). In order to set up evolution to receive local email, I had to set up the configuration as follows:

Account name: Local Email Account
Full Name: Dan Nessett
Email Address dnessett@localhost
Server Type: Local delivery
Configuration (path): /var/mail/dnessett
Server Type: Sendmail

When we have a CZ repository set up, need to exclude some directories in phase3 from version control.

In order to exclude all images in phase3/images from version control (other than those preloaded in icons), set property svn:ignore * on that directory.
Svn copy LocalSettings.php into config (after potentially locally deleting any existing version of that file there). Then svn delete LocalSettings.php in phase3. Set properties on phase3 to include "svn:ignore LocalSettings.php". Commit these changes. Then locally (not using svn) copy LocalSettings.php from config to phase3. This effectively removes LocalSettings.php from version control. So, local developers can make modifications to it and commit other changes without saving LocalSettings.php to the repository. If it ever becomes necessary to change the repository version of LocalSettings.php, the developer should merge changes in phase3/LocalSettings.php into config/LocalSettings.php and then commit the changes.

When ftp transferring a file created by svnadmin dump, make sure the transfer type is set to binary. Otherwise, when you attempt to import it, you will get an error like, "svnadmin: Dump stream contains a malformed header (with no ':') at:" Also when loading the dump, use svadmin load --ignore-uuid /path/to/repository < dumpfile. This will ensure the UUID specified in the dump file does not clobber the repository's existing UUID (this will happen if the repository being loaded has no revisions in it).
The command used to dump the cz database is: pg_dump cz | gzip > cz_dump.gz. This resulted (on 1-15-2010) in a 154MB file. Restore with gunzip -c cz_dump.gz | psql cz
The daily CZ data dump is located at: http://en.citizendium.org/wiki/CZ:Downloads
The bz2 version is uncompressed using the following command: bunzip2 cz.dump.current.xml.bz2
To import the current data dump, cd to /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance. If the data dump file is in home folder, import the dump using: php importDump.php ~/cz.dump.current.xml.
After importing dump, in maintenance directory execute: php refreshLinks.php. This will create a lot of Jobs. When refreshLinks completes, in maintenance directory execute php runJobs.php 2>&1 ~/runJobs.log. Running this utility will take a very long time. To reduce this run several instances of this utility at once. Here is a shell script that starts up 20 instances.

#!/bin/bash
cd /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance
php runJobs.php 2>&1 > ~/runJobs.log1&
php runJobs.php 2>&1 > ~/runJobs.log2&
php runJobs.php 2>&1 > ~/runJobs.log3&
php runJobs.php 2>&1 > ~/runJobs.log4&
php runJobs.php 2>&1 > ~/runJobs.log5&
php runJobs.php 2>&1 > ~/runJobs.log6&
php runJobs.php 2>&1 > ~/runJobs.log7&
php runJobs.php 2>&1 > ~/runJobs.log8&
php runJobs.php 2>&1 > ~/runJobs.log9&
php runJobs.php 2>&1 > ~/runJobs.log10&
php runJobs.php 2>&1 > ~/runJobs.log11&
php runJobs.php 2>&1 > ~/runJobs.log12&
php runJobs.php 2>&1 > ~/runJobs.log13&
php runJobs.php 2>&1 > ~/runJobs.log14&
php runJobs.php 2>&1 > ~/runJobs.log15&
php runJobs.php 2>&1 > ~/runJobs.log16&
php runJobs.php 2>&1 > ~/runJobs.log17&
php runJobs.php 2>&1 > ~/runJobs.log18&
php runJobs.php 2>&1 > ~/runJobs.log19&
php runJobs.php 2>&1 > ~/runJobs.log20&
wait

After running runJobs, in the maintenance directory run initStats.php --update. This will update the Statistics special page.
Importing compressed DB dump took ~21 minutes on Dual 1.8 GHz processor system with 4 GB of storage.
Loading IE6 under wine. First tried directions at:http://www.howtoforge.com/how-to-install-internet-explorer-on-ubuntu8.04. These ended in an error. Then tried: http://ubuntumanual.org/posts/171/install-internet-explorer-in-ubuntu-the-easiest-way. These worked. However, when I tried to install ie5.5 or ie5.0 Pyton seg-faulted.

Setting up a local development environment

Netbeans 6.8 plus the Java SE 6 Development Kit (JDK) package is at: NB6.8 + JDK.

Managing subversion source code

To delete all .svn directories (and their contents) from a source tree, execute: find . -name ".svn" -exec rm -rf {} \; This is from http://snippets.dzone.com/posts/show/2486
Nice subversion cheat sheet: http://www.abbeyworkshop.com/howto/misc/svn01/

Some useful information

importDump may create some jobs to run. By default a job is taken from the job queue and run for each page access. However, for a clone this is not going to empty the job queue very quickly. Consequently, the shell script given above. However, running this script will not tell you how many jobs remain. Another script showJobs.php will do this. Run it by executing in a terminal window:

cd /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance
php showJobs.php

An even more useful way to show job queue processing progress is to run showJobs periodically using the watch command:

cd /usr/local/src/mediawiki/CZ_1_13_2/phase3/maintenance
watch --interval=10 php showJobs.php

There is a utility called wikifind that search wiki xml dumps. This can be used to find mediawiki markup. The URL is http://meta.wikimedia.org/wiki/User:Micke/WikiFind.

sudo su -
cd /usr/local/src/wikifind
wget http://wikifind.wikispaces.com/space/showimage/wikifind.cpp

To compile:

yum install boost.x86_64
g++ wikifind.cpp -o wikifind -lboost_regex
cp wikifind /usr/bin

To run:

cd <directory where dumpfile is located>
wikifind

Setting up multiple clones on same machine

First checkout the CZ code for each duplicate and put it in separate directories. For example, suppose you want two duplicate clones, one for work on the existing CZ code and one for working on the refactoring branch. We will call these two duplicates CZ_1_13_2 and CZ_RF_1_13_2. Create two directories in /usr/local/src/mediawiki/. For the purpose of these instructions call them:

/usr/local/src/mediawiki/CZ_1_13_2/
/usr/local/src/mediawiki/CZ_refactor_1_13_2/

Now checkout the appropriate subversion code into each. For example, use the following commands:

cd /usr/local/src/mediawiki/CZ_1_13_2
svn co http://svn.citizendium.org/czrepo/trunk/phase3
cd /usr/local/src/mediawiki/CZ_refactor_1_13_2
svn co  http://svn.citizendium.org/czrepo/branches/CZ_refactor_1_13_2/phase3

Add an alias for each duplicate clone. For example, for the two clones CZ_1_13_2 and CZ_RF_1_13_2, add the following lines to the sites-enabled directory in /etc/apache2/ (Ububntu).

Alias /CZ_1_13_2 "/usr/local/src/mediawiki/CZ_1_13_2/phase3/index.php"
Alias /CZ_RF_1_13_2 "/usr/local/src/mediawiki/CZ_refactor_1_13_2/phase3/index.php"

These should be contained within the <VirtualHost> tag pair.

Follow the instructions on How_to_create_a_CZ_clone (section Configuring the CZ wiki software) for each duplicate, but before testing each, make the following edits to LocalSettings.php:

$wgScriptPath        = "/mediawiki/<directory_name>/phase3";
$wgScript            = "/<alias_name>/wiki";

For example for the CZ_1_13_2 clone the edits would be:

$wgScriptPath        = "/mediawiki/CZ_1_13_2/phase3";
$wgScript            = "/CZ_1_13_2/wiki";

and for the CZ_RF_1_13_2 clone the edits would be:

$wgScriptPath        = "/mediawiki/CZ_refactor_1_13_2/phase3";
$wgScript            = "/CZ_RF_1_13_2/wiki";

Save these edits and access each duplicate by their aliases:

http://localhost/CZ_1_13_2
http://localhost/CZ_RF_1_13_2

Setting up an MW clone is somewhat different. You first install the software and then access the index.php file. You must do this by referencing the directory in which the software is installed. For example, if you checkout the MW software into /usr/local/src/mediawiki/MW_1_13_2, then you must start the install by referencing:

http://localhost/mediawiki/MW_1_13_2/phase3/index.php

This takes you to an install web page telling that you must install Mediawiki. Click on the link and follow the instructions. Then add the alias you want to use for this wiki, For example, if you wish to call it MW_1_13_2, then you must add the following to the /etc/apache2/sites-enabled (Ubuntu) configuration file:

Alias /MW_1_13_2 "/usr/local/src/mediawiki/MW_1_13_2/phase3/index.php"

In your browser you would reference this wiki as:

http://localhost/MW_1_13_2

User:Dan Nessett/Technical/Notes on setting up CZ clones

Contents

Notes

Creating a CZ clone

Setting up a local development environment

Managing subversion source code

Some useful information

Setting up multiple clones on same machine

Navigation menu

User:Dan Nessett/Technical/Notes on setting up CZ clones

Notes

Creating a CZ clone

Setting up a local development environment

Managing subversion source code

Some useful information

Setting up multiple clones on same machine

Navigation menu

Search