Ejabberd clustering

Some notes about ejabberd clustering (incomplete and probably wrong ;-) )

Setup the first node

Note: If you are already running a single ejabberd node (called ejabberd@localhost) i'd recommend to simply dump mnesia and import it again: changing the node name might be too much work

We will call the first node ejabberd@first.example.com
  • Install erlang and ejabberd
  • Create your ejabberd.cfg file
  • In ejabberdctl.cfg: set INET_DIST_INTERFACE={10,0,0,1} (where 10.0.0.1 is the ip of first.example.com!)
  • Start the node via ejabberdctl --node ejabberd@first.example.com
  • Check: ejabberdctl --node ejabberd@first.example.com status

Prepare mnesia on the second node

After ejabberd@first.example.com is running we can start to setup ejabberd on the second node (called: ejabberd@second.example.com):

Setup second.example.com

Simply install and configure erlang + ejabberd on second.example.com the same way as on first.example.com.(Note: do not forget to change INET_DIST_INTERFACE on the second node!).
The rest of this example assumes that ejabberd was installed with '--prefix=/opt/ejabber'

Copy the erlang cookie

Erlang uses a special cookie file to 'authorize' cluster nodes. Therefore the same cookie file must be used on the whole cluster. The file is located at /opt/ejabberd/var/lib/.erlang.cookie and should be copied from first.example.com to second.example.com

Setup mnesia replication

Switch to the user used for the ejabberd process and run:
$ cd /opt/ejabberd/var/lib/ejabberd
$ export HOME=/opt/ejabber/var/lib/ejabberd
$ erl -name ejabberd@second.example.com -mnesia extra_db_nodes "['ejabberd@first.example.com']" -s mnesia
# you are now in an erlang shell:

Erlang R14B02 (erts-5.8.3) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.3 (abort with ^G)
(ejabberd@second.example.com)1> mnesia:info().
schema : with 29 records occupying 3580 words of mem
opt_disc. Directory "/opt/ejabber/var/lib/ejabberd/Mnesia.ejabberd@second.example.com" is NOT used.
use fallback at restart = false
running db nodes = ['ejabberd@first.example.com','ejabberd@second.example.com']
stopped db nodes = []
master node tables = []
remote = [acl,caps_features,captcha,config,iq_response,
last_activity,local_config,mod_register_ip,motd,
motd_users,muc_online_room,muc_registered,muc_room,
offline_msg,passwd,privacy,private_storage,
reg_users_counter,roster,roster_version,route,s2s,
session,session_counter,sr_group,sr_user,vcard,
vcard_search]
ram_copies = [schema]
disc_copies = []
disc_only_copies = []
[] = [local_config,caps_features,mod_register_ip]
[{'ejabberd@second.example.com',ram_copies},
{'ejabberd@first.example.com',disc_copies}] = [schema]
[{'ejabberd@first.example.com',disc_copies}] = [config,privacy,passwd,roster,
last_activity,sr_user,
roster_version,motd,acl,sr_group,
vcard_search,motd_users,muc_room,
muc_registered]
[{'ejabberd@first.example.com',disc_only_copies}] = [offline_msg,vcard,
private_storage]
[{'ejabberd@first.example.com',ram_copies}] = [reg_users_counter,route,s2s,
captcha,session_counter,session,
iq_response,muc_online_room]
3 transactions committed, 0 aborted, 0 restarted, 0 logged to disc
0 held locks, 0 in queue; 0 local transactions, 0 remote
0 transactions waits for other nodes: []
ok

(ejabberd@second.example.com)4> mnesia:change_table_copy_type(schema, node(), disc_copies).
(ejabberd@second.example.com)5> mnesia:add_table_copy(roster, node(), disc_copies).
(ejabberd@second.example.com)6> mnesia:add_table_copy(passwd, node(), disc_copies).
(ejabberd@second.example.com)7> mnesia:change_table_copy_type(acl, node(), disc_copies).
(ejabberd@second.example.com)8> mnesia:change_table_copy_type(config, node(), disc_copies).
(ejabberd@second.example.com)9> mnesia:add_table_copy(offline_msg, node(), disc_only_copies).
(ejabberd@second.example.com)10> mnesia:add_table_copy(vcard, node(), disc_only_copies).
(ejabberd@second.example.com)11> mnesia:add_table_copy(private_storage, node(), disc_only_copies).
(ejabberd@second.example.com)12> q().

# ..and move the new mnesia replica to the correct place:
$ rm *
$ mv Mnesia*/* .
$ rmdir Mnesia*

Tricks

Removing a node from the cluster

This seems to be a rather common problem: Googling for 'remove ejabberd node' or 'remove mnesia node' leads to no result (well: this page may change it after it got indexed ;-) ). # on first.example.com:
mnesia:del_table_copy(schema, 'ejabberd@second.example.com').

Force load

You'll have to forcefully load (start) all tables if both nodes go down at the same time. This can be done via mnesia:force_load_table(tablename).