Quick Apache Pulsar cluster howto

Apache Pulsar is distributed messaging system. I was doing some POC and here are instructions on how to get it going quickly.

The official documentation is pretty decent, and the instructions below are distilled from it.

The setup is a three node Zookeeper cluster and three node broker cluster. The nodes are running CentOS 7 and Pulsar version is 2.5.1. First, there should be some DNS records:

zoo1.example.net      IN A	10.1.1.1
zoo2.example.net      IN A	10.1.1.2
zoo3.example.net      IN A	10.1.1.3
broker1.example.net   IN A	10.1.1.4
broker2.example.net   IN A	10.1.1.5
broker3.example.net   IN A	10.1.1.6
pulsar-cl.example.net IN A	10.1.1.7
pulsar-cl.example.net IN A	10.1.1.8
pulsar-cl.example.net IN A	10.1.1.9

The following steps are to be performed on all 6 systems.

Install Java:

[root@zoo1 ~]# yum install java-devel

Next, set $JAVA_HOME for the whole system by creating /etc/profile.d/java.sh with following content:

export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))

Decompress pulsar tarball and create symlink:

[root@zoo1 ~]# cd /opt
[root@zoo1 opt]# tar zxvf apache-pulsar-2.5.1-bin.tar.gz
[root@zoo1 opt]# ln -s apache-pulsar-2.5.1/ pulsar

Steps below need to be performed on the three zookeeper machines:

[root@zoo1 opt]# mkdir -p pulsardata/zookeeper
[root@zoo1 opt]# chown -R pulsar:pulsar pulsardata/

In /opt/pulsar/conf/zookeeper.conf make the following changes:

server.1=zoo1.example.net:2888:3888
server.2=zoo2.example.net:2888:3888
server.3=zoo3.example.net:2888:3888
dataDir=/opt/pulsardata/zookeeper

Now, each zookeeper server needs to have a unique ID. They do not necessarily have to be sequential, so for simplicity I used hostname index in /opt/pulsar/pulsardata/zookeeper/myid:

[root@zoo1 opt]# echo 1 > /opt/pulsar/pulsardata/zookeeper/myid
[root@zoo1 opt]# chown pulsar:pulsar /opt/pulsar/pulsardata/zookeeper/myid

Similarly, on zoo2 I would echo 2 into the myid file, and so on. Next, start zookeeper service. Note, that no systemd units are included in the tarball, so you have to make those yourself.

[root@zoo1 opt]# systemctl enable pulsar.zookeeper
[root@zoo1 opt]# systemctl start pulsar.zookeeper

Finally, initialize the zookeeper cluster. You only need to do this once on one machine in the cluster:

[root@zoo1 opt]# /opt/pulsar/bin/pulsar initialize-cluster-metadata --cluster pulsar-cl --zookeeper zoo1.example.net:2181 --configuration-store zoo1.example.net --web-service-url http://pulsar-cl.example.net:8080 --web-service-url-tls https://pulsar-cl.example.net:8443 --broker-service-url pulsar://pulsar-cl.example.net:6650 --broker-service-url-tls pulsar+ssl://pulsar-cl.example.net:6651

This concludes basic zookeeper setup. Now, onto remaining three broker nodes.

Make datadir for bookkeeper:

[root@broker1 opt]# mkdir -p pulsardata/bookkeeper
[root@broker1 opt]# chown -R pulsar:pulsar pulsardata/

In /opt/pulsar/conf/bookkeeper.conf specify zookeeper servers, optionally enable stateful function and set custom directories:

zkServers=zoo1.example.net:2181,zoo2.example.net:2181,zoo3.inteorute.net:2181
extraServerComponents=org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent
journalDirectory=/opt/pulsardata/bookkeeper/journal
ledgerDirectories=/opt/pulsardata/bookkeeper/ledgers

Now, you can start bookies, and again systemd units are not included in the Pulsar tarball:

[root@broker1 opt]# systemctl enable pulsar.bookkeeper
[root@broker1 opt]# systemctl start pulsar bookkeeper

Perform sanity check on broker nodes:

[root@broker1 opt]# /opt/pulsar/bin/bookkeeper shell bookiesanity

Finally, configure brokers. Set the following parameters in /opt/pulsar/conf/broker.conf:

zookeeperServers=zoo1.example.net:2181,zoo2.example.net:2181,zoo3.example.net:2181
configurationStoreServers=zoo1.example.net:2181.zoo2.example.net:2181,zoo3.example.net:2181
clusterName=pulsar-cl
functionsWorkerEnabled=true
allowAutoTopicCreation=false
managedLedgerDefaultEnsembleSize=2
managedLedgerDefaultWriteQuorum=2
managedLedgerDefaultAckQuorum=2

Next, verify ports with the ones used during matadata initialization:

brokerServicePort=6650
brokerServicePortTls=6651
webServicePort=8080
webServicePortTls=8443

Enable Pulsar functions in /opt/pulsar/conf/functions_worker.yml:

pulsarFunctionsCluster: pulsar-cl

Finally, start brokers:

[root@broker1 opt]# systemctl enable pulsar.broker
[root@broker1 opt]# systemctl start pulsar.broker

This should result in Pulsar cluster. There is no security or encryption set up. Unfortunately, the official docs are no complete when it comes to securing the individual components using SSL certificates. For now.