PARSEC Technical Information

Intro

The following is a lab scenario with two Linux systems running OCFS2 using DRBD backing stores. I did this on CentOS7, Devuan Beowolf, and Oracle Enterprise Linux 7. The machines were actually VMware guest systems. I used a single IP subnet and IP for the whole thing. I choose to use “local” heartbeat for the OCFS2 clustering since I was only using two nodes. If your number of nodes exceeds your number of CPU threads, then you should switch to “global” heartbeat which uses a shared quorum disk and lowers the amount of network traffic and load on the cluster members.

Configuring DRBD

I added an extra 1GB “disk” to each VM (/dev/sdb). The nodes were named “alpha” and “beta”. They used IP addresses 10.217.202.70 and 10.217.202.71. Both on a /16 CIDR block. The VMware ethernet was setup for bridging and they could talk over the same segment.

DRBD has worked well for me when I needed a shared disk setup but a SAN, or shared SCSI was too much hassle or overkill. I realize I could have also used a shared VMware VMDK by hacking the VMware guests' .vmx and .vmdk files manually. However, I have had mixed results with that in the past. That appearently doesn't have the same dynamics as most shared disk systems and confused the clustering software I was using when I last tried it. So, at this point, I trust DRBD more than shared VMDK's. Plus, using drbdadm one can check the state of the shared disk for inconsistency and that it's in a dual-primary state.

I dislike the fragmented style of RHEL's /etc/drbd.conf so I stomped it and wrote my own.

global {
 usage-count no;
}
common {
 net {
  protocol C;
  cram-hmac-alg   sha1;
  csums-alg sha1;
  shared-secret   "1234SecretPassword1234";
  use-rle yes;
  allow-two-primaries yes;
  after-sb-0pri discard-zero-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
 }
}
resource drbd0 {
  startup {
    wfc-timeout 300;
    degr-wfc-timeout 0;
    become-primary-on both;
  }
  disk /dev/sdb;
  device /dev/drbd0;
  meta-disk internal;
  on alpha {
          address 10.217.202.70:7789;
  }
  on beta {
          address 10.217.202.71:7789;
  }
}

Last minute checklist for DRBD setup before we begin

Do both machines have a disk/LUN of the same size for DRBD to use?
Do you understand you'll be using /dev/drbd0 in this example, NOT /dev/sdb for your actual device-of-use?
Do both machines have each other in their /etc/host files? Are they pingable back and forth?
If this is over a WAN, you should adjust the drbd.conf for bandwidth limiting. You should probably also fork over the license for DRBD proxy so you can do fancy buffered async replication.

Final DRBD Steps

First you should make sure you have the drbd kernel module available. Just do:

modprobe drbd
lsmod | grep drbd

If that worked, you should see it listed by the lsmod command. You also need the DRBD CLI utils called drbd-utils (for Devuan, Ubuntu, or Debian if you must). On RHEL7, CentOS7, or OEL7 you will probably need the newest version of DRBD that includes both the kernel module and the utilities package. In my case this was drbd90-utils and kmod-drbd90 which came from the ELrepo repository. The reason I did not use the default packages was because the Red Hat based distros had a mismatch between the DRBD kernel version and the utils they came with. This resulted in DRBD basically breaking whenever you ran drbdadm. So, if you get errors trying to start the DRBD device and you are on a late-version of RHEL7, CentOS7, or OEL7, you might need to use the Elrepo packages, too. In either case you can use the configuration file from above.

Now it's time to initialize the local device and start DRBD

# Do this on both nodes
drbdadm create-md drbd0

# Make sure that the drbd modules are loaded (RHEL/OEL/CENTOS)
systemctl start drbd
systemctl enable drbd

# If you use Devuan, Ubuntu, or Debian
cd /etc/init.d ; ./drbd start

# Bring up the shared device for the first time and become the primary side
drbdadm up drbd0
drbdadm primary drbd0 --force

# Now on the second node (beta), do the same but you should not need --force
drbdadm up drbd0
drbdadm primary drbd0

When the DRBD device is up and working, it should allow both nodes to be “primary” on the device. This allows writing on both nodes. Keep in mind there are specifics in my configuration file that allow this since it's not the default. Check to make sure both nodes are primary before you begin on OCFS2 setup.

# Let's check the status in two different ways
# cat /proc/drbd
version: 8.4.10 (api:1/proto:86-101)
srcversion: 473968AD625BA317874A57E 
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:40858 nr:40852 dw:81710 dr:90176 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

# drbdadm status
drbd0 role:Primary
  disk:UpToDate
  peer role:Primary
    replication:Established peer-disk:UpToDate

In my example you can see both nodes are primary and we are all ready to move onto OCFS2. Keep in mind I did things this way because DRBD is an easy way to achieve a shared storage disk/LUN. However, there are several other methods known to work including:

Fiber Channel Disks which have their masking and zoning setup to allow multiple host access. Usually these need to have SCSI3 locking enabled and ALUA also turned on.
Shared SCSI disks (using an external array with dual SCSI ports).
iSCSI disks which have a shared ACL allowing multiple hosts.
NBD devices (though why do this when we now have DRBD?)
AoE shared ATA LUNs or disks (CORAID users or vblade disks). Much nicer than it sounds.
Shared LUNs or disks in KVM, Qemu, VMware, or VirtualBox. In VMware specifically, these can be problematic. I've had many issues in both setup and long term configuration with this method. I personally avoid it.

OCFS2 Configuration

Under Devuan or Debian-clones, you'll need apt/dpkg packages called ocfs2-tools and ocfs2-tools-dev. If you are using RHEL, OEL, or CentOS then you'd want three yum/rpm packages, ocfs2, ocfs2-tools, and probably also ocfs2console which is a GUI version of the o2cb command.

First of all, be aware that o2cb, the cluster management tool for OCFS2, is VERY sensitive to formatting and whitespace. It's best to let o2cb make changes to your cluster configuration, then copy the config to any additional nodes. Hand-editing the file is not recommended and can cause some very strange errors that are hard to track down. When you run o2cb, it will make changes to the cluster configuration file itself and use the right formatting. That's the right way to go.

First thing you'll want to do is edit your default settings for o2cb. Do this by editing the file /etc/default/o2cb (on Devuan, Ubuntu, or Debian) or /etc/

# Create a cluster "test1" and add both nodes to it
o2cb add-cluster test1
o2cb add-node test1 alpha
o2cb add-node test1 beta

# If on an older RHEL-alike distro do this, too
# it's not needed and won't work in Devuan or Debian-alikes
/etc/init.d/o2cb configure

# Now register the cluster 
o2cb register-cluster test1

At this point you should have your cluster configuration file built up. It's default location on all Linux distros is /etc/ocfs2/cluster.conf. Go ahead and check it out on this first node you are configuring. It should look something like this.

cluster:
        heartbeat_mode = local
        node_count = 2
        name = test1

node:
        number = 0
        cluster = test1
        ip_port = 7777
        ip_address = 10.217.202.70
        name = alpha

node:
        number = 1
        cluster = test1
        ip_port = 7777
        ip_address = 10.217.202.71
        name = beta

Setting the OCFS2 Cluster Defaults and Kernel Modules

Unfortunately, there is still quite a bit to do. First off, your init script or systemd-unit should be loading up all the OCFS2 related kernel modules. Here is what I needed and found loaded after going through it all: configfs ocfs2_dlmfs ocfs2 ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue.

I added those modules to /etc/modules to load automatically, though in most cases the startup scripts (systemd or init) will take care of that for you. You can double check Oracle's Best Practices for OCFS2 document for more detail.

On RHEL7 or older systems the /etc/init.d/o2cb script will do all the module loading and unloading. For newer systems, your systemd unit for o2cb should have pre-requisite kernels specified, but since Systemd is such a bum system, I've yet to see a properly setup unit-plus-script setup that works right for OCFS2. I had to basically do everything manually and troubleshoot a lot of breakage when I did this same setup on RHEL8. It was terrible. My advice would be to stay away from those for a few years until they get it enhanced enough to be usable in production. RHEL7 and friends use systemd, also, but still call the old o2cb helper script to do the heavy lifting (so much for Systemd not needing to run scripts, that's out the window).

Now, you'll need to edit the defaults file for o2cb. On Devuan it's in /etc/default/o2cb on RHEL-like systems it'll be /etc/sysconfig/o2cb. I only edited the on/off setting and the default cluster name. I left the rest alone. Be aware that the O2CB_ENABLED field is defaulting to “false”. So, that was killing my efforts on Devuan and Debian clones. Additionally, I set the O2CB_BOOTCLUSTER to be my cluster name (in this case “test1”). The timeouts I left alone.

# O2CB_ENABLED: 'true' means to load the driver on boot.
O2CB_ENABLED=true

# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=test1

# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=31

# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead.
O2CB_IDLE_TIMEOUT_MS=30000

# O2CB_KEEPALIVE_DELAY_MS: Max. time in ms before a keepalive packet is sent.
O2CB_KEEPALIVE_DELAY_MS=2000

# O2CB_RECONNECT_DELAY_MS: Min. time in ms between connection attempts.
O2CB_RECONNECT_DELAY_MS=2000

OCFS2 Filesystem Setup

Now you should have the prerequisites for your final steps. You've got a shared storage disk/LUN which DRBD has now sync'd in an active-active (dual “primary” in DRBD terms). You also have the ocfs2-tools and ocfs2-tools-dev packages installed. Now you can lay down the OCFS2 structure on-disk and get to the fun part of starting the filesystem and cluster. Copy over your /etc/ocfs2/cluster.conf to the second node in the same location.

# Sync the cluster configuration to node2 (beta)
scp /etc/ocfs2/cluster.conf root@beta:/etc/ocfs2

# Let the utility compute the right sizes for blocks etc...
# I like using the ocfs2 cluster name as the filesystem label if
# I'm only dealing with one filesystem
mkfs.ocfs2 -L "test1" /dev/sdb

# Now create a mount point directory
mkdir /testfs

# Add an fstab entry, because mount.ocfs2 wants it
# even for ad-hoc mounting, I've found. 
echo "/dev/sdb /testfs ocfs2 _netdev,defaults 0 0" >> /etc/fstab

# Now the filesystem should mount up. 
mount /testfs

The most common reasons for things going wrong are usually:

You hand-edited your ocfs2 cluster.conf file and put in some whitespace that's confusing o2cb
You didn't synchronize the cluster.conf or it has bad permissions on the destination size
Some of the kernel modules aren't started probably because your o2cb init script or unit file isn't working as expected.
The configfs kernel module is loaded, but there is no configfs mounted. Mount it up and add it to your local /etc/fstab file. Ala “mount -t configfs configfs /sys/kernel/config”. Also be aware sometimes on some versions of configfs, they like you to use /config as the mount point rather than /sys/kernel/config. I have no idea, but it was a difference in Devuan versus RHEL.
For whatever reason the Distributed Lock Manager isn't mounting it's filesystem (which is supposed to be automatic). Try mounting it and see if it changes your o2cb problems (especially if it's complaining it can't see the cluster). Ie.. try: mount -t ocfs2_dlmfs ocfs2_dlmfs /dlm
You forgot the step to register the cluster on both sides: ie.. “o2cb register-cluster test1”
Not all the required kernel modules for OCFS2 are loaded (listed up above).
You didn't add any entry to /etc/fstab and you are mounting by directory name only.
You forgot to enable your cluster in the defaults/sysconfig file for o2cb
You have the wrong or an old cluster name in the defaults/sysconfig file for o2cb

One problem I kept running into on the second node was the error “o2cb: Unable to access cluster service while initializing the cluster”. This seemed to be caused chiefly by not having the configfs and DLM mount points up which prevented me from ever running the “o2cb register-cluster test1” command successfully. Without a registered cluster, that error is what you see from o2cb for just about everything.

For more troubleshooting check out the OCFS2 FAQ from Oracle.