What is the default size of journals in GFS?
When you run mkfs.gfs2 without the size attribut for journal to create a GFS2 partition, by default a 128MB size journal is created which is enough for most of the applications

In case you plan on reducing the size of the journal, it can severely affect the performance. Suppose you reduce the size of the journal to 32MB it does not take much file system activity to fill an 32MB journal, and when the journal is full, performance slows because GFS2 has to wait for writes to the storage.

What is a Quorum Disk?
  • Quorum Disk is a disk-based quorum daemon, qdiskd, that provides supplemental heuristics to determine node fitness.
  • With heuristics you can determine factors that are important to the operation of the node in the event of a network partition
For a 3 node cluster a quorum state is present untill 2 of the 3 nodes are active i.e. more than half. But what if due to some reasons the 2nd node also stops communicating with the the 3rd node? In that case under a normal architecture the cluster would dissolve and stop working. But for mission critical environments and such scenarios we use quorum disk in which an additional disk is configured which is mounted on all the nodes with qdiskd service running and a vote value is assigned to it.

So suppose in above case I have assigned 1 vote to qdisk so even after 2 nodes stops communicating with 3rd node, the cluster would have 2 votes (1 qdisk + 1 from 3rd node) which is still more than half of vote count for a 3 node cluster. Now both the inactive nodes would be fenced and your 3rd node would be still up and running being a part of the cluster.


  • This is a service termed as Resource Group Manager
  • RGManager manages and provides failover capabilities for collections of cluster resources called services, resource groups, or resource trees
  • it allows administrators to define, configure, and monitor cluster services. In the event of a node failure, rgmanager will relocate the clustered service to another node with minimal service disruption
  • luci is the server component of the Conga administration utility
  • Conga is an integrated set of software components that provides centralized configuration and management of Red Hat clusters and storage
  • luci is a server that runs on one computer and communicates with multiple clusters and computers via ricci

  • ricci is the client component of the Conga administration utility
  • ricci is an agent that runs on each computer (either a cluster member or a standalone computer) managed by Conga
  • This service needs to be running on all the client nodes of the cluster.
What is cman in Red Hat Cluster?
  • This is an abbreviation used for Cluster Manager. 
  • CMAN is a distributed cluster manager and runs in each cluster node. 
  • It is responsible for monitoring, heartbeat, quorum, voting and communication between cluster nodes.
  • CMAN keeps track of cluster quorum by monitoring the count of cluster nodes.



What are the different port no. used in Red Hat Cluster?
IP Port no.
Protocol
Component
5404,5405
UDP
corosync/cman
11111
TCP
ricci
21064
TCP
dlm (Distributed Lock Manager)
16851
TCP
Modclustered
8084
TCP
luci
4196,4197
TCP
rgmanager

How does NetworkManager service affects Red Hat Cluster?
  • The use of NetworkManager is not supported on cluster nodes. If you have installed NetworkManager on your cluster nodes, you should either remove it or disable it.
  • # service NetworkManager stop
  • # chkconfig NetworkManager off
  • The cman service will not start if NetworkManager is either running or has been configured to run with the chkconfig command
What is the command used to relocate a service to another node?
clusvcadm -r service_name -m node_name


What is split-brain condition in Red Hat Cluster?
  • We say a cluster has quorum if a majority of nodes are alive, communicating, and agree on the active cluster members. For example, in a thirteen-node cluster, quorum is only reached if seven or more nodes are communicating. If the seventh node dies, the cluster loses quorum and can no longer function.
  • A cluster must maintain quorum to prevent split-brain issues.
  • If quorum was not enforced, quorum, a communication error on that same thirteen-node cluster may cause a situation where six nodes are operating on the shared storage, while another six nodes are also operating on it, independently. Because of the communication error, the two partial-clusters would overwrite areas of the disk and corrupt the file system.
  • With quorum rules enforced, only one of the partial clusters can use the shared storage, thus protecting data integrity.
  • Quorum doesn't prevent split-brain situations, but it does decide who is dominant and allowed to function in the cluster.
  • quorum can be determined by a combination of communicating messages via Ethernet and through a quorum disk.