=======Spectrum Scale (GPFS) with HA SMB ======= ==== Introduction. ==== //This article is exploring the process of creating falt-tollerant file storage with HA smb service based on Spectrum Scale (GPFS) v4.2. File system was deployed on three Linux %%CentOS 7%% servers with local disks. Servers connected with 2 ports Ethernet 10Gbit/s. All packages were installed manually like in 3.5 GPFS version. Manually installation was chosen because we are using Spacewalk and Puppet for deploiment.// ==== Scheme and Equipments. ==== **Scheme** {{:ru:jobs:rumgarbage_foratr.jpg?800|500}} Each of gpfs-nodeX to each of Ethernet/Linux switches. Bonding lacp is used for connecting to Ethernet switches. Switches configured with vPC.\\ To prevent of problems in cases of taking one of Frontend servers IP by other devices %%BackEnd%% LAN was configured. **Equipments** | Type | Model | Quantity | | Chassis| Supermicro SC846TQ-R900B| 1 | | CPU | Intel® Xeon® Processor CPU E5620 | 2| | MEM | 64GB*/32 | 1 | | Controller |Adaptec | 1 | *only for gpfs-node1-2 ==== Preparing ==== **Shortly about initial configuration of %%CentOS 7%% on gpfs-nodeX nodes** **1. (gpfs-nodeX) two disks was used for system partition.**\\ One volumes of RAID1 and 22 single volumes were created on RAID controller. \\ **Requre to turn off read/write cache for all volume there will be used for GPFS. With tune on cache we explore the losing of NSD with losing of the data after reboot node. ** \\ Installing following packets on every of nodes:\\ yum install bison byacc cscope kernel-devel boost-devel boost-regexp \ ksh rpm-sign ctags cvs diffstat doxygen flex gcc gcc-c++ gcc-gfortran \ gettext git indent intltool libtool zlib-devel libuuid-devel bc tcl tk \ tcsh mc ksh libaio ntp ntpdate openssh-clients wget tar net-tools \ patch patchutils rcs rpm-build subversion swig systemtap dkms -y **2. All ethernet ports in bond (mode 4)**\\ **3. Passwordless access between nodes for user root.**\\ The key shoud be created on one node and then copy to other to /root/.ssh/. ssh-keygen -t dsa (creation of passwordless key) cd /root/.ssh cat id_dsa.pub >> authorized_keys chown root.root authorized_keys chmod 600 authorized_keys echo "StrictHostKeyChecking no" > config **4. Don't forget to write all names hosts in /etc/hosts.**\\ 10.255.254.200 gpfs-node1 10.255.254.201 gpfs-node2 10.255.254.202 gpfs-node3 192.168.5.185 gpfs-node1.example.com 192.168.5.186 gpfs-node2.example.com 192.168.5.187 gpfs-node3.example.com 192.168.5.184 gpfs-share 192.168.5.184 gpfs-share.example.com **5. Configure ntpd on all nodes**\\ **6. FIREWALL**\\ Allow ports sshd and Specrum Scale GPFS. (using official documentation IBM) **7. Configuration of sysctl.conf** net.ipv4.tcp_timestamps=0 net.ipv4.tcp_sack=0 net.core.netdev_max_backlog=250000 net.core.rmem_max=16777216 net.core.wmem_max=16777216 net.core.rmem_default=16777216 net.core.wmem_default=16777216 net.core.optmem_max=16777216 net.ipv4.tcp_mem=16777216 16777216 16777216 net.ipv4.tcp_rmem=4096 87380 16777216 net.ipv4.tcp_wmem=4096 65536 16777216 **8. Add max open files to /etc/security/limits.conf (depends on your requirements)** * hard nofile 1000000 * soft nofile 1000000 **9. Disable Selinux** ==== Spectrum Scale GPFS 4.2 installation === Requires to use standard protocol or advanced protocol edition.\\ **All following should be complete for every of nodes.**\\ **1. Installation** Extract software and start to install. ./Spectrum_Scale_Protocols_Advanced-4.2.0.0-x86_64-Linux-install --text-only cd /usr/lpp/mmfs/4.2.0.0/gpfs_rpms/ rpm -ivh gpfs.base*.rpm gpfs.gpl*rpm gpfs.gskit*rpm gpfs.msg*rpm gpfs.ext*rpm gpfs.adv*rpm **2. Updates installation (4.2.0 version has bug)** ./Spectrum_Scale_Protocols_Advanced-4.2.0.2-x86_64-Linux-install --text-only cd /usr/lpp/mmfs/4.2.0.2/gpfs_rpms/ rpm -Uvh gpfs.base*.rpm gpfs.gpl*rpm gpfs.gskit*rpm gpfs.msg*rpm gpfs.ext*rpm gpfs.adv*rpm **3. Kernel module compilation** cd /usr/lpp/mmfs/src && make LINUX_DISTRIBUTION=REDHAT_AS_LINUX Autoconfig && make World && make rpm rpm -ivh ~/rpmbuild/RPMS/x86_64/gpfs.gplbin* **4. Set up paths** echo 'export PATH=$PATH:/usr/lpp/mmfs/bin' > /etc/profile.d/gpfs.sh export PATH=$PATH:/usr/lpp/mmfs/bin GPFS was successfully installed. ==== Cluster configuration==== **Configuration must be carried out on one of nodes.**\\ Quorum is used for reliability of cluster **1. File creation with the description of nodes.** cat /root/gpfs_create gpfs-node1:quorum-manager gpfs-node2:quorum-manager gpfs-node3:quorum-manager **2. Creating of cluster.** mmcrcluster -N /root/gpfs_create -p gpfs-node1 -s gpfs-node2 -r /usr/bin/ssh -R /usr/bin/scp -C gpfsstorage0 checking mmlscluster applying licenses mmchlicense server --accept -N gpfs-node1,gpfs-node2,gpfs-node3 checking licenses mmlslicence -L **3. Creating NSD.**\\ 5 pools were created on cluster. System and 4 pool for reliability (Eight is the maximum number of pools per one file system). Data and meta-data were storing on pool system. NSD description file creation. Format: block dev:node name::type of data:failure group number:pools name: block dev - sd* type of data - metadataOnly,dataOnly,dataAndMetadata Each node is failure group . cat /root/nsd_creation sdb:gpfs-node1::dataAndMetadata:7101:node1NSD3THDD1:: sdc:gpfs-node1::dataAndMetadata:7101:node1NSD3THDD2:: sdd:gpfs-node1::dataAndMetadata:7101:node1NSD3THDD3:: sde:gpfs-node1::dataAndMetadata:7101:node1NSD3THDD4:: sdf:gpfs-node1::dataAndMetadata:7101:node1NSD3THDD5:: sdg:gpfs-node1::dataAndMetadata:7101:node1NSD3THDD6:: sdh:gpfs-node1::dataOnly:7101:node1NSD3THDD7:pool1 sdi:gpfs-node1::dataOnly:7101:node1NSD3THDD8:pool1 sdj:gpfs-node1::dataOnly:7101:node1NSD3THDD9:pool1 sdk:gpfs-node1::dataOnly:7101:node1NSD3THDD10:pool1 sdl:gpfs-node1::dataOnly:7101:node1NSD3THDD11:pool2 sdm:gpfs-node1::dataOnly:7101:node1NSD3THDD12:pool2 sdn:gpfs-node1::dataOnly:7101:node1NSD3THDD13:pool2 sdo:gpfs-node1::dataOnly:7101:node1NSD3THDD14:pool2 sdp:gpfs-node1::dataOnly:7101:node1NSD3THDD15:pool3 sdq:gpfs-node1::dataOnly:7101:node1NSD3THDD16:pool3 sdr:gpfs-node1::dataOnly:7101:node1NSD3THDD17:pool3 sds:gpfs-node1::dataOnly:7101:node1NSD3THDD18:pool3 sdt:gpfs-node1::dataOnly:7101:node1NSD3THDD19:pool4 sdu:gpfs-node1::dataOnly:7101:node1NSD3THDD20:pool4 sdv:gpfs-node1::dataOnly:7101:node1NSD3THDD21:pool4 sdw:gpfs-node1::dataOnly:7101:node1NSD3THDD22:pool4 sdb:gpfs-node2::dataAndMetadata:7201:node2NSD3THDD1:: sdc:gpfs-node2::dataAndMetadata:7201:node2NSD3THDD2:: sdd:gpfs-node2::dataAndMetadata:7201:node2NSD3THDD3:: sde:gpfs-node2::dataAndMetadata:7201:node2NSD3THDD4:: sdf:gpfs-node2::dataAndMetadata:7201:node2NSD3THDD5:: sdg:gpfs-node2::dataAndMetadata:7201:node2NSD3THDD6:: sdh:gpfs-node2::dataOnly:7201:node2NSD3THDD7:pool1 sdi:gpfs-node2::dataOnly:7201:node2NSD3THDD8:pool1 sdj:gpfs-node2::dataOnly:7201:node2NSD3THDD9:pool1 sdk:gpfs-node2::dataOnly:7201:node2NSD3THDD10:pool1 sdl:gpfs-node2::dataOnly:7201:node2NSD3THDD11:pool2 sdm:gpfs-node2::dataOnly:7201:node2NSD3THDD12:pool2 sdn:gpfs-node2::dataOnly:7201:node2NSD3THDD13:pool2 sdo:gpfs-node2::dataOnly:7201:node2NSD3THDD14:pool2 sdp:gpfs-node2::dataOnly:7201:node2NSD3THDD15:pool3 sdq:gpfs-node2::dataOnly:7201:node2NSD3THDD16:pool3 sdr:gpfs-node2::dataOnly:7201:node2NSD3THDD17:pool3 sds:gpfs-node2::dataOnly:7201:node2NSD3THDD18:pool3 sdt:gpfs-node2::dataOnly:7201:node2NSD3THDD19:pool4 sdu:gpfs-node2::dataOnly:7201:node2NSD3THDD20:pool4 sdv:gpfs-node2::dataOnly:7201:node2NSD3THDD21:pool4 sdw:gpfs-node2::dataOnly:7201:node2NSD3THDD22:pool4 sdb:gpfs-node3::dataAndMetadata:7301:node3NSD3THDD1:: sdc:gpfs-node3::dataAndMetadata:7301:node3NSD3THDD2:: sdd:gpfs-node3::dataAndMetadata:7301:node3NSD3THDD3:: sde:gpfs-node3::dataAndMetadata:7301:node3NSD3THDD4:: sdf:gpfs-node3::dataAndMetadata:7301:node3NSD3THDD5:: sdg:gpfs-node3::dataAndMetadata:7301:node3NSD3THDD6:: sdh:gpfs-node3::dataOnly:7301:node3NSD3THDD7:pool1 sdi:gpfs-node3::dataOnly:7301:node3NSD3THDD8:pool1 sdj:gpfs-node3::dataOnly:7301:node3NSD3THDD9:pool1 sdk:gpfs-node3::dataOnly:7301:node3NSD3THDD10:pool1 sdl:gpfs-node3::dataOnly:7301:node3NSD3THDD11:pool2 sdm:gpfs-node3::dataOnly:7301:node3NSD3THDD12:pool2 sdn:gpfs-node3::dataOnly:7301:node3NSD3THDD13:pool2 sdo:gpfs-node3::dataOnly:7301:node3NSD3THDD14:pool2 sdp:gpfs-node3::dataOnly:7301:node3NSD3THDD15:pool3 sdq:gpfs-node3::dataOnly:7301:node3NSD3THDD16:pool3 sdr:gpfs-node3::dataOnly:7301:node3NSD3THDD17:pool3 sds:gpfs-node3::dataOnly:7301:node3NSD3THDD18:pool3 sdt:gpfs-node3::dataOnly:7301:node3NSD3THDD19:pool4 sdu:gpfs-node3::dataOnly:7301:node3NSD3THDD20:pool4 sdv:gpfs-node3::dataOnly:7301:node3NSD3THDD21:pool4 sdw:gpfs-node3::dataOnly:7301:node3NSD3THDD22:pool4 **System pool bigger then other pools because it using also for metadata. In a case of exhausted system pool file system stop to write anything.** Create NSD:\\ mmcrnsd -F NSD_creation **4. Configuration and startup.**\\ Starting GPFS. (form one of nodes) mmstartup -a To check.\\ mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 gpfs-node1 active 2 gpfs-node2 active 3 gpfs-node3 active ==== File system creation, file placement policies , acl NFSv4 access lists and snapshots==== **Configuration must be carried out on one of nodes.**\\ **1. Creation of file system** mmcrfs -T /gpfsst /dev/gpfsst0 -F /root/nsd_creation -D nfs4 -k nfs4 -A yes -B 128K -Q no -v yes -r 3 -R 3 -m 3 -M 3 **2. Placement polices** For ballancing between system and pools 1-4 was used RAND funcation. Function RAND has nonlinear distribution in difference from C-function random()\\ The results were received by test below.\\ Writing of 1000 files were used for testing of placement distribution. All values were found by several iterations \\ **Attention: RAND policy doesn't work with windows server/client. All data will be stored to default pool** Testing policy below.\\ mmlspolicy gpfsst -L RULE 'LIST_1' LIST 'allfiles' FROM POOL 'system' RULE 'LIST_2' LIST 'allfiles' FROM POOL 'pool1' RULE 'LIST_3' LIST 'allfiles' FROM POOL 'pool2' RULE 'LIST_4' LIST 'allfiles' FROM POOL 'pool3' RULE 'LIST_5' LIST 'allfiles' FROM POOL 'pool4' RULE 'to_pool1' SET POOL 'pool1' LIMIT(99) WHERE INTEGER(RAND()*100)<21 RULE 'to_pool2' SET POOL 'pool2' LIMIT(99) WHERE INTEGER(RAND()*100)<26 RULE 'to_pool3' SET POOL 'pool3' LIMIT(99) WHERE INTEGER(RAND()*100)<34 RULE 'to_pool4' SET POOL 'pool4' LIMIT(99) WHERE INTEGER(RAND()*100)<50 RULE DEFAULT SET POOL 'system' Testing policies mmchpolicy gpfsst /root/policy -I test Installing policies mmchpolicy gpfsst /root/policy -I yes Checking policies mmlspolicy gpfsst -L **3. ACL NFSv4** Use mmeditacl to edit ACL, mmgetacl to view; mmgetacl /gpfsst/eng/ #NFSv4 ACL #owner:root #group:root #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED group:EXAMPLE\dg_engineer:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED **4. Snapshots**\\ The script below will be create snapshot and delete snapshot with 30 days age or older. Script is stating every day with cron on gpfs-node1. cat /scripts/gpfs_snap_cr #!/bin/sh num=`/bin/date "+%Y%m%d"` nums=`/bin/date "+%s"` numms=`expr $nums - 2592000` numm=`/bin/date --date=@$numms "+%Y%m%d"` /usr/lpp/mmfs/bin/mmcrsnapshot gpfsst gpfsst$num /usr/lpp/mmfs/bin/mmdelsnapshot gpfssy gpfsst$numm ==== Installing of HA smb service. ==== **For two nodes gpfs-node1 and gpfs-node2.**\\ yum install nfs-utils rpm -ivh /usr/lpp/mmfs/4.2.0.0/ganesha_rpms/*.rpm rpm -ivh /usr/lpp/mmfs/4.2.0.0/smb_rpms/*.rpm update rpm -Uvh /usr/lpp/mmfs/4.2.0.2/ganesha_rpms/*.rpm rpm -Uvh /usr/lpp/mmfs/4.2.0.2/smb_rpms/*.rpm ==== Configuring of HA smb service. ==== **Configuration must be carried out on one of nodes.**\\ **1. Create directory on gpfs storage** mkdir -p /gpfsst/ces **2. Setting up Cluster Export Services shared root file system** mmchconfig cesSharedRoot=/gpfsst/ces **3. Configuring Cluster Export Services nodes** mmchnode --ces-enable -N gpfs-node1,gpfs-node2 After configuring all nodes, verify that the list of CES nodes is complete:\\ mmces node list **4. Preparing to perform service actions on the CES shared root directory file system**\\ To create such a node class, named protocol, issue the following command:\\ mmcrnodeclass protocol -N gpfs-node1,gpfs-node2 Suspend all protocol nodes:\\ mmces node suspend -N gpfs-node1,gpfs-node2 Stop Protocol services on all protocol nodes:\\ mmces service stop NFS -a mmces service stop SMB -a mmces service stop OBJ -a Verify that all protocol services have been stopped:\\ mmces service list -a Shut down GPFS on all protocol nodes:\\ mmshutdown -N protocol Protocol nodes are now ready for service actions to be performed on CES shared root directory or the nodes themselves. To recover from a service action start up GPFS on all protocol nodes:\\ mmstartup -N protocol Make sure that the CES shared root directory file system is mounted on all protocol nodes: mmmount cesSharedRoot -N protocol Resume all protocol nodes: mmces node resume -N gpfs-node1,gpfs-node2 Start protocol services on all protocol nodes: mmces service start SMB -a mmces service enable SMB Verify that all protocol services have been started: mmces service list -a **5. Creating shared IP **\\ mmces address add --ces-node gpfs-node1 --ces-ip 192.168.5.184 Verify:\\ mmces address list **6. Adding to AD **\\ Create DNS records before. mmuserauth service create --type ad --data-access-method file --netbios-name gpfs-share \ --user-name admin@example.com --idmap-role master --servers 192.168.4.9 --password Password \ --idmap-range-size 1000000 --idmap-range 10000000-299999999 --unixmap-domains 'EXAMPLE(5000-20000)' **7. Export directory **\\ Creating directory mkdir -p /gpfsst/accounting Set NFSv4 ACL export EDITOR=/usr/bin/vi mmeditacl /gpfsst/accounting #NFSv4 ACL #owner:root #group:root #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:EXAMPLE\dg_accounting:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED Creating shared smb:\\ mmsmb export add accounting "/gpfsst/accounting" Adding options:\\ mmsmb export change accounting --option "browseable=yes" Verify:\\ mmsmb export list export path guest ok smb encrypt thematic_garbage /gpfsst/accounting no auto Information: The following options are not displayed because they do not contain a value: "browseable" ==== Installing GUI. ==== **1. Installing sensors on every of nodes** yum install boost-devel boost-regexp nc -y cd /usr/lpp/mmfs/4.2.0.2/zimon_rpms rpm -ivh gpfs.gss.pmsensors-4.2.0*.rpm **2. Installing collectors on one or mode nodes** cd /usr/lpp/mmfs/4.2.0.2/zimon_rpms rpm -ivh gpfs.gss.pmcollector-4.2.0-*.rpm **3. Installing GUI on one or mode nodes** yum install postgresql-server cd /usr/lpp/mmfs/4.2.0.2/gpfs_rpms rpm -ivh gpfs.gui-4.2.0-2.el7.*.rpm **4. Configuring and staring collectors on one or more nodes** mmperfmon config generate --collectors gpfs-node1 systemctl start pmcollector systemctl enable pmcollector **5. Configuring and staring sensors from one node (gpfs-node1)** Enable the sensors on the cluster by using the mmchmode command. Issuing this command configures and starts the performance tool sensors on the nodes.\\ mmchnode --perfmon -N gpfs-node1,gpfs-node2,gpfs-node3 For every nodes systemctl start pmsensors systemctl enable pmsensors **6. Staring GUI on one node (gpfs-node1)** systemctl gpfsgui start systemctl gpfsgui enable **7. Connecting** Open with browser https://gpfs-node1.example.com.\\ The default user name and password to access the IBM Spectrum Scale management GUI is admin and admin001 respectively. ==== Conclusion. ==== Enjoy!\\ ==== About author ==== [[https://www.linkedin.com/pub/alexey-vyrodov/59/976/16b|Profile]] of the author