Share this:
Posted in:
UncategorizedFirst, this is NOT, an officially supported process so use at your own risk.
Backstory. I had a customer that corrupted their rootvg contents on their standby node after loading some additional packages and application updates. Trying to remove corrupted some libraries. Unfortunately they did not clone or create a mksysb prior to performing these actions. So we created a clone from their primary PowerHA node.
Also know that prior to creating the clone, you can either enable the ghostdev parameter or use the “-O” flag when creating the clone to make it forget about things like hostname,ip, vgs, etc as shown in the video
Note in this case the standby node was still up and running we just needed to get back to a known good state prior to the changes. So we added a new lun to prod, cloned to it, then moved the disk over to the standby and changed the bootlist to it. IF it wasn’t available this procedure would have involved an SMS boot to choose the new cloned boot disk.
Now not shown is also the potential cleanup/changes needed to made on the standby that we don’t want from prod. For example, cronjobs may need to be disabled. There may also be other misc cleanup but the primary focus here was getting the cluster functional again.
So our procedure consisted of the following:
- Allocate new lun to prod and run cfgmgr
- Create clone to new lun via “alt_disk_copy -B -d <hdisk#> -O“
- Wake up clone via “alt_rootvg_op -W -d <hdisk#>“
- Remove all entries in /etc/filesystems other than rootvg
- Put clone back to sleep via “alt_rootvg_op -S -d hdisk#“
- Remove cloned vg via “alt_rootvg_op -X altinst_rootvg“
- Remove disk from prod via “rmdev -dl <hdisk#>“
- Unallocate lun from prod and over to standby and run “cfgmgr” on standby
- Change bootlist to boot from the newly added disk via “bootlist -m normal <newhdisk#> <currentrootvghdisk#>“
- Reboot standby via “shutdown -Fr“
- Recreate hostname, IP, subnet mask via either ifconfig or smitty mktcpip
- Gather node id information about both nodes from primary node via “lsprnode -i”
- Run clusterconf to import and activate CAA via “clusterconf -r reposhdisk#“
- Stop the CAA cluster on local/restored node via “clmgr offline node <nodename> STOP_CAA=yes“
- Note original node id of cloned node and use it in the command “recfgct -i “<orig_ct_node_id> -F” to restore the original ct_node_id back onto the cloned node.
- Import shared vg via “importvg -y <vgname> -V <major#> -n <hdisk#>“
- Sync cluster from prod node via “clmgr sync cluster“
- Assuming no errors, restart local/restored node into cluster and start CAA via “clmgr online node <nodename> START_CAA=yes“
Check out the following video showing these procedures. (NOTE: Video does not show ALL steps)