您的位置:首页 > 娱乐 > 明星 > 集群down机的应急和恢复测试(非重做备机)

集群down机的应急和恢复测试(非重做备机)

2025/7/6 20:30:48 来源:https://blog.csdn.net/qq_59083851/article/details/139756798  浏览:    关键词:集群down机的应急和恢复测试(非重做备机)

1. 集群的两台服务器的状态

实例

正常情况主备

ip

端口

node1

主机

192.168.6.6

9088

node2

备机

192.168.6.7

9088

2. 测试的步骤

  • down掉node1
  • 观察node2的状态
  • 在node2未自动切换的时候手动将node2调整为单机状态,模拟紧急使用
  • 模拟不紧急时,将node2升级为主机,并恢复节点node1

3. 主机down机后手动操纵备机使备机快速进入可使用状态

[gbasedbt@node01 install]$ onstat -g dri
On-Line (Prim) -- Up 00:16:11 -- 1650580 KbytesData Replication at 0x4cf1a028:Type           State        Paired server        Last DR CKPT (id/pg)    Supports Proxy Writesprimary        on           node2                         9 / 1          NADRINTERVAL   0DRTIMEOUT    30DRAUTO       0DRLOSTFOUND  /opt/GBASE/gbase/etc/dr.lostfoundDRIDXAUTO    0ENCRYPT_HDR  0Backlog      0Last Send    2024/06/17 22:01:20Last Receive 2024/06/17 22:01:20Last Ping    2024/06/17 22:01:05Last log page applied(log id,page): 9,2[root@node01 GBASE]# onstat -
On-Line (Prim) -- Up 00:14:11 -- 1650580 Kbytes[root@node01 GBASE]# su - gbasedbt
上一次登录:一 6月 17 21:45:54 CST 2024pts/0 上
[gbasedbt@node01 ~]$ onclean -ky
onclean: Cleaning up processes and resources for 'node1'...- Looking for the master daemon process: 13760- Looking for the shmem key: 52934803- Looking for the shmem key: 52934804- Looking for semaphore ID: 10- Looking for the shmem key: 52934801- Looking for the shmem key: 52934802
[gbasedbt@node01 ~]$
--主备集群之间由健康检查判断集群是否正常,由于心跳检查是多次连接,每次连接之间有数秒的间隔,所以主机down到备机切换之间有健康检查时间,这段时间备机显示集群是正常的
[gbasedbt@node02 ~]$ onstat -g dri
Read-Only (Sec) -- Up 00:01:22 -- 1635008 KbytesData Replication at 0x4c13d028:Type           State        Paired server        Last DR CKPT (id/pg)    Supports Proxy WritesHDR Secondary  on           node1                         9 / 1          NDRINTERVAL   0DRTIMEOUT    30DRAUTO       0DRLOSTFOUND  /opt/GBASE/gbase/etc/dr.lostfoundDRIDXAUTO    0ENCRYPT_HDR  0Backlog      0Last Send    2024/06/17 22:02:04Last Receive 2024/06/17 22:02:04Last Ping    2024/06/17 22:01:59Last log page applied(log id,page): 0,0
  • 本次模拟主机down机,备机还没有发现的情况下,将备机恢复使用
[gbasedbt@node02 ~]$ onstat -g dri
Read-Only (Sec) -- Up 00:01:22 -- 1635008 KbytesData Replication at 0x4c13d028:Type           State        Paired server        Last DR CKPT (id/pg)    Supports Proxy WritesHDR Secondary  on           node1                         9 / 1          NDRINTERVAL   0DRTIMEOUT    30DRAUTO       0DRLOSTFOUND  /opt/GBASE/gbase/etc/dr.lostfoundDRIDXAUTO    0ENCRYPT_HDR  0Backlog      0Last Send    2024/06/17 22:02:04Last Receive 2024/06/17 22:02:04Last Ping    2024/06/17 22:01:59Last log page applied(log id,page): 0,0[gbasedbt@node02 ~]$ onstat -
Read-Only (Sec) -- Up 00:01:55 -- 1635008 Kbytes[gbasedbt@node02 ~]$ onmode -d standard
[gbasedbt@node02 ~]$ onstat -
On-Line -- Up 00:02:21 -- 1635008 Kbytes

4. 备机变成单机状态后需要升为主机并恢复集群

[gbasedbt@node02 ~]$ onmode -d primary node1
[gbasedbt@node02 ~]$ onstat -
On-Line (Prim) -- Up 00:02:38 -- 1635008 Kbytes
--node1节点执行oninit -PHY执行物理日志恢复
[gbasedbt@node01 node1_dbs]$ oninit -PHY
[gbasedbt@node01 node1_dbs]$ onstat -m
Fast Recovery -- Up 00:00:13 -- 1650580 KbytesMessage Log File: /opt/GBASE/gbase/tmp/online_node1.log
06/17/24 22:49:31  SQL_FEAT_CTRL value set to 0x8008
06/17/24 22:49:31  SQL_DEF_CTRL value set to 0x4b0
06/17/24 22:49:31  GBase Database Server Version 12.10.FC4G1AEE Software Serial Number AAA#B000000
06/17/24 22:49:32  GBase Database Server Initialized -- Shared Memory Initialized.06/17/24 22:49:32  Started 1 B-tree scanners.
06/17/24 22:49:32  B-tree scanner threshold set at 5000.
06/17/24 22:49:32  B-tree scanner range scan size set to -1.
06/17/24 22:49:32  B-tree scanner ALICE mode set to 6.
06/17/24 22:49:32  B-tree scanner index compression level set to med.
06/17/24 22:49:32  DR: Reservation of the last logical log for log backup turned on
06/17/24 22:49:32  Data replication type and state information reset. To start DR, usethe 'onmode -d' command and wait for the pair to be operational,before shutting down the database server06/17/24 22:49:32  Physical Recovery Started at Page (3:394).
06/17/24 22:49:32  Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored.
06/17/24 22:49:32  Dataskip is now OFF for all dbspaces
06/17/24 22:49:32  Restartable Restore has been ENABLED
06/17/24 22:49:32  Recovery Mode
--查看节点,发现为快速恢复阶段
[gbasedbt@node01 node1_dbs]$ onstat -
Fast Recovery -- Up 00:00:21 -- 1650580 Kbytes--将node1节点当成备机加入节点
[gbasedbt@node01 node1_dbs]$ onmode -d secondary node2
[gbasedbt@node01 node1_dbs]$ onstat -
Read-Only (Sec) -- Up 00:02:04 -- 2188180 Kbytes[gbasedbt@node01 node1_dbs]$ onstat -g dri
Read-Only (Sec) -- Up 00:04:31 -- 2188180 KbytesData Replication at 0x4cf1a028:Type           State        Paired server        Last DR CKPT (id/pg)    Supports Proxy WritesHDR Secondary  on           node2                         9 / 5          NDRINTERVAL   0DRTIMEOUT    30DRAUTO       2DRLOSTFOUND  /opt/GBASE/gbase/etc/dr.lostfoundDRIDXAUTO    0ENCRYPT_HDR  0Backlog      0Last Send    2024/06/17 22:50:42Last Receive 2024/06/17 22:50:44Last Ping    2024/06/17 22:53:35Last log page applied(log id,page): 0,0

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com