Oct 16 01:17:48 ascratch-mds01 pacemaker-controld[4907]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Oct 16 01:18:27 ascratch-mds01 kernel: [10061483.742495] LustreError: 1857607:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 104s: evicting client at 172.22.17.11@o2ib20  ns: mdt-ascratch-MDT0000_UUID lock: 00000000aa806c6b/0x36c29fcb7cda5151 lrc: 3/0,0 mode: PW/PW res: [0x20000442c:0xfa6d:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT gid 0 flags: 0x60200400010020 nid: 172.22.17.11@o2ib20 remote: 0x57f1dc50ef258e4b expref: 117 pid: 1858569 timeout: 10061567 lvb_type: 0
Oct 16 01:18:27 ascratch-mds01 kernel: LustreError: 1857607:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 104s: evicting client at 172.22.17.11@o2ib20  ns: mdt-ascratch-MDT0000_UUID lock: 00000000aa806c6b/0x36c29fcb7cda5151 lrc: 3/0,0 mode: PW/PW res: [0x20000442c:0xfa6d:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT gid 0 flags: 0x60200400010020 nid: 172.22.17.11@o2ib20 remote: 0x57f1dc50ef258e4b expref: 117 pid: 1858569 timeout: 10061567 lvb_type: 0
Oct 16 01:20:10 ascratch-mds01 systemd[1]: Starting system activity accounting tool...
Oct 16 01:20:10 ascratch-mds01 systemd[1]: sysstat-collect.service: Succeeded.
Oct 16 01:20:10 ascratch-mds01 systemd[1]: Started system activity accounting tool.
Oct 16 01:20:26 ascratch-mds01 kernel: [10061602.526511] LustreError: 1857607:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 103s: evicting client at 172.22.18.122@o2ib20  ns: mdt-ascratch-MDT0000_UUID lock: 00000000525b90a1/0x36c29fcb7cb09e4a lrc: 3/0,0 mode: PW/PW res: [0x200006bf2:0x95f4:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT gid 0 flags: 0x60000480010020 nid: 172.22.18.122@o2ib20 remote: 0x9fbd2aece603461f expref: 211 pid: 1863166 timeout: 10061686 lvb_type: 0
Oct 16 01:20:26 ascratch-mds01 kernel: LustreError: 1857607:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 103s: evicting client at 172.22.18.122@o2ib20  ns: mdt-ascratch-MDT0000_UUID lock: 00000000525b90a1/0x36c29fcb7cb09e4a lrc: 3/0,0 mode: PW/PW res: [0x200006bf2:0x95f4:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT gid 0 flags: 0x60000480010020 nid: 172.22.18.122@o2ib20 remote: 0x9fbd2aece603461f expref: 211 pid: 1863166 timeout: 10061686 lvb_type: 0
Oct 16 01:20:35 ascratch-mds01 kernel: [10061611.869421] Lustre: 11147:0:(client.c:2290:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1665875434/real 1665875434]  req@00000000bb7cd87d x1736380908478848/t0(0) o6->ascratch-OST0011-osc-MDT0000@172.22.30.15@o2ib20:28/4 lens 544/432 e 23 to 1 dl 1665876035 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'osp-syn-17-0.0'
Oct 16 01:20:35 ascratch-mds01 kernel: [10061611.949177] Lustre: 11147:0:(client.c:2290:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Oct 16 01:20:35 ascratch-mds01 kernel: [10061611.975273] Lustre: ascratch-OST0011-osc-MDT0000: Connection to ascratch-OST0011 (at 172.22.30.15@o2ib20) was lost; in progress operations using this service will wait for recovery to complete
Oct 16 01:20:35 ascratch-mds01 kernel: Lustre: 11147:0:(client.c:2290:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1665875434/real 1665875434]  req@00000000bb7cd87d x1736380908478848/t0(0) o6->ascratch-OST0011-osc-MDT0000@172.22.30.15@o2ib20:28/4 lens 544/432 e 23 to 1 dl 1665876035 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'osp-syn-17-0.0'
Oct 16 01:20:35 ascratch-mds01 kernel: Lustre: 11147:0:(client.c:2290:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Oct 16 01:20:35 ascratch-mds01 kernel: Lustre: ascratch-OST0011-osc-MDT0000: Connection to ascratch-OST0011 (at 172.22.30.15@o2ib20) was lost; in progress operations using this service will wait for recovery to complete
Oct 16 01:20:35 ascratch-mds01 kernel: [10061612.043961] Lustre: ascratch-OST0011-osc-MDT0000: Connection restored to 172.22.30.15@o2ib20 (at 172.22.30.15@o2ib20)
Oct 16 01:20:35 ascratch-mds01 kernel: Lustre: ascratch-OST0011-osc-MDT0000: Connection restored to 172.22.30.15@o2ib20 (at 172.22.30.15@o2ib20)
Oct 16 01:21:18 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 53.310001
Oct 16 01:22:30 ascratch-mds01 kernel: [10061726.471275] Lustre: ascratch-MDT0000: haven't heard from client 2c1a2487-1123-41d1-86c0-6da23c204f1e (at 172.22.17.11@o2ib20) in 227 seconds. I think it's dead, and I am evicting it. exp 00000000e3a554c0, cur 1665876150 expire 1665876000 last 1665875923
Oct 16 01:22:30 ascratch-mds01 kernel: Lustre: ascratch-MDT0000: haven't heard from client 2c1a2487-1123-41d1-86c0-6da23c204f1e (at 172.22.17.11@o2ib20) in 227 seconds. I think it's dead, and I am evicting it. exp 00000000e3a554c0, cur 1665876150 expire 1665876000 last 1665875923
Oct 16 01:22:48 ascratch-mds01 pacemaker-controld[4907]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Oct 16 01:22:48 ascratch-mds01 pacemaker-schedulerd[4906]: notice: On loss of quorum: Ignore
Oct 16 01:22:48 ascratch-mds01 pacemaker-schedulerd[4906]: warning: Unexpected result (error) was recorded for monitor of zfs-OST0001 on ascratch-oss01 at Oct  6 08:44:58 2022
Oct 16 01:22:48 ascratch-mds01 pacemaker-schedulerd[4906]: warning: Unexpected result (error) was recorded for monitor of zfs-OST0002 on ascratch-oss01 at Oct  6 08:45:02 2022
Oct 16 01:22:48 ascratch-mds01 pacemaker-schedulerd[4906]: notice: Calculated transition 33620, saving inputs in /var/lib/pacemaker/pengine/pe-input-128.bz2
Oct 16 01:22:48 ascratch-mds01 pacemaker-controld[4907]: notice: Transition 33620 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-128.bz2): Complete
Oct 16 01:22:48 ascratch-mds01 pacemaker-controld[4907]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Oct 16 01:23:46 ascratch-mds01 kernel: [10061802.485307] Lustre: ascratch-MDT0000: haven't heard from client fd185179-c81f-4a8b-9547-dfa38f03895c (at 172.22.18.122@o2ib20) in 181 seconds. I think it's dead, and I am evicting it. exp 00000000d5e8476e, cur 1665876226 expire 1665876076 last 1665876045
Oct 16 01:23:46 ascratch-mds01 kernel: Lustre: ascratch-MDT0000: haven't heard from client fd185179-c81f-4a8b-9547-dfa38f03895c (at 172.22.18.122@o2ib20) in 181 seconds. I think it's dead, and I am evicting it. exp 00000000d5e8476e, cur 1665876226 expire 1665876076 last 1665876045
Oct 16 01:26:55 ascratch-mds01 kernel: [10061991.642249] LustreError: 1857607:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 103s: evicting client at 172.22.16.162@o2ib20  ns: mdt-ascratch-MDT0000_UUID lock: 00000000f755675f/0x36c29fcb7cec0d1e lrc: 3/0,0 mode: PW/PW res: [0x200004414:0x1aa5:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT gid 0 flags: 0x60000400000020 nid: 172.22.16.162@o2ib20 remote: 0x8041aeda280eb671 expref: 72 pid: 1867746 timeout: 10062075 lvb_type: 0
Oct 16 01:26:55 ascratch-mds01 kernel: LustreError: 1857607:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 103s: evicting client at 172.22.16.162@o2ib20  ns: mdt-ascratch-MDT0000_UUID lock: 00000000f755675f/0x36c29fcb7cec0d1e lrc: 3/0,0 mode: PW/PW res: [0x200004414:0x1aa5:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT gid 0 flags: 0x60000400000020 nid: 172.22.16.162@o2ib20 remote: 0x8041aeda280eb671 expref: 72 pid: 1867746 timeout: 10062075 lvb_type: 0
Oct 16 01:27:00 ascratch-mds01 kernel: [10061997.274200] Lustre: 11148:0:(client.c:2290:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1665876035/real 1665876035]  req@000000002a154c0d x1736380911134784/t0(0) o6->ascratch-OST0011-osc-MDT0000@172.22.30.15@o2ib20:28/4 lens 544/432 e 2 to 1 dl 1665876420 ref 1 fl Rpc:XQr/2/ffffffff rc -11/-1 job:'osp-syn-17-0.0'
Oct 16 01:27:00 ascratch-mds01 kernel: [10061997.354189] Lustre: ascratch-OST0011-osc-MDT0000: Connection to ascratch-OST0011 (at 172.22.30.15@o2ib20) was lost; in progress operations using this service will wait for recovery to complete
Oct 16 01:27:00 ascratch-mds01 kernel: Lustre: 11148:0:(client.c:2290:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1665876035/real 1665876035]  req@000000002a154c0d x1736380911134784/t0(0) o6->ascratch-OST0011-osc-MDT0000@172.22.30.15@o2ib20:28/4 lens 544/432 e 2 to 1 dl 1665876420 ref 1 fl Rpc:XQr/2/ffffffff rc -11/-1 job:'osp-syn-17-0.0'
Oct 16 01:27:00 ascratch-mds01 kernel: Lustre: ascratch-OST0011-osc-MDT0000: Connection to ascratch-OST0011 (at 172.22.30.15@o2ib20) was lost; in progress operations using this service will wait for recovery to complete
Oct 16 01:27:01 ascratch-mds01 kernel: [10061997.436472] Lustre: ascratch-OST0011-osc-MDT0000: Connection restored to 172.22.30.15@o2ib20 (at 172.22.30.15@o2ib20)
Oct 16 01:27:01 ascratch-mds01 kernel: Lustre: ascratch-OST0011-osc-MDT0000: Connection restored to 172.22.30.15@o2ib20 (at 172.22.30.15@o2ib20)
Oct 16 01:27:48 ascratch-mds01 pacemaker-controld[4907]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Oct 16 01:27:48 ascratch-mds01 pacemaker-schedulerd[4906]: notice: On loss of quorum: Ignore
Oct 16 01:27:48 ascratch-mds01 pacemaker-schedulerd[4906]: warning: Unexpected result (error) was recorded for monitor of zfs-OST0001 on ascratch-oss01 at Oct  6 08:44:58 2022
Oct 16 01:27:48 ascratch-mds01 pacemaker-schedulerd[4906]: warning: Unexpected result (error) was recorded for monitor of zfs-OST0002 on ascratch-oss01 at Oct  6 08:45:02 2022
Oct 16 01:27:48 ascratch-mds01 pacemaker-schedulerd[4906]: notice: Calculated transition 33621, saving inputs in /var/lib/pacemaker/pengine/pe-input-128.bz2
Oct 16 01:27:48 ascratch-mds01 pacemaker-controld[4907]: notice: Transition 33621 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-128.bz2): Complete
Oct 16 01:27:48 ascratch-mds01 pacemaker-controld[4907]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Oct 16 01:27:48 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 76.629997
Oct 16 01:28:18 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 190.789993
Oct 16 01:28:48 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 279.850006
Oct 16 01:29:18 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 335.070007
Oct 16 01:29:48 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 369.700012
Oct 16 01:30:18 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 394.820007
Oct 16 01:30:23 ascratch-mds01 systemd[1]: Starting system activity accounting tool...
Oct 16 01:30:23 ascratch-mds01 systemd[1]: sysstat-collect.service: Succeeded.
Oct 16 01:30:23 ascratch-mds01 systemd[1]: Started system activity accounting tool.
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728420] ptlrpc_watchdog_fire: 85 callbacks suppressed
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728423] Lustre: mdt01_217: service thread pid 1879908 was inactive for 200.157 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728424] Lustre: mdt01_118: service thread pid 1863983 was inactive for 201.516 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728425] Pid: 1880421, comm: mdt01_227 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728427] Lustre: Skipped 2 previous similar messages
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728427] Call Trace TBD:
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728449] [<0>] rwsem_down_write_slowpath+0x32a/0x5e0
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728473] [<0>] lod_ost_alloc_qos.constprop.22+0x2e4/0xf60 [lod]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728481] [<0>] lod_qos_prep_create+0xa98/0x1310 [lod]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728489] [<0>] lod_declare_instantiate_components+0x97/0x200 [lod]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728496] [<0>] lod_declare_layout_change+0xa0e/0xc00 [lod]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728510] [<0>] mdd_declare_layout_change+0x49/0x100 [mdd]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728516] [<0>] mdd_layout_change+0x62c/0x19a0 [mdd]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728542] [<0>] mdt_layout_change+0x31c/0x4b0 [mdt]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728552] [<0>] mdt_intent_layout+0x6c8/0x990 [mdt]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728562] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728573] [<0>] mdt_intent_policy+0x207/0x3a0 [mdt]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728643] [<0>] ldlm_lock_enqueue+0x4e4/0xa80 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728672] [<0>] ldlm_handle_enqueue0+0x634/0x1530 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728703] [<0>] tgt_enqueue+0xa4/0x210 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728734] [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728764] [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728793] [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728796] [<0>] kthread+0x116/0x130
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728799] [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728801] Pid: 1867772, comm: mdt00_158 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728802] Call Trace TBD:
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728813] [<0>] flush_work+0x11d/0x1c0
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728815] [<0>] __cancel_work_timer+0x105/0x190
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728882] [<0>] ptlrpc_wait_event+0x39/0x590 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728912] [<0>] ptlrpc_main+0xaf4/0x1560 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728914] [<0>] kthread+0x116/0x130
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.728916] [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:30:32 ascratch-mds01 kernel: [10062208.750145] Lustre: Skipped 31 previous similar messages
Oct 16 01:30:32 ascratch-mds01 kernel: ptlrpc_watchdog_fire: 85 callbacks suppressed
Oct 16 01:30:32 ascratch-mds01 kernel: Lustre: mdt01_217: service thread pid 1879908 was inactive for 200.157 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:30:32 ascratch-mds01 kernel: Lustre: mdt01_118: service thread pid 1863983 was inactive for 201.516 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Oct 16 01:30:32 ascratch-mds01 kernel: Pid: 1880421, comm: mdt01_227 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:30:32 ascratch-mds01 kernel: Lustre: Skipped 2 previous similar messages
Oct 16 01:30:32 ascratch-mds01 kernel: Call Trace TBD:
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] rwsem_down_write_slowpath+0x32a/0x5e0
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] lod_ost_alloc_qos.constprop.22+0x2e4/0xf60 [lod]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] lod_qos_prep_create+0xa98/0x1310 [lod]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] lod_declare_instantiate_components+0x97/0x200 [lod]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] lod_declare_layout_change+0xa0e/0xc00 [lod]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] mdd_declare_layout_change+0x49/0x100 [mdd]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] mdd_layout_change+0x62c/0x19a0 [mdd]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] mdt_layout_change+0x31c/0x4b0 [mdt]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] mdt_intent_layout+0x6c8/0x990 [mdt]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] mdt_intent_policy+0x207/0x3a0 [mdt]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] ldlm_lock_enqueue+0x4e4/0xa80 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] ldlm_handle_enqueue0+0x634/0x1530 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] tgt_enqueue+0xa4/0x210 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] kthread+0x116/0x130
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:30:32 ascratch-mds01 kernel: Pid: 1867772, comm: mdt00_158 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:30:32 ascratch-mds01 kernel: Call Trace TBD:
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] flush_work+0x11d/0x1c0
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] __cancel_work_timer+0x105/0x190
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] ptlrpc_wait_event+0x39/0x590 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] ptlrpc_main+0xaf4/0x1560 [ptlrpc]
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] kthread+0x116/0x130
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:30:32 ascratch-mds01 kernel: Lustre: Skipped 31 previous similar messages
Oct 16 01:30:32 ascratch-mds01 kernel: [10062209.408297] Pid: 1863983, comm: mdt01_118 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:30:32 ascratch-mds01 kernel: [10062209.427396] Call Trace TBD:
Oct 16 01:30:32 ascratch-mds01 kernel: [10062209.439752] [<0>] flush_work+0x11d/0x1c0
Oct 16 01:30:32 ascratch-mds01 kernel: [10062209.452715] [<0>] __cancel_work_timer+0x105/0x190
Oct 16 01:30:33 ascratch-mds01 kernel: [10062209.466307] [<0>] ptlrpc_wait_event+0x39/0x590 [ptlrpc]
Oct 16 01:30:33 ascratch-mds01 kernel: [10062209.480203] [<0>] ptlrpc_main+0xaf4/0x1560 [ptlrpc]
Oct 16 01:30:33 ascratch-mds01 kernel: [10062209.493446] [<0>] kthread+0x116/0x130
Oct 16 01:30:33 ascratch-mds01 kernel: [10062209.505212] [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:30:32 ascratch-mds01 kernel: Pid: 1863983, comm: mdt01_118 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:30:32 ascratch-mds01 kernel: Call Trace TBD:
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] flush_work+0x11d/0x1c0
Oct 16 01:30:32 ascratch-mds01 kernel: [<0>] __cancel_work_timer+0x105/0x190
Oct 16 01:30:33 ascratch-mds01 kernel: [<0>] ptlrpc_wait_event+0x39/0x590 [ptlrpc]
Oct 16 01:30:33 ascratch-mds01 kernel: [<0>] ptlrpc_main+0xaf4/0x1560 [ptlrpc]
Oct 16 01:30:33 ascratch-mds01 kernel: [<0>] kthread+0x116/0x130
Oct 16 01:30:33 ascratch-mds01 kernel: [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:30:36 ascratch-mds01 kernel: [10062212.824386] Lustre: mdt00_052: service thread pid 1863438 was inactive for 200.464 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:30:36 ascratch-mds01 kernel: [10062212.854744] Lustre: Skipped 31 previous similar messages
Oct 16 01:30:36 ascratch-mds01 kernel: Lustre: mdt00_052: service thread pid 1863438 was inactive for 200.464 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:30:36 ascratch-mds01 kernel: Lustre: Skipped 31 previous similar messages
Oct 16 01:30:44 ascratch-mds01 kernel: [10062221.016317] Lustre: mdt00_224: service thread pid 489993 was inactive for 200.954 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:30:44 ascratch-mds01 kernel: [10062221.046197] Lustre: Skipped 48 previous similar messages
Oct 16 01:30:44 ascratch-mds01 kernel: Lustre: mdt00_224: service thread pid 489993 was inactive for 200.954 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:30:44 ascratch-mds01 kernel: Lustre: Skipped 48 previous similar messages
Oct 16 01:30:47 ascratch-mds01 kernel: [10062223.542141] Lustre: ascratch-MDT0000: haven't heard from client 5bfc6223-60f8-44e7-9309-7b684545aba3 (at 172.22.16.162@o2ib20) in 227 seconds. I think it's dead, and I am evicting it. exp 000000007fd3a23b, cur 1665876647 expire 1665876497 last 1665876420
Oct 16 01:30:47 ascratch-mds01 kernel: Lustre: ascratch-MDT0000: haven't heard from client 5bfc6223-60f8-44e7-9309-7b684545aba3 (at 172.22.16.162@o2ib20) in 227 seconds. I think it's dead, and I am evicting it. exp 000000007fd3a23b, cur 1665876647 expire 1665876497 last 1665876420
Oct 16 01:30:48 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 410.140015
Oct 16 01:31:00 ascratch-mds01 kernel: [10062237.400183] Lustre: mdt01_234: service thread pid 1881191 was inactive for 200.524 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:31:00 ascratch-mds01 kernel: [10062237.430252] Lustre: Skipped 49 previous similar messages
Oct 16 01:31:00 ascratch-mds01 kernel: Lustre: mdt01_234: service thread pid 1881191 was inactive for 200.524 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:31:00 ascratch-mds01 kernel: Lustre: Skipped 49 previous similar messages
Oct 16 01:31:18 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 420.059998
Oct 16 01:31:33 ascratch-mds01 kernel: [10062270.167907] Lustre: mdt00_207: service thread pid 489976 was inactive for 203.551 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:31:33 ascratch-mds01 kernel: [10062270.198615] Lustre: Skipped 180 previous similar messages
Oct 16 01:31:33 ascratch-mds01 kernel: Lustre: mdt00_207: service thread pid 489976 was inactive for 203.551 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:31:33 ascratch-mds01 kernel: Lustre: Skipped 180 previous similar messages
Oct 16 01:31:48 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 428.010010
Oct 16 01:32:18 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 433.890015
Oct 16 01:32:39 ascratch-mds01 kernel: [10062335.703358] Lustre: mdt00_096: service thread pid 1863932 was inactive for 236.182 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:32:39 ascratch-mds01 kernel: [10062335.735066] Lustre: Skipped 18 previous similar messages
Oct 16 01:32:39 ascratch-mds01 kernel: Lustre: mdt00_096: service thread pid 1863932 was inactive for 236.182 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:32:39 ascratch-mds01 kernel: Lustre: Skipped 18 previous similar messages
Oct 16 01:32:48 ascratch-mds01 pacemaker-controld[4907]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Oct 16 01:32:48 ascratch-mds01 pacemaker-schedulerd[4906]: notice: On loss of quorum: Ignore
Oct 16 01:32:48 ascratch-mds01 pacemaker-schedulerd[4906]: warning: Unexpected result (error) was recorded for monitor of zfs-OST0001 on ascratch-oss01 at Oct  6 08:44:58 2022
Oct 16 01:32:48 ascratch-mds01 pacemaker-schedulerd[4906]: warning: Unexpected result (error) was recorded for monitor of zfs-OST0002 on ascratch-oss01 at Oct  6 08:45:02 2022
Oct 16 01:32:48 ascratch-mds01 pacemaker-schedulerd[4906]: notice: Calculated transition 33622, saving inputs in /var/lib/pacemaker/pengine/pe-input-128.bz2
Oct 16 01:32:48 ascratch-mds01 pacemaker-controld[4907]: notice: Transition 33622 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-128.bz2): Complete
Oct 16 01:32:48 ascratch-mds01 pacemaker-controld[4907]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Oct 16 01:32:48 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 437.700012
Oct 16 01:33:18 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 442.209991
Oct 16 01:33:26 ascratch-mds01 kernel: [10062382.806967] Lustre: 11148:0:(client.c:2290:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1665876421/real 1665876421]  req@000000002a154c0d x1736380911134784/t0(0) o6->ascratch-OST0011-osc-MDT0000@172.22.30.15@o2ib20:28/4 lens 544/432 e 2 to 1 dl 1665876806 ref 1 fl Rpc:XQr/2/ffffffff rc -11/-1 job:'osp-syn-17-0.0'
Oct 16 01:33:26 ascratch-mds01 kernel: [10062382.865334] Lustre: ascratch-OST0011-osc-MDT0000: Connection to ascratch-OST0011 (at 172.22.30.15@o2ib20) was lost; in progress operations using this service will wait for recovery to complete
Oct 16 01:33:26 ascratch-mds01 kernel: [10062382.902505] Lustre: ascratch-OST0011-osc-MDT0000: Connection restored to 172.22.30.15@o2ib20 (at 172.22.30.15@o2ib20)
Oct 16 01:33:26 ascratch-mds01 kernel: Lustre: 11148:0:(client.c:2290:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1665876421/real 1665876421]  req@000000002a154c0d x1736380911134784/t0(0) o6->ascratch-OST0011-osc-MDT0000@172.22.30.15@o2ib20:28/4 lens 544/432 e 2 to 1 dl 1665876806 ref 1 fl Rpc:XQr/2/ffffffff rc -11/-1 job:'osp-syn-17-0.0'
Oct 16 01:33:26 ascratch-mds01 kernel: Lustre: ascratch-OST0011-osc-MDT0000: Connection to ascratch-OST0011 (at 172.22.30.15@o2ib20) was lost; in progress operations using this service will wait for recovery to complete
Oct 16 01:33:26 ascratch-mds01 kernel: Lustre: ascratch-OST0011-osc-MDT0000: Connection restored to 172.22.30.15@o2ib20 (at 172.22.30.15@o2ib20)
Oct 16 01:33:48 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 448.350006
Oct 16 01:34:18 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 452.880005
Oct 16 01:34:48 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 456.420013
Oct 16 01:35:18 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 459.959991
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829885] ptlrpc_watchdog_fire: 393 callbacks suppressed
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829887] Lustre: mdt00_074: service thread pid 1863658 was inactive for 356.631 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829888] Lustre: mdt00_153: service thread pid 1867766 was inactive for 345.655 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829889] Lustre: mdt00_024: service thread pid 1863209 was inactive for 345.532 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829890] Lustre: Skipped 18 previous similar messages
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829891] Pid: 1863209, comm: mdt00_024 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829891] Call Trace TBD:
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829910] [<0>] rwsem_down_write_slowpath+0x32a/0x5e0
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829930] [<0>] lod_ost_alloc_qos.constprop.22+0x2e4/0xf60 [lod]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829939] [<0>] lod_qos_prep_create+0xa98/0x1310 [lod]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829946] [<0>] lod_declare_instantiate_components+0x97/0x200 [lod]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829954] [<0>] lod_declare_layout_change+0xa0e/0xc00 [lod]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829969] [<0>] mdd_declare_layout_change+0x49/0x100 [mdd]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.829975] [<0>] mdd_layout_change+0x62c/0x19a0 [mdd]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830000] [<0>] mdt_layout_change+0x31c/0x4b0 [mdt]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830011] [<0>] mdt_intent_layout+0x6c8/0x990 [mdt]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830020] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830031] [<0>] mdt_intent_policy+0x207/0x3a0 [mdt]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830095] [<0>] ldlm_lock_enqueue+0x4e4/0xa80 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830125] [<0>] ldlm_handle_enqueue0+0x634/0x1530 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830156] [<0>] tgt_enqueue+0xa4/0x210 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830187] [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830221] [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830253] [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830255] [<0>] kthread+0x116/0x130
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.830260] [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:35:35 ascratch-mds01 kernel: [10062511.845773] Lustre: Skipped 18 previous similar messages
Oct 16 01:35:35 ascratch-mds01 kernel: ptlrpc_watchdog_fire: 393 callbacks suppressed
Oct 16 01:35:35 ascratch-mds01 kernel: Lustre: mdt00_074: service thread pid 1863658 was inactive for 356.631 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:35:35 ascratch-mds01 kernel: Lustre: mdt00_153: service thread pid 1867766 was inactive for 345.655 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Oct 16 01:35:35 ascratch-mds01 kernel: Lustre: mdt00_024: service thread pid 1863209 was inactive for 345.532 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Oct 16 01:35:35 ascratch-mds01 kernel: Lustre: Skipped 18 previous similar messages
Oct 16 01:35:35 ascratch-mds01 kernel: Pid: 1863209, comm: mdt00_024 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:35:35 ascratch-mds01 kernel: Call Trace TBD:
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] rwsem_down_write_slowpath+0x32a/0x5e0
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] lod_ost_alloc_qos.constprop.22+0x2e4/0xf60 [lod]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] lod_qos_prep_create+0xa98/0x1310 [lod]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] lod_declare_instantiate_components+0x97/0x200 [lod]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] lod_declare_layout_change+0xa0e/0xc00 [lod]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] mdd_declare_layout_change+0x49/0x100 [mdd]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] mdd_layout_change+0x62c/0x19a0 [mdd]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] mdt_layout_change+0x31c/0x4b0 [mdt]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] mdt_intent_layout+0x6c8/0x990 [mdt]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] mdt_intent_policy+0x207/0x3a0 [mdt]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] ldlm_lock_enqueue+0x4e4/0xa80 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] ldlm_handle_enqueue0+0x634/0x1530 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] tgt_enqueue+0xa4/0x210 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] kthread+0x116/0x130
Oct 16 01:35:35 ascratch-mds01 kernel: [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:35:35 ascratch-mds01 kernel: Lustre: Skipped 18 previous similar messages
Oct 16 01:35:48 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 463.670013
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.597610] Lustre: mdt01_210: service thread pid 1879734 was inactive for 376.560 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.597612] Pid: 1867708, comm: mdt01_149 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.597613] Call Trace TBD:
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.637297] Lustre: Skipped 1 previous similar message
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.685862] [<0>] rwsem_down_write_slowpath+0x32a/0x5e0
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.701220] [<0>] lod_ost_alloc_qos.constprop.22+0x2e4/0xf60 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: Lustre: mdt01_210: service thread pid 1879734 was inactive for 376.560 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Oct 16 01:36:08 ascratch-mds01 kernel: Pid: 1867708, comm: mdt01_149 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:36:08 ascratch-mds01 kernel: Call Trace TBD:
Oct 16 01:36:08 ascratch-mds01 kernel: Lustre: Skipped 1 previous similar message
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] rwsem_down_write_slowpath+0x32a/0x5e0
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] lod_ost_alloc_qos.constprop.22+0x2e4/0xf60 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.717676] [<0>] lod_qos_prep_create+0xa98/0x1310 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.732895] [<0>] lod_declare_instantiate_components+0x97/0x200 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.749174] [<0>] lod_declare_layout_change+0xa0e/0xc00 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.764500] [<0>] mdd_declare_layout_change+0x49/0x100 [mdd]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.779634] [<0>] mdd_layout_change+0x62c/0x19a0 [mdd]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.794109] [<0>] mdt_layout_change+0x31c/0x4b0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.808174] [<0>] mdt_intent_layout+0x6c8/0x990 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.822253] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.835523] [<0>] mdt_intent_policy+0x207/0x3a0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.849100] [<0>] ldlm_lock_enqueue+0x4e4/0xa80 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.862429] [<0>] ldlm_handle_enqueue0+0x634/0x1530 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.875930] [<0>] tgt_enqueue+0xa4/0x210 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] lod_qos_prep_create+0xa98/0x1310 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] lod_declare_instantiate_components+0x97/0x200 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] lod_declare_layout_change+0xa0e/0xc00 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdd_declare_layout_change+0x49/0x100 [mdd]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdd_layout_change+0x62c/0x19a0 [mdd]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdt_layout_change+0x31c/0x4b0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdt_intent_layout+0x6c8/0x990 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdt_intent_policy+0x207/0x3a0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] ldlm_lock_enqueue+0x4e4/0xa80 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] ldlm_handle_enqueue0+0x634/0x1530 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] tgt_enqueue+0xa4/0x210 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.888315] [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.901261] [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.915077] [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.927005] [<0>] kthread+0x116/0x130
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.937463] [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.948032] Pid: 1879734, comm: mdt01_210 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.964301] Call Trace TBD:
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.973639] [<0>] rwsem_down_write_slowpath+0x32a/0x5e0
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.985402] [<0>] lod_ost_alloc_qos.constprop.22+0x2e4/0xf60 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062544.998122] [<0>] lod_qos_prep_create+0xa98/0x1310 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.009950] [<0>] lod_declare_instantiate_components+0x97/0x200 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.022958] [<0>] lod_declare_layout_change+0xa0e/0xc00 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.035240] [<0>] mdd_declare_layout_change+0x49/0x100 [mdd]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.047381] [<0>] mdd_layout_change+0x62c/0x19a0 [mdd]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.059059] [<0>] mdt_layout_change+0x31c/0x4b0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.070594] [<0>] mdt_intent_layout+0x6c8/0x990 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.082275] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.093581] [<0>] mdt_intent_policy+0x207/0x3a0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.105182] [<0>] ldlm_lock_enqueue+0x4e4/0xa80 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.117300] [<0>] ldlm_handle_enqueue0+0x634/0x1530 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.129577] [<0>] tgt_enqueue+0xa4/0x210 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.140891] [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.153025] [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] kthread+0x116/0x130
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:36:08 ascratch-mds01 kernel: Pid: 1879734, comm: mdt01_210 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021
Oct 16 01:36:08 ascratch-mds01 kernel: Call Trace TBD:
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] rwsem_down_write_slowpath+0x32a/0x5e0
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] lod_ost_alloc_qos.constprop.22+0x2e4/0xf60 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] lod_qos_prep_create+0xa98/0x1310 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] lod_declare_instantiate_components+0x97/0x200 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] lod_declare_layout_change+0xa0e/0xc00 [lod]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdd_declare_layout_change+0x49/0x100 [mdd]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdd_layout_change+0x62c/0x19a0 [mdd]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdt_layout_change+0x31c/0x4b0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdt_intent_layout+0x6c8/0x990 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] mdt_intent_policy+0x207/0x3a0 [mdt]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] ldlm_lock_enqueue+0x4e4/0xa80 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] ldlm_handle_enqueue0+0x634/0x1530 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] tgt_enqueue+0xa4/0x210 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.165986] [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.177481] [<0>] kthread+0x116/0x130
Oct 16 01:36:08 ascratch-mds01 kernel: [10062545.187704] [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] kthread+0x116/0x130
Oct 16 01:36:08 ascratch-mds01 kernel: [<0>] ret_from_fork+0x1f/0x40
Oct 16 01:36:18 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 467.049988
Oct 16 01:36:48 ascratch-mds01 pacemaker-controld[4907]: notice: High CPU load detected: 470.929993
