ar 30 12:26:40 nbp13-srv2 systemd[1]: Started ansible_dbcheckin. Mar 30 12:26:43 nbp13-srv2 ansible_timerrun.sh[1592208]: Updating files: 54% (11979/21945)#015Updating files: 55% (12070/21945)#015Updating files: 56% (12290/21945)#015Updating files: 57% (12509/21945)#015Updating files: 58% (12729/21945)#015Updating files: 59% (12948/21945)#015Updating files: 60% (13167/21945)#015Updating files: 61% (13387/21945)#015Updating files: 62% (13606/21945)#015Updating files: 63% (13826/21945)#015Updating files: 64% (14045/21945)#015Updating files: 65% (14265/21945)#015Updating files: 66% (14484/21945)#015Updating files: 67% (14704/21945)#015Updating files: 68% (14923/21945)#015Updating files: 69% (15143/21945)#015Updating files: 70% (15362/21945)#015Updating files: 71% (15581/21945)#015Updating files: 72% (15801/21945)#015Updating files: 73% (16020/21945)#015Updating files: 74% (16240/21945)#015Updating files: 75% (16459/21945)#015Updating files: 76% (16679/21945)#015Updating files: 77% (16898/21945)#015Updating files: 78% (17118/21945)#015Updating files: 79% (17337/21945)#015Updating files: 80% (17556/21945)#015Updating files: 81% (17776/21945)#015Updating files: 82% (17995/21945)#015Updating files: 83% (18215/21945)#015Updating files: 84% (18434/21945)#015Updating files: 85% (18654/21945)#015Updating files: 86% (18873/21945)#015Updating files: 87% (19093/21945)#015Updating files: 88% (19312/21945)#015Updating files: 89% (19532/21945)#015Updating files: 90% (19751/21945)#015Updating files: 91% (19970/21945)#015Updating files: 92% (20190/21945)#015Updating files: 93% (20409/21945)#015Updating files: 94% (20629/21945)#015Updating files: 95% (20848/21945)#015Updating files: 96% (21068/21945)#015Updating files: 97% (21287/21945)#015Updating files: 98% (21507/21945)#015Updating files: 99% (21726/21945)#015Updating files: 100% (21945/21945)#015Updating files: 100% (21945/21945), done. Mar 30 12:26:44 nbp13-srv2 ansible_timerrun.sh[1592388]: fixing perms in /ansible (takes a while) Mar 30 12:27:09 nbp13-srv2 systemd[1]: systemd-hostnamed.service: Succeeded. Mar 30 12:31:54 nbp13-srv2 systemd[1]: ansible_playbooks.service: Succeeded. Mar 30 12:31:54 nbp13-srv2 systemd[1]: Started ansible_playbooks. Mar 30 12:35:07 nbp13-srv2 sssd_be[2002]: Backend is online Mar 30 12:37:51 nbp13-srv2 chronyd[107590]: Selected source 172.25.0.44 Mar 30 13:12:10 nbp13-srv2 kernel: Lustre: nbp13-OST0014: haven't heard from client 64c58f70-8d90-39a0-aa44-2b863022e495 (at 10.141.15.173@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp 000000005d81e75c, cur 1680207130 expire 1680206980 last 1680206903 Mar 30 13:12:10 nbp13-srv2 kernel: Lustre: Skipped 1 previous similar message Mar 30 13:15:25 nbp13-srv2 kernel: INFO: task kswapd0:349 blocked for more than 120 seconds. Mar 30 13:15:25 nbp13-srv2 kernel: Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1 Mar 30 13:15:25 nbp13-srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 30 13:15:25 nbp13-srv2 kernel: task:kswapd0 state:D stack: 0 pid: 349 ppid: 2 flags:0x80004000 Mar 30 13:15:25 nbp13-srv2 kernel: Call Trace: Mar 30 13:15:25 nbp13-srv2 kernel: __schedule+0x2d1/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: ? __find_get_block+0xb4/0x2b0 Mar 30 13:15:25 nbp13-srv2 kernel: schedule+0x35/0xa0 Mar 30 13:15:25 nbp13-srv2 kernel: wait_transaction_locked+0x89/0xd0 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: add_transaction_credits+0xd4/0x290 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? __find_get_block+0xb4/0x2b0 Mar 30 13:15:25 nbp13-srv2 kernel: start_this_handle+0x10a/0x520 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? jbd2__journal_start+0x8f/0x1f0 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? kmem_cache_alloc+0x13f/0x280 Mar 30 13:15:25 nbp13-srv2 kernel: jbd2__journal_start+0xee/0x1f0 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? ldiskfs_release_dquot+0x60/0xb0 [ldiskfs] Mar 30 13:15:25 nbp13-srv2 kernel: __ldiskfs_journal_start_sb+0x6e/0x140 [ldiskfs] Mar 30 13:15:25 nbp13-srv2 kernel: ldiskfs_release_dquot+0x60/0xb0 [ldiskfs] Mar 30 13:15:25 nbp13-srv2 kernel: dqput.part.19+0x82/0x1e0 Mar 30 13:15:25 nbp13-srv2 kernel: __dquot_drop+0x69/0x90 Mar 30 13:15:25 nbp13-srv2 kernel: ldiskfs_clear_inode+0x1e/0x80 [ldiskfs] Mar 30 13:15:25 nbp13-srv2 kernel: ldiskfs_evict_inode+0x58/0x6b0 [ldiskfs] Mar 30 13:15:25 nbp13-srv2 kernel: evict+0xd2/0x1a0 Mar 30 13:15:25 nbp13-srv2 kernel: dispose_list+0x48/0x70 Mar 30 13:15:25 nbp13-srv2 kernel: prune_icache_sb+0x52/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: super_cache_scan+0x123/0x1b0 Mar 30 13:15:25 nbp13-srv2 kernel: do_shrink_slab+0x11d/0x330 Mar 30 13:15:25 nbp13-srv2 kernel: shrink_slab+0xbe/0x2f0 Mar 30 13:15:25 nbp13-srv2 kernel: shrink_node+0x246/0x700 Mar 30 13:15:25 nbp13-srv2 kernel: balance_pgdat+0x2d7/0x550 Mar 30 13:15:25 nbp13-srv2 kernel: kswapd+0x201/0x3c0 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? balance_pgdat+0x550/0x550 Mar 30 13:15:25 nbp13-srv2 kernel: kthread+0x10a/0x120 Mar 30 13:15:25 nbp13-srv2 kernel: ? set_kthread_struct+0x50/0x50 Mar 30 13:15:25 nbp13-srv2 kernel: ret_from_fork+0x1f/0x40 Mar 30 13:15:25 nbp13-srv2 kernel: INFO: task jbd2/dm-22-8:9051 blocked for more than 120 seconds. Mar 30 13:15:25 nbp13-srv2 kernel: Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1 Mar 30 13:15:25 nbp13-srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 30 13:15:25 nbp13-srv2 kernel: task:jbd2/dm-22-8 state:D stack: 0 pid: 9051 ppid: 2 flags:0x80004080 Mar 30 13:15:25 nbp13-srv2 kernel: Call Trace: Mar 30 13:15:25 nbp13-srv2 kernel: __schedule+0x2d1/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: schedule+0x35/0xa0 Mar 30 13:15:25 nbp13-srv2 kernel: jbd2_journal_commit_transaction+0x259/0x1a00 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? __update_load_avg_cfs_rq+0x27a/0x300 Mar 30 13:15:25 nbp13-srv2 kernel: ? update_load_avg+0x7e/0x710 Mar 30 13:15:25 nbp13-srv2 kernel: ? newidle_balance+0x279/0x3c0 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? __switch_to+0x10c/0x450 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_task_switch+0xaf/0x2e0 Mar 30 13:15:25 nbp13-srv2 kernel: ? lock_timer_base+0x67/0x90 Mar 30 13:15:25 nbp13-srv2 kernel: kjournald2+0xbd/0x270 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? commit_timeout+0x10/0x10 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: kthread+0x10a/0x120 Mar 30 13:15:25 nbp13-srv2 kernel: ? set_kthread_struct+0x50/0x50 Mar 30 13:15:25 nbp13-srv2 kernel: ret_from_fork+0x1f/0x40 Mar 30 13:15:25 nbp13-srv2 kernel: INFO: task jbd2/dm-12-8:9052 blocked for more than 120 seconds. Mar 30 13:15:25 nbp13-srv2 kernel: Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1 Mar 30 13:15:25 nbp13-srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 30 13:15:25 nbp13-srv2 kernel: task:jbd2/dm-12-8 state:D stack: 0 pid: 9052 ppid: 2 flags:0x80004080 Mar 30 13:15:25 nbp13-srv2 kernel: Call Trace: Mar 30 13:15:25 nbp13-srv2 kernel: __schedule+0x2d1/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: ? load_balance+0x967/0xc70 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: schedule+0x35/0xa0 Mar 30 13:15:25 nbp13-srv2 kernel: jbd2_journal_commit_transaction+0x259/0x1a00 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? newidle_balance+0x308/0x3c0 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? __switch_to+0x10c/0x450 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_task_switch+0xaf/0x2e0 Mar 30 13:15:25 nbp13-srv2 kernel: ? lock_timer_base+0x67/0x90 Mar 30 13:15:25 nbp13-srv2 kernel: kjournald2+0xbd/0x270 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? commit_timeout+0x10/0x10 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: kthread+0x10a/0x120 Mar 30 13:15:25 nbp13-srv2 kernel: ? set_kthread_struct+0x50/0x50 Mar 30 13:15:25 nbp13-srv2 kernel: ret_from_fork+0x1f/0x40 Mar 30 13:15:25 nbp13-srv2 kernel: INFO: task jbd2/dm-21-8:9053 blocked for more than 120 seconds. Mar 30 13:15:25 nbp13-srv2 kernel: Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1 Mar 30 13:15:25 nbp13-srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 30 13:15:25 nbp13-srv2 kernel: task:jbd2/dm-21-8 state:D stack: 0 pid: 9053 ppid: 2 flags:0x80004080 Mar 30 13:15:25 nbp13-srv2 kernel: Call Trace: Mar 30 13:15:25 nbp13-srv2 kernel: __schedule+0x2d1/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: schedule+0x35/0xa0 Mar 30 13:15:25 nbp13-srv2 kernel: jbd2_journal_commit_transaction+0x259/0x1a00 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? update_load_avg+0x7e/0x710 Mar 30 13:15:25 nbp13-srv2 kernel: ? newidle_balance+0x279/0x3c0 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? __switch_to+0x10c/0x450 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_task_switch+0xaf/0x2e0 Mar 30 13:15:25 nbp13-srv2 kernel: ? lock_timer_base+0x67/0x90 Mar 30 13:15:25 nbp13-srv2 kernel: kjournald2+0xbd/0x270 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? commit_timeout+0x10/0x10 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: kthread+0x10a/0x120 Mar 30 13:15:25 nbp13-srv2 kernel: ? set_kthread_struct+0x50/0x50 Mar 30 13:15:25 nbp13-srv2 kernel: ret_from_fork+0x1f/0x40 Mar 30 13:15:25 nbp13-srv2 kernel: INFO: task jbd2/dm-14-8:9054 blocked for more than 120 seconds. Mar 30 13:15:25 nbp13-srv2 kernel: Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1 Mar 30 13:15:25 nbp13-srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 30 13:15:25 nbp13-srv2 kernel: task:jbd2/dm-14-8 state:D stack: 0 pid: 9054 ppid: 2 flags:0x80004080 Mar 30 13:15:25 nbp13-srv2 kernel: Call Trace: Mar 30 13:15:25 nbp13-srv2 kernel: __schedule+0x2d1/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: schedule+0x35/0xa0 Mar 30 13:15:25 nbp13-srv2 kernel: jbd2_journal_commit_transaction+0x259/0x1a00 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? update_load_avg+0x7e/0x710 Mar 30 13:15:25 nbp13-srv2 kernel: ? newidle_balance+0x279/0x3c0 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? __switch_to+0x10c/0x450 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_task_switch+0xaf/0x2e0 Mar 30 13:15:25 nbp13-srv2 kernel: ? lock_timer_base+0x67/0x90 Mar 30 13:15:25 nbp13-srv2 kernel: kjournald2+0xbd/0x270 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? commit_timeout+0x10/0x10 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: kthread+0x10a/0x120 Mar 30 13:15:25 nbp13-srv2 kernel: ? set_kthread_struct+0x50/0x50 Mar 30 13:15:25 nbp13-srv2 kernel: ret_from_fork+0x1f/0x40 Mar 30 13:15:25 nbp13-srv2 kernel: INFO: task jbd2/dm-15-8:9055 blocked for more than 120 seconds. Mar 30 13:15:25 nbp13-srv2 kernel: Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1 Mar 30 13:15:25 nbp13-srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 30 13:15:25 nbp13-srv2 kernel: task:jbd2/dm-15-8 state:D stack: 0 pid: 9055 ppid: 2 flags:0x80004080 Mar 30 13:15:25 nbp13-srv2 kernel: Call Trace: Mar 30 13:15:25 nbp13-srv2 kernel: __schedule+0x2d1/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: schedule+0x35/0xa0 Mar 30 13:15:25 nbp13-srv2 kernel: jbd2_journal_commit_transaction+0x259/0x1a00 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? update_load_avg+0x7e/0x710 Mar 30 13:15:25 nbp13-srv2 kernel: ? newidle_balance+0x279/0x3c0 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? __switch_to+0x10c/0x450 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_task_switch+0xaf/0x2e0 Mar 30 13:15:25 nbp13-srv2 kernel: ? lock_timer_base+0x67/0x90 Mar 30 13:15:25 nbp13-srv2 kernel: kjournald2+0xbd/0x270 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? commit_timeout+0x10/0x10 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: kthread+0x10a/0x120 Mar 30 13:15:25 nbp13-srv2 kernel: ? set_kthread_struct+0x50/0x50 Mar 30 13:15:25 nbp13-srv2 kernel: ret_from_fork+0x1f/0x40 Mar 30 13:15:25 nbp13-srv2 kernel: INFO: task jbd2/dm-20-8:9059 blocked for more than 120 seconds. Mar 30 13:15:25 nbp13-srv2 kernel: Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1 Mar 30 13:15:25 nbp13-srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 30 13:15:25 nbp13-srv2 kernel: task:jbd2/dm-20-8 state:D stack: 0 pid: 9059 ppid: 2 flags:0x80004080 Mar 30 13:15:25 nbp13-srv2 kernel: Call Trace: Mar 30 13:15:25 nbp13-srv2 kernel: __schedule+0x2d1/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: ? load_balance+0x967/0xc70 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: schedule+0x35/0xa0 Mar 30 13:15:25 nbp13-srv2 kernel: jbd2_journal_commit_transaction+0x259/0x1a00 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? newidle_balance+0x308/0x3c0 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? __switch_to+0x10c/0x450 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_task_switch+0xaf/0x2e0 Mar 30 13:15:25 nbp13-srv2 kernel: ? lock_timer_base+0x67/0x90 Mar 30 13:15:25 nbp13-srv2 kernel: kjournald2+0xbd/0x270 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? commit_timeout+0x10/0x10 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: kthread+0x10a/0x120 Mar 30 13:15:25 nbp13-srv2 kernel: ? set_kthread_struct+0x50/0x50 Mar 30 13:15:25 nbp13-srv2 kernel: ret_from_fork+0x1f/0x40 Mar 30 13:15:25 nbp13-srv2 kernel: INFO: task jbd2/dm-24-8:9060 blocked for more than 120 seconds. Mar 30 13:15:25 nbp13-srv2 kernel: Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1 Mar 30 13:15:25 nbp13-srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 30 13:15:25 nbp13-srv2 kernel: task:jbd2/dm-24-8 state:D stack: 0 pid: 9060 ppid: 2 flags:0x80004080 Mar 30 13:15:25 nbp13-srv2 kernel: Call Trace: Mar 30 13:15:25 nbp13-srv2 kernel: __schedule+0x2d1/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: ? load_balance+0x967/0xc70 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: schedule+0x35/0xa0 Mar 30 13:15:25 nbp13-srv2 kernel: jbd2_journal_commit_transaction+0x259/0x1a00 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? newidle_balance+0x308/0x3c0 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? __switch_to+0x10c/0x450 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_task_switch+0xaf/0x2e0 Mar 30 13:15:25 nbp13-srv2 kernel: ? lock_timer_base+0x67/0x90 Mar 30 13:15:25 nbp13-srv2 kernel: kjournald2+0xbd/0x270 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ? commit_timeout+0x10/0x10 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: kthread+0x10a/0x120 Mar 30 13:15:25 nbp13-srv2 kernel: ? set_kthread_struct+0x50/0x50 Mar 30 13:15:25 nbp13-srv2 kernel: ret_from_fork+0x1f/0x40 Mar 30 13:15:25 nbp13-srv2 kernel: INFO: task ll_ost_io04_002:9190 blocked for more than 120 seconds. Mar 30 13:15:25 nbp13-srv2 kernel: Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1 Mar 30 13:15:25 nbp13-srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 30 13:15:25 nbp13-srv2 kernel: task:ll_ost_io04_002 state:D stack: 0 pid: 9190 ppid: 2 flags:0x80004080 Mar 30 13:15:25 nbp13-srv2 kernel: Call Trace: Mar 30 13:15:25 nbp13-srv2 kernel: __schedule+0x2d1/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: ? __update_load_avg_se+0x2b9/0x340 Mar 30 13:15:25 nbp13-srv2 kernel: schedule+0x35/0xa0 Mar 30 13:15:25 nbp13-srv2 kernel: wait_transaction_locked+0x89/0xd0 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: add_transaction_credits+0xd4/0x290 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? __switch_to+0x10c/0x450 Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_task_switch+0xaf/0x2e0 Mar 30 13:15:25 nbp13-srv2 kernel: start_this_handle+0x10a/0x520 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? jbd2__journal_start+0x8f/0x1f0 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? kmem_cache_alloc+0x13f/0x280 Mar 30 13:15:25 nbp13-srv2 kernel: jbd2__journal_start+0xee/0x1f0 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? osd_trans_start+0x13b/0x500 [osd_ldiskfs] Mar 30 13:15:25 nbp13-srv2 kernel: __ldiskfs_journal_start_sb+0x6e/0x140 [ldiskfs] Mar 30 13:15:25 nbp13-srv2 kernel: osd_trans_start+0x13b/0x500 [osd_ldiskfs] Mar 30 13:15:25 nbp13-srv2 kernel: ofd_write_attr_set+0x11d/0x1070 [ofd] Mar 30 13:15:25 nbp13-srv2 kernel: ofd_commitrw_write+0x226/0x1ad0 [ofd] Mar 30 13:15:25 nbp13-srv2 kernel: ? lprocfs_counter_add+0xd2/0x140 [obdclass] Mar 30 13:15:25 nbp13-srv2 kernel: ofd_commitrw+0x5b4/0xd20 [ofd] Mar 30 13:15:25 nbp13-srv2 kernel: ? obd_commitrw+0x1b0/0x380 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: obd_commitrw+0x1b0/0x380 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: tgt_brw_write+0x139f/0x1ce0 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: ? flush_work+0x42/0x1d0 Mar 30 13:15:25 nbp13-srv2 kernel: ? internal_add_timer+0x42/0x70 Mar 30 13:15:25 nbp13-srv2 kernel: ? _cond_resched+0x15/0x30 Mar 30 13:15:25 nbp13-srv2 kernel: ? mutex_lock+0xe/0x30 Mar 30 13:15:25 nbp13-srv2 kernel: tgt_request_handle+0xc97/0x1a40 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ptlrpc_main+0xc0f/0x1570 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: ? __schedule+0x2d9/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: ? ptlrpc_wait_event+0x590/0x590 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: kthread+0x10a/0x120 Mar 30 13:15:25 nbp13-srv2 kernel: ? set_kthread_struct+0x50/0x50 Mar 30 13:15:25 nbp13-srv2 kernel: ret_from_fork+0x1f/0x40 Mar 30 13:15:25 nbp13-srv2 kernel: INFO: task ll_ost_io05_000:9191 blocked for more than 120 seconds. Mar 30 13:15:25 nbp13-srv2 kernel: Tainted: G OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1 Mar 30 13:15:25 nbp13-srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 30 13:15:25 nbp13-srv2 kernel: task:ll_ost_io05_000 state:D stack: 0 pid: 9191 ppid: 2 flags:0x80004080 Mar 30 13:15:25 nbp13-srv2 kernel: Call Trace: Mar 30 13:15:25 nbp13-srv2 kernel: __schedule+0x2d1/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: schedule+0x35/0xa0 Mar 30 13:15:25 nbp13-srv2 kernel: wait_transaction_locked+0x89/0xd0 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: add_transaction_credits+0xd4/0x290 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? mlx5_ib_map_mr_sg+0x90/0xc0 [mlx5_ib] Mar 30 13:15:25 nbp13-srv2 kernel: start_this_handle+0x10a/0x520 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? jbd2__journal_start+0x8f/0x1f0 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? kmem_cache_alloc+0x13f/0x280 Mar 30 13:15:25 nbp13-srv2 kernel: jbd2__journal_start+0xee/0x1f0 [jbd2] Mar 30 13:15:25 nbp13-srv2 kernel: ? osd_trans_start+0x13b/0x500 [osd_ldiskfs] Mar 30 13:15:25 nbp13-srv2 kernel: __ldiskfs_journal_start_sb+0x6e/0x140 [ldiskfs] Mar 30 13:15:25 nbp13-srv2 kernel: osd_trans_start+0x13b/0x500 [osd_ldiskfs] Mar 30 13:15:25 nbp13-srv2 kernel: ofd_write_attr_set+0x11d/0x1070 [ofd] Mar 30 13:15:25 nbp13-srv2 kernel: ofd_commitrw_write+0x226/0x1ad0 [ofd] Mar 30 13:15:25 nbp13-srv2 kernel: ? lprocfs_counter_add+0xd2/0x140 [obdclass] Mar 30 13:15:25 nbp13-srv2 kernel: ofd_commitrw+0x5b4/0xd20 [ofd] Mar 30 13:15:25 nbp13-srv2 kernel: ? obd_commitrw+0x1b0/0x380 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: obd_commitrw+0x1b0/0x380 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: tgt_brw_write+0x139f/0x1ce0 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: ? internal_add_timer+0x42/0x70 Mar 30 13:15:25 nbp13-srv2 kernel: ? _cond_resched+0x15/0x30 Mar 30 13:15:25 nbp13-srv2 kernel: ? mutex_lock+0xe/0x30 Mar 30 13:15:25 nbp13-srv2 kernel: tgt_request_handle+0xc97/0x1a40 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: ? finish_wait+0x80/0x80 Mar 30 13:15:25 nbp13-srv2 kernel: ptlrpc_main+0xc0f/0x1570 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: ? __schedule+0x2d9/0x860 Mar 30 13:15:25 nbp13-srv2 kernel: ? ptlrpc_wait_event+0x590/0x590 [ptlrpc] Mar 30 13:15:25 nbp13-srv2 kernel: kthread+0x10a/0x120 Mar 30 13:15:25 nbp13-srv2 kernel: ? set_kthread_struct+0x50/0x50 Mar 30 13:15:25 nbp13-srv2 kernel: ret_from_fork+0x1f/0x40 Mar 30 13:16:18 nbp13-srv2 kernel: obd_memory max: 1854284506, obd_memory current: 1806827450 Mar 30 13:17:57 nbp13-srv2 kernel: Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). Mar 30 13:17:57 nbp13-srv2 kernel: Lustre: 304648:0:(service.c:1614:ptlrpc_at_check_timed()) earlyQ=0 reqQ=0 recA=7, svcEst=323, delay=0ms Mar 30 13:19:02 nbp13-srv2 kernel: Lustre: ost_io: This server is not able to keep up with request traffic (cpu-bound). Mar 30 13:19:02 nbp13-srv2 kernel: Lustre: 304829:0:(service.c:1614:ptlrpc_at_check_timed()) earlyQ=4 reqQ=0 recA=20, svcEst=284, delay=0ms Mar 30 13:19:02 nbp13-srv2 kernel: Lustre: 304829:0:(service.c:1379:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@000000002d641a95 x1759798724993152/t0(0) o4->406654fd-5c54-201c-130b-e454b2bc5f52@10.151.18.182@o2ib:339/0 lens 2496/448 e 0 to 0 dl 1680207539 ref 2 fl Interpret:/0/0 rc 0/0 job:'15692533.pbspl1.nas.nasa.gov' Mar 30 13:19:04 nbp13-srv2 kernel: Lustre: 305162:0:(service.c:2329:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (348/4s); client may timeout req@00000000b2294097 x1760346945236864/t120259585112(0) o4->61e12c2a-8b03-d504-9634-3407ec7e4579@10.151.54.84@o2ib:340/0 lens 488/448 e 0 to 0 dl 1680207540 ref 1 fl Complete:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:19:05 nbp13-srv2 kernel: Lustre: 305158:0:(service.c:2329:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (348/5s); client may timeout req@00000000eb6c9c1c x1759664001720640/t120259607132(0) o4->dafbeb92-8ede-8b09-2dc9-633257cffc11@10.151.54.57@o2ib:340/0 lens 488/448 e 0 to 0 dl 1680207540 ref 1 fl Complete:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:19:05 nbp13-srv2 kernel: LustreError: 305158:0:(service.c:2291:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.151.54.84@o2ib: deadline 348/5s ago req@0000000040eb687e x1760346945237376/t0(0) o4->61e12c2a-8b03-d504-9634-3407ec7e4579@10.151.54.84@o2ib:340/0 lens 488/0 e 0 to 0 dl 1680207540 ref 1 fl Interpret:/0/ffffffff rc 0/-1 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:19:11 nbp13-srv2 kernel: obd_memory max: 1854284506, obd_memory current: 1806162594 Mar 30 13:19:35 nbp13-srv2 kernel: Lustre: 304676:0:(service.c:1379:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-34s), not sending early reply. Consider increasing at_early_margin (5)? req@00000000b513f8fa x1761728490498944/t0(0) o4->e20ba9b9-7d5a-94c9-fe62-6c0f2aaeed14@10.151.27.24@o2ib:341/0 lens 488/0 e 0 to 0 dl 1680207541 ref 2 fl New:/0/ffffffff rc 0/-1 job:'python.82629339' Mar 30 13:19:35 nbp13-srv2 kernel: Lustre: 304676:0:(service.c:1379:ptlrpc_at_send_early_reply()) Skipped 109 previous similar messages Mar 30 13:19:35 nbp13-srv2 kernel: Lustre: ost_io: This server is not able to keep up with request traffic (cpu-bound). Mar 30 13:19:35 nbp13-srv2 kernel: Lustre: Skipped 8 previous similar messages Mar 30 13:19:35 nbp13-srv2 kernel: Lustre: 304728:0:(service.c:1614:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=62, svcEst=402, delay=0ms Mar 30 13:19:35 nbp13-srv2 kernel: Lustre: 304728:0:(service.c:1614:ptlrpc_at_check_timed()) Skipped 8 previous similar messages Mar 30 13:19:36 nbp13-srv2 kernel: Lustre: 305176:0:(service.c:2329:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (348/36s); client may timeout req@00000000ec362b31 x1760346945235840/t120259585113(0) o4->61e12c2a-8b03-d504-9634-3407ec7e4579@10.151.54.84@o2ib:340/0 lens 488/448 e 0 to 0 dl 1680207540 ref 1 fl Complete:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:19:36 nbp13-srv2 kernel: Lustre: 305176:0:(service.c:2329:ptlrpc_server_handle_request()) Skipped 15 previous similar messages Mar 30 13:19:36 nbp13-srv2 kernel: LustreError: 304722:0:(service.c:2291:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.141.5.38@o2ib417: deadline 348/37s ago req@00000000fd6a649c x1759777461065408/t0(0) o4->d2af165d-2e65-5423-d0c3-be69ebabece6@10.141.5.38@o2ib417:339/0 lens 10568/0 e 0 to 0 dl 1680207539 ref 1 fl Interpret:/0/ffffffff rc 0/-1 job:'15694296.pbspl1.nas.nasa.gov' Mar 30 13:19:36 nbp13-srv2 kernel: LustreError: 304722:0:(service.c:2291:ptlrpc_server_handle_request()) Skipped 14 previous similar messages Mar 30 13:19:38 nbp13-srv2 kernel: Lustre: 305171:0:(service.c:2329:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (348/40s); client may timeout req@000000001784a9c7 x1759832484236416/t120259619421(0) o4->f6f25c0c-982a-336c-3119-1afc44c07404@10.151.54.39@o2ib:338/0 lens 488/448 e 0 to 0 dl 1680207538 ref 1 fl Complete:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:19:38 nbp13-srv2 kernel: Lustre: 305171:0:(service.c:2329:ptlrpc_server_handle_request()) Skipped 170 previous similar messages Mar 30 13:20:24 nbp13-srv2 kernel: obd_memory max: 1854284506, obd_memory current: 1812335298 Mar 30 13:22:52 nbp13-srv2 kernel: Lustre: ll_ost_io02_040: service thread pid 304903 was inactive for 578.938 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 30 13:22:52 nbp13-srv2 kernel: Pid: 304903, comm: ll_ost_io02_040 4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UTC 2023 Mar 30 13:22:52 nbp13-srv2 kernel: Call Trace TBD: Mar 30 13:22:52 nbp13-srv2 kernel: Lustre: ll_ost_io00_004: service thread pid 206422 was inactive for 581.270 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] rq_qos_wait+0xb2/0x130 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] wbt_wait+0x96/0xc0 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] __rq_qos_throttle+0x23/0x40 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] blk_mq_make_request+0x131/0x5b0 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] generic_make_request_no_check+0xe1/0x330 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] submit_bio+0x3c/0x160 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] osd_do_bio.constprop.51+0xb63/0xc40 [osd_ldiskfs] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] osd_ldiskfs_map_inode_pages+0x873/0x8f0 [osd_ldiskfs] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] osd_write_commit+0x5e2/0x990 [osd_ldiskfs] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ofd_commitrw_write+0x77e/0x1ad0 [ofd] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ofd_commitrw+0x5b4/0xd20 [ofd] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] obd_commitrw+0x1b0/0x380 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] tgt_brw_write+0x139f/0x1ce0 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] tgt_request_handle+0xc97/0x1a40 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] kthread+0x10a/0x120 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ret_from_fork+0x1f/0x40 Mar 30 13:22:52 nbp13-srv2 kernel: Pid: 304875, comm: ll_ost_io06_028 4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UTC 2023 Mar 30 13:22:52 nbp13-srv2 kernel: Call Trace TBD: Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] wait_transaction_locked+0x89/0xd0 [jbd2] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] add_transaction_credits+0xd4/0x290 [jbd2] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] start_this_handle+0x10a/0x520 [jbd2] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] jbd2__journal_start+0xee/0x1f0 [jbd2] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] __ldiskfs_journal_start_sb+0x6e/0x140 [ldiskfs] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] osd_trans_start+0x13b/0x500 [osd_ldiskfs] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ofd_commitrw_write+0x6aa/0x1ad0 [ofd] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ofd_commitrw+0x5b4/0xd20 [ofd] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] obd_commitrw+0x1b0/0x380 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] tgt_brw_write+0x139f/0x1ce0 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] tgt_request_handle+0xc97/0x1a40 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] kthread+0x10a/0x120 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ret_from_fork+0x1f/0x40 Mar 30 13:22:52 nbp13-srv2 kernel: Pid: 305094, comm: ll_ost_io05_054 4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UTC 2023 Mar 30 13:22:52 nbp13-srv2 kernel: Call Trace TBD: Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] rq_qos_wait+0xb2/0x130 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] wbt_wait+0x96/0xc0 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] __rq_qos_throttle+0x23/0x40 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] blk_mq_make_request+0x131/0x5b0 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] generic_make_request_no_check+0xe1/0x330 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] submit_bio+0x3c/0x160 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] osd_do_bio.constprop.51+0xb63/0xc40 [osd_ldiskfs] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] osd_ldiskfs_map_inode_pages+0x873/0x8f0 [osd_ldiskfs] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] osd_write_commit+0x5e2/0x990 [osd_ldiskfs] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ofd_commitrw_write+0x77e/0x1ad0 [ofd] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ofd_commitrw+0x5b4/0xd20 [ofd] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] obd_commitrw+0x1b0/0x380 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] tgt_brw_write+0x139f/0x1ce0 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] tgt_request_handle+0xc97/0x1a40 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] kthread+0x10a/0x120 Mar 30 13:22:52 nbp13-srv2 kernel: [<0>] ret_from_fork+0x1f/0x40 Mar 30 13:23:10 nbp13-srv2 kernel: Lustre: 304926:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (4/4), not sending early reply req@00000000c9fea21a x1759927126475392/t120259619383(0) o4->cac27975-73c2-296d-7163-2457ec287ce6@10.151.32.78@o2ib:594/0 lens 7672/448 e 10 to 0 dl 1680207794 ref 2 fl Interpret:/0/0 rc 0/0 job:'15694723.pbspl1.nas.nasa.gov' Mar 30 13:23:11 nbp13-srv2 kernel: Lustre: 304676:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@00000000e45c02a5 x1759664408662144/t0(0) o4->c5057bc6-f677-728b-89e8-5deecc316272@10.151.54.69@o2ib:596/0 lens 488/448 e 10 to 0 dl 1680207796 ref 2 fl Interpret:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:23:11 nbp13-srv2 kernel: Lustre: 304676:0:(service.c:1437:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Mar 30 13:23:12 nbp13-srv2 kernel: Lustre: 304836:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@000000000dbd0359 x1761814347727488/t120259619396(0) o4->6f6ecde9-a247-2545-ddbf-3d990ce8254c@10.141.16.172@o2ib417:597/0 lens 10768/448 e 10 to 0 dl 1680207797 ref 2 fl Interpret:/0/0 rc 0/0 job:'15687238.pbspl1.nas.nasa.gov' Mar 30 13:23:12 nbp13-srv2 kernel: Lustre: 304836:0:(service.c:1437:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 30 13:23:15 nbp13-srv2 kernel: Lustre: 304873:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@0000000015d988ba x1759660356544576/t0(0) o4->749a166e-39ab-6cdc-d5bb-0babf355f04e@10.151.54.45@o2ib:600/0 lens 488/448 e 10 to 0 dl 1680207800 ref 2 fl Interpret:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:23:15 nbp13-srv2 kernel: Lustre: 304873:0:(service.c:1437:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Mar 30 13:23:20 nbp13-srv2 kernel: Lustre: 304873:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@000000004db1d489 x1759660356547712/t0(0) o4->749a166e-39ab-6cdc-d5bb-0babf355f04e@10.151.54.45@o2ib:605/0 lens 504/448 e 10 to 0 dl 1680207805 ref 2 fl Interpret:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:23:20 nbp13-srv2 kernel: Lustre: 304873:0:(service.c:1437:ptlrpc_at_send_early_reply()) Skipped 16 previous similar messages Mar 30 13:23:24 nbp13-srv2 kernel: Lustre: ll_ost_io01_056: service thread pid 304934 was inactive for 556.227 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 30 13:23:24 nbp13-srv2 kernel: Lustre: Skipped 18 previous similar messages Mar 30 13:23:28 nbp13-srv2 kernel: Lustre: 304804:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@000000008f289894 x1759832330720448/t0(0) o4->8cec28c3-a323-72ee-79bf-855704fcf379@10.151.54.26@o2ib:613/0 lens 504/448 e 10 to 0 dl 1680207813 ref 2 fl Interpret:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:23:28 nbp13-srv2 kernel: Lustre: 304804:0:(service.c:1437:ptlrpc_at_send_early_reply()) Skipped 110 previous similar messages Mar 30 13:23:33 nbp13-srv2 kernel: Lustre: nbp13-OST000d: Client 0f6c6ca5-034f-397b-47f6-0aac9c6defef (at 10.151.54.60@o2ib) reconnecting Mar 30 13:23:34 nbp13-srv2 kernel: Lustre: nbp13-OST000d: Client dafbeb92-8ede-8b09-2dc9-633257cffc11 (at 10.151.54.57@o2ib) reconnecting Mar 30 13:23:34 nbp13-srv2 kernel: Lustre: Skipped 2 previous similar messages Mar 30 13:23:34 nbp13-srv2 kernel: Lustre: 304964:0:(service.c:2329:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (600/20s); client may timeout req@00000000c9fea21a x1759927126475392/t120259619430(0) o4->cac27975-73c2-296d-7163-2457ec287ce6@10.151.32.78@o2ib:594/0 lens 7672/448 e 10 to 0 dl 1680207794 ref 1 fl Complete:/0/0 rc 0/0 job:'15694723.pbspl1.nas.nasa.gov' Mar 30 13:23:34 nbp13-srv2 kernel: Lustre: 304964:0:(service.c:2329:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Mar 30 13:23:34 nbp13-srv2 kernel: LustreError: 304839:0:(service.c:2291:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.151.54.42@o2ib: deadline 600/1s ago req@00000000223d68f8 x1759659870972544/t0(0) o4->9e200689-e3b2-ac2f-c551-7aeb2a287699@10.151.54.42@o2ib:613/0 lens 504/0 e 10 to 0 dl 1680207813 ref 1 fl Interpret:/0/ffffffff rc 0/-1 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:23:34 nbp13-srv2 kernel: LustreError: 304839:0:(service.c:2291:ptlrpc_server_handle_request()) Skipped 42 previous similar messages Mar 30 13:23:35 nbp13-srv2 kernel: Lustre: nbp13-OST0015: Client 72773efd-efa1-0161-2939-5a79dcfc56ea (at 10.151.54.24@o2ib) reconnecting Mar 30 13:23:35 nbp13-srv2 kernel: Lustre: Skipped 8 previous similar messages Mar 30 13:23:47 nbp13-srv2 kernel: Lustre: 9197:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@00000000c3ec3f48 x1759664001744064/t0(0) o4->dafbeb92-8ede-8b09-2dc9-633257cffc11@10.151.54.57@o2ib:632/0 lens 488/448 e 5 to 0 dl 1680207832 ref 2 fl Interpret:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:23:47 nbp13-srv2 kernel: Lustre: 9197:0:(service.c:1437:ptlrpc_at_send_early_reply()) Skipped 148 previous similar messages Mar 30 13:24:09 nbp13-srv2 kernel: Lustre: 305164:0:(service.c:2329:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (600/46s); client may timeout req@000000002fad9645 x1759664001737536/t120259592334(0) o4->dafbeb92-8ede-8b09-2dc9-633257cffc11@10.151.54.57@o2ib:603/0 lens 488/448 e 10 to 0 dl 1680207803 ref 1 fl Complete:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:24:09 nbp13-srv2 kernel: Lustre: 305164:0:(service.c:2329:ptlrpc_server_handle_request()) Skipped 16 previous similar messages Mar 30 13:24:09 nbp13-srv2 kernel: LustreError: 305164:0:(service.c:2291:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.151.54.84@o2ib: deadline 600/8s ago req@0000000030cbe89b x1760346945256576/t0(0) o4->61e12c2a-8b03-d504-9634-3407ec7e4579@10.151.54.84@o2ib:641/0 lens 504/0 e 5 to 0 dl 1680207841 ref 1 fl Interpret:/0/ffffffff rc 0/-1 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:24:09 nbp13-srv2 kernel: LustreError: 305164:0:(service.c:2291:ptlrpc_server_handle_request()) Skipped 1 previous similar message Mar 30 13:24:12 nbp13-srv2 kernel: Lustre: nbp13-OST0016: Client e20ba9b9-7d5a-94c9-fe62-6c0f2aaeed14 (at 10.151.27.24@o2ib) reconnecting Mar 30 13:24:12 nbp13-srv2 kernel: Lustre: Skipped 15 previous similar messages Mar 30 13:24:21 nbp13-srv2 kernel: Lustre: 304873:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@00000000dd8686fb x1759666486652352/t0(0) o4->f65a7b3b-c285-35b4-7d44-6b809423bb84@10.151.54.81@o2ib:666/0 lens 488/448 e 4 to 0 dl 1680207866 ref 2 fl Interpret:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:24:21 nbp13-srv2 kernel: Lustre: 304873:0:(service.c:1437:ptlrpc_at_send_early_reply()) Skipped 139 previous similar messages Mar 30 13:24:25 nbp13-srv2 kernel: LustreError: 9177:0:(service.c:2291:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.151.51.152@o2ib: deadline 600/47s ago req@00000000a9c85b12 x1759820183217792/t0(0) o4->0dee3e88-e3cc-b022-3e96-3dc71003999f@10.151.51.152@o2ib:618/0 lens 6328/0 e 8 to 0 dl 1680207818 ref 1 fl Interpret:/0/ffffffff rc 0/-1 job:'15692479.pbspl1.nas.nasa.gov' Mar 30 13:24:25 nbp13-srv2 kernel: LustreError: 9177:0:(service.c:2291:ptlrpc_server_handle_request()) Skipped 14 previous similar messages Mar 30 13:24:25 nbp13-srv2 kernel: Lustre: 304787:0:(service.c:2329:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (600/49s); client may timeout req@0000000037afe092 x1759831996996864/t120259607252(0) o4->99d02b15-66be-d855-56a7-f6649bb2f9d1@10.151.54.23@o2ib:616/0 lens 504/448 e 10 to 0 dl 1680207816 ref 1 fl Complete:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:24:25 nbp13-srv2 kernel: Lustre: 304787:0:(service.c:2329:ptlrpc_server_handle_request()) Skipped 67 previous similar messages Mar 30 13:24:32 nbp13-srv2 kernel: Lustre: nbp13-OST000f: Client 3521eb3d-9d6e-6b5f-5fc8-edc286e8a574 (at 10.151.54.87@o2ib) reconnecting Mar 30 13:24:54 nbp13-srv2 kernel: LustreError: 304800:0:(service.c:2291:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.151.3.117@o2ib: deadline 600/45s ago req@000000002c5e2c7f x1761239182594240/t0(0) o4->38263fb3-2252-ea74-c031-08046dd83a9a@10.151.3.117@o2ib:649/0 lens 3016/0 e 4 to 0 dl 1680207849 ref 1 fl Interpret:/0/ffffffff rc 0/-1 job:'chem.36968841' Mar 30 13:24:54 nbp13-srv2 kernel: LustreError: 304800:0:(service.c:2291:ptlrpc_server_handle_request()) Skipped 50 previous similar messages Mar 30 13:25:01 nbp13-srv2 kernel: Lustre: 305138:0:(service.c:2329:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (600/96s); client may timeout req@00000000d1f85c00 x1759663660835392/t120259619457(0) o4->d1d54c6b-8a92-d77f-ee0b-eef34d9e984b@10.151.54.59@o2ib:605/0 lens 504/448 e 10 to 0 dl 1680207805 ref 1 fl Complete:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:25:01 nbp13-srv2 kernel: Lustre: 305138:0:(service.c:2329:ptlrpc_server_handle_request()) Skipped 15 previous similar messages Mar 30 13:25:36 nbp13-srv2 kernel: Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). Mar 30 13:25:36 nbp13-srv2 kernel: Lustre: 304610:0:(service.c:1614:ptlrpc_at_check_timed()) earlyQ=7 reqQ=0 recA=28, svcEst=400, delay=0ms Mar 30 13:25:36 nbp13-srv2 kernel: Lustre: 304610:0:(service.c:1379:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@0000000041d0de34 x1761762403451776/t0(0) o6->nbp13-MDT0000-mdtlov_UUID@10.151.26.183@o2ib:734/0 lens 544/432 e 0 to 0 dl 1680207934 ref 2 fl Interpret:/0/0 rc 0/0 job:'osp-syn-17-0.0' Mar 30 13:25:36 nbp13-srv2 kernel: Lustre: 304610:0:(service.c:1379:ptlrpc_at_send_early_reply()) Skipped 104 previous similar messages Mar 30 13:25:36 nbp13-srv2 kernel: Lustre: 305108:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (2/2), not sending early reply req@0000000085ac41e4 x1759661480923712/t0(0) o4->6a4b4e52-b388-042a-993b-d4f330552c83@10.151.54.46@o2ib:738/0 lens 488/448 e 2 to 0 dl 1680207938 ref 2 fl Interpret:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:25:36 nbp13-srv2 kernel: Lustre: 305108:0:(service.c:1437:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Mar 30 13:26:08 nbp13-srv2 kernel: Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). Mar 30 13:26:08 nbp13-srv2 kernel: Lustre: 206383:0:(service.c:1614:ptlrpc_at_check_timed()) earlyQ=36 reqQ=0 recA=36, svcEst=275, delay=0ms Mar 30 13:26:08 nbp13-srv2 kernel: Lustre: 206383:0:(service.c:1379:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-7s), not sending early reply. Consider increasing at_early_margin (5)? req@00000000c4ca2498 x1759661952469632/t0(0) o9->76686889-4aef-e6ff-268a-2fc48c796ffd@10.141.5.246@o2ib417:6/0 lens 224/224 e 0 to 0 dl 1680207961 ref 2 fl Interpret:/0/0 rc 0/0 job:'kworker/25:1.0' Mar 30 13:26:08 nbp13-srv2 kernel: Lustre: 206383:0:(service.c:1379:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Mar 30 13:27:14 nbp13-srv2 kernel: Lustre: ll_ost_io08_024: service thread pid 304939 was inactive for 561.855 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 30 13:27:14 nbp13-srv2 kernel: Lustre: Skipped 12 previous similar messages Mar 30 13:27:47 nbp13-srv2 kernel: Lustre: 304873:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@00000000b8c97b38 x1759661320501952/t0(0) o4->0e510a3b-846e-b481-a065-5ec9f007ddc3@10.151.54.18@o2ib:117/0 lens 488/0 e 1 to 0 dl 1680208072 ref 2 fl New:/0/ffffffff rc 0/-1 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:27:47 nbp13-srv2 kernel: Lustre: 304873:0:(service.c:1437:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Mar 30 13:27:47 nbp13-srv2 kernel: Lustre: ll_ost_io07_049: service thread pid 305182 was inactive for 593.800 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 30 13:27:47 nbp13-srv2 kernel: Lustre: Skipped 4 previous similar messages Mar 30 13:27:51 nbp13-srv2 kernel: Lustre: nbp13-OST000d: Client 0fc5d4b5-c081-a4ab-fac4-1b566c62eed0 (at 10.151.54.76@o2ib) reconnecting Mar 30 13:28:07 nbp13-srv2 kernel: Lustre: nbp13-OST000b: Client 6dd872a2-024e-a649-7a63-637fcd7ff92e (at 10.151.27.23@o2ib) reconnecting Mar 30 13:28:07 nbp13-srv2 kernel: Lustre: Skipped 55 previous similar messages Mar 30 13:28:40 nbp13-srv2 kernel: Lustre: nbp13-OST000b: Client df7a3ce0-fa09-9e8a-6144-81421cc2a001 (at 10.151.54.77@o2ib) reconnecting Mar 30 13:28:40 nbp13-srv2 kernel: Lustre: Skipped 88 previous similar messages Mar 30 13:28:52 nbp13-srv2 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Mar 30 13:28:52 nbp13-srv2 kernel: Lustre: Skipped 5 previous similar messages Mar 30 13:28:52 nbp13-srv2 kernel: Lustre: 1102353:0:(service.c:1614:ptlrpc_at_check_timed()) earlyQ=7 reqQ=0 recA=7, svcEst=296, delay=0ms Mar 30 13:28:52 nbp13-srv2 kernel: Lustre: 1102353:0:(service.c:1614:ptlrpc_at_check_timed()) Skipped 5 previous similar messages Mar 30 13:28:52 nbp13-srv2 kernel: Lustre: 1102353:0:(service.c:1379:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-15s), not sending early reply. Consider increasing at_early_margin (5)? req@0000000071aac690 x1761726491226880/t0(0) o103->9c90f193-8145-6e93-acbb-d9eb3e6fc62d@10.151.25.233@o2ib:162/0 lens 432/224 e 0 to 0 dl 1680208117 ref 2 fl Interpret:/0/0 rc 0/0 job:'' Mar 30 13:28:52 nbp13-srv2 kernel: Lustre: 1102353:0:(service.c:1379:ptlrpc_at_send_early_reply()) Skipped 112 previous similar messages Mar 30 13:29:21 nbp13-srv2 kernel: LustreError: 305003:0:(service.c:2291:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.151.54.18@o2ib: deadline 600/89s ago req@0000000057405a13 x1759661320502016/t0(0) o4->0e510a3b-846e-b481-a065-5ec9f007ddc3@10.151.54.18@o2ib:117/0 lens 504/0 e 1 to 0 dl 1680208072 ref 1 fl Interpret:/0/ffffffff rc 0/-1 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:29:21 nbp13-srv2 kernel: LustreError: 305003:0:(service.c:2291:ptlrpc_server_handle_request()) Skipped 95 previous similar messages Mar 30 13:29:21 nbp13-srv2 kernel: Lustre: 305003:0:(service.c:2329:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (600/89s); client may timeout req@0000000057405a13 x1759661320502016/t0(0) o4->0e510a3b-846e-b481-a065-5ec9f007ddc3@10.151.54.18@o2ib:117/0 lens 504/0 e 1 to 0 dl 1680208072 ref 1 fl Interpret:/0/ffffffff rc 0/-1 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:29:21 nbp13-srv2 kernel: Lustre: 305003:0:(service.c:2329:ptlrpc_server_handle_request()) Skipped 84 previous similar messages Mar 30 13:29:25 nbp13-srv2 kernel: ptlrpc_watchdog_fire: 40 callbacks suppressed Mar 30 13:29:25 nbp13-srv2 kernel: Lustre: ll_ost02_053: service thread pid 1098570 was inactive for 551.732 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 30 13:29:25 nbp13-srv2 kernel: Lustre: ll_ost02_011: service thread pid 305560 was inactive for 551.731 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 30 13:29:25 nbp13-srv2 kernel: Lustre: Skipped 2 previous similar messages Mar 30 13:29:25 nbp13-srv2 kernel: Lustre: Skipped 2 previous similar messages Mar 30 13:29:25 nbp13-srv2 kernel: Pid: 608484, comm: ll_ost02_031 4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UTC 2023 Mar 30 13:29:25 nbp13-srv2 kernel: Call Trace TBD: Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] wait_transaction_locked+0x89/0xd0 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] add_transaction_credits+0xd4/0x290 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] start_this_handle+0x10a/0x520 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] jbd2__journal_start+0xee/0x1f0 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] __ldiskfs_journal_start_sb+0x6e/0x140 [ldiskfs] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] osd_trans_start+0x13b/0x500 [osd_ldiskfs] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_server_data_update+0x3db/0x5a0 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_client_del+0x368/0x710 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ofd_obd_disconnect+0x1f8/0x210 [ofd] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] target_handle_disconnect+0x22f/0x500 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_disconnect+0x4a/0x1a0 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_request_handle+0xc97/0x1a40 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] kthread+0x10a/0x120 Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ret_from_fork+0x1f/0x40 Mar 30 13:29:25 nbp13-srv2 kernel: Pid: 305560, comm: ll_ost02_011 4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UTC 2023 Mar 30 13:29:25 nbp13-srv2 kernel: Call Trace TBD: Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] wait_transaction_locked+0x89/0xd0 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] add_transaction_credits+0xd4/0x290 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] start_this_handle+0x10a/0x520 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] jbd2__journal_start+0xee/0x1f0 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] __ldiskfs_journal_start_sb+0x6e/0x140 [ldiskfs] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] osd_trans_start+0x13b/0x500 [osd_ldiskfs] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_server_data_update+0x3db/0x5a0 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_client_del+0x368/0x710 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ofd_obd_disconnect+0x1f8/0x210 [ofd] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] target_handle_disconnect+0x22f/0x500 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_disconnect+0x4a/0x1a0 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_request_handle+0xc97/0x1a40 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] kthread+0x10a/0x120 Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ret_from_fork+0x1f/0x40 Mar 30 13:29:25 nbp13-srv2 kernel: Pid: 405000, comm: ll_ost03_010 4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UTC 2023 Mar 30 13:29:25 nbp13-srv2 kernel: Call Trace TBD: Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] wait_transaction_locked+0x89/0xd0 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] add_transaction_credits+0xd4/0x290 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] start_this_handle+0x10a/0x520 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] jbd2__journal_start+0xee/0x1f0 [jbd2] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] __ldiskfs_journal_start_sb+0x6e/0x140 [ldiskfs] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] osd_trans_start+0x13b/0x500 [osd_ldiskfs] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_server_data_update+0x3db/0x5a0 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_client_del+0x368/0x710 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ofd_obd_disconnect+0x1f8/0x210 [ofd] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] target_handle_disconnect+0x22f/0x500 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_disconnect+0x4a/0x1a0 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] tgt_request_handle+0xc97/0x1a40 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] kthread+0x10a/0x120 Mar 30 13:29:25 nbp13-srv2 kernel: [<0>] ret_from_fork+0x1f/0x40 Mar 30 13:29:32 nbp13-srv2 kernel: Lustre: nbp13-OST0012: Export 00000000cf5e758e already connecting from 10.151.27.19@o2ib Mar 30 13:29:37 nbp13-srv2 kernel: Lustre: nbp13-OST0012: Export 00000000cf5e758e already connecting from 10.151.27.19@o2ib Mar 30 13:29:42 nbp13-srv2 kernel: Lustre: nbp13-OST0012: Export 00000000cf5e758e already connecting from 10.151.27.19@o2ib Mar 30 13:29:47 nbp13-srv2 kernel: Lustre: nbp13-OST0012: Export 00000000cf5e758e already connecting from 10.151.27.19@o2ib Mar 30 13:29:52 nbp13-srv2 kernel: Lustre: nbp13-OST0012: Export 00000000cf5e758e already connecting from 10.151.27.19@o2ib Mar 30 13:29:58 nbp13-srv2 kernel: Lustre: ll_ost07_063: service thread pid 1098587 was inactive for 579.554 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 30 13:29:58 nbp13-srv2 kernel: Lustre: Skipped 98 previous similar messages Mar 30 13:30:02 nbp13-srv2 kernel: Lustre: nbp13-OST000d: Client e20ba9b9-7d5a-94c9-fe62-6c0f2aaeed14 (at 10.151.27.24@o2ib) reconnecting Mar 30 13:30:02 nbp13-srv2 kernel: Lustre: Skipped 36 previous similar messages Mar 30 13:30:04 nbp13-srv2 kernel: Lustre: nbp13-OST0015: deleting orphan objects from 0x0:341539447 to 0x0:341539857 Mar 30 13:30:04 nbp13-srv2 kernel: Lustre: nbp13-OST000e: deleting orphan objects from 0x0:337200165 to 0x0:337200609 Mar 30 13:30:04 nbp13-srv2 kernel: Lustre: nbp13-OST0011: deleting orphan objects from 0x0:342915799 to 0x0:342916465 Mar 30 13:30:05 nbp13-srv2 kernel: Lustre: nbp13-OST000c: Export 00000000b7ede90d already connecting from 10.151.27.22@o2ib Mar 30 13:30:05 nbp13-srv2 kernel: Lustre: Skipped 1 previous similar message Mar 30 13:30:06 nbp13-srv2 kernel: Lustre: nbp13-OST000f: deleting orphan objects from 0x0:335937610 to 0x0:335938001 Mar 30 13:30:21 nbp13-srv2 kernel: Lustre: nbp13-OST0012: Export 00000000cf5e758e already connecting from 10.151.27.19@o2ib Mar 30 13:30:21 nbp13-srv2 kernel: Lustre: Skipped 14 previous similar messages Mar 30 13:30:30 nbp13-srv2 kernel: Lustre: ll_ost05_055: service thread pid 1102417 was inactive for 567.895 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 30 13:30:30 nbp13-srv2 kernel: Lustre: Skipped 11 previous similar messages Mar 30 13:30:55 nbp13-srv2 kernel: Lustre: nbp13-OST000c: Export 00000000b7ede90d already connecting from 10.151.27.22@o2ib Mar 30 13:30:55 nbp13-srv2 kernel: Lustre: Skipped 24 previous similar messages Mar 30 13:31:12 nbp13-srv2 kernel: LustreError: 304767:0:(service.c:2291:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.151.54.24@o2ib: deadline 600/96s ago req@00000000f1774886 x1759661517954304/t0(0) o4->72773efd-efa1-0161-2939-5a79dcfc56ea@10.151.54.24@o2ib:221/0 lens 488/0 e 1 to 0 dl 1680208176 ref 1 fl Interpret:/0/ffffffff rc 0/-1 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:31:12 nbp13-srv2 kernel: LustreError: 304767:0:(service.c:2291:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Mar 30 13:31:36 nbp13-srv2 kernel: LustreError: 9124:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 718s: evicting client at 10.151.27.24@o2ib ns: filter-nbp13-OST000d_UUID lock: 00000000a9700145/0x9522fbc66e2727d3 lrc: 3/0,0 mode: PW/PW res: [0x145c901e:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 2147483648->8589934591) gid 0 flags: 0x60000400000020 nid: 10.151.27.24@o2ib remote: 0xd1683a727289c1ea expref: 56 pid: 362253 timeout: 64809 lvb_type: 0 Mar 30 13:32:27 nbp13-srv2 kernel: Lustre: nbp13-OST0015: Client f6f25c0c-982a-336c-3119-1afc44c07404 (at 10.151.54.39@o2ib) reconnecting Mar 30 13:32:27 nbp13-srv2 kernel: Lustre: Skipped 5 previous similar messages Mar 30 13:32:42 nbp13-srv2 kernel: Lustre: ll_ost01_004: service thread pid 9514 was inactive for 551.837 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 30 13:33:03 nbp13-srv2 kernel: Lustre: 362239:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (4/4), not sending early reply req@0000000092c0b054 x1761728617450944/t0(0) o2->e20ba9b9-7d5a-94c9-fe62-6c0f2aaeed14@10.151.27.24@o2ib:432/0 lens 440/432 e 8 to 0 dl 1680208387 ref 2 fl Interpret:/0/0 rc 0/0 job:'cp.920567893' Mar 30 13:33:03 nbp13-srv2 kernel: Lustre: 362239:0:(service.c:1437:ptlrpc_at_send_early_reply()) Skipped 375 previous similar messages Mar 30 13:33:47 nbp13-srv2 kernel: Lustre: ll_ost06_005: service thread pid 9497 was inactive for 813.952 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 30 13:33:47 nbp13-srv2 kernel: Lustre: Skipped 356 previous similar messages Mar 30 13:33:47 nbp13-srv2 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Mar 30 13:33:47 nbp13-srv2 kernel: Lustre: 9107:0:(service.c:1614:ptlrpc_at_check_timed()) earlyQ=2 reqQ=0 recA=3, svcEst=275, delay=9ms Mar 30 13:33:47 nbp13-srv2 kernel: Lustre: 9107:0:(service.c:1379:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-12s), not sending early reply. Consider increasing at_early_margin (5)? req@00000000b249a7ce x1761728399052864/t0(0) o103->2efdc390-9827-b9e1-9e8f-6c97cf489f1e@10.151.27.18@o2ib:460/0 lens 440/224 e 0 to 0 dl 1680208415 ref 2 fl Interpret:/0/0 rc 0/0 job:'' Mar 30 13:33:47 nbp13-srv2 kernel: Lustre: 9107:0:(service.c:1379:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Mar 30 13:34:11 nbp13-srv2 kernel: LustreError: 304795:0:(service.c:2291:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.151.54.51@o2ib: deadline 600/37s ago req@00000000ce8e4147 x1759687817045952/t0(0) o4->42e38550-847f-0172-7969-71df3b58f372@10.151.54.51@o2ib:459/0 lens 488/0 e 1 to 0 dl 1680208414 ref 1 fl Interpret:/2/ffffffff rc 0/-1 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:34:11 nbp13-srv2 kernel: LustreError: 304795:0:(service.c:2291:ptlrpc_server_handle_request()) Skipped 179 previous similar messages Mar 30 13:34:11 nbp13-srv2 kernel: Lustre: 304795:0:(service.c:2329:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (600/37s); client may timeout req@00000000ce8e4147 x1759687817045952/t0(0) o4->42e38550-847f-0172-7969-71df3b58f372@10.151.54.51@o2ib:459/0 lens 488/0 e 1 to 0 dl 1680208414 ref 1 fl Interpret:/2/ffffffff rc 0/-1 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:34:11 nbp13-srv2 kernel: Lustre: 304795:0:(service.c:2329:ptlrpc_server_handle_request()) Skipped 197 previous similar messages Mar 30 13:34:51 nbp13-srv2 kernel: obd_memory max: 1854284506, obd_memory current: 1841389002 Mar 30 13:34:53 nbp13-srv2 kernel: ptlrpc_watchdog_fire: 487 callbacks suppressed Mar 30 13:34:53 nbp13-srv2 kernel: Lustre: ll_ost_io08_055: service thread pid 305089 was inactive for 884.931 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 30 13:34:53 nbp13-srv2 kernel: Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). Mar 30 13:34:53 nbp13-srv2 kernel: Lustre: Skipped 1 previous similar message Mar 30 13:34:53 nbp13-srv2 kernel: Lustre: 305557:0:(service.c:1614:ptlrpc_at_check_timed()) earlyQ=3 reqQ=0 recA=62, svcEst=275, delay=0ms Mar 30 13:34:53 nbp13-srv2 kernel: Pid: 305089, comm: ll_ost_io08_055 4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UTC 2023 Mar 30 13:34:53 nbp13-srv2 kernel: Call Trace TBD: Mar 30 13:34:53 nbp13-srv2 kernel: Lustre: 305557:0:(service.c:1379:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-18s), not sending early reply. Consider increasing at_early_margin (5)? req@00000000326aa7ea x1761814347807936/t0(0) o9->6f6ecde9-a247-2545-ddbf-3d990ce8254c@10.141.16.172@o2ib417:520/0 lens 224/224 e 0 to 0 dl 1680208475 ref 2 fl Interpret:/0/0 rc 0/0 job:'' Mar 30 13:34:53 nbp13-srv2 kernel: Lustre: 305557:0:(service.c:1379:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] wait_transaction_locked+0x89/0xd0 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] add_transaction_credits+0xd4/0x290 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] start_this_handle+0x10a/0x520 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] jbd2__journal_start+0xee/0x1f0 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] __ldiskfs_journal_start_sb+0x6e/0x140 [ldiskfs] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] osd_trans_start+0x13b/0x500 [osd_ldiskfs] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ofd_write_attr_set+0x11d/0x1070 [ofd] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ofd_commitrw_write+0x226/0x1ad0 [ofd] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ofd_commitrw+0x5b4/0xd20 [ofd] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] obd_commitrw+0x1b0/0x380 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] tgt_brw_write+0x139f/0x1ce0 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] tgt_request_handle+0xc97/0x1a40 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] kthread+0x10a/0x120 Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ret_from_fork+0x1f/0x40 Mar 30 13:34:53 nbp13-srv2 kernel: Pid: 305118, comm: ll_ost_io00_041 4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UTC 2023 Mar 30 13:34:53 nbp13-srv2 kernel: Call Trace TBD: Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] wait_transaction_locked+0x89/0xd0 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] add_transaction_credits+0xd4/0x290 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] start_this_handle+0x10a/0x520 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] jbd2__journal_start+0xee/0x1f0 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] __ldiskfs_journal_start_sb+0x6e/0x140 [ldiskfs] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] osd_trans_start+0x13b/0x500 [osd_ldiskfs] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ofd_write_attr_set+0x11d/0x1070 [ofd] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ofd_commitrw_write+0x226/0x1ad0 [ofd] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ofd_commitrw+0x5b4/0xd20 [ofd] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] obd_commitrw+0x1b0/0x380 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] tgt_brw_write+0x139f/0x1ce0 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] tgt_request_handle+0xc97/0x1a40 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] kthread+0x10a/0x120 Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ret_from_fork+0x1f/0x40 Mar 30 13:34:53 nbp13-srv2 kernel: Pid: 1003395, comm: ll_ost08_054 4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 11 23:55:00 UTC 2023 Mar 30 13:34:53 nbp13-srv2 kernel: Call Trace TBD: Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] wait_transaction_locked+0x89/0xd0 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] add_transaction_credits+0xd4/0x290 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] start_this_handle+0x10a/0x520 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] jbd2__journal_start+0xee/0x1f0 [jbd2] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] __ldiskfs_journal_start_sb+0x6e/0x140 [ldiskfs] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] osd_trans_start+0x13b/0x500 [osd_ldiskfs] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] tgt_client_data_update+0x468/0x6c0 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] tgt_client_new+0x5c2/0x880 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ofd_obd_connect+0x385/0x4f0 [ofd] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] target_handle_connect+0x611/0x29a0 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] tgt_request_handle+0x569/0x1a40 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] kthread+0x10a/0x120 Mar 30 13:34:53 nbp13-srv2 kernel: [<0>] ret_from_fork+0x1f/0x40 Mar 30 13:35:04 nbp13-srv2 kernel: obd_memory max: 1854284506, obd_memory current: 1841848626 Mar 30 13:35:13 nbp13-srv2 kernel: obd_memory max: 1854284506, obd_memory current: 1842010642 Mar 30 13:35:33 nbp13-srv2 kernel: obd_memory max: 1854284506, obd_memory current: 1845310498 Mar 30 13:35:58 nbp13-srv2 kernel: Lustre: ll_ost_io00_014: service thread pid 304768 was inactive for 927.776 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 30 13:35:58 nbp13-srv2 kernel: Lustre: Skipped 23 previous similar messages Mar 30 13:36:31 nbp13-srv2 kernel: obd_memory max: 1854330874, obd_memory current: 1854330874 Mar 30 13:36:31 nbp13-srv2 kernel: Lustre: ost_io: This server is not able to keep up with request traffic (cpu-bound). Mar 30 13:36:31 nbp13-srv2 kernel: Lustre: 304817:0:(service.c:1614:ptlrpc_at_check_timed()) earlyQ=3 reqQ=0 recA=56, svcEst=600, delay=0ms Mar 30 13:36:31 nbp13-srv2 kernel: Lustre: 304817:0:(service.c:1379:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@000000000e85a616 x1759661480940672/t0(0) o4->6a4b4e52-b388-042a-993b-d4f330552c83@10.151.54.46@o2ib:632/0 lens 488/448 e 0 to 0 dl 1680208587 ref 2 fl Interpret:/0/0 rc 0/0 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:36:31 nbp13-srv2 kernel: Lustre: 304817:0:(service.c:1379:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Mar 30 13:36:31 nbp13-srv2 kernel: LustreError: 305153:0:(service.c:2291:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.151.54.39@o2ib: deadline 600/176s ago req@000000006c39d2dd x1759832484253120/t0(0) o4->f6f25c0c-982a-336c-3119-1afc44c07404@10.151.54.39@o2ib:460/0 lens 488/0 e 1 to 0 dl 1680208415 ref 1 fl Interpret:/2/ffffffff rc 0/-1 job:'15692804.pbspl1.nas.nasa.gov' Mar 30 13:36:31 nbp13-srv2 kernel: LustreError: 305153:0:(service.c:2291:ptlrpc_server_handle_request()) Skipped 36 previous similar messages Mar 30 13:36:35 nbp13-srv2 kernel: obd_memory max: 1854330874, obd_memory current: 1854328578 Mar 30 13:36:42 nbp13-srv2 kernel: obd_memory max: 1854661130, obd_memory current: 1854661130 Mar 30 13:36:43 nbp13-srv2 kernel: obd_memory max: 1854856474, obd_memory current: 1854856474 Mar 30 13:36:44 nbp13-srv2 kernel: obd_memory max: 1854856474, obd_memory current: 1854699818 Mar 30 13:36:45 nbp13-srv2 kernel: obd_memory max: 1854856474, obd_memory current: 1854746362 Mar 30 13:36:49 nbp13-srv2 kernel: obd_memory max: 1854856474, obd_memory current: 1854684890 Mar 30 13:36:50 nbp13-srv2 kernel: obd_memory max: 1854856474, obd_memory current: 1854693034 Mar 30 13:36:51 nbp13-srv2 kernel: obd_memory max: 1854856474, obd_memory current: 1854721834 Mar 30 13:36:54 nbp13-srv2 kernel: obd_memory max: 1854940938, obd_memory current: 1854940938 Mar 30 13:36:57 nbp13-srv2 kernel: obd_memory max: 1854940938, obd_memory current: 1854862810 Mar 30 13:36:58 nbp13-srv2 kernel: obd_memory max: 1854940938, obd_memory current: 1854918810 Mar 30 13:36:59 nbp13-srv2 kernel: obd_memory max: 1854940938, obd_memory current: 1854927098 Mar 30 13:37:00 nbp13-srv2 kernel: obd_memory max: 1854996506, obd_memory current: 1854996506 Mar 30 13:37:01 nbp13-srv2 kernel: obd_memory max: 1855198698, obd_memory current: 1855198698 Mar 30 13:37:04 nbp13-srv2 kernel: obd_memory max: 1855341002, obd_memory current: 1855341002 Mar 30 13:37:05 nbp13-srv2 kernel: obd_memory max: 1855424346, obd_memory current: 1855424346 Mar 30 13:37:06 nbp13-srv2 kernel: obd_memory max: 1855500026, obd_memory current: 1855500026 Mar 30 13:37:12 nbp13-srv2 kernel: obd_memory max: 1855513258, obd_memory current: 1855513258 Mar 30 13:37:13 nbp13-srv2 kernel: obd_memory max: 1855513258, obd_memory current: 1855261242 Mar 30 13:37:19 nbp13-srv2 kernel: obd_memory max: 1855666954, obd_memory current: 1855666954 Mar 30 13:37:23 nbp13-srv2 kernel: obd_memory max: 1855848474, obd_memory current: 1855848474 Mar 30 13:37:25 nbp13-srv2 kernel: obd_memory max: 1855848474, obd_memory current: 1855760882 Mar 30 13:37:25 nbp13-srv2 kernel: obd_memory max: 1855848474, obd_memory current: 1855764106 Mar 30 13:37:25 nbp13-srv2 kernel: obd_memory max: 1855848474, obd_memory current: 1855805930 Mar 30 13:37:27 nbp13-srv2 kernel: obd_memory max: 1855998434, obd_memory current: 1855998434 Mar 30 13:37:29 nbp13-srv2 kernel: obd_memory max: 1856058034, obd_memory current: 1856058034 Mar 30 13:37:31 nbp13-srv2 kernel: obd_memory max: 1856079690, obd_memory current: 1856079690 Mar 30 13:37:33 nbp13-srv2 kernel: obd_memory max: 1856306322, obd_memory current: 1856306322 Mar 30 13:37:36 nbp13-srv2 kernel: obd_memory max: 1856306322, obd_memory current: 1856242034 Mar 30 13:37:36 nbp13-srv2 kernel: Lustre: 1002535:0:(service.c:1379:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@000000000bd5d1c2 x1759661118148864/t0(0) o17->fc7b0cde-5d61-9a8a-fc77-5a9defde6191@10.151.54.27@o2ib:696/0 lens 456/0 e 0 to 0 dl 1680208651 ref 2 fl New:/0/ffffffff rc 0/-1 job:'kworker/4:1.0' Mar 30 13:37:36 nbp13-srv2 kernel: Lustre: 1002535:0:(service.c:1379:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Mar 30 13:37:37 nbp13-srv2 kernel: obd_memory max: 1856416914, obd_memory current: 1856416914 Mar 30 13:37:42 nbp13-srv2 kernel: obd_memory max: 1856626634, obd_memory current: 1856626634 Mar 30 13:37:52 nbp13-srv2 kernel: obd_memory max: 1857756082, obd_memory current: 1857756082 Mar 30 13:37:55 nbp13-srv2 kernel: obd_memory max: 1858716554, obd_memory current: 1858716554 Mar 30 13:37:56 nbp13-srv2 kernel: obd_memory max: 1858716554, obd_memory current: 1858560362 Mar 30 13:37:59 nbp13-srv2 kernel: obd_memory max: 1858827098, obd_memory current: 1858827098 Mar 30 13:38:01 nbp13-srv2 kernel: obd_memory max: 1858827098, obd_memory current: 1858751066 Mar 30 13:38:03 nbp13-srv2 kernel: obd_memory max: 1858948810, obd_memory current: 1858948810 Mar 30 13:38:07 nbp13-srv2 kernel: obd_memory max: 1859029866, obd_memory current: 1859029866 Mar 30 13:38:09 nbp13-srv2 kernel: Lustre: nbp13-OST000b: Client d2af165d-2e65-5423-d0c3-be69ebabece6 (at 10.141.5.38@o2ib417) reconnecting Mar 30 13:38:09 nbp13-srv2 kernel: Lustre: Skipped 146 previous similar messages Mar 30 13:38:10 nbp13-srv2 kernel: obd_memory max: 1859029866, obd_memory current: 1859015378 Mar 30 13:38:15 nbp13-srv2 kernel: obd_memory max: 1859132002, obd_memory current: 1859132002 Mar 30 13:38:15 nbp13-srv2 kernel: obd_memory max: 1859222082, obd_memory current: 1859222082 Mar 30 13:38:23 nbp13-srv2 kernel: obd_memory max: 1859702402, obd_memory current: 1859702402 Mar 30 13:38:25 nbp13-srv2 kernel: obd_memory max: 1859747698, obd_memory current: 1859747698 Mar 30 13:38:25 nbp13-srv2 kernel: obd_memory max: 1859747698, obd_memory current: 1859747698 Mar 30 13:38:31 nbp13-srv2 kernel: obd_memory max: 1860176458, obd_memory current: 1860176458 Mar 30 13:38:38 nbp13-srv2 kernel: obd_memory max: 1860213050, obd_memory current: 1860213050 Mar 30 13:38:42 nbp13-srv2 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Mar 30 13:38:42 nbp13-srv2 kernel: Lustre: Skipped 9 previous similar messages Mar 30 13:38:42 nbp13-srv2 kernel: Lustre: 1004238:0:(service.c:1614:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=3, svcEst=275, delay=0ms Mar 30 13:38:42 nbp13-srv2 kernel: Lustre: 1004238:0:(service.c:1614:ptlrpc_at_check_timed()) Skipped 9 previous similar messages Mar 30 13:38:44 nbp13-srv2 kernel: obd_memory max: 1860637234, obd_memory current: 1860637234 Mar 30 13:38:45 nbp13-srv2 kernel: obd_memory max: 1860752690, obd_memory current: 1860752690 Mar 30 13:38:46 nbp13-srv2 kernel: obd_memory max: 1860797906, obd_memory current: 1860797906 Mar 30 13:38:46 nbp13-srv2 kernel: obd_memory max: 1860797906, obd_memory current: 1860508306 Mar 30 13:38:49 nbp13-srv2 kernel: obd_memory max: 1860807458, obd_memory current: 1860807458 Mar 30 13:38:51 nbp13-srv2 kernel: obd_memory max: 1861101794, obd_memory current: 1861101794