Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.15.2, Lustre 2.15.3
-
None
-
3
-
9223372036854775807
Description
There's a regular crash in conf-sanity test 30a (janitor) or 30b (maloo) that unfolds like this https://testing.whamcloud.com/test_sets/683c15ea-ec59-46d0-89c1-7f10d715f355
[ 5673.591855] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-60vm1.onyx.whamcloud.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid [ 5673.592761] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-60vm2.onyx.whamcloud.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid [ 5673.882066] Lustre: DEBUG MARKER: onyx-60vm1.onyx.whamcloud.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid [ 5673.948710] Lustre: DEBUG MARKER: onyx-60vm2.onyx.whamcloud.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid [ 5679.785437] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre-OST0000.failover.node='10.240.25.54@tcp' [ 5684.257444] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param -d lustre-OST0000.failover.node [ 5686.387703] LustreError: 167-0: lustre-OST0000-osc-MDT0002: This client was evicted by lustre-OST0000; in progress operations using this service will fail. [ 5686.390343] LustreError: Skipped 1 previous similar message [ 5686.391768] Lustre: lustre-OST0000-osc-MDT0002: Connection restored to 10.240.25.54@tcp (at 10.240.25.54@tcp) [ 5686.393594] Lustre: Skipped 1 previous similar message [ 5686.576332] LustreError: 15573:0:(osp_precreate.c:684:osp_precreate_send()) ASSERTION( req->rq_transno == 0 ) failed: [ 5686.578284] LustreError: 15573:0:(osp_precreate.c:684:osp_precreate_send()) LBUG [ 5686.579611] Pid: 15573, comm: osp-pre-0-2 3.10.0-1160.76.1.el7_lustre.ddn17.x86_64 #1 SMP Fri Oct 14 20:32:07 UTC 2022 [ 5686.581497] Call Trace: [ 5686.582054] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [ 5686.582998] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 5686.583926] [<0>] osp_precreate_send+0x1106/0x11b0 [osp] [ 5686.584908] [<0>] osp_precreate_thread+0x6ba/0x13b0 [osp] [ 5686.585956] [<0>] kthread+0xd1/0xe0 [ 5686.586645] [<0>] ret_from_fork_nospec_begin+0x21/0x21 [ 5686.587621] [<0>] 0xfffffffffffffffe [ 5686.588300] Kernel panic - not syncing: LBUG [ 5686.589082] CPU: 0 PID: 15573 Comm: osp-pre-0-2 Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.76.1.el7_lustre.ddn17.x86_64 #1 [ 5686.591324] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 5686.592332] Call Trace: [ 5686.592803] [<ffffffff875865c9>] dump_stack+0x19/0x1b [ 5686.593723] [<ffffffff875802d1>] panic+0xe8/0x21f [ 5686.594578] [<ffffffffc075ba1b>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 5686.595672] [<ffffffffc12e1f56>] osp_precreate_send+0x1106/0x11b0 [osp] [ 5686.596843] [<ffffffffc12e464a>] osp_precreate_thread+0x6ba/0x13b0 [osp] [ 5686.598027] [<ffffffff86ec7080>] ? wake_up_atomic_t+0x30/0x30 [ 5686.599052] [<ffffffffc12e3f90>] ? osp_init_pre_fid+0x630/0x630 [osp] [ 5686.600244] [<ffffffff86ec5f91>] kthread+0xd1/0xe0 [ 5686.601114] [<ffffffff86ec5ec0>] ? insert_kthread_work+0x40/0x40 [ 5686.602190] [<ffffffff87599df7>] ret_from_fork_nospec_begin+0x21/0x21 [ 5686.603335] [<ffffffff86ec5ec0>] ? insert_kthread_work+0x40/0x40
[ 3227.316101] Lustre: Skipped 1 previous similar message [ 3227.321723] Lustre: lustre-OST0000: Imperative Recovery not enabled, recovery window 60-180 [ 3228.406622] Lustre: DEBUG MARKER: oleg208-server.virtnet: executing set_default_debug -1 all 8 [ 3230.985273] Lustre: DEBUG MARKER: oleg208-client.virtnet: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid [ 3231.624671] Lustre: lustre-OST0000: Connection restored to 50155b09-b14b-6aa1-56c4-bebc671f968e (at 192.168.201.8@tcp) [ 3231.627520] Lustre: Skipped 3 previous similar messages [ 3242.148194] Lustre: MGS: Connection restored to 50155b09-b14b-6aa1-56c4-bebc671f968e (at 192.168.201.8@tcp) [ 3242.149935] Lustre: Skipped 2 previous similar messages [ 3243.345638] LustreError: 11-0: lustre-OST0000-osc-MDT0000: operation ost_statfs to node 0@lo failed: rc = -107 [ 3243.351508] Lustre: lustre-OST0000-osc-MDT0000: Connection to lustre-OST0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [ 3243.361084] Lustre: lustre-OST0000: Not available for connect from 0@lo (stopping) [ 3243.370317] LustreError: 167-0: lustre-OST0000-osc-MDT0000: This client was evicted by lustre-OST0000; in progress operations using this service will fail. [ 3243.404559] LustreError: 10962:0:(osp_precreate.c:683:osp_precreate_send()) ASSERTION( req->rq_transno == 0 ) failed: [ 3243.411037] LustreError: 10962:0:(osp_precreate.c:683:osp_precreate_send()) LBUG [ 3243.414422] Pid: 10962, comm: osp-pre-0-1 3.10.0-7.9-debug #1 SMP Sat Mar 26 23:28:42 EDT 2022 [ 3243.418602] Call Trace: [ 3243.419815] [<ffffffffa018d7dc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 3243.422930] [<ffffffffa018d88c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 3243.425789] [<ffffffffa0e7a8b6>] osp_precreate_send+0x11a6/0x11c0 [osp] [ 3243.428811] [<ffffffffa0e7c218>] osp_precreate_thread+0x818/0x1360 [osp] [ 3243.431581] [<ffffffff810ba114>] kthread+0xe4/0xf0 [ 3243.433857] [<ffffffff817f3e5d>] ret_from_fork_nospec_begin+0x7/0x21 [ 3243.437050] [<ffffffffffffffff>] 0xffffffffffffffff [ 3243.439596] Kernel panic - not syncing: LBUG
Happens on a variety of branches including b2_12, b2_15 and even master at times, and all the derivatives.
Attachments
Issue Links
- duplicates
-
LU-15671 conf-sanity test_30b: MDS assertion in osp_precreate_send req->rq_transno == 0
- Resolved