[LU-9547] LBUG osp_dev.c:755:osp_statfs()) ASSERTION( sfs->os_fprecreated <= OST_MAX_PRECREATE * 2 ) failed Created: 23/May/17 Updated: 07/Jul/23 Resolved: 07/Jul/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Cliff White (Inactive) | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
Soak performance test cluster. Test patch for |
||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
OSS failover completed 2017-05-23 02:45:41 10) was lost; in progress operations using this service will wait for recovery to complete May 23 02:45:06 soak-10 kernel: Lustre: Skipped 3 previous similar messages May 23 02:45:11 soak-10 chronyd[1310]: Source 204.11.201.12 replaced with 198.58.110.84 May 23 02:45:19 soak-10 kernel: LustreError: 4833:0:(osp_dev.c:755:osp_statfs()) ASSERTION( sfs->os_fprecreated <= OST_MAX_PRECREATE * 2 ) failed: last_created [0x540000401:0x2e43841:0x0], next_fid [0x540000401:0x2e43841:0x0], reserved 19 May 23 02:45:19 soak-10 kernel: LustreError: 4833:0:(osp_dev.c:755:osp_statfs()) LBUG May 23 02:45:19 soak-10 kernel: Pid: 4833, comm: mdt00_016 May 23 02:45:19 soak-10 kernel: #012Call Trace: May 23 02:45:19 soak-10 kernel: [<ffffffffa0bbe7ee>] libcfs_call_trace+0x4e/0x60 [libcfs] May 23 02:45:19 soak-10 kernel: [<ffffffffa0bbe87c>] lbug_with_loc+0x4c/0xb0 [libcfs] May 23 02:45:19 soak-10 kernel: [<ffffffffa14e7764>] osp_statfs+0x404/0x480 [osp] May 23 02:45:19 soak-10 kernel: [<ffffffffa1443251>] lod_statfs_and_check+0xa1/0x580 [lod] May 23 02:45:19 soak-10 kernel: [<ffffffffa1445154>] lod_alloc_rr.constprop.18+0x494/0x1000 [lod] May 23 02:45:19 soak-10 kernel: [<ffffffffa1448979>] lod_qos_prep_create+0x12b9/0x17f0 [lod] May 23 02:45:19 soak-10 kernel: [<ffffffffa1449428>] lod_prepare_create+0x298/0x3f0 [lod] May 23 02:45:19 soak-10 kernel: [<ffffffffa143df8e>] lod_declare_striped_create+0x1ee/0x970 [lod] May 23 02:45:19 soak-10 kernel: [<ffffffffa14404bc>] lod_declare_create+0x1fc/0x570 [lod] May 23 02:45:19 soak-10 kernel: [<ffffffffa14ab97f>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd] May 23 02:45:19 soak-10 kernel: [<ffffffffa149d2f3>] mdd_declare_create+0x53/0xe20 [mdd] May 23 02:45:19 soak-10 kernel: [<ffffffffa0bc6574>] ? libcfs_log_return+0x24/0x30 [libcfs] May 23 02:45:19 soak-10 kernel: [<ffffffffa14a10d9>] mdd_create+0x7c9/0x1330 [mdd] May 23 02:45:19 soak-10 kernel: [<ffffffffa1377302>] mdt_reint_open+0x2192/0x31a0 [mdt] May 23 02:45:19 soak-10 kernel: [<ffffffffa136c500>] mdt_reint_rec+0x80/0x210 [mdt] May 23 02:45:19 soak-10 kernel: [<ffffffffa134e46b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] May 23 02:45:19 soak-10 kernel: [<ffffffffa134e992>] mdt_intent_reint+0x162/0x430 [mdt] May 23 02:45:19 soak-10 kernel: [<ffffffffa135970e>] mdt_intent_policy+0x43e/0xc70 [mdt] May 23 02:45:19 soak-10 kernel: [<ffffffffa0ee0167>] ldlm_lock_enqueue+0x387/0x970 [ptlrpc] May 23 02:45:19 soak-10 kernel: [<ffffffffa0f09ab3>] ldlm_handle_enqueue0+0x9c3/0x1680 [ptlrpc] May 23 02:45:19 soak-10 kernel: [<ffffffffa0bc9d87>] ? libcfs_debug_msg+0x57/0x80 [libcfs] May 23 02:45:19 soak-10 kernel: [<ffffffffa0f8ef62>] tgt_enqueue+0x62/0x210 [ptlrpc] May 23 02:45:19 soak-10 kernel: [<ffffffffa0f92225>] tgt_request_handle+0x915/0x1360 [ptlrpc] May 23 02:45:19 soak-10 kernel: [<ffffffffa0f3b3e3>] ptlrpc_server_handle_request+0x233/0xa90 [ptlrpc] May 23 02:45:19 soak-10 kernel: [<ffffffffa0f3f3f0>] ptlrpc_main+0xaa0/0x1dd0 [ptlrpc] May 23 02:45:19 soak-10 kernel: [<ffffffffa0f3e950>] ? ptlrpc_main+0x0/0x1dd0 [ptlrpc] May 23 02:45:19 soak-10 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0 May 23 02:45:19 soak-10 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0 May 23 02:45:19 soak-10 kernel: [<ffffffff81697318>] ret_from_fork+0x58/0x90 May 23 02:45:19 soak-10 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0 May 23 02:45:19 soak-10 kernel: May 23 02:45:19 soak-10 kernel: LustreError: dumping log to /tmp/lustre-log.1495507519.4833 May 23 02:45:31 soak-10 kernel: LustreError: 11-0: soaked-OST0015-osc-MDT0002: operation obd_ping to node 192.168.1.104@o2ib10 failed: rc = -107 May 23 02:45:31 soak-10 kernel: Lustre: soaked-OST0015-osc-MDT0002: Connection to soaked-OST0015 (at 192.168.1.104@o2ib10) was lost; in progress operations using this service will wait for recovery to complete May 23 02:45:31 soak-10 kernel: Lustre: Skipped 1 previous similar message May 23 02:45:43 soak-10 kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [mdt00_006:4716] System is crash dumping atm. Will attach. |
| Comments |
| Comment by Alex Zhuravlev [ 23/May/17 ] |
|
this is fixed in the updated patch yet. |
| Comment by Cliff White (Inactive) [ 23/May/17 ] |
|
Which patch? |
| Comment by Andreas Dilger [ 08/Feb/19 ] |
|
I hit this while testing It looks like the problem is that when the sequence rolls over and osp_precreate_rollover_new_seq() sets opd_pre_last_created_fid == opd_pre_used_fid the opd_pre_reserved value may be non-zero and is not changed. This causes os_fprecreated to go negative in osp_statfs() (which is unsigned and becomes a large positive value) and trip the LASSERT(). |
| Comment by Andreas Dilger [ 27/Oct/21 ] |
|
This may be resolved by patch https://review.whamcloud.com/43248 " |