[LU-9547] LBUG osp_dev.c:755:osp_statfs()) ASSERTION( sfs->os_fprecreated <= OST_MAX_PRECREATE * 2 ) failed Created: 23/May/17  Updated: 07/Jul/23  Resolved: 07/Jul/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Cliff White (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: soak
Environment:

Soak performance test cluster. Test patch for LU-9504. Lustre-reviews build 47608 - version=2.9.57_70_gfcd7bdd


Issue Links:
Related
is related to LU-14598 Too many FIDs to precreate OST replac... Resolved
is related to LU-11758 remove assertion from osp_statfs() Resolved
is related to LU-11912 reduce number of OST objects created ... Resolved
is related to LU-9504 LBUG ptlrpc_handle_rs()) ASSERTION( l... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

OSS failover completed 2017-05-23 02:45:41
While OST were remounting/recovering, MDT3 hit an LBUG:

10) was lost; in progress operations using this service will wait for recovery to complete
May 23 02:45:06 soak-10 kernel: Lustre: Skipped 3 previous similar messages
May 23 02:45:11 soak-10 chronyd[1310]: Source 204.11.201.12 replaced with 198.58.110.84
May 23 02:45:19 soak-10 kernel: LustreError: 4833:0:(osp_dev.c:755:osp_statfs()) ASSERTION( sfs->os_fprecreated <= OST_MAX_PRECREATE * 2 ) failed: last_created [0x540000401:0x2e43841:0x0], next_fid [0x540000401:0x2e43841:0x0], reserved 19
May 23 02:45:19 soak-10 kernel: LustreError: 4833:0:(osp_dev.c:755:osp_statfs()) LBUG
May 23 02:45:19 soak-10 kernel: Pid: 4833, comm: mdt00_016
May 23 02:45:19 soak-10 kernel: #012Call Trace:
May 23 02:45:19 soak-10 kernel: [<ffffffffa0bbe7ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
May 23 02:45:19 soak-10 kernel: [<ffffffffa0bbe87c>] lbug_with_loc+0x4c/0xb0 [libcfs]
May 23 02:45:19 soak-10 kernel: [<ffffffffa14e7764>] osp_statfs+0x404/0x480 [osp]
May 23 02:45:19 soak-10 kernel: [<ffffffffa1443251>] lod_statfs_and_check+0xa1/0x580 [lod]
May 23 02:45:19 soak-10 kernel: [<ffffffffa1445154>] lod_alloc_rr.constprop.18+0x494/0x1000 [lod]
May 23 02:45:19 soak-10 kernel: [<ffffffffa1448979>] lod_qos_prep_create+0x12b9/0x17f0 [lod]
May 23 02:45:19 soak-10 kernel: [<ffffffffa1449428>] lod_prepare_create+0x298/0x3f0 [lod]
May 23 02:45:19 soak-10 kernel: [<ffffffffa143df8e>] lod_declare_striped_create+0x1ee/0x970 [lod]
May 23 02:45:19 soak-10 kernel: [<ffffffffa14404bc>] lod_declare_create+0x1fc/0x570 [lod]
May 23 02:45:19 soak-10 kernel: [<ffffffffa14ab97f>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd]
May 23 02:45:19 soak-10 kernel: [<ffffffffa149d2f3>] mdd_declare_create+0x53/0xe20 [mdd]
May 23 02:45:19 soak-10 kernel: [<ffffffffa0bc6574>] ? libcfs_log_return+0x24/0x30 [libcfs]
May 23 02:45:19 soak-10 kernel: [<ffffffffa14a10d9>] mdd_create+0x7c9/0x1330 [mdd]
May 23 02:45:19 soak-10 kernel: [<ffffffffa1377302>] mdt_reint_open+0x2192/0x31a0 [mdt]
May 23 02:45:19 soak-10 kernel: [<ffffffffa136c500>] mdt_reint_rec+0x80/0x210 [mdt]
May 23 02:45:19 soak-10 kernel: [<ffffffffa134e46b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
May 23 02:45:19 soak-10 kernel: [<ffffffffa134e992>] mdt_intent_reint+0x162/0x430 [mdt]
May 23 02:45:19 soak-10 kernel: [<ffffffffa135970e>] mdt_intent_policy+0x43e/0xc70 [mdt]
May 23 02:45:19 soak-10 kernel: [<ffffffffa0ee0167>] ldlm_lock_enqueue+0x387/0x970 [ptlrpc]
May 23 02:45:19 soak-10 kernel: [<ffffffffa0f09ab3>] ldlm_handle_enqueue0+0x9c3/0x1680 [ptlrpc]
May 23 02:45:19 soak-10 kernel: [<ffffffffa0bc9d87>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
May 23 02:45:19 soak-10 kernel: [<ffffffffa0f8ef62>] tgt_enqueue+0x62/0x210 [ptlrpc]
May 23 02:45:19 soak-10 kernel: [<ffffffffa0f92225>] tgt_request_handle+0x915/0x1360 [ptlrpc]
May 23 02:45:19 soak-10 kernel: [<ffffffffa0f3b3e3>] ptlrpc_server_handle_request+0x233/0xa90 [ptlrpc]
May 23 02:45:19 soak-10 kernel: [<ffffffffa0f3f3f0>] ptlrpc_main+0xaa0/0x1dd0 [ptlrpc]
May 23 02:45:19 soak-10 kernel: [<ffffffffa0f3e950>] ? ptlrpc_main+0x0/0x1dd0 [ptlrpc]
May 23 02:45:19 soak-10 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0
May 23 02:45:19 soak-10 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0
May 23 02:45:19 soak-10 kernel: [<ffffffff81697318>] ret_from_fork+0x58/0x90
May 23 02:45:19 soak-10 kernel: [<ffffffff810b0980>] ? kthread+0x0/0xe0
May 23 02:45:19 soak-10 kernel:
May 23 02:45:19 soak-10 kernel: LustreError: dumping log to /tmp/lustre-log.1495507519.4833
May 23 02:45:31 soak-10 kernel: LustreError: 11-0: soaked-OST0015-osc-MDT0002: operation obd_ping to node 192.168.1.104@o2ib10 failed: rc = -107
May 23 02:45:31 soak-10 kernel: Lustre: soaked-OST0015-osc-MDT0002: Connection to soaked-OST0015 (at 192.168.1.104@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
May 23 02:45:31 soak-10 kernel: Lustre: Skipped 1 previous similar message
May 23 02:45:43 soak-10 kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [mdt00_006:4716]

System is crash dumping atm. Will attach.



 Comments   
Comment by Alex Zhuravlev [ 23/May/17 ]

this is fixed in the updated patch yet.

Comment by Cliff White (Inactive) [ 23/May/17 ]

Which patch?

Comment by Andreas Dilger [ 08/Feb/19 ]

I hit this while testing LU-11912 to reduce LUSTRE_DATA_SEQ_MAX_WIDTH = 0x1000 for testing the SEQ overflow handling.

It looks like the problem is that when the sequence rolls over and osp_precreate_rollover_new_seq() sets opd_pre_last_created_fid == opd_pre_used_fid the opd_pre_reserved value may be non-zero and is not changed. This causes os_fprecreated to go negative in osp_statfs() (which is unsigned and becomes a large positive value) and trip the LASSERT().

Comment by Andreas Dilger [ 27/Oct/21 ]

This may be resolved by patch https://review.whamcloud.com/43248 "LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write"

Generated at Sat Feb 10 02:27:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.