[LU-10650] cslco1705 crash: dt_statfs()) ASSERTION( dev ) failed: LBUG, Pid: 3372, comm: lctl Created: 09/Feb/18  Updated: 12/Apr/18  Resolved: 27/Feb/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4

Type: Bug Priority: Critical
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-10820 Interop 2.10.3 <->2.11 sanity test_27... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[1534439.553868] LDISKFS-fs (md0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,errors=panic,journal_checksum,no_mbcache,nodelalloc
[1534439.883377] LustreError: 3372:0:(dt_object.h:2509:dt_statfs()) ASSERTION( dev ) failed: 
[1534439.892393] LustreError: 3372:0:(dt_object.h:2509:dt_statfs()) LBUG
[1534439.899543] Pid: 3372, comm: lctl
[1534439.903791] 
Call Trace:
[1534439.909528]  [<ffffffffa095c7ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
[1534439.917285]  [<ffffffffa095c87c>] lbug_with_loc+0x4c/0xb0 [libcfs]
[1534439.924847]  [<ffffffffa0d3221a>] tgt_statfs_internal+0x2ea/0x350 [ptlrpc]
[1534439.932712]  [<ffffffffa133f306>] ofd_statfs+0x66/0x470 [ofd]
[1534439.939464]  [<ffffffffa0a64dd6>] lprocfs_filesfree_seq_show+0xf6/0x520 [obdclass]
[1534439.948022]  [<ffffffff811b80ea>] ? alloc_pages_vma+0x9a/0x150
[1534439.954855]  [<ffffffff811781ae>] ? lru_cache_add+0xe/0x10
[1534439.959871] Lustre: testfs-OST0000: Imperative Recovery enabled, recovery window shrunk from 900-2700 down to 450-2700
[1534439.973161]  [<ffffffff811a2121>] ? page_add_new_anon_rmap+0x91/0x130
[1534439.980700]  [<ffffffff811c2585>] ? __kmalloc+0x55/0x230
[1534439.987099]  [<ffffffff81202687>] ? seq_buf_alloc+0x17/0x40
[1534439.993785]  [<ffffffffa1353212>] ofd_filesfree_seq_show+0x12/0x20 [ofd]
[1534440.001575]  [<ffffffff81202b7a>] seq_read+0xfa/0x3a0
[1534440.007748]  [<ffffffff8124911d>] proc_reg_read+0x3d/0x80
[1534440.014285]  [<ffffffff811decfc>] vfs_read+0x9c/0x170
[1534440.020499]  [<ffffffff811df84f>] SyS_read+0x7f/0xe0
[1534440.026754]  [<ffffffff81645009>] system_call_fastpath+0x16/0x1b
[1534440.033891] 
[1534440.036630] Kernel panic - not syncing: LBUG

There is a race between lctl get_param obdfilter.*.filesfree and mount operation.
ofd_statfs->tgt_statfs_internal->dt_statfs(env, lut->lut_bottom, osfs)
lut_bottom is NULL during execution and assert, but at vmcore it is initialized already.

ofd_init0() call ofd_procfs_init() and then call tgt_init() which initialize lut_bottom. So a race window exist between procfs touch and full ofd initialization.



 Comments   
Comment by Gerrit Updater [ 09/Feb/18 ]

Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/31243
Subject: LU-10650 obd: add check to obd_statfs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e6cc1fb0c0d671b951a5f7410aa9006db4bbf298

Comment by Gerrit Updater [ 27/Feb/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31243/
Subject: LU-10650 obd: add check to obd_statfs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4f40429775c49468d9ec1fec34d5e7500ac01116

Comment by Peter Jones [ 27/Feb/18 ]

Landed for 2.11

Comment by Gerrit Updater [ 23/Mar/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31752
Subject: LU-10650 obd: add check to obd_statfs
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 6a86f7f6f11940c5624cbaaa3e152a049089a6c4

Comment by Gerrit Updater [ 12/Apr/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31752/
Subject: LU-10650 obd: add check to obd_statfs
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 5e47249b04ebec318b3589f43830c665fe0d8448

Generated at Sat Feb 10 02:36:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.