Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.12.3
-
None
-
3
-
9223372036854775807
Description
If a system has hit LU-12593 with a corrupt block in the llog file, it may trigger an LASSERT() because of a bad FID found in the unintialized part of the block. Applying the patch from that ticket is too late to fix the problem.
LustreError: 17438:0:(osd_handler.c:1077:osd_fid_lookup()) ASSERTION( fid_is_sane(fid) || fid_is_idif(fid) ) failed: [0x0:0x68:0x0] LustreError: 17438:0:(osd_handler.c:1077:osd_fid_lookup()) LBUG Pid: 17438, comm: llog_process_th 3.10.0-1062.1.1.el7_lustre.x86_64 Call Trace: libcfs_call_trace+0x8c/0xc0 [libcfs] lbug_with_loc+0x4c/0xa0 [libcfs] osd_fid_lookup+0xc8/0x1c60 [osd_ldiskfs] osd_object_init+0x61/0x110 [osd_ldiskfs] lu_object_start.isra.35+0x8b/0x120 [obdclass] lu_object_find_at+0x1e1/0xa60 [obdclass] dt_locate_at+0x1d/0xb0 [obdclass] llog_osd_open+0x50e/0xf30 [obdclass] llog_open+0x15a/0x3e0 [obdclass] osp_sync_init+0x44a/0xe20 [osp] osp_init0.isra.19+0x1aed/0x1f60 [osp] osp_device_alloc+0x86/0x130 [osp] obd_setup+0x119/0x280 [obdclass] class_setup+0x2a8/0x840 [obdclass] class_process_config+0x1726/0x2830 [obdclass] class_config_llog_handler+0x819/0x1520 [obdclass] llog_process_thread+0x82f/0x18e0 [obdclass] llog_process_thread_daemonize+0x9f/0xe0 [obdclass]
Lustre should avoid ever triggering an LASSERT() on data read from disk or from the network. In this case, it probably makes sense to add a check in llog_osd_open() with fid_is_sane() before it uses the FID, and just return an error rather than crashing.