Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.7.0
-
Single node test system (MDTx2, OSTx3, client), RHEL 2.6.32-431.29.2.el6 kernel, Lustre master v2_6_91_0-49-ge0ece89
-
3
-
16885
Description
I was testing out some filesystem corruption (mounted MDT as type ldiskfs, copied MDT file and all xattrs from hosts to hosts.clone, then modified LMA FID and LOV ostid f_oid=0x1 to f_oid=0x2) so that they would share the same OST object but have different FIDs.
When remounting the MDT as type lustre and listing the files, it detected OI corruption due to the missing FID and started OI scrub:
Lustre: testfs-MDT0000: trigger OI scrub by RPC for [0x2c00059f0:0x2:0x0], rc = 0 [2]
which appeared to be successful since I could list all the files.
I deleted the hosts.clone file, and then observed (as expected) that ls returned an error because the referenced OST objects no longer existed. However, I was unable to unlink the original filename, even when using munlink which should ignore any errors. This was apparently because I had (accidentally) made the cloned file share the same FID f_oid=0x2 as a third file hosts2, and figured that the duplication of the MDT FID was causing problems since it couldn't find this FID in the OI anymore.
I tried running lctl lfsck_start -M testfs-MDT0000 -A to rebuild the OI to contain the original f_oid=0x2 inode (which still existed in the host2 LMA), but immediately hit the below assertions on two different LFSCK threads:
LustreError: 20102:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed: LustreError: 20102:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) LBUG Pid: 20102, comm: lfsck_namespace Call Trace: [<ffffffffa0812895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0812e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0f06bf1>] lfsck_namespace_repair_nlink+0x6b1/0xa60 [lfsck] [<ffffffffa0f1b9bf>] lfsck_namespace_double_scan_one+0x23f/0x1410 [lfsck] [<ffffffffa0f1d899>] lfsck_namespace_assistant_handler_p2+0xd09/0x11b0 [lfsck] [<ffffffffa0eff399>] lfsck_assistant_engine+0x14e9/0x1e00 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 LustreError: 20097:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed: Pid: 20097, comm: lfsck_namespace Call Trace: [<ffffffffa0812895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0812e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0f06bf1>] lfsck_namespace_repair_nlink+0x6b1/0xa60 [lfsck] [<ffffffffa0f1b9bf>] lfsck_namespace_double_scan_one+0x23f/0x1410 [lfsck] [<ffffffffa0f1d899>] lfsck_namespace_assistant_handler_p2+0xd09/0x11b0 [lfsck] [<ffffffffa0eff399>] lfsck_assistant_engine+0x14e9/0x1e00 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 LustreError: dumping log to /tmp/lustre-log.1419280935.20097
We definitely shouldn't be LASSERTing on data from the filesystem.