[LU-6066] lfsck_namespace_repair_nlink() ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed Created: 22/Dec/14  Updated: 10/Jan/15  Resolved: 10/Jan/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: HB
Environment:

Single node test system (MDTx2, OSTx3, client), RHEL 2.6.32-431.29.2.el6 kernel, Lustre master v2_6_91_0-49-ge0ece89


Severity: 3
Rank (Obsolete): 16885

 Description   

I was testing out some filesystem corruption (mounted MDT as type ldiskfs, copied MDT file and all xattrs from hosts to hosts.clone, then modified LMA FID and LOV ostid f_oid=0x1 to f_oid=0x2) so that they would share the same OST object but have different FIDs.

When remounting the MDT as type lustre and listing the files, it detected OI corruption due to the missing FID and started OI scrub:

Lustre: testfs-MDT0000: trigger OI scrub by RPC for [0x2c00059f0:0x2:0x0], rc = 0 [2]

which appeared to be successful since I could list all the files.

I deleted the hosts.clone file, and then observed (as expected) that ls returned an error because the referenced OST objects no longer existed. However, I was unable to unlink the original filename, even when using munlink which should ignore any errors. This was apparently because I had (accidentally) made the cloned file share the same FID f_oid=0x2 as a third file hosts2, and figured that the duplication of the MDT FID was causing problems since it couldn't find this FID in the OI anymore.

I tried running lctl lfsck_start -M testfs-MDT0000 -A to rebuild the OI to contain the original f_oid=0x2 inode (which still existed in the host2 LMA), but immediately hit the below assertions on two different LFSCK threads:

LustreError: 20102:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed:
LustreError: 20102:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) LBUG
Pid: 20102, comm: lfsck_namespace
Call Trace:
 [<ffffffffa0812895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0812e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0f06bf1>] lfsck_namespace_repair_nlink+0x6b1/0xa60 [lfsck]
 [<ffffffffa0f1b9bf>] lfsck_namespace_double_scan_one+0x23f/0x1410 [lfsck]
 [<ffffffffa0f1d899>] lfsck_namespace_assistant_handler_p2+0xd09/0x11b0 [lfsck]
 [<ffffffffa0eff399>] lfsck_assistant_engine+0x14e9/0x1e00 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0

LustreError: 20097:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed:
Pid: 20097, comm: lfsck_namespace
Call Trace:
 [<ffffffffa0812895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0812e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0f06bf1>] lfsck_namespace_repair_nlink+0x6b1/0xa60 [lfsck]
 [<ffffffffa0f1b9bf>] lfsck_namespace_double_scan_one+0x23f/0x1410 [lfsck]
 [<ffffffffa0f1d899>] lfsck_namespace_assistant_handler_p2+0xd09/0x11b0 [lfsck]
 [<ffffffffa0eff399>] lfsck_assistant_engine+0x14e9/0x1e00 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
LustreError: dumping log to /tmp/lustre-log.1419280935.20097

We definitely shouldn't be LASSERTing on data from the filesystem.



 Comments   
Comment by Gerrit Updater [ 24/Dec/14 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13181
Subject: LU-6066 lfsck: handle file's nlink attribute properly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 47013f40c4aa2c853de5675e2a632cf1e7613be0

Comment by Gerrit Updater [ 10/Jan/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13181/
Subject: LU-6066 lfsck: handle file's nlink attribute properly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 85be1fae82b515094b60bb20eb48f88989ccc6e9

Comment by nasf (Inactive) [ 10/Jan/15 ]

The patch has been landed to master.

Generated at Sat Feb 10 01:56:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.