[LU-4930] osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed Created: 19/Apr/14  Updated: 04/Jun/14  Resolved: 08/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: racer

Issue Links:
Duplicate
Related
is related to LU-4981 need to remount after sanity 133g Resolved
is related to LU-3531 DNE2: striped directory Resolved
Severity: 3
Rank (Obsolete): 13630

 Description   

Running racer with MDSCOUNT=4, migration disabled, and 2.5.57-90-gac5cd8f + http://review.whamcloud.com/#/c/9511/, I've seen this a few times. I don't think it's introduced by 9511.

PID: 23901  TASK: ffff8801438ea680  CPU: 2   COMMAND: "mdt_rdpg01_003"
 #0 [ffff88014d2838f8] machine_kexec at ffffffff81039950
 #1 [ffff88014d283958] crash_kexec at ffffffff810d4372
 #2 [ffff88014d283a28] panic at ffffffff81550d83
 #3 [ffff88014d283aa8] lbug_with_loc at ffffffffa0ec9f1b [libcfs]
 #4 [ffff88014d283ac8] osd_object_destroy at ffffffffa05fca79 [osd_ldiskfs]
 #5 [ffff88014d283b28] lod_object_destroy at ffffffffa0839166 [lod]
 #6 [ffff88014d283b88] mdd_close at ffffffffa06e4ec8 [mdd]
 #7 [ffff88014d283bf8] mdt_mfd_close at ffffffffa0777af9 [mdt]
 #8 [ffff88014d283cb8] mdt_close at ffffffffa077a794 [mdt]
 #9 [ffff88014d283d18] tgt_request_handle at ffffffffa12ace85 [ptlrpc]
#10 [ffff88014d283d78] ptlrpc_main at ffffffffa125dc31 [ptlrpc]
#11 [ffff88014d283eb8] kthread at ffffffff8109eab6
#12 [ffff88014d283f48] kernel_thread at ffffffff8100c30a


 Comments   
Comment by Di Wang [ 19/Apr/14 ]

John, Could you please tell me which ASSERT which LBUG is referred to? Thanks.

Comment by John Hammond [ 19/Apr/14 ]
[46132.503041] LustreError: 23901:0:(osd_handler.c:2477:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: 
[46132.505577] LustreError: 23901:0:(osd_handler.c:2477:osd_object_destroy()) LBUG
[46132.506782] Pid: 23901, comm: mdt_rdpg01_003

There wasn't anything else in the logs from this thread.

Comment by Oleg Drokin [ 21/Apr/14 ]

So did you ever see this failure in racer with no 9511 patch? IT's quite big and involved and cannot be excluded due to this I suspect.

Comment by John Hammond [ 24/Apr/14 ]

I only see it with 9511.

Comment by John Hammond [ 24/Apr/14 ]

Here you go:

t:~# export MDSCOUNT=4
t:~# export MOUNT_2=y
t:~# llmount.sh
...
t:~# lfs mkdir -c4 /mnt/lustre/d0
t:~# exec 3</mnt/lustre/d0
t:~# cd /mnt/lustre/d0
t:d0# rmdir /mnt/lustre2/d0
t:d0# mkdir d1 d2 d3 d4
t:d0# cd ..
t:lustre# exec 3<&-

Message from syslogd@t at Apr 24 14:23:21 ...
 kernel:[   86.589533] LustreError: 3971:0:(osd_handler.c:2477:osd_object_destroy()) ASSE\
RTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed:

Message from syslogd@t at Apr 24 14:23:21 ...
 kernel:[   86.594221] LustreError: 3971:0:(osd_handler.c:2477:osd_object_destroy()) LBUG
Comment by Di Wang [ 25/Apr/14 ]

http://review.whamcloud.com/#/c/10110/

Comment by John Hammond [ 30/Apr/14 ]

I hit the LBUG from LU-4930 everytime I run sanity on a single node with MDSCOUNT=4 (at the end, usually in 300c). Unless I add 133g to SANITY_EXCEPT then it goes away.

In conclusion, there is a DNE2 bug in some error handling path that is exposed by some weird proc bug. My hunch is that it's some funny business around the identity_upcall.

Comment by Di Wang [ 30/Apr/14 ]

http://review.whamcloud.com/10170

Comment by Di Wang [ 08/May/14 ]

Both 9511 and 10170 have been landed to master, so this problem should be fixed.

Generated at Sat Feb 10 01:47:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.