[LU-4930] osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed Created: 19/Apr/14 Updated: 04/Jun/14 Resolved: 08/May/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | racer | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 13630 | ||||||||||||||||
| Description |
|
Running racer with MDSCOUNT=4, migration disabled, and 2.5.57-90-gac5cd8f + http://review.whamcloud.com/#/c/9511/, I've seen this a few times. I don't think it's introduced by 9511. PID: 23901 TASK: ffff8801438ea680 CPU: 2 COMMAND: "mdt_rdpg01_003" #0 [ffff88014d2838f8] machine_kexec at ffffffff81039950 #1 [ffff88014d283958] crash_kexec at ffffffff810d4372 #2 [ffff88014d283a28] panic at ffffffff81550d83 #3 [ffff88014d283aa8] lbug_with_loc at ffffffffa0ec9f1b [libcfs] #4 [ffff88014d283ac8] osd_object_destroy at ffffffffa05fca79 [osd_ldiskfs] #5 [ffff88014d283b28] lod_object_destroy at ffffffffa0839166 [lod] #6 [ffff88014d283b88] mdd_close at ffffffffa06e4ec8 [mdd] #7 [ffff88014d283bf8] mdt_mfd_close at ffffffffa0777af9 [mdt] #8 [ffff88014d283cb8] mdt_close at ffffffffa077a794 [mdt] #9 [ffff88014d283d18] tgt_request_handle at ffffffffa12ace85 [ptlrpc] #10 [ffff88014d283d78] ptlrpc_main at ffffffffa125dc31 [ptlrpc] #11 [ffff88014d283eb8] kthread at ffffffff8109eab6 #12 [ffff88014d283f48] kernel_thread at ffffffff8100c30a |
| Comments |
| Comment by Di Wang [ 19/Apr/14 ] |
|
John, Could you please tell me which ASSERT which LBUG is referred to? Thanks. |
| Comment by John Hammond [ 19/Apr/14 ] |
[46132.503041] LustreError: 23901:0:(osd_handler.c:2477:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: [46132.505577] LustreError: 23901:0:(osd_handler.c:2477:osd_object_destroy()) LBUG [46132.506782] Pid: 23901, comm: mdt_rdpg01_003 There wasn't anything else in the logs from this thread. |
| Comment by Oleg Drokin [ 21/Apr/14 ] |
|
So did you ever see this failure in racer with no 9511 patch? IT's quite big and involved and cannot be excluded due to this I suspect. |
| Comment by John Hammond [ 24/Apr/14 ] |
|
I only see it with 9511. |
| Comment by John Hammond [ 24/Apr/14 ] |
|
Here you go: t:~# export MDSCOUNT=4 t:~# export MOUNT_2=y t:~# llmount.sh ... t:~# lfs mkdir -c4 /mnt/lustre/d0 t:~# exec 3</mnt/lustre/d0 t:~# cd /mnt/lustre/d0 t:d0# rmdir /mnt/lustre2/d0 t:d0# mkdir d1 d2 d3 d4 t:d0# cd .. t:lustre# exec 3<&- Message from syslogd@t at Apr 24 14:23:21 ... kernel:[ 86.589533] LustreError: 3971:0:(osd_handler.c:2477:osd_object_destroy()) ASSE\ RTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: Message from syslogd@t at Apr 24 14:23:21 ... kernel:[ 86.594221] LustreError: 3971:0:(osd_handler.c:2477:osd_object_destroy()) LBUG |
| Comment by Di Wang [ 25/Apr/14 ] |
| Comment by John Hammond [ 30/Apr/14 ] |
|
I hit the LBUG from In conclusion, there is a DNE2 bug in some error handling path that is exposed by some weird proc bug. My hunch is that it's some funny business around the identity_upcall. |
| Comment by Di Wang [ 30/Apr/14 ] |
| Comment by Di Wang [ 08/May/14 ] |
|
Both 9511 and 10170 have been landed to master, so this problem should be fixed. |