[LU-3022] osd_handler.c:2534:osd_object_ref_del()) ASSERTION( inode->i_nlink > 0 ) failed Created: 25/Mar/13  Updated: 05/Feb/15  Resolved: 05/Feb/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Oleg Drokin Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: yuc2

Issue Links:
Related
is related to LU-2311 clean up MDD orphan handling code style Resolved
Severity: 3
Rank (Obsolete): 7350

 Description   

There are multiple hits of this assertion in Jira, but all closed or have landed patches, so this might be something new.

This is in current master with http://review.whamcloud.com/#change,5802 applied.

[101057.524432] LustreError: 22066:0:(fld_handler.c:170:fld_server_lookup()) srv-lustre-MDT0000: Can not found the seq 0x1100000002000004
[101057.525015] LustreError: 22066:0:(osd_handler.c:2016:osd_fld_lookup()) lustre-MDT0000-osd can not find [0x1100000002000004:0x111:0xd3000000]: rc = -5
[101057.525579] LustreError: 22066:0:(osd_handler.c:3196:osd_remote_fid()) lustre-MDT0000-osd: Can not lookup fld for [0x1100000002000004:0x111:0xd3000000]
[101057.599333] LustreError: 9254:0:(lov_object.c:185:lov_init_sub()) header@ffff88003f204ec0[0x0, 2, [0x100000000:0x165b:0x0] hash]{ 
[101057.599926] LustreError: 9254:0:(lov_object.c:185:lov_init_sub()) ....lovsub@ffff88003f204f98[0]
[101057.600385] LustreError: 9254:0:(lov_object.c:185:lov_init_sub()) ....osc@ffff8800a646deb0id: 5723 gr: 0 idx: 0 gen: 0 kms_valid: 0 kms 0 rc: 0 force_sync: 0 min_xid: 0 size: 0 mtime: 0 atime: 0 ctime: 0 blocks: 0
[101057.604547] LustreError: 9254:0:(lov_object.c:185:lov_init_sub()) } header@ffff88003f204ec0
[101057.604979] LustreError: 9254:0:(lov_object.c:185:lov_init_sub()) stripe 0 is already owned.
[101057.605415] LustreError: 9254:0:(lov_object.c:186:lov_init_sub()) header@ffff88001940dea8[0x0, 1, [0x200000401:0x10c76:0x0] hash]{ 
[101057.605917] LustreError: 9254:0:(lov_object.c:186:lov_init_sub()) ....vvp@ffff88001940df80(- 0 0) inode: ffff88004a3b9b08 144115205272570998/33554436 100644 1 1 ffff88001940df80 [0x200000401:0x10c76:0x0]
[101057.606651] LustreError: 9254:0:(lov_object.c:186:lov_init_sub()) ....lov@ffff8800a646eed0stripes: 1, valid, lsm{ffff88003511aa60 0x0BD10BD0 1 1 0}: 
[101057.607166] LustreError: 9254:0:(lov_object.c:186:lov_init_sub()) header@ffff88003f204ec0[0x0, 2, [0x100000000:0x165b:0x0] hash]{ 
[101057.607650] LustreError: 9254:0:(lov_object.c:186:lov_init_sub()) ....lovsub@ffff88003f204f98[0]
[101057.610941] LustreError: 9254:0:(lov_object.c:186:lov_init_sub()) ....osc@ffff8800a646deb0id: 5723 gr: 0 idx: 0 gen: 0 kms_valid: 0 kms 0 rc: 0 force_sync: 0 min_xid: 0 size: 0 mtime: 0 atime: 0 ctime: 0 blocks: 0
[101057.611709] LustreError: 9254:0:(lov_object.c:186:lov_init_sub()) } header@ffff88003f204ec0
[101057.612160] LustreError: 9254:0:(lov_object.c:186:lov_init_sub()) 
[101057.612440] LustreError: 9254:0:(lov_object.c:186:lov_init_sub()) } header@ffff88001940dea8
[101057.612916] LustreError: 9254:0:(lov_object.c:186:lov_init_sub()) owned.
[101057.613171] LustreError: 9254:0:(lov_object.c:187:lov_init_sub()) header@ffff880062842ea8[0x0, 1, [0x200000400:0x111d3:0x0]]
[101057.620565] LustreError: 9254:0:(lov_object.c:187:lov_init_sub()) try to own.
[101057.624245] LustreError: 9254:0:(lcommon_cl.c:1201:cl_file_inode_init()) Failure to initialize cl object [0x200000400:0x111d3:0x0]: -5
[101057.624794] LustreError: 9254:0:(llite_lib.c:2176:ll_prep_inode()) new_inode -fatal: rc -5
[101058.178146] LustreError: 23316:0:(ldlm_resource.c:1161:ldlm_resource_get()) lvbo_init failed for resource 5722: rc -2
[101058.228406] LustreError: 9294:0:(lov_object.c:185:lov_init_sub()) header@ffff88003f204ec0[0x0, 4, [0x100000000:0x165b:0x0] hash]{ 
[101058.228962] LustreError: 9294:0:(lov_object.c:185:lov_init_sub()) ....lovsub@ffff88003f204f98[0]
[101058.229465] LustreError: 9294:0:(lov_object.c:185:lov_init_sub()) ....osc@ffff8800a646deb0id: 5723 gr: 0 idx: 0 gen: 0 kms_valid: 1 kms 5 rc: 0 force_sync: 0 min_xid: 0 size: 5 mtime: 1364179625 atime: 0 ctime: 1364179625 blocks: 0
[101058.230330] LustreError: 9294:0:(lov_object.c:185:lov_init_sub()) } header@ffff88003f204ec0
[101058.230794] LustreError: 9294:0:(lov_object.c:185:lov_init_sub()) stripe 0 is already owned.
[101058.231272] LustreError: 9294:0:(lov_object.c:186:lov_init_sub()) header@ffff88001940dea8[0x0, 3, [0x200000401:0x10c76:0x0] hash]{ 
[101058.231808] LustreError: 9294:0:(lov_object.c:186:lov_init_sub()) ....vvp@ffff88001940df80(+ 0 0) inode: ffff88004a3b9b08 144115205272570998/33554436 100644 1 0 ffff88001940df80 [0x200000401:0x10c76:0x0]
[101058.232634] LustreError: 9294:0:(lov_object.c:186:lov_init_sub()) ....lov@ffff8800a646eed0stripes: 1, valid, lsm{ffff88003511aa60 0x0BD10BD0 1 1 0}: 
[101058.233199] LustreError: 9294:0:(lov_object.c:186:lov_init_sub()) header@ffff88003f204ec0[0x0, 4, [0x100000000:0x165b:0x0] hash]{ 
[101058.233729] LustreError: 9294:0:(lov_object.c:186:lov_init_sub()) ....lovsub@ffff88003f204f98[0]
[101058.234214] LustreError: 9294:0:(lov_object.c:186:lov_init_sub()) ....osc@ffff8800a646deb0id: 5723 gr: 0 idx: 0 gen: 0 kms_valid: 1 kms 5 rc: 0 force_sync: 0 min_xid: 0 size: 5 mtime: 1364179625 atime: 0 ctime: 1364179625 blocks: 0
[101058.235057] LustreError: 9294:0:(lov_object.c:186:lov_init_sub()) } header@ffff88003f204ec0
[101058.235524] LustreError: 9294:0:(lov_object.c:186:lov_init_sub()) 
[101058.235795] LustreError: 9294:0:(lov_object.c:186:lov_init_sub()) } header@ffff88001940dea8
[101058.236277] LustreError: 9294:0:(lov_object.c:186:lov_init_sub()) owned.
[101058.236561] LustreError: 9294:0:(lov_object.c:187:lov_init_sub()) header@ffff88006d993ea8[0x0, 1, [0x200000400:0x111d3:0x0]]
[101058.237083] LustreError: 9294:0:(lov_object.c:187:lov_init_sub()) try to own.
[101058.248962] LustreError: 9294:0:(lcommon_cl.c:1201:cl_file_inode_init()) Failure to initialize cl object [0x200000400:0x111d3:0x0]: -5
[101058.249501] LustreError: 9294:0:(llite_lib.c:2176:ll_prep_inode()) new_inode -fatal: rc -5
[101058.300655] LustreError: 9259:0:(lov_object.c:185:lov_init_sub()) header@ffff88003f204ec0[0x0, 4, [0x100000000:0x165b:0x0] hash]{ 
[101058.301600] LustreError: 9259:0:(lov_object.c:185:lov_init_sub()) ....lovsub@ffff88003f204f98[0]
[101058.302233] LustreError: 9259:0:(lov_object.c:185:lov_init_sub()) ....osc@ffff8800a646deb0id: 5723 gr: 0 idx: 0 gen: 0 kms_valid: 1 kms 5 rc: 0 force_sync: 0 min_xid: 0 size: 5 mtime: 1364179625 atime: 0 ctime: 1364179625 blocks: 0
[101058.303091] LustreError: 9259:0:(lov_object.c:185:lov_init_sub()) } header@ffff88003f204ec0
[101058.303543] LustreError: 9259:0:(lov_object.c:185:lov_init_sub()) stripe 0 is already owned.
[101058.303975] LustreError: 9259:0:(lov_object.c:186:lov_init_sub()) header@ffff88001940dea8[0x0, 3, [0x200000401:0x10c76:0x0] hash]{ 
[101058.304470] LustreError: 9259:0:(lov_object.c:186:lov_init_sub()) ....vvp@ffff88001940df80(+ 0 0) inode: ffff88004a3b9b08 144115205272570998/33554436 100644 1 0 ffff88001940df80 [0x200000401:0x10c76:0x0]
[101058.305222] LustreError: 9259:0:(lov_object.c:186:lov_init_sub()) ....lov@ffff8800a646eed0stripes: 1, valid, lsm{ffff88003511aa60 0x0BD10BD0 1 1 0}: 
[101058.305737] LustreError: 9259:0:(lov_object.c:186:lov_init_sub()) header@ffff88003f204ec0[0x0, 4, [0x100000000:0x165b:0x0] hash]{ 
[101058.312954] LustreError: 9259:0:(lov_object.c:186:lov_init_sub()) ....lovsub@ffff88003f204f98[0]
[101058.313461] LustreError: 9259:0:(lov_object.c:186:lov_init_sub()) ....osc@ffff8800a646deb0id: 5723 gr: 0 idx: 0 gen: 0 kms_valid: 1 kms 5 rc: 0 force_sync: 0 min_xid: 0 size: 5 mtime: 1364179625 atime: 0 ctime: 1364179625 blocks: 0
[101058.314344] LustreError: 9259:0:(lov_object.c:186:lov_init_sub()) } header@ffff88003f204ec0
[101058.314842] LustreError: 9259:0:(lov_object.c:186:lov_init_sub()) 
[101058.315132] LustreError: 9259:0:(lov_object.c:186:lov_init_sub()) } header@ffff88001940dea8
[101058.315741] LustreError: 9259:0:(lov_object.c:186:lov_init_sub()) owned.
[101058.316179] LustreError: 9259:0:(lov_object.c:187:lov_init_sub()) header@ffff880045dd6ea8[0x0, 1, [0x200000400:0x111d3:0x0]]
[101058.317021] LustreError: 9259:0:(lov_object.c:187:lov_init_sub()) try to own.
[101059.562642] LustreError: 4927:0:(ost_handler.c:1640:ost_blocking_ast()) Error -2 syncing data on lock cancel
[101062.023713] LustreError: 23478:0:(osd_handler.c:2534:osd_object_ref_del()) ASSERTION( inode->i_nlink > 0 ) failed: 
[101062.024156] LustreError: 23478:0:(osd_handler.c:2534:osd_object_ref_del()) LBUG
[101062.024499] Pid: 23478, comm: mdt01_009
[101062.024696] 
[101062.024697] Call Trace:
[101062.024994]  [<ffffffffa0b69915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[101062.025298]  [<ffffffffa0b69f17>] lbug_with_loc+0x47/0xb0 [libcfs]
[101062.025577]  [<ffffffffa0581089>] osd_object_ref_del+0x239/0x310 [osd_ldiskfs]
[101062.025936]  [<ffffffffa072646b>] lod_ref_del+0x3b/0xd0 [lod]
[101062.026116]  [<ffffffffa0464ac5>] mdo_ref_del+0x35/0xc0 [mdd]
[101062.026368]  [<ffffffffa046c246>] mdd_unlink+0x6d6/0xe30 [mdd]
[101062.026660]  [<ffffffffa066b3a8>] mdo_unlink+0x18/0x50 [mdt]
[101062.026945]  [<ffffffffa066e669>] mdt_reint_unlink+0x739/0xfd0 [mdt]
[101062.027248]  [<ffffffffa066b061>] mdt_reint_rec+0x41/0xe0 [mdt]
[101062.027535]  [<ffffffffa06646e3>] mdt_reint_internal+0x4e3/0x7d0 [mdt]
[101062.027844]  [<ffffffffa0664a14>] mdt_reint+0x44/0xe0 [mdt]
[101062.028177]  [<ffffffffa0652fa8>] mdt_handle_common+0x648/0x1660 [mdt]
[101062.028510]  [<ffffffffa068e855>] mds_regular_handle+0x15/0x20 [mdt]
[101062.028862]  [<ffffffffa1305fd3>] ptlrpc_server_handle_request+0x443/0xe40 [ptlrpc]
[101062.029355]  [<ffffffffa0b6a65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[101062.029547]  [<ffffffffa12fef91>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
[101062.029730]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
[101062.029918]  [<ffffffffa1308a9d>] ptlrpc_main+0xafd/0x17f0 [ptlrpc]
[101062.030118]  [<ffffffffa1307fa0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
[101062.030301]  [<ffffffff8100c14a>] child_rip+0xa/0x20
[101062.030507]  [<ffffffffa1307fa0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
[101062.030725]  [<ffffffffa1307fa0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
[101062.030918]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
[101062.031505] 
[101062.046287] Kernel panic - not syncing: LBUG

Crashdump in /exports/crashdumps/192.168.10.220-2013-03-24-22\:47\:11
branch in my tree is master-20130324



 Comments   
Comment by Andreas Dilger [ 05/Apr/13 ]

I think my patch in http://review.whamcloud.com/4675 may fix the i_nlink assertion problem. Hongchao, could you please update this patch under LU-2311 to fix whatever minor issue remains with the patch.

I don't think this relates to the FLD sequence lookup errors that are included in the log, so I'd prefer to repurpose this bug to understand/fix the FLD/SEQ issues reported by Oleg.

Comment by Hongchao Zhang [ 08/Apr/13 ]

the patch has been updated and tracked at http://review.whamcloud.com/#change,4675

Comment by Jian Yu [ 09/Aug/13 ]

Lustre Branch: b2_4
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/27/
Distro/Arch: RHEL6.4/x86_64 + FC18/x86_64 (Server + Client)

The racer test hit the following failure on MDS:

18:40:59:Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-32vm1,client-32vm2.lab.whamcloud.com DURATION=900 ========== 18:40:49 (1375494049)
18:40:59:Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
18:41:40:LustreError: 11543:0:(osd_handler.c:2581:osd_object_ref_del()) ASSERTION( inode->i_nlink > 0 ) failed: 
18:41:40:LustreError: 11543:0:(osd_handler.c:2581:osd_object_ref_del()) LBUG
18:41:40:Pid: 11543, comm: mdt00_012
18:41:40:
18:41:40:Call Trace:
18:41:40: [<ffffffffa06e5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
18:41:40: [<ffffffffa06e5e97>] lbug_with_loc+0x47/0xb0 [libcfs]
18:41:40: [<ffffffffa0e7a7e7>] osd_object_ref_del+0x1e7/0x220 [osd_ldiskfs]
18:41:40: [<ffffffffa1038f9b>] lod_ref_del+0x3b/0xd0 [lod]
18:41:40: [<ffffffffa0da9095>] mdo_ref_del+0x35/0xc0 [mdd]
18:41:40: [<ffffffffa0db3858>] mdd_rename+0x2028/0x24f0 [mdd]
18:41:40: [<ffffffffa0dad2fe>] ? __mdd_lookup_locked+0x1ee/0x490 [mdd]
18:41:40: [<ffffffffa0f8bb62>] mdt_reint_rename+0x13d2/0x1b00 [mdt]
18:41:40: [<ffffffffa082ce60>] ? lu_ucred+0x20/0x30 [obdclass]
18:41:40: [<ffffffffa0f868f1>] mdt_reint_rec+0x41/0xe0 [mdt]
18:41:40: [<ffffffffa0f6bb03>] mdt_reint_internal+0x4c3/0x780 [mdt]
18:41:40: [<ffffffffa0f6be04>] mdt_reint+0x44/0xe0 [mdt]
18:41:40: [<ffffffffa0f70ab7>] mdt_handle_common+0x647/0x16d0 [mdt]
18:41:40: [<ffffffffa0faa1a5>] mds_regular_handle+0x15/0x20 [mdt]
18:41:40: [<ffffffffa09a8388>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
18:41:40: [<ffffffffa06e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
18:41:40: [<ffffffffa06f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
18:41:40: [<ffffffffa099f6e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
18:41:40: [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
18:41:40: [<ffffffffa09a971e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
18:41:40: [<ffffffffa09a8c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
18:41:40: [<ffffffff8100c0ca>] child_rip+0xa/0x20
18:41:40: [<ffffffffa09a8c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
18:41:40: [<ffffffffa09a8c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
18:41:40: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
18:41:40:
18:41:40:Kernel panic - not syncing: LBUG

Maloo report: https://maloo.whamcloud.com/test_sets/a5ffcb6a-fcc0-11e2-9fdb-52540035b04c

Comment by Hongchao Zhang [ 04/Sep/13 ]

in "ldiskfs_add_dot_dotdot", the inode->i_nlink is set as "2", which is not under the protection "osd_object->oo_guard", could it cause this problem?

Comment by John Fuchs-Chesney (Inactive) [ 05/Feb/15 ]

Patch has been landed to Master.
~ jfc.

Generated at Sat Feb 10 01:30:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.