[LU-7075] trigger scrub when running racer + migration Created: 01/Sep/15  Updated: 23/Mar/17  Resolved: 23/Mar/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Di Wang Assignee: Di Wang
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The console message also includes some debug information I added.

Lustre: lustre-MDT0000: trigger OI scrub by RPC for [0x200000405:0xcc9:0x0], rc = 0 [2]
LustreError: 7795:0:(osd_handler.c:460:osd_check_lma()) lma fid [0x200000405:0xed7:0x0] obj fid [0x200000405:0xcc9:0x0]
LustreError: 60792:0:(osd_handler.c:642:osd_fid_lookup()) trigger scrub with show -78
Lustre: lustre-MDT0000-o: trigger OI scrub by RPC for [0x200000405:0xcc9:0x0], rc = 0 [1] incoming 6
LustreError: 60792:0:(osd_handler.c:591:osd_fid_lookup()) LBUG
Pid: 60792, comm: mdt01_022

Call Trace:
 [<ffffffffa05cf875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa05cfe77>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0f3c4a6>] osd_object_init+0x1416/0x1420 [osd_ldiskfs]
 [<ffffffffa0717e5e>] ? dt_object_init+0xe/0x10 [obdclass]
 [<ffffffffa0715848>] lu_object_alloc+0xd8/0x320 [obdclass]
 [<ffffffffa0716c31>] lu_object_find_try+0x151/0x260 [obdclass]
 [<ffffffffa0716df1>] lu_object_find_at+0xb1/0xe0 [obdclass]
 [<ffffffffa0b004a5>] ? lod_index_lookup+0x25/0x30 [lod]
 [<ffffffffa101fc0c>] ? __mdd_lookup+0x28c/0x450 [mdd]
 [<ffffffffa0716e36>] lu_object_find+0x16/0x20 [obdclass]
 [<ffffffffa10847f6>] mdt_object_find+0x56/0x170 [mdt]
 [<ffffffffa1098a77>] mdt_getattr_name_lock+0xf87/0x1910 [mdt]
 [<ffffffffa109d009>] ? old_init_ucred+0x1b9/0x390 [mdt]
 [<ffffffffa1099922>] mdt_intent_getattr+0x292/0x470 [mdt]
 [<ffffffffa108b224>] mdt_intent_policy+0x494/0xc40 [mdt]
 [<ffffffffa08f0267>] ldlm_lock_enqueue+0x127/0x8e0 [ptlrpc]
 [<ffffffffa091cfa7>] ldlm_handle_enqueue0+0x807/0x15b0 [ptlrpc]
 [<ffffffffa0994e41>] ? tgt_lookup_reply+0x31/0x190 [ptlrpc]
 [<ffffffffa09a7b11>] tgt_enqueue+0x61/0x230 [ptlrpc]
 [<ffffffffa09a88ec>] tgt_request_handle+0xa4c/0x1290 [ptlrpc]
 [<ffffffffa09505b1>] ptlrpc_main+0xe41/0x1910 [ptlrpc]
 [<ffffffffa094f770>] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
 [<ffffffff8109e66e>] kthread+0x9e/0xc0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20



 Comments   
Comment by Di Wang [ 04/Sep/15 ]

Hmm, after I disable the OIC cache (OSD ID cache), this problem will go away. So it might be related with OIC cache.

Comment by Andreas Dilger [ 15/Sep/15 ]

Di, is there another bug that is fixing the OSD cache problem?

Comment by Di Wang [ 15/Sep/15 ]

Andreas: No, there are no other bugs. I will discuss with Fan Yong to see if this OIC bug or migration problem.

Comment by Peter Jones [ 24/Sep/15 ]

Is this ticket a duplicate of LU-6895?

Comment by Di Wang [ 24/Sep/15 ]

No, it should not be duplicate of LU-6895, because LFSCK is not running when I run racer. According to my debug, it is caused by two OI entry who points to the same inode, i.e. two FIDs both point to the same inode. I suspect this is caused by OI cache somehow(See previous comment). Discussed with Fanyong, but we do not have a clue. Btw: this seems more difficult to reproduce in my recent run, maybe 2-3 out of 10 (racer + migration).

Comment by Di Wang [ 09/Sep/16 ]

This does not happen for very long time, let's close it for now.

Generated at Sat Feb 10 02:05:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.