[LU-4829] LBUG: ASSERTION( !fid_is_idif(fid) ) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.5.0
Affects Version/s: Lustre 2.4.3
Labels:
- mn4

Rank (Obsolete):
13287

Description

We have our TDS system setup in wide-stripe mode. Each OSS is mounting over 100 OSTs. On mount the other day, we hit an assertion when scrub started.

[12319.230157] LustreError: 54554:0:(osd_internal.h:752:osd_fid2oi()) ASSERTION( !fid_is_idif(fid) ) failed: [0x100000000:0x1:0x0]
[12319.242502] LustreError: 54554:0:(osd_internal.h:752:osd_fid2oi()) LBUG
[12319.249538] Pid: 54554, comm: OI_scrub
[12319.253707] 
[12319.253707] Call Trace:
[12319.258529]  [<ffffffffa03dd895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[12319.265837]  [<ffffffffa03dde97>] lbug_with_loc+0x47/0xb0 [libcfs]
[12319.272395]  [<ffffffffa0be40f5>] __osd_oi_lookup+0x3a5/0x3b0 [osd_ldiskfs]
[12319.279770]  [<ffffffff8119dfcd>] ? generic_drop_inode+0x1d/0x80
[12319.286133]  [<ffffffffa0be4174>] osd_oi_lookup+0x74/0x140 [osd_ldiskfs]
[12319.293197]  [<ffffffffa0bf8fbf>] osd_scrub_exec+0x1af/0xf30 [osd_ldiskfs]
[12319.300553]  [<ffffffffa0bfa5f2>] ? osd_scrub_next+0x142/0x4b0 [osd_ldiskfs]
[12319.308061]  [<ffffffffa0b71432>] ? ldiskfs_read_inode_bitmap+0x172/0x2c0 [ldiskfs]
[12319.316454]  [<ffffffffa0bf4d4f>] osd_inode_iteration+0x1cf/0x570 [osd_ldiskfs]
[12319.324461]  [<ffffffff810516b9>] ? __wake_up_common+0x59/0x90
[12319.330764]  [<ffffffffa0bf8e10>] ? osd_scrub_exec+0x0/0xf30 [osd_ldiskfs]
[12319.337941]  [<ffffffffa0bfa4b0>] ? osd_scrub_next+0x0/0x4b0 [osd_ldiskfs]
[12319.345300]  [<ffffffffa0bf732a>] osd_scrub_main+0x59a/0xd00 [osd_ldiskfs]
[12319.352591]  [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
[12319.358585]  [<ffffffffa0bf6d90>] ? osd_scrub_main+0x0/0xd00 [osd_ldiskfs]
[12319.365881]  [<ffffffff8100c0ca>] child_rip+0xa/0x20
[12319.371174]  [<ffffffffa0bf6d90>] ? osd_scrub_main+0x0/0xd00 [osd_ldiskfs]
[12319.378509]  [<ffffffffa0bf6d90>] ? osd_scrub_main+0x0/0xd00 [osd_ldiskfs]
[12319.385799]  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

We had panic_on_lbug off, so we don't have a crash dump. But the system is still running, so if there's anything useful we can try to grab it. I tried to cat /proc/fs/lustre/osd-ldiskfs/atlastds-OST00f3/oi_scrub but it just hangs. That 'cat' process is stuck on the following:

# cat /proc/83715/stack
[<ffffffff81281f34>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffffa0bf630d>] osd_scrub_dump+0x3d/0x320 [osd_ldiskfs]
[<ffffffffa0be6055>] lprocfs_osd_rd_oi_scrub+0x75/0xb0 [osd_ldiskfs]
[<ffffffffa054f563>] lprocfs_fops_read+0xf3/0x1f0 [obdclass]
[<ffffffff811e9fee>] proc_reg_read+0x7e/0xc0
[<ffffffff81181f05>] vfs_read+0xb5/0x1a0
[<ffffffff81182041>] sys_read+0x51/0x90
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

the FID it's complaining about [0x100000000:0x1:0x0] looks suspect. The sequence is FID_SEQ_IDIF and the ObjID is 1. I know on ext4 inode 1 stores the bad blocks information, but I don't think that's what we're seeing here.

We haven't yet tried to re-mount to see if the issue is persistent, since there may be something on the running system that you want us to provide. But we can do that if it's helpful.

Attachments

Issue Links

is related to

LU-3335 LFSCK II: MDT-OST OST local consistency checking

Resolved

Activity

People

Assignee:: nasf (Inactive)

Reporter:: Matt Ezell

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 28/Mar/14 2:38 PM

Updated:: 10/Aug/14 12:32 PM

Resolved:: 07/Aug/14 2:22 PM