Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4829

LBUG: ASSERTION( !fid_is_idif(fid) )

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.5.0
    • Lustre 2.4.3
    • 13287

    Description

      We have our TDS system setup in wide-stripe mode. Each OSS is mounting over 100 OSTs. On mount the other day, we hit an assertion when scrub started.

      [12319.230157] LustreError: 54554:0:(osd_internal.h:752:osd_fid2oi()) ASSERTION( !fid_is_idif(fid) ) failed: [0x100000000:0x1:0x0]
      [12319.242502] LustreError: 54554:0:(osd_internal.h:752:osd_fid2oi()) LBUG
      [12319.249538] Pid: 54554, comm: OI_scrub
      [12319.253707] 
      [12319.253707] Call Trace:
      [12319.258529]  [<ffffffffa03dd895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [12319.265837]  [<ffffffffa03dde97>] lbug_with_loc+0x47/0xb0 [libcfs]
      [12319.272395]  [<ffffffffa0be40f5>] __osd_oi_lookup+0x3a5/0x3b0 [osd_ldiskfs]
      [12319.279770]  [<ffffffff8119dfcd>] ? generic_drop_inode+0x1d/0x80
      [12319.286133]  [<ffffffffa0be4174>] osd_oi_lookup+0x74/0x140 [osd_ldiskfs]
      [12319.293197]  [<ffffffffa0bf8fbf>] osd_scrub_exec+0x1af/0xf30 [osd_ldiskfs]
      [12319.300553]  [<ffffffffa0bfa5f2>] ? osd_scrub_next+0x142/0x4b0 [osd_ldiskfs]
      [12319.308061]  [<ffffffffa0b71432>] ? ldiskfs_read_inode_bitmap+0x172/0x2c0 [ldiskfs]
      [12319.316454]  [<ffffffffa0bf4d4f>] osd_inode_iteration+0x1cf/0x570 [osd_ldiskfs]
      [12319.324461]  [<ffffffff810516b9>] ? __wake_up_common+0x59/0x90
      [12319.330764]  [<ffffffffa0bf8e10>] ? osd_scrub_exec+0x0/0xf30 [osd_ldiskfs]
      [12319.337941]  [<ffffffffa0bfa4b0>] ? osd_scrub_next+0x0/0x4b0 [osd_ldiskfs]
      [12319.345300]  [<ffffffffa0bf732a>] osd_scrub_main+0x59a/0xd00 [osd_ldiskfs]
      [12319.352591]  [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
      [12319.358585]  [<ffffffffa0bf6d90>] ? osd_scrub_main+0x0/0xd00 [osd_ldiskfs]
      [12319.365881]  [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [12319.371174]  [<ffffffffa0bf6d90>] ? osd_scrub_main+0x0/0xd00 [osd_ldiskfs]
      [12319.378509]  [<ffffffffa0bf6d90>] ? osd_scrub_main+0x0/0xd00 [osd_ldiskfs]
      [12319.385799]  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      

      We had panic_on_lbug off, so we don't have a crash dump. But the system is still running, so if there's anything useful we can try to grab it. I tried to cat /proc/fs/lustre/osd-ldiskfs/atlastds-OST00f3/oi_scrub but it just hangs. That 'cat' process is stuck on the following:

      # cat /proc/83715/stack
      [<ffffffff81281f34>] call_rwsem_down_read_failed+0x14/0x30
      [<ffffffffa0bf630d>] osd_scrub_dump+0x3d/0x320 [osd_ldiskfs]
      [<ffffffffa0be6055>] lprocfs_osd_rd_oi_scrub+0x75/0xb0 [osd_ldiskfs]
      [<ffffffffa054f563>] lprocfs_fops_read+0xf3/0x1f0 [obdclass]
      [<ffffffff811e9fee>] proc_reg_read+0x7e/0xc0
      [<ffffffff81181f05>] vfs_read+0xb5/0x1a0
      [<ffffffff81182041>] sys_read+0x51/0x90
      [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      the FID it's complaining about [0x100000000:0x1:0x0] looks suspect. The sequence is FID_SEQ_IDIF and the ObjID is 1. I know on ext4 inode 1 stores the bad blocks information, but I don't think that's what we're seeing here.

      We haven't yet tried to re-mount to see if the issue is persistent, since there may be something on the running system that you want us to provide. But we can do that if it's helpful.

      Attachments

        Issue Links

          Activity

            People

              yong.fan nasf (Inactive)
              ezell Matt Ezell
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: