Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4653

Hit LBUG ASSERTION( fid_seq(fid1) == fid_seq(fid2) ) failed after upgrade OST from 2.5.0 to 2.6

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.2
    • Lustre 2.6.0, Lustre 2.5.1, Lustre 2.4.3
    • before upgrade:
      server and client are running 2.5.0

      upgrade OSS to lustre-master build # 1876
      MDS and clients are still running 2.5.0
    • 3
    • 12727

    Description

      Hit following LBUG when running rolling upgrade testing. Test steps are
      1. setup the system with 2.5.0
      2. keep the system on and upgrade OSS to lustre-master build #1876(teg-2.5.55)
      3. when mounting OST, MDS reboot

      Lustre: lustre-MDT0000: Recovery over after 0:01, of 2 clients 2 recovered and 0 were evicted.
      LustreError: 2152:0:(lustre_fid.h:725:lu_fid_diff()) ASSERTION( fid_seq(fid1) == fid_seq(fid2) ) failed: fid1:[0x100000000:0x21:0x0], fid2:[0x100000001:0x0:0x0]
      LustreError: 2152:0:(lustre_fid.h:725:lu_fid_diff()) LBUG
      Pid: 2152, comm: osp-pre-0
      
      Call Trace:
       [<ffffffffa01ff895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa01ffe97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0af896d>] osp_precreate_cleanup_orphans+0x10fd/0x1130 [osp]
       [<ffffffffa0498161>] ? import_at_get_index+0xb1/0xf0 [ptlrpc]
       [<ffffffff81063410>] ? default_wake_function+0x0/0x20
       [<ffffffffa0afac6f>] osp_precreate_thread+0x20f/0x1b00 [osp]
       [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
       [<ffffffff81063410>] ? default_wake_function+0x0/0x20
       [<ffffffffa0afaa60>] ? osp_precreate_thread+0x0/0x1b00 [osp]
       [<ffffffff81096a36>] kthread+0x96/0xa0
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffff810969a0>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      
      Kernel panic - not syncing: LBUG
      Pid: 2152, comm: osp-pre-0 Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1
      Call Trace:
       [<ffffffff8150de58>] ? panic+0xa7/0x16f
       [<ffffffffa01ffeeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
       [<ffffffffa0af896d>] ? osp_precreate_cleanup_orphans+0x10fd/0x1130 [osp]
       [<ffffffffa0498161>] ? import_at_get_index+0xb1/0xf0 [ptlrpc]
       [<ffffffff81063410>] ? default_wake_function+0x0/0x20
       [<ffffffffa0afac6f>] ? osp_precreate_thread+0x20f/0x1b00 [osp]
       [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
       [<ffffffff81063410>] ? default_wake_function+0x0/0x20
       [<ffffffffa0afaa60>] ? osp_precreate_thread+0x0/0x1b00 [osp]
       [<ffffffff81096a36>] ? kthread+0x96/0xa0
       [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
       [<ffffffff810969a0>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Initializing cgroup subsys cpuset
      Initializing cgroup subsys cpu
      Linux version 2.6.32-358.18.1.el6_lustre.x86_64 (jenkins@builder-1-sde1-el6-x8664.lab.whamcloud.com) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Fri Oct 11 16:41:53 PDT 2013
      Command line: ro root=UUID=dec021fd-a287-4254-8c2b-0a004dfdde46 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD console=ttyS0,115200 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off  memmap=exactmap memmap=538K@64K memmap=132562K@49690K elfcorehdr=182252K memmap=64K$0K memmap=38K$602K memmap=104K$920K memmap=8K$3668600K memmap=72K#3668608K memmap=184K#3668680K memmap=263296K$3668864K memmap=2048K$4192256K
      KERNEL supported cpus:
        Intel GenuineIntel
        AMD AuthenticAMD
        Centaur CentaurHauls
      BIOS-provided physical RAM map:
      

      Attachments

        Issue Links

          Activity

            [LU-4653] Hit LBUG ASSERTION( fid_seq(fid1) == fid_seq(fid2) ) failed after upgrade OST from 2.5.0 to 2.6

            Patch landed for 2.5.2.

            adilger Andreas Dilger added a comment - Patch landed for 2.5.2.
            sarah Sarah Liu added a comment -

            Alex, ok, will do it now

            sarah Sarah Liu added a comment - Alex, ok, will do it now

            Sara, could you try to reproduce this and grab a full debug log please? you can attach that to LU-4957

            bzzz Alex Zhuravlev added a comment - Sara, could you try to reproduce this and grab a full debug log please? you can attach that to LU-4957
            di.wang Di Wang (Inactive) added a comment - - edited

            Alex: 0x100000001 is the same sequence as 0x100000000, and we just embedded ost_index here. or you mean sth else? btw: probably discussed this in 4957?

            di.wang Di Wang (Inactive) added a comment - - edited Alex: 0x100000001 is the same sequence as 0x100000000, and we just embedded ost_index here. or you mean sth else? btw: probably discussed this in 4957?

            Di, look at the details:

            [0x100000001:0x0:0x0] pre used fid [0x100000000:0x1e8:0x0] LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) LBUG

            so, it got a new sequence?

            bzzz Alex Zhuravlev added a comment - Di, look at the details: [0x100000001:0x0:0x0] pre used fid [0x100000000:0x1e8:0x0] LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) LBUG so, it got a new sequence?
            di.wang Di Wang (Inactive) added a comment - - edited

            Create a new ticket in LU-4957

            di.wang Di Wang (Inactive) added a comment - - edited Create a new ticket in LU-4957
            Lustre: lustre-MDT0000: Recovery over after 0:10, of 2 clients 2 recovered and 0 were evicted.
            LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) ASSERTION( osp_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100000001:0x0:0x0] pre used fid [0x100000000:0x1e8:0x0]
            LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) LBUG
            Pid: 2166, comm: osp-pre-0
            

            Hmm, this is different issue here, it seems MDT get a lower FID( < its last precreate used FID) from OST during recovery, which is wrong. Though this ASSERT might be improper here. But anyway probably a new ticket?

            di.wang Di Wang (Inactive) added a comment - Lustre: lustre-MDT0000: Recovery over after 0:10, of 2 clients 2 recovered and 0 were evicted. LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) ASSERTION( osp_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100000001:0x0:0x0] pre used fid [0x100000000:0x1e8:0x0] LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) LBUG Pid: 2166, comm: osp-pre-0 Hmm, this is different issue here, it seems MDT get a lower FID( < its last precreate used FID) from OST during recovery, which is wrong. Though this ASSERT might be improper here. But anyway probably a new ticket?

            Patch landed to Master. Patches for other branches will be tracked outside of this ticket.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master. Patches for other branches will be tracked outside of this ticket.
            sarah Sarah Liu added a comment -

            I tried the patch for b2_5, got following error in the recovery stage after OSS upgrade to 2.6, MDS reboot again

            fat-amd-1.lab.whamcloud.com login: root
            Password: 
            Lustre: lustre-MDT0000: Recovery over after 0:10, of 2 clients 2 recovered and 0 were evicted.
            LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) ASSERTION( osp_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100000001:0x0:0x0] pre used fid [0x100000000:0x1e8:0x0]
            LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) LBUG
            Pid: 2166, comm: osp-pre-0
            
            Call Trace:
             [<ffffffffa0209895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
             [<ffffffffa0209e97>] lbug_with_loc+0x47/0xb0 [libcfs]
             [<ffffffffa0b119d7>] osp_precreate_send+0x1a47/0x1b00 [osp]
             [<ffffffffa0491304>] ? lustre_msg_set_timeout+0x74/0xc0 [ptlrpc]
             [<ffffffffa0b11f79>] osp_precreate_thread+0x4e9/0xc50 [osp]
             [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
             [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
             [<ffffffffa0b11a90>] ? osp_precreate_thread+0x0/0xc50 [osp]
            Last login: Wed  [<ffffffff8109aee6>] kthread+0x96/0xa0
            Apr 23 17:31:38  [<ffffffff8100c20a>] child_rip+0xa/0x20
            on ttyS0
             [<ffffffff8109ae50>] ? kthread+0x0/0xa0
             [<ffffffff8100c200>] ? child_rip+0x0/0x20
            
            Kernel panic - not syncing: LBUG
            Pid: 2166, comm: osp-pre-0 Not tainted 2.6.32-431.5.1.el6_lustre.x86_64 #1
            Call Trace:
             [<ffffffff81527983>] ? panic+0xa7/0x16f
             [<ffffffffa0209eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
             [<ffffffffa0b119d7>] ? osp_precreate_send+0x1a47/0x1b00 [osp]
             [<ffffffffa0491304>] ? lustre_msg_set_timeout+0x74/0xc0 [ptlrpc]
             [<ffffffffa0b11f79>] ? osp_precreate_thread+0x4e9/0xc50 [osp]
             [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
             [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
             [<ffffffffa0b11a90>] ? osp_precreate_thread+0x0/0xc50 [osp]
             [<ffffffff8109aee6>] ? kthread+0x96/0xa0
             [<ffffffff8100c20a>] ? child_rip+0xa/0x20
             [<ffffffff8109ae50>] ? kthread+0x0/0xa0
             [<ffffffff8100c200>] ? child_rip+0x0/0x20
            Initializing cgroup subsys cpuset
            Initializing cgroup subsys cpu
            
            sarah Sarah Liu added a comment - I tried the patch for b2_5, got following error in the recovery stage after OSS upgrade to 2.6, MDS reboot again fat-amd-1.lab.whamcloud.com login: root Password: Lustre: lustre-MDT0000: Recovery over after 0:10, of 2 clients 2 recovered and 0 were evicted. LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) ASSERTION( osp_fid_diff(fid, &d->opd_pre_used_fid) > 0 ) failed: reply fid [0x100000001:0x0:0x0] pre used fid [0x100000000:0x1e8:0x0] LustreError: 2166:0:(osp_precreate.c:476:osp_precreate_send()) LBUG Pid: 2166, comm: osp-pre-0 Call Trace: [<ffffffffa0209895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0209e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0b119d7>] osp_precreate_send+0x1a47/0x1b00 [osp] [<ffffffffa0491304>] ? lustre_msg_set_timeout+0x74/0xc0 [ptlrpc] [<ffffffffa0b11f79>] osp_precreate_thread+0x4e9/0xc50 [osp] [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 [<ffffffffa0b11a90>] ? osp_precreate_thread+0x0/0xc50 [osp] Last login: Wed [<ffffffff8109aee6>] kthread+0x96/0xa0 Apr 23 17:31:38 [<ffffffff8100c20a>] child_rip+0xa/0x20 on ttyS0 [<ffffffff8109ae50>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Kernel panic - not syncing: LBUG Pid: 2166, comm: osp-pre-0 Not tainted 2.6.32-431.5.1.el6_lustre.x86_64 #1 Call Trace: [<ffffffff81527983>] ? panic+0xa7/0x16f [<ffffffffa0209eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa0b119d7>] ? osp_precreate_send+0x1a47/0x1b00 [osp] [<ffffffffa0491304>] ? lustre_msg_set_timeout+0x74/0xc0 [ptlrpc] [<ffffffffa0b11f79>] ? osp_precreate_thread+0x4e9/0xc50 [osp] [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 [<ffffffffa0b11a90>] ? osp_precreate_thread+0x0/0xc50 [osp] [<ffffffff8109aee6>] ? kthread+0x96/0xa0 [<ffffffff8100c20a>] ? child_rip+0xa/0x20 [<ffffffff8109ae50>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Initializing cgroup subsys cpuset Initializing cgroup subsys cpu
            di.wang Di Wang (Inactive) added a comment - http://review.whamcloud.com/#/c/10058 b2_5 http://review.whamcloud.com/10059 b2_4

            People

              di.wang Di Wang (Inactive)
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: