Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13796

MDT: Crash after failover: ASSERTION( attr->la_valid & LA_TYPE )

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.12.5
    • None
    • Centos 7, 3.10.0-1127.8.2.el7_lustre.x86_64, ZFS
    • 3
    • 9223372036854775807

    Description

      Immediately after a failover from one to node to the other (manually performed using pcs resource move) the new node crashed:

       

      [82973.263903] Lustre: meteo0-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
      [82973.263907] Lustre: Skipped 1 previous similar message
      [82973.273770] LustreError: 28190:0:(osp_md_object.c:167:osp_md_create()) ASSERTION( attr->la_valid & LA_TYPE ) failed: 
      [82973.275269] LustreError: 28190:0:(osp_md_object.c:167:osp_md_create()) LBUG
      [82973.276678] Pid: 28190, comm: lod0000_rec0001 3.10.0-1127.8.2.el7_lustre.x86_64 #1 SMP Mon Jun 8 13:48:45 UTC 2020
      [82973.276679] Call Trace:
      [82973.276686]  [<ffffffffc10487cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [82973.276698]  [<ffffffffc104887c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [82973.276703]  [<ffffffffc1995b7a>] osp_md_create+0x42a/0x470 [osp]
      [82973.276715]  [<ffffffffc1151334>] llog_osd_get_cat_list+0x8d4/0xbd0 [obdclass]
      [82973.276740]  [<ffffffffc18ca359>] lod_sub_prep_llog+0xb9/0x783 [lod]
      [82973.276758]  [<ffffffffc188f82b>] lod_sub_recovery_thread+0x1cb/0xc80 [lod]
      [82973.276764]  [<ffffffff9f2c6691>] kthread+0xd1/0xe0
      [82973.276769]  [<ffffffff9f992d1d>] ret_from_fork_nospec_begin+0x7/0x21
      [82973.276774]  [<ffffffffffffffff>] 0xffffffffffffffff
      [82973.276793] Kernel panic - not syncing: LBUG
      [82973.278196] CPU: 29 PID: 28190 Comm: lod0000_rec0001 Kdump: loaded Tainted: P           OE  ------------   3.10.0-1127.8.2.el7_lustre.x86_64 #1
      [82973.281009] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 3.1 09/14/2018
      [82973.282404] Call Trace:
      [82973.282990] Lustre: meteo0-MDT0000: in recovery but waiting for the first client to connect
      [82973.282992] Lustre: Skipped 1 previous similar message
      [82973.286683]  [<ffffffff9f97ffa5>] dump_stack+0x19/0x1b
      [82973.288065]  [<ffffffff9f979541>] panic+0xe8/0x21f
      [82973.289403]  [<ffffffffc10488cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [82973.290714]  [<ffffffffc1995b7a>] osp_md_create+0x42a/0x470 [osp]
      [82973.292031]  [<ffffffffc1151334>] llog_osd_get_cat_list+0x8d4/0xbd0 [obdclass]
      [82973.293333]  [<ffffffffc18ca359>] lod_sub_prep_llog+0xb9/0x783 [lod]
      [82973.294630]  [<ffffffffc118de6c>] ? keys_fill+0xfc/0x180 [obdclass]
      [82973.295895]  [<ffffffffc188f82b>] lod_sub_recovery_thread+0x1cb/0xc80 [lod]
      [82973.297134]  [<ffffffffc188f660>] ? lod_obd_get_info+0x9d0/0x9d0 [lod]
      [82973.298364]  [<ffffffff9f2c6691>] kthread+0xd1/0xe0
      [82973.299582]  [<ffffffff9f2c65c0>] ? insert_kthread_work+0x40/0x40
      [82973.300792]  [<ffffffff9f992d1d>] ret_from_fork_nospec_begin+0x7/0x21
      [82973.301984]  [<ffffffff9f2c65c0>] ? insert_kthread_work+0x40/0x40
      
      

      Attachments

        Issue Links

          Activity

            [LU-13796] MDT: Crash after failover: ASSERTION( attr->la_valid & LA_TYPE )

            This was fixed in 2.14 via patch https://review.whamcloud.com/40655 "LU-14039 obdclass: set LA_TYPE when update_log init".

            adilger Andreas Dilger added a comment - This was fixed in 2.14 via patch https://review.whamcloud.com/40655 " LU-14039 obdclass: set LA_TYPE when update_log init ".

            People

              wc-triage WC Triage
              rredl Robert Redl
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: