Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5285

mdt_reconstruct_setattr() calls mdt_attr_get_complex() without checking that object exists

    XMLWordPrintable

Details

    • 3
    • 14748

    Description

      Running racer on 2 nodes with MDSCOUNT=1 and http://review.whamcloud.com/#/c/5936/ (add more operations to racer) I see this frequently.

      [  803.128945] Lustre: lustre-MDT0000: Client 045a894d-d2a7-e744-df1f-a695740583d8 (at 19\
      2.168.122.108@tcp) reconnecting
      [  803.134578] LustreError: 5113:0:(lu_object.h:852:lu_object_attr()) ASSERTION( ((o)->lo\
      _header->loh_attr & LOHA_EXISTS) != 0 ) failed:
      [  803.136583] LustreError: 5113:0:(lu_object.h:852:lu_object_attr()) LBUG
      [  803.137789] Pid: 5113, comm: mdt00_005
      [  803.138439]
      [  803.138440] Call Trace:
      [  803.139129]  [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [  803.140296]  [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs]
      [  803.141337]  [<ffffffffa0c034ac>] mdt_attr_get_complex+0xaec/0xb80 [mdt]
      [  803.142457]  [<ffffffffa0c20425>] mdt_reconstruct_setattr+0xd5/0x410 [mdt]
      [  803.143631]  [<ffffffffa04737e0>] ? lu_ucred+0x20/0x30 [obdclass]
      [  803.144677]  [<ffffffffa0c1fde5>] mdt_reconstruct+0x45/0x120 [mdt]
      [  803.145722]  [<ffffffffa0bfdbfa>] mdt_reint_internal+0x6fa/0x7c0 [mdt]
      [  803.146820]  [<ffffffffa0bfe24b>] mdt_reint+0x6b/0x120 [mdt]
      [  803.147818]  [<ffffffffa06e9c45>] tgt_request_handle+0x245/0xad0 [ptlrpc]
      [  803.148977]  [<ffffffffa069ce31>] ptlrpc_main+0xcf1/0x1880 [ptlrpc]
      [  803.150053]  [<ffffffffa069c140>] ? ptlrpc_main+0x0/0x1880 [ptlrpc]
      [  803.151093]  [<ffffffff8109eab6>] kthread+0x96/0xa0
      [  803.151886]  [<ffffffff8100c30a>] child_rip+0xa/0x20
      [  803.152714]  [<ffffffff81554710>] ? _spin_unlock_irq+0x30/0x40
      [  803.153677]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
      [  803.154572]  [<ffffffff8109ea20>] ? kthread+0x0/0xa0
      [  803.155403]  [<ffffffff8100c300>] ? child_rip+0x0/0x20
      [  803.156261]
      

      When I hit this I see the following sequence of events:

      1. The MDT crashes.
      2. The client node stays up.
      3. I restart the MDT.
      4. The client resends the setattr.
      5. GOTO 1.

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: