Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5285

mdt_reconstruct_setattr() calls mdt_attr_get_complex() without checking that object exists

Details

    • 3
    • 14748

    Description

      Running racer on 2 nodes with MDSCOUNT=1 and http://review.whamcloud.com/#/c/5936/ (add more operations to racer) I see this frequently.

      [  803.128945] Lustre: lustre-MDT0000: Client 045a894d-d2a7-e744-df1f-a695740583d8 (at 19\
      2.168.122.108@tcp) reconnecting
      [  803.134578] LustreError: 5113:0:(lu_object.h:852:lu_object_attr()) ASSERTION( ((o)->lo\
      _header->loh_attr & LOHA_EXISTS) != 0 ) failed:
      [  803.136583] LustreError: 5113:0:(lu_object.h:852:lu_object_attr()) LBUG
      [  803.137789] Pid: 5113, comm: mdt00_005
      [  803.138439]
      [  803.138440] Call Trace:
      [  803.139129]  [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [  803.140296]  [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs]
      [  803.141337]  [<ffffffffa0c034ac>] mdt_attr_get_complex+0xaec/0xb80 [mdt]
      [  803.142457]  [<ffffffffa0c20425>] mdt_reconstruct_setattr+0xd5/0x410 [mdt]
      [  803.143631]  [<ffffffffa04737e0>] ? lu_ucred+0x20/0x30 [obdclass]
      [  803.144677]  [<ffffffffa0c1fde5>] mdt_reconstruct+0x45/0x120 [mdt]
      [  803.145722]  [<ffffffffa0bfdbfa>] mdt_reint_internal+0x6fa/0x7c0 [mdt]
      [  803.146820]  [<ffffffffa0bfe24b>] mdt_reint+0x6b/0x120 [mdt]
      [  803.147818]  [<ffffffffa06e9c45>] tgt_request_handle+0x245/0xad0 [ptlrpc]
      [  803.148977]  [<ffffffffa069ce31>] ptlrpc_main+0xcf1/0x1880 [ptlrpc]
      [  803.150053]  [<ffffffffa069c140>] ? ptlrpc_main+0x0/0x1880 [ptlrpc]
      [  803.151093]  [<ffffffff8109eab6>] kthread+0x96/0xa0
      [  803.151886]  [<ffffffff8100c30a>] child_rip+0xa/0x20
      [  803.152714]  [<ffffffff81554710>] ? _spin_unlock_irq+0x30/0x40
      [  803.153677]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
      [  803.154572]  [<ffffffff8109ea20>] ? kthread+0x0/0xa0
      [  803.155403]  [<ffffffff8100c300>] ? child_rip+0x0/0x20
      [  803.156261]
      

      When I hit this I see the following sequence of events:

      1. The MDT crashes.
      2. The client node stays up.
      3. I restart the MDT.
      4. The client resends the setattr.
      5. GOTO 1.

      Attachments

        Issue Links

          Activity

            [LU-5285] mdt_reconstruct_setattr() calls mdt_attr_get_complex() without checking that object exists
            pjones Peter Jones added a comment -

            Landed for 2.6

            pjones Peter Jones added a comment - Landed for 2.6
            bzzz Alex Zhuravlev added a comment - http://review.whamcloud.com/11025

            I'm OK with not returning the attributes for setattr if the object is missing. The setattr itself must have been processed, so it shouldn't return an error to the caller (i.e. no -ENOENT), but it is OK if the attributes on the client inode are not updated, and it could even be purged from cache if there is a mechanism to do that (e.g. return only mbo_nlink=0 and mbo_flags=OBD_MD_FLNLINK to the client).

            adilger Andreas Dilger added a comment - I'm OK with not returning the attributes for setattr if the object is missing. The setattr itself must have been processed, so it shouldn't return an error to the caller (i.e. no -ENOENT ), but it is OK if the attributes on the client inode are not updated, and it could even be purged from cache if there is a mechanism to do that (e.g. return only mbo_nlink=0 and mbo_flags=OBD_MD_FLNLINK to the client).

            Andreas, 2.1 doesn't check that either.

            why do we need to return the attributes in the setattr path? we're in the reconstruct path and given the original transaction was committed, rep-ack lock was cancelled and another client was able to destroy the object. so we can't return valid attribues. with no LDLM lock the attributes can't be used either?

            bzzz Alex Zhuravlev added a comment - Andreas, 2.1 doesn't check that either. why do we need to return the attributes in the setattr path? we're in the reconstruct path and given the original transaction was committed, rep-ack lock was cancelled and another client was able to destroy the object. so we can't return valid attribues. with no LDLM lock the attributes can't be used either?
            jhammond John Hammond added a comment -

            From the source mdt_reconstruct_create() has exactly the same issue. I haven't gotten a crash there but it does pass an object to mdt_attr_get_complex() without checking that it exists. mdt_reconstruct_open() looks OK. Probably all uses of mdt_attr_get_complex() should be audited...

            jhammond John Hammond added a comment - From the source mdt_reconstruct_create() has exactly the same issue. I haven't gotten a crash there but it does pass an object to mdt_attr_get_complex() without checking that it exists. mdt_reconstruct_open() looks OK. Probably all uses of mdt_attr_get_complex() should be audited...

            Looks like this was added in http://review.whamcloud.com/3782

            adilger Andreas Dilger added a comment - Looks like this was added in http://review.whamcloud.com/3782

            People

              bzzz Alex Zhuravlev
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: