Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6973

Null pointer access during umount MDT server if orph_cleanup_sc is not finish

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.5.3
    • None
    • MDT Server crash
    • 3
    • 9223372036854775807

    Description

      Null pointer access during umount MDT server if orph_cleanup_sc is not finish

      crash> bt
      PID: 31563  TASK: ffff880879176ab0  CPU: 1   COMMAND: "orph_cleanup_sc"
       #0 [ffff880547d9b810] machine_kexec at ffffffff8103b71b
       #1 [ffff880547d9b870] crash_kexec at ffffffff810c9942
       #2 [ffff880547d9b940] oops_end at ffffffff8152f070
       #3 [ffff880547d9b970] no_context at ffffffff8104c80b
       #4 [ffff880547d9b9c0] __bad_area_nosemaphore at ffffffff8104ca95
       #5 [ffff880547d9ba10] bad_area_nosemaphore at ffffffff8104cb63
       #6 [ffff880547d9ba20] __do_page_fault at ffffffff8104d25c
       #7 [ffff880547d9bb40] do_page_fault at ffffffff81530fbe
       #8 [ffff880547d9bb70] page_fault at ffffffff8152e375
          [exception RIP: fld_server_lookup+97]
          RIP: ffffffffa0a50b31  RSP: ffff880547d9bc20  RFLAGS: 00010286
          RAX: ffff8810515df4c0  RBX: 00000002122fc000  RCX: ffff8810426c5078
          RDX: ffff880e6896b400  RSI: ffffffffa0a56b00  RDI: ffff881040aff840
          RBP: ffff880547d9bc70   R8: 0000000015fbb1dc   R9: 0000000000000000
          R10: 092c8d41cd51a9c5  R11: 0000000000000041  R12: 0000000000000000
          R13: ffff8810515df4c0  R14: ffff8810426c5078  R15: ffff8810426c4000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #9 [ffff880547d9bc78] osd_fld_lookup at ffffffffa0d53b8a [osd_ldiskfs]
      #10 [ffff880547d9bca8] osd_remote_fid at ffffffffa0d54320 [osd_ldiskfs]
      #11 [ffff880547d9bcf8] osd_it_ea_rec at ffffffffa0d6539e [osd_ldiskfs]
      #12 [ffff880547d9be38] lod_it_rec at ffffffffa0eec331 [lod]
      #13 [ffff880547d9be48] __mdd_orphan_cleanup at ffffffffa0f55050 [mdd]
      #14 [ffff880547d9bee8] kthread at ffffffff8109e71e
      #15 [ffff880547d9bf48] kernel_thread at ffffffff8100c20a
      crash>
      

      This crash appear because ss_server_fld is NULL

      crash> p *(*((struct osd_device *)0xffff8801f827a000).od_dt_dev.dd_lu_dev.ld_site).ld_seq_site
      $9 = {
        ss_lu = 0xffff8801f827a150,
        ss_node_id = 0,
        ss_server_fld = 0x0,     <---------- ICI
        ss_client_fld = 0x0,
        ss_server_seq = 0x0,
        ss_control_seq = 0x0,
        ss_control_exp = 0x0,
        ss_client_seq = 0x0
      }
      
      2228 int osd_fld_lookup(const struct lu_env *env, struct osd_device *osd,
      2229                    obd_seq seq, struct lu_seq_range *range)
      2230 {
      ....
      2248
      2249         LASSERT(ss != NULL);
      2250         fld_range_set_any(range);
      2251         rc = fld_server_lookup(env, ss->ss_server_fld, seq, range);    <--- in some condition ss->ss_server_fld could be NULL
      2252         if (rc != 0) {
      2253                 CERROR("%s: cannot find FLD range for "LPX64": rc = %d\n",
      2254                        osd_name(osd), seq, rc);
      "lustre/osd-ldiskfs/osd_handler.c" 6005 lines --37%--                                                                      2258,0-1      37%
      

      Is it possible to stop orph_cleanup_sc process on the begining of umount MDT process to prevent this issue ?

      Attachments

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              apercher Antoine Percher
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: