Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18172

server umount and lctl lfsck_stop race

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The following race between umount and lctl lfsck_stop has been observed:
      lctl lfsck_stop is running..
      umount start:

      server_put_super
        class_manual_cleanup
          obd_precleanup
            mdt_device_fini
              mdt_fini
                mdd_iocontrol(OBD_IOC_STOP_LFSCK)
                  lfsck_stop
                    if (thread_is_stopping(thread))
                      GOTO(unlock, rc = -EINPROGRESS);  <<< umount continues as lctl lfsck_stop has setup already SVC_STOPPING flag.
      
                /* various *fini()-s */
                mdt_fld_fini
                  ss->ss_server_fld = NULL;
      
                mdt_stack_fini
                  mdd_process_config
                    lod_process_config
                      osd_process_config(LCFG_CLEANUP)
                        osd_shutdown
                          osd_scrub_cleanup
                            LASSERT(dev->od_otable_it == NULL);   <<< hit this assertion as lfsck_master_engine() did not run osd_otable_it_fini() yet.
      

      There was also seen another consequence of the problem:
      lfsck failed at:

         lfsck_find_mdt_idx_by_fid
             rc = fld_server_lookup(env, ss->ss_server_fld...
               BUG: unable to handle kernel NULL pointer dereference
      

      because ss->ss_server_fld is set to NULL in mdt_fld_fini() by umount. See above umount call trace.

      Attachments

        Activity

          People

            vsaveliev Vladimir Saveliev
            vsaveliev Vladimir Saveliev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: