Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5984

server_mgc_set_fs()) can't set_fs -17

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • Lustre 2.5.3
    • 2
    • 16699

    Description

      After upgrading Lustre to 2.5.3 (specifically lustre-2.5.3-2chaos) we're no longer able to start the MDS due to the following failure.

      Lustre: Lustre: Build Version: 2.5.3-2chaos-2chaos--PRISTINE-2.6.32-431.29.2.1chaos.ch5.2.x86_64
      LustreError: 13871:0:(obd_mount_server.c:313:server_mgc_set_fs()) can't set_fs -17
      Lustre: fsv-MDT0000: Unable to start target: -17
      LustreError: 13871:0:(obd_mount_server.c:845:lustre_disconnect_lwp()) fsv-MDT0000-lwp-MDT0000: Can't end config log fsv-client.
      LustreError: 13871:0:(obd_mount_server.c:1419:server_put_super()) fsv-MDT0000: failed to disconnect lwp. (rc=-2)
      LustreError: 13871:0:(obd_mount_server.c:1449:server_put_super()) no obd fsv-MDT0000
      LustreError: 13871:0:(obd_mount_server.c:135:server_deregister_mount()) fsv-MDT0000 not registered
      Lustre: server umount fsv-MDT0000 complete
      LustreError: 13871:0:(obd_mount.c:1326:lustre_fill_super()) Unable to mount  (-17)
      

      I took a look at the Lustre debug log and the failure is due to a problem creating the local copy of the config logs. This is a ZFS based MDS which is upgrading from 2.4.x so there was never a local CONFIGS directory.

      I'll attach the full log but basically it seems to be correctly detecting there is no CONFIGS directory. Then it attempts to create the directory which fails with -17 EEXISTS. Given the debug log we have it's not clear why this fails since the directory clearly doesn't exist. We've mounted the MDT via the ZPL and verified this.

      Hoping we could work around the issue we tried manually created the CONFIGS directory and added a copy of the llogs from the MGS. We also just tried creating an empty CONFIGS directory through the ZPL. In both cases this caused the MDS to LBUG on start as follows:

      2014-12-04 11:10:50 LustreError: 16688:0:(osd_index.c:1313:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed:
      2014-12-04 11:10:50 LustreError: 16688:0:(osd_index.c:1313:osd_index_try()) LBUG
      2014-12-04 11:10:50 Pid: 16688, comm: mount.lustre
      2014-12-04 11:10:50
      2014-12-04 11:10:50 Call Trace:
      2014-12-04 11:10:50  [<ffffffffa05d18f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa05d1ef7>] lbug_with_loc+0x47/0xb0 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa0d623e4>] osd_index_try+0x224/0x470 [osd_zfs]
      2014-12-04 11:10:50  [<ffffffffa0740d41>] dt_try_as_dir+0x41/0x60 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa0741351>] dt_lookup_dir+0x31/0x130 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa071f845>] llog_osd_open+0x475/0xbb0 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa06f15ba>] llog_open+0xba/0x2c0 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa06f5131>] llog_backup+0x61/0x500 [obdclass]
      2014-12-04 11:10:50  [<ffffffff8128f540>] ? sprintf+0x40/0x50
      2014-12-04 11:10:50  [<ffffffffa0d99757>] mgc_process_log+0x1177/0x18f0 [mgc]
      2014-12-04 11:10:50  [<ffffffffa0d93360>] ? mgc_blocking_ast+0x0/0x810 [mgc]
      2014-12-04 11:10:50  [<ffffffffa08991e0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
      2014-12-04 11:10:50  [<ffffffffa0d9b4b5>] mgc_process_config+0x645/0x11d0 [mgc]
      2014-12-04 11:10:50  [<ffffffffa07351c6>] lustre_process_log+0x256/0xa60 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa05e1971>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa05dc378>] ? libcfs_log_return+0x28/0x40 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa0766cb7>] server_start_targets+0x9e7/0x1db0 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa05dc378>] ? libcfs_log_return+0x28/0x40 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa0738876>] ? lustre_start_mgc+0x4b6/0x1e60 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa05e1971>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa0730760>] ? class_config_llog_handler+0x0/0x1880 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa076ceb8>] server_fill_super+0xb98/0x19e0 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa05dc378>] ? libcfs_log_return+0x28/0x40 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa073a3f8>] lustre_fill_super+0x1d8/0x550 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa073a220>] ? lustre_fill_super+0x0/0x550 [obdclass]
      2014-12-04 11:10:50  [<ffffffff8118d1ef>] get_sb_nodev+0x5f/0xa0
      2014-12-04 11:10:50  [<ffffffffa07320e5>] lustre_get_sb+0x25/0x30 [obdclass]
      2014-12-04 11:10:50  [<ffffffff8118c82b>] vfs_kern_mount+0x7b/0x1b0
      2014-12-04 11:10:50  [<ffffffff8118c9d2>] do_kern_mount+0x52/0x130
      2014-12-04 11:10:50  [<ffffffff811ae21b>] do_mount+0x2fb/0x930
      2014-12-04 11:10:50  [<ffffffff811ae8e0>] sys_mount+0x90/0xe0
      2014-12-04 11:10:50  [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
      

      At this point we're rolling back to the previous Lustre release in order to make the system available again.

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              behlendorf Brian Behlendorf
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: