Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1268

Lustre MDS cannot start after ASSERTION

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • Lustre 2.1.0
    • None
    • lustre-2.1.0-24chaos (github.com/chaos/lustre)
    • 1
    • 6425

    Description

      To work around LU-1257 without a complete downtime of the filesystem and use of --writeconf, I edited an OST's mountdata file to remove the LDD_F_VIRGIN flag.

      Unfortunately, that seems to have left the configuration on the MDS in a bad state. The OST was allowed to reconnect, but on the MDS/MGS console we saw the message:

      2012-03-22 15:12:26 LustreError: 6002:0:(obd_config.c:1019:class_process_config()) no device for: lsc-OST0174-osc
      2012-03-22 15:12:26 LustreError: 6002:0:(obd_config.c:1363:class_config_llog_handler()) Err -22 on cfg command:
      2012-03-22 15:12:26 Lustre:    cmd=cf00b 0:lsc-OST0174-osc  1:172.19.1.127@o2ib100  
      

      With the MDS already running, that error was non-fatal. But after a crash due to LU-931, the MDS is unable to start because of the same llog problem:

      2012-03-29 03:20:45 Lustre: 20272:0:(mdt_handler.c:4705:mdt_process_config()) For 1.8 interoperability, skip this mdt.group_upcall. It is obsolete
      2012-03-29 03:20:45 Lustre: 20272:0:(mdt_handler.c:4711:mdt_process_config()) Found old param mdt.quota_type, changed it to mdd.quota_type.
      2012-03-29 03:20:47 LustreError: 20272:0:(obd_config.c:1019:class_process_config()) no device for: lsc-OST0174-osc
      2012-03-29 03:20:47 LustreError: 20272:0:(obd_config.c:1363:class_config_llog_handler()) Err -22 on cfg command:
      2012-03-29 03:20:47 Lustre:    cmd=cf00b 0:lsc-OST0174-osc  1:172.19.1.127@o2ib100  
      2012-03-29 03:20:47 LustreError: 15b-f: MGC172.19.1.100@o2ib100: The configuration from log 'lsc-MDT0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
      2012-03-29 03:20:47 LustreError: 15c-8: MGC172.19.1.100@o2ib100: The configuration from log 'lsc-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      2012-03-29 03:20:47 LustreError: 20183:0:(obd_mount.c:1192:server_start_targets()) failed to start server lsc-MDT0000: -22
      2012-03-29 03:20:47 LustreError: 20183:0:(obd_mount.c:1719:server_fill_super()) Unable to start targets: -22
      2012-03-29 03:20:47 Lustre: Failing over lsc-MDT0000
      

      Can you suggest any quick fixes? This is a production filesystem that is currently unusable with jobs hung waiting on its return.

      I fear that we may need to really unmount this filesystem everywhere and resort to completely reinitializing the logs with writeconf.

      Attachments

        Activity

          People

            johann Johann Lombardi (Inactive)
            morrone Christopher Morrone
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: