Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1268

Lustre MDS cannot start after ASSERTION

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • None
    • Lustre 2.1.0
    • None
    • lustre-2.1.0-24chaos (github.com/chaos/lustre)
    • 1
    • 6425

      To work around LU-1257 without a complete downtime of the filesystem and use of --writeconf, I edited an OST's mountdata file to remove the LDD_F_VIRGIN flag.

      Unfortunately, that seems to have left the configuration on the MDS in a bad state. The OST was allowed to reconnect, but on the MDS/MGS console we saw the message:

      2012-03-22 15:12:26 LustreError: 6002:0:(obd_config.c:1019:class_process_config()) no device for: lsc-OST0174-osc
      2012-03-22 15:12:26 LustreError: 6002:0:(obd_config.c:1363:class_config_llog_handler()) Err -22 on cfg command:
      2012-03-22 15:12:26 Lustre:    cmd=cf00b 0:lsc-OST0174-osc  1:172.19.1.127@o2ib100  
      

      With the MDS already running, that error was non-fatal. But after a crash due to LU-931, the MDS is unable to start because of the same llog problem:

      2012-03-29 03:20:45 Lustre: 20272:0:(mdt_handler.c:4705:mdt_process_config()) For 1.8 interoperability, skip this mdt.group_upcall. It is obsolete
      2012-03-29 03:20:45 Lustre: 20272:0:(mdt_handler.c:4711:mdt_process_config()) Found old param mdt.quota_type, changed it to mdd.quota_type.
      2012-03-29 03:20:47 LustreError: 20272:0:(obd_config.c:1019:class_process_config()) no device for: lsc-OST0174-osc
      2012-03-29 03:20:47 LustreError: 20272:0:(obd_config.c:1363:class_config_llog_handler()) Err -22 on cfg command:
      2012-03-29 03:20:47 Lustre:    cmd=cf00b 0:lsc-OST0174-osc  1:172.19.1.127@o2ib100  
      2012-03-29 03:20:47 LustreError: 15b-f: MGC172.19.1.100@o2ib100: The configuration from log 'lsc-MDT0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
      2012-03-29 03:20:47 LustreError: 15c-8: MGC172.19.1.100@o2ib100: The configuration from log 'lsc-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      2012-03-29 03:20:47 LustreError: 20183:0:(obd_mount.c:1192:server_start_targets()) failed to start server lsc-MDT0000: -22
      2012-03-29 03:20:47 LustreError: 20183:0:(obd_mount.c:1719:server_fill_super()) Unable to start targets: -22
      2012-03-29 03:20:47 Lustre: Failing over lsc-MDT0000
      

      Can you suggest any quick fixes? This is a production filesystem that is currently unusable with jobs hung waiting on its return.

      I fear that we may need to really unmount this filesystem everywhere and resort to completely reinitializing the logs with writeconf.

            johann Johann Lombardi (Inactive)
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: