Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
Lustre 2.1.0
-
None
-
lustre-2.1.0-24chaos (github.com/chaos/lustre)
-
1
-
6425
Description
To work around LU-1257 without a complete downtime of the filesystem and use of --writeconf, I edited an OST's mountdata file to remove the LDD_F_VIRGIN flag.
Unfortunately, that seems to have left the configuration on the MDS in a bad state. The OST was allowed to reconnect, but on the MDS/MGS console we saw the message:
2012-03-22 15:12:26 LustreError: 6002:0:(obd_config.c:1019:class_process_config()) no device for: lsc-OST0174-osc 2012-03-22 15:12:26 LustreError: 6002:0:(obd_config.c:1363:class_config_llog_handler()) Err -22 on cfg command: 2012-03-22 15:12:26 Lustre: cmd=cf00b 0:lsc-OST0174-osc 1:172.19.1.127@o2ib100
With the MDS already running, that error was non-fatal. But after a crash due to LU-931, the MDS is unable to start because of the same llog problem:
2012-03-29 03:20:45 Lustre: 20272:0:(mdt_handler.c:4705:mdt_process_config()) For 1.8 interoperability, skip this mdt.group_upcall. It is obsolete 2012-03-29 03:20:45 Lustre: 20272:0:(mdt_handler.c:4711:mdt_process_config()) Found old param mdt.quota_type, changed it to mdd.quota_type. 2012-03-29 03:20:47 LustreError: 20272:0:(obd_config.c:1019:class_process_config()) no device for: lsc-OST0174-osc 2012-03-29 03:20:47 LustreError: 20272:0:(obd_config.c:1363:class_config_llog_handler()) Err -22 on cfg command: 2012-03-29 03:20:47 Lustre: cmd=cf00b 0:lsc-OST0174-osc 1:172.19.1.127@o2ib100 2012-03-29 03:20:47 LustreError: 15b-f: MGC172.19.1.100@o2ib100: The configuration from log 'lsc-MDT0000'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. 2012-03-29 03:20:47 LustreError: 15c-8: MGC172.19.1.100@o2ib100: The configuration from log 'lsc-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. 2012-03-29 03:20:47 LustreError: 20183:0:(obd_mount.c:1192:server_start_targets()) failed to start server lsc-MDT0000: -22 2012-03-29 03:20:47 LustreError: 20183:0:(obd_mount.c:1719:server_fill_super()) Unable to start targets: -22 2012-03-29 03:20:47 Lustre: Failing over lsc-MDT0000
Can you suggest any quick fixes? This is a production filesystem that is currently unusable with jobs hung waiting on its return.
I fear that we may need to really unmount this filesystem everywhere and resort to completely reinitializing the logs with writeconf.