[LU-8044] class_process_config() no device for: lustre-MDT0021-mdtlov Created: 19/Apr/16 Updated: 14/Jun/18 Resolved: 15/Jun/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Olaf Faaland | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
TOSS 2 (RHEL 6.7 based) |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
On startup for the first time after formatting, the MDT fails to process the config provided by the MGS. The MDT then fails to start. The MDT which fails to start reports: Lustre: Lustre: Build Version: 2.8.0 LustreError: 11797:0:(obd_config.c:1262:class_process_config()) no device for: lustre-MDT0021-mdtlov LustreError: 11797:0:(obd_config.c:1666:class_config_llog_handler()) MGC192.168.112.240@o2ib15: cfg command failed: rc = -22 Lustre: cmd=cf014 0:lustre-MDT0021-mdtlov 1:lustre-MDT0014_UUID 2:20 3:1 LustreError: 15b-f: MGC192.168.112.240@o2ib15: The configuration from log 'lustre-MDT0021'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 11667:0:(obd_mount_server.c:1309:server_start_targets()) failed to start server lustre-MDT0021: -22 LustreError: 11667:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -22 LustreError: 11667:0:(obd_mount_server.c:1512:server_put_super()) no obd lustre-MDT0021 Lustre: server umount lustre-MDT0021 complete LustreError: 11667:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-22) The config logs CONFIGS/lustre-MDT* do not all have the same number of records. lustre-MDT0021 has 2 more records than the other 29 MDTs. The suspicious llog records are: #04 (152)setup 0:lustre-MDT0014-osp-MDT0021 1:lustre-MDT0014_UUID 2:192.168.113.6@o2ib15 #05 (136)modify_mdc_tgts add 0:lustre-MDT0021-mdtlov 1:lustre-MDT0014_UUID 2:20 3:1 #179 (152)setup 0:lustre-MDT0014-osp-MDT0021 1:lustre-MDT0014_UUID 2:192.168.113.6@o2ib15 #180 (136)modify_mdc_tgts add 0:lustre-MDT0021-mdtlov 1:lustre-MDT0014_UUID 2:20 3:1 |
| Comments |
| Comment by Olaf Faaland [ 19/Apr/16 ] |
|
My description made it sound like this happens every time. That's not the case; it happens intermittently. |
| Comment by Olaf Faaland [ 19/Apr/16 ] |
|
Attached: |
| Comment by Di Wang [ 19/Apr/16 ] |
#04 (152)setup 0:lustre-MDT0014-osp-MDT0021 1:lustre-MDT0014_UUID 2:192.168.113.6@o2ib15 #05 (136)modify_mdc_tgts add 0:lustre-MDT0021-mdtlov 1:lustre-MDT0014_UUID 2:20 3:1 The index for OSP setup seems too earlier, which does not look right. Could you please post CONFIGS/lustre-MDT0021 and CONFIG/lustre-MDT0000 here? thanks. |
| Comment by Olaf Faaland [ 19/Apr/16 ] |
|
Config log for MDT0000. |
| Comment by Olaf Faaland [ 19/Apr/16 ] |
|
Di, For the next few hours, I can either gather more information for you, or experiment. About 4 hours from now, I'll have to put the nodes back to their production use and the filesystem will be destroyed. thanks, |
| Comment by Di Wang [ 19/Apr/16 ] |
|
According to the config log, it looks OSP (lustre-MDT0014-osp-MDT0021) setup record is added before "lov setup", which is clearly wrong. #01 (224)marker 865 (flags=0x01, v2.8.0.0) lustre-MDT0014 'add osp' Tue Apr 19 08:55:48 2016- #02 (088)add_uuid nid=192.168.113.6@o2ib15(0x5000fc0a87106) 0: 1:192.168.113.6@o2ib15 #03 (144)attach 0:lustre-MDT0014-osp-MDT0021 1:osp 2:lustre-MDT0021-mdtlov_UUID #04 (152)setup 0:lustre-MDT0014-osp-MDT0021 1:lustre-MDT0014_UUID 2:192.168.113.6@o2ib15 #05 (136)modify_mdc_tgts add 0:lustre-MDT0021-mdtlov 1:lustre-MDT0014_UUID 2:20 3:1 #06 (224)END marker 865 (flags=0x02, v2.8.0.0) lustre-MDT0014 'add osp' Tue Apr 19 08:55:48 2016- #07 (224)marker 873 (flags=0x01, v2.8.0.0) lustre-MDT0021 'add mdt' Tue Apr 19 08:55:48 2016- #08 (120)attach 0:lustre-MDT0021 1:mdt 2:lustre-MDT0021_UUID #09 (112)mount_option 0: 1:lustre-MDT0021 2:lustre-MDT0021-mdtlov #10 (160)setup 0:lustre-MDT0021 1:lustre-MDT0021_UUID 2:33 3:lustre-MDT0021-mdtlov 4:f #11 (224)END marker 873 (flags=0x02, v2.8.0.0) lustre-MDT0021 'add mdt' Tue Apr 19 08:55:48 2016- I am checking the debug log on MGS to see why this happen. Olaf, Could you please try to reproduce the log with debug level = -1 on MGS? it will help me to figure out what happens there. thanks. |
| Comment by Di Wang [ 19/Apr/16 ] |
|
Ah, it looks like a race when MGS register 2 MDTs at the same time, I will cook a patch. |
| Comment by Olaf Faaland [ 19/Apr/16 ] |
|
Attach debug log from MGS with debug = -1, while MDTs coming up for first time. In this log, MDT0002 (on catalyst243, NID 192.168.112.243@o2ib15) encountered the error. Lustre: Lustre: Build Version: 2.8.0 LustreError: 11826:0:(obd_config.c:1262:class_process_config()) no device for: lustre-MDT0002-mdtlov LustreError: 11826:0:(obd_config.c:1666:class_config_llog_handler()) MGC192.168.112.240@o2ib15: cfg command failed: rc = -22 Lustre: cmd=cf014 0:lustre-MDT0002-mdtlov 1:lustre-MDT0023_UUID 2:35 3:1 LustreError: 15b-f: MGC192.168.112.240@o2ib15: The configuration from log 'lustre-MDT0002'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 11696:0:(obd_mount_server.c:1309:server_start_targets()) failed to start server lustre-MDT0002: -22 LustreError: 11696:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -22 LustreError: 11696:0:(obd_mount_server.c:1512:server_put_super()) no obd lustre-MDT0002 Lustre: server umount lustre-MDT0002 complete LustreError: 11696:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-22) |
| Comment by Gerrit Updater [ 19/Apr/16 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/19658 |
| Comment by Di Wang [ 19/Apr/16 ] |
|
Olaf: the new debug log seems not catch the failure, probably too late or -1 make the dk log too big to catch all of information? But anyway the patch 19658 should help here. Please try this when you have another chance. Thanks. |
| Comment by Joseph Gmitter (Inactive) [ 20/Apr/16 ] |
|
Hi Di, |
| Comment by Gerrit Updater [ 14/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19658/ |
| Comment by Joseph Gmitter (Inactive) [ 15/Jun/16 ] |
|
patch has landed to master for 2.9.0 |