Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.4.0
-
lustre-orion-2.3.54-6chaos
-
3
-
5562
Description
We created a new filesystem with 768 zfs-osd OSTs. The OSTs were started out-of-index order, in parallel. There was a failure in the initial registration of about 35 of the OSTs:
2012-11-14 12:30:07 Lustre: Lustre: Build Version: 2.3.54-6chaos-6chaos--PRISTINE-2.6.32-220.23.1.2chaos.ch5.x86_64 2012-11-14 12:30:19 LustreError: 41374:0:(client.c:1123:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff881011707000 x1418644692140034/t0(0) o253->MGC172.20.5.1@o2ib500@172.20.5.1@o2ib500:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 2012-11-14 12:30:19 Lustre: Error -5 communicating with the MGS, is the MGS running? 2012-11-14 12:30:25 LustreError: 41374:0:(client.c:1123:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff881011707000 x1418644692140035/t0(0) o101->MGC172.20.5.1@o2ib500@172.20.5.1@o2ib500:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 2012-11-14 12:30:31 LustreError: 41374:0:(client.c:1123:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff881011707000 x1418644692140036/t0(0) o101->MGC172.20.5.1@o2ib500@172.20.5.1@o2ib500:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 2012-11-14 12:30:31 LustreError: 15c-8: MGC172.20.5.1@o2ib500: The configuration from log 'lsfull-OST0062' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. 2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:1851:server_start_targets()) failed to start server lsfull-OST0062: -5 2012-11-14 12:30:31 Lustre: lsfull-OST0062: Unable to start target: -5 2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:1350:lustre_disconnect_osp()) Can't end config log lsfull 2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:2113:server_put_super()) lsfull-OST0062: failed to disconnect osp-on-ost (rc=-2)! 2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:2143:server_put_super()) no obd lsfull-OST0062 2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:1418:lustre_stop_osp()) Can not find osp-on-ost lsfull-MDT0000-osp-OST0062 2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:2158:server_put_super()) lsfull-OST0062: Fail to stop osp-on-ost! 2012-11-14 12:30:58 Lustre: server umount lsfull-OST0062 complete 2012-11-14 12:30:58 LustreError: 41374:0:(obd_mount.c:2990:lustre_fill_super()) Unable to mount (-5) 2012-11-14 12:32:54 LustreError: 42061:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2 2012-11-14 12:32:54 Lustre: lsfull-OST0062: Initializing new disk 2012-11-14 12:34:10 LustreError: 166-1: MGC172.20.5.1@o2ib500: Connection to MGS (at 172.20.5.1@o2ib500) was lost; in progress operations using this service will fail 2012-11-14 12:50:50 Lustre: Evicted from MGS (at MGC172.20.5.1@o2ib500_0) after server handle changed from 0x2ca8c28d7be253b7 to 0xa10ed509d011108e 2012-11-14 12:50:50 Lustre: MGC172.20.5.1@o2ib500: Connection restored to MGS (at 172.20.5.1@o2ib500) 2012-11-14 12:56:27 LustreError: 137-5: UUID 'ls1-OST0062_UUID' is not available for connect (no target) 2012-11-14 12:56:27 LustreError: Skipped 21 previous similar messages
df shows the OST mounted, and the lustre:svname property value no longer has the ":", however, but it has no exports:
# grove98 /root > df -t lustre Filesystem 1K-blocks Used Available Use% Mounted on grove98/lsfull-ost0 67554518656 1152 67554515456 1% /mnt/lustre/local/lsfull-OST0062 # grove98 /root > zfs get lustre:svname grove98/lsfull-ost0 NAME PROPERTY VALUE SOURCE grove98/lsfull-ost0 lustre:svname lsfull-OST0062 local # grove98 /root > ls /proc/fs/lustre/obdfilter/lsfull-OST0062/exports/ clear
If I try to restart the OST, the MGS complains it isn't registered:
MGS:
LustreError: 142-7: The target lsfull-OST0062 has not registered yet. It must be started before failnids can be added. LustreError: 527:0:(mgs_llog.c:2956:mgs_write_log_param()) err -2 on param 'failover.node=172.20.1.97@o2ib500' LustreError: 527:0:(mgs_handler.c:393:mgs_handle_target_reg()) Failed to write lsfull-OST0062 log (-2)
OSS:
2012-11-14 15:52:56 Lustre: Error -2 communicating with the MGS, is the MGS running? 2012-11-14 15:52:56 LustreError: 45074:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2 2012-11-14 15:52:56 LustreError: 15c-8: MGC172.20.5.1@o2ib500: The configuration from log 'lsfull-OST0062' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. 2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:1851:server_start_targets()) failed to start server lsfull-OST0062: -2 2012-11-14 15:52:56 Lustre: lsfull-OST0062: Unable to start target: -2 2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:1350:lustre_disconnect_osp()) Can't end config log lsfull 2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:2113:server_put_super()) lsfull-OST0062: failed to disconnect osp-on-ost (rc=-2)! 2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:2143:server_put_super()) no obd lsfull-OST0062 2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:1418:lustre_stop_osp()) Can not find osp-on-ost lsfull-MDT0000-osp-OST0062 2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:2158:server_put_super()) lsfull-OST0062: Fail to stop osp-on-ost! 2012-11-14 15:52:57 Lustre: server umount lsfull-OST0062 complete 2012-11-14 15:52:57 LustreError: 45074:0:(obd_mount.c:2990:lustre_fill_super()) Unable to mount (-2)
Finally, if put the ":" back in the lustre:svname property to try to force re-registration, the MGS complains the index is already in use:
OSS:
# grove98 /root > zfs set lustre:svname=lsfull:OST0062 grove98/lsfull-ost0 # grove98 /root > zfs get lustre:svname grove98/lsfull-ost0 NAME PROPERTY VALUE SOURCE grove98/lsfull-ost0 lustre:svname lsfull:OST0062 local # grove98 /root > /etc/init.d/lustre start Mounting grove98/lsfull-ost0 on /mnt/lustre/local/lsfull-OST0062 mount.lustre: mount grove98/lsfull-ost0 at /mnt/lustre/local/lsfull-OST0062 failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) # grove98 /root > dmesg | tail Lustre: Error -98 communicating with the MGS, is the MGS running? LustreError: 45333:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2 LustreError: 15c-8: MGC172.20.5.1@o2ib500: The configuration from log 'lsfull-OST0062' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 45333:0:(obd_mount.c:1851:server_start_targets()) failed to start server lsfull-OST0062: -2 Lustre: lsfull-OST0062: Unable to start target: -2 LustreError: 45333:0:(obd_mount.c:1350:lustre_disconnect_osp()) Can't end config log lsfull LustreError: 45333:0:(obd_mount.c:2113:server_put_super()) lsfull-OST0062: failed to disconnect osp-on-ost (rc=-2)! LustreError: 45333:0:(obd_mount.c:2143:server_put_super()) no obd lsfull-OST0062 LustreError: 45333:0:(obd_mount.c:1418:lustre_stop_osp()) Can not find osp-on-ost lsfull-MDT0000-osp-OST0062 LustreError: 45333:0:(obd_mount.c:2158:server_put_super()) lsfull-OST0062: Fail to stop osp-on-ost! Lustre: server umount lsfull-OST0062 complete LustreError: 45333:0:(obd_mount.c:2990:lustre_fill_super()) Unable to mount (-2)
MGS:
LustreError: 140-5: Server lsfull-OST0062 requested index 98, but that index is already in use. Use --writeconf to force LustreError: 534:0:(mgs_llog.c:3005:mgs_write_log_target()) Can't get index (-98) LustreError: 534:0:(mgs_handler.c:393:mgs_handle_target_reg()) Failed to write lsfull-OST0062 log (-98)
Attachments
Issue Links
- mentioned in
-
Page Loading...