[LU-4916] mount failure when adding failover node to mkfs.lustre Created: 16/Apr/14  Updated: 14/Aug/14  Resolved: 13/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Blocker
Reporter: Di Wang Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: dne

Issue Links:
Related
is related to LU-3951 OST-object inconsistency self detect ... Resolved
is related to LU-4190 LustreError: 18166:0:(genops.c:1570:o... Closed
Severity: 3
Rank (Obsolete): 13580

 Description   

I tried this test on current master.
MDT1

[root@client-2 ~]# mkfs.lustre --reformat --mgs --mdt --index=0 --fsname lustre --failnode=10.10.4.3@tcp /dev/disk/by-id/scsi-1IET_00040001

MDT2

[root@client-3 ~]#  mkfs.lustre --reformat --mgsnode=10.10.4.2@tcp --mgsnode=10.10.4.3@tcp --mdt --index=1 --fsname lustre  --failnode=10.10.4.2@tcp /dev/disk/by-id/scsi-1IET_00020001

But unfortunately when it failed when I tries to mount mdt2

[root@client-3 ~]# mount -t lustre /dev/disk/by-id/scsi-1IET_00020001 /mnt/mds2/
mount.lustre: mount /dev/sdj at /mnt/mds2 failed: No such file or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
[root@client-3 ~]# 
...
LDISKFS-fs (sdj): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: srv-lustre-MDT0001: No data found on store. Initialize space
Lustre: lustre-MDT0001: new disk, initializing
LustreError: 11-0: lustre-MDT0000-osp-MDT0001: Communicating with 10.10.4.2@tcp, operation mds_connect failed with -11.
LustreError: 13a-8: Failed to get MGS log params and no local copy.
LustreError: 2354:0:(obd_mount_server.c:699:lustre_lwp_add_conn()) lustre-MDT0001: can't find lwp device.
LustreError: 15c-8: MGC10.10.4.2@tcp: The configuration from log 'lustre-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 2242:0:(obd_mount_server.c:1321:server_start_targets()) lustre-MDT0001: failed to start LWP: -2
LustreError: 2242:0:(obd_mount_server.c:1776:server_fill_super()) Unable to start targets: -2
Lustre: Failing over lustre-MDT0001
Lustre: server umount lustre-MDT0001 complete
LustreError: 2242:0:(obd_mount.c:1338:lustre_fill_super()) Unable to mount  (-2)

config log

[root@client-2 ~]# llog_reader /mnt/mds1/CONFIGS/lustre-client 
Header size : 8192
Time : Fri Apr  4 20:36:36 2014
Number of records: 30
Target uuid : config_uuid 
-----------------------
#01 (224)marker   4 (flags=0x01, v2.5.57.0) lustre-clilov   'lov setup' Fri Apr  4 20:36:36 2014-
#02 (120)attach    0:lustre-clilov  1:lov  2:lustre-clilov_UUID  
#03 (168)lov_setup 0:lustre-clilov  1:(struct lov_desc)
		uuid=lustre-clilov_UUID  stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1
#04 (224)marker   4 (flags=0x02, v2.5.57.0) lustre-clilov   'lov setup' Fri Apr  4 20:36:36 2014-
#05 (224)marker   5 (flags=0x01, v2.5.57.0) lustre-clilmv   'lmv setup' Fri Apr  4 20:36:36 2014-
#06 (120)attach    0:lustre-clilmv  1:lmv  2:lustre-clilmv_UUID  
#07 (168)lov_setup 0:lustre-clilmv  1:(struct lov_desc)
		uuid=lustre-clilmv_UUID  stripe:cnt=0 size=0 offset=0 pattern=0
#08 (224)marker   5 (flags=0x02, v2.5.57.0) lustre-clilmv   'lmv setup' Fri Apr  4 20:36:36 2014-
#09 (224)marker   6 (flags=0x01, v2.5.57.0) lustre-MDT0000  'add mdc' Fri Apr  4 20:36:36 2014-
#10 (080)add_uuid  nid=10.10.4.2@tcp(0x200000a0a0402)  0:  1:10.10.4.2@tcp  
#11 (128)attach    0:lustre-MDT0000-mdc  1:mdc  2:lustre-clilmv_UUID  
#12 (136)setup     0:lustre-MDT0000-mdc  1:lustre-MDT0000_UUID  2:10.10.4.2@tcp  
#13 (080)add_uuid  nid=10.10.4.3@tcp(0x200000a0a0403)  0:  1:10.10.4.3@tcp  
#14 (104)add_conn  0:lustre-MDT0000-mdc  1:10.10.4.3@tcp  
#15 (160)modify_mdc_tgts add 0:lustre-clilmv  1:lustre-MDT0000_UUID  2:0  3:1  4:lustre-MDT0000-mdc_UUID  
#16 (224)marker   6 (flags=0x02, v2.5.57.0) lustre-MDT0000  'add mdc' Fri Apr  4 20:36:36 2014-
#17 (224)marker   7 (flags=0x01, v2.5.57.0) lustre-client   'mount opts' Fri Apr  4 20:36:36 2014-
#18 (120)mount_option 0:  1:lustre-client  2:lustre-clilov  3:lustre-clilmv  
#19 (224)marker   7 (flags=0x02, v2.5.57.0) lustre-client   'mount opts' Fri Apr  4 20:36:36 2014-
#20 (224)marker  11 (flags=0x01, v2.5.57.0) lustre-MDT0001  'add mdc' Fri Apr  4 20:50:05 2014-
#21 (080)add_uuid  nid=10.10.4.3@tcp(0x200000a0a0403)  0:  1:10.10.4.3@tcp  
#22 (128)attach    0:lustre-MDT0001-mdc  1:mdc  2:lustre-clilmv_UUID  
#23 (136)setup     0:lustre-MDT0001-mdc  1:lustre-MDT0001_UUID  2:10.10.4.3@tcp  
#24 (080)add_uuid  nid=10.10.4.2@tcp(0x200000a0a0402)  0:  1:10.10.4.2@tcp  
#25 (104)add_conn  0:lustre-MDT0001-mdc  1:10.10.4.2@tcp  
#26 (160)modify_mdc_tgts add 0:lustre-clilmv  1:lustre-MDT0001_UUID  2:1  3:1  4:lustre-MDT0001-mdc_UUID  
#27 (224)marker  11 (flags=0x02, v2.5.57.0) lustre-MDT0001  'add mdc' Fri Apr  4 20:50:05 2014-
#28 (224)marker  12 (flags=0x01, v2.5.57.0) lustre-client   'mount opts' Fri Apr  4 20:50:05 2014-
#29 (120)mount_option 0:  1:lustre-client  2:lustre-clilov  3:lustre-clilmv  
#30 (224)marker  12 (flags=0x02, v2.5.57.0) lustre-client   'mount opts' Fri Apr  4 20:50:05 2014-

It might be related with the change http://review.whamcloud.com/7666 Fan Yong, could you please comment here. Thanks!



 Comments   
Comment by Jodi Levi (Inactive) [ 17/Apr/14 ]

Mike,
Can you please take this one?
Thank you!

Comment by Mikhail Pershin [ 23/Apr/14 ]

OK, I am looking at this

Comment by Andreas Dilger [ 02/May/14 ]

Mike, have you had a chance to try the above steps?

Comment by Mikhail Pershin [ 07/May/14 ]

Andreas, I can reproduce that with the same error message about LWP. I am checking Di proposal to check commit http://review.whamcloud.com/7666 now

Comment by Mikhail Pershin [ 08/May/14 ]

this issue was introduced in http://review.whamcloud.com/7666 as Di supposed. The lwp device is not set up for every MDT-MDT, but lwp connection is added even for missed lwp device.

Patch is here: http://review.whamcloud.com/10272

Comment by Jodi Levi (Inactive) [ 13/May/14 ]

Patch landed to Master.

Generated at Sat Feb 10 01:46:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.