|
Hi,
From a full cold start system running Lustre 2.1, I reran the test by doing sequentially:
1. format MGS
At this point, 'tunefs.lustre --print' on the MGT gives:
[root@perou2 ~]# tunefs.lustre --print /dev/disk/by-id/scsi-2003013841aac0025
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: MGS
Index: unassigned
Lustre FS: mgs
Mount type: ldiskfs
Flags: 0x74
(MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: failover.node=10.3.1.2@o2ib
Permanent disk data:
Target: MGS
Index: unassigned
Lustre FS: mgs
Mount type: ldiskfs
Flags: 0x74
(MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: failover.node=10.3.1.2@o2ib
exiting before disk write.
2. start MGS
3. format MDT
At this point, 'tunefs.lustre --print' on the MDT gives:
[root@perou3 ~]# tunefs.lustre --print /dev/disk/by-id/scsi-2003013841aac002d
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: fs_mdt-MDT0000
Index: 0
Lustre FS: fs_mdt
Mount type: ldiskfs
Flags: 0x61
(MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=10.5.1.3@o2ib lov.stripesize=1048576 failover.node=10.5.1.3@o2ib network=o2ib0
Permanent disk data:
Target: fs_mdt-MDT0000
Index: 0
Lustre FS: fs_mdt
Mount type: ldiskfs
Flags: 0x61
(MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=10.5.1.3@o2ib lov.stripesize=1048576 failover.node=10.5.1.3@o2ib network=o2ib0
exiting before disk write.
4. format OSTs
At this point, 'tunefs.lustre --print' on the OSTs gives:
[root@perou6 ~]# tunefs.lustre --print /dev/disk/by-id/scsi-2003013841aac0037
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: fs_mdt-OST0000
Index: 0
Lustre FS: fs_mdt
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.5.1.3@o2ib failover.node=10.5.1.6@o2ib network=o2ib0
Permanent disk data:
Target: fs_mdt-OST0000
Index: 0
Lustre FS: fs_mdt
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.5.1.3@o2ib failover.node=10.5.1.6@o2ib network=o2ib0
exiting before disk write.
[root@perou6 ~]# tunefs.lustre --print /dev/disk/by-id/scsi-2003013841aac0035
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: fs_mdt-OST0001
Index: 1
Lustre FS: fs_mdt
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.5.1.3@o2ib failover.node=10.5.1.6@o2ib network=o2ib0
Permanent disk data:
Target: fs_mdt-OST0001
Index: 1
Lustre FS: fs_mdt
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.5.1.3@o2ib failover.node=10.5.1.6@o2ib network=o2ib0
exiting before disk write.
5. start OSTs
At that point, here is the content of the MGS:
[root@perou2 ~]# ls toto/CONFIGS/
fs_mdt-client fs_mdt-OST0000 fs_mdt-OST0001 fs_mdt-sptlrpc _mgs-sptlrpc mountdata
[root@perou2 ~]# llog_reader toto/CONFIGS/fs_mdt-OST0000
Header size : 8192
Time : Fri Apr 13 15:07:03 2012
Number of records: 4
Target uuid : config_uuid
-----------------------
#01 (224)marker 5 (flags=0x01, v2.1.0.0) fs_mdt-OST0000 'add ost' Fri Apr 13 15:07:03 2012-
#02 (128)attach 0:fs_mdt-OST0000 1:obdfilter 2:fs_mdt-OST0000_UUID
#03 (112)setup 0:fs_mdt-OST0000 1:dev 2:type 3:f
#04 (224)marker 5 (flags=0x02, v2.1.0.0) fs_mdt-OST0000 'add ost' Fri Apr 13 15:07:03 2012-
[root@perou2 ~]#
[root@perou2 ~]# llog_reader toto/CONFIGS/fs_mdt-OST0001
Header size : 8192
Time : Fri Apr 13 15:07:03 2012
Number of records: 4
Target uuid : config_uuid
-----------------------
#01 (224)marker 1 (flags=0x01, v2.1.0.0) fs_mdt-OST0001 'add ost' Fri Apr 13 15:07:03 2012-
#02 (128)attach 0:fs_mdt-OST0001 1:obdfilter 2:fs_mdt-OST0001_UUID
#03 (112)setup 0:fs_mdt-OST0001 1:dev 2:type 3:f
#04 (224)marker 1 (flags=0x02, v2.1.0.0) fs_mdt-OST0001 'add ost' Fri Apr 13 15:07:03 2012-
[root@perou2 ~]#
6. start MDT
At that point, the content of the MGS is:
[root@perou2 ~]# ls toto/CONFIGS/
fs_mdt-client fs_mdt-MDT0000 fs_mdt-OST0000 fs_mdt-OST0001 fs_mdt-sptlrpc _mgs-sptlrpc mountdata
[root@perou2 ~]#
[root@perou2 ~]# llog_reader toto/CONFIGS/fs_mdt-OST0000
Header size : 8192
Time : Fri Apr 13 15:07:03 2012
Number of records: 4
Target uuid : config_uuid
-----------------------
#01 (224)marker 5 (flags=0x01, v2.1.0.0) fs_mdt-OST0000 'add ost' Fri Apr 13 15:07:03 2012-
#02 (128)attach 0:fs_mdt-OST0000 1:obdfilter 2:fs_mdt-OST0000_UUID
#03 (112)setup 0:fs_mdt-OST0000 1:dev 2:type 3:f
#04 (224)marker 5 (flags=0x02, v2.1.0.0) fs_mdt-OST0000 'add ost' Fri Apr 13 15:07:03 2012-
[root@perou2 ~]#
[root@perou2 ~]# llog_reader toto/CONFIGS/fs_mdt-OST0001
Header size : 8192
Time : Fri Apr 13 15:07:03 2012
Number of records: 4
Target uuid : config_uuid
-----------------------
#01 (224)marker 1 (flags=0x01, v2.1.0.0) fs_mdt-OST0001 'add ost' Fri Apr 13 15:07:03 2012-
#02 (128)attach 0:fs_mdt-OST0001 1:obdfilter 2:fs_mdt-OST0001_UUID
#03 (112)setup 0:fs_mdt-OST0001 1:dev 2:type 3:f
#04 (224)marker 1 (flags=0x02, v2.1.0.0) fs_mdt-OST0001 'add ost' Fri Apr 13 15:07:03 2012-
And in the syslog of the MDS node we have:
1334322067 2012 Apr 13 15:01:07 perou3 kern warning kernel LDISKFS-fs warning (device sdau): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.
1334322068 2012 Apr 13 15:01:08 perou3 kern info kernel LDISKFS-fs (sdau): barriers disabled
1334322068 2012 Apr 13 15:01:08 perou3 kern info kernel LDISKFS-fs (sdau): mounted filesystem with ordered data mode
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel LDISKFS-fs warning (device sdau): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel LDISKFS-fs (sdau): barriers disabled
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel LDISKFS-fs (sdau): mounted filesystem with ordered data mode
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel LDISKFS-fs warning (device sdau): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel LDISKFS-fs (sdau): barriers disabled
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel LDISKFS-fs (sdau): mounted filesystem with ordered data mode
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: 7750:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGC10.5.1.3@o2ib->MGC10.5.1.3@o2ib_0 netid 50000: select flavor null
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: MGC10.5.1.3@o2ib: Reactivating import
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel Lustre: Enabling ACL
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel Lustre: Enabling user_xattr
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: fs_mdt-MDT0000: new disk, initializing
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: 7783:0:(mds_lov.c:1004:mds_notify()) MDS mdd_obd-fs_mdt-MDT0000: add target fs_mdt-OST0001_UUID
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: 7783:0:(mds_lov.c:1004:mds_notify()) Skipped 1 previous similar message
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800069 sent from fs_mdt-OST0001-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: [
sent 1334322545] [real_sent 1334322545] [current 1334322545] [deadline 5s] [delay -5s] req@ffff88030a0dc000 x1398844479800069/t0(0) o-1->fs_mdt-OST0001_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322550 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800070 sent from fs_mdt-OST0000-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: [sent 1334322545] [real_sent 1334322545] [current 1334322545] [deadline 5s] [delay -5s] req@ffff88030a0f0000 x1398844479800070/t0(0) o-1->fs_mdt-OST0000_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322550 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
1334322570 2012 Apr 13 15:09:30 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 5s
1334322570 2012 Apr 13 15:09:30 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800072 sent from fs_mdt-OST0001-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: [sent 1334322570] [real_sent 1334322570] [current 1334322570] [deadline 10s] [delay -10s] req@ffff8803313d3400 x1398844479800072/t0(0) o-1->fs_mdt-OST0001_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322580 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
1334322577 2012 Apr 13 15:09:37 perou3 kern warning kernel Lustre: 619:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800071 sent from MGC10.5.1.3@o2ib to NID 10.5.1.3@o2ib has timed out for slow reply: [sent 1334322570] [real_sent 1334322570] [current 1334322577] [deadline 7s] [delay 0s] req@ffff88031fdf4000 x1398844479800071/t0(0) o-1->MGS@MGC10.5.1.3@o2ib_0:26/25 lens 192/192 e 0 to 1 dl 1334322577 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
1334322577 2012 Apr 13 15:09:37 perou3 kern warning kernel Lustre: 619:0:(client.c:1778:ptlrpc_expire_one_request()) Skipped 1 previous similar message
1334322577 2012 Apr 13 15:09:37 perou3 kern err kernel LustreError: 166-1: MGC10.5.1.3@o2ib: Connection to service MGS via nid 10.5.1.3@o2ib was lost; in progress operations using this service will fail.
1334322583 2012 Apr 13 15:09:43 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800074 sent from MGC10.5.1.3@o2ib to NID 10.5.1.3@o2ib has timed out for slow reply: [sent 1334322577] [real_sent 1334322577] [current 1334322583] [deadline 6s] [delay 0s] req@ffff88031fdf4000 x1398844479800074/t0(0) o-1->MGS@MGC10.5.1.3@o2ib_0:26/25 lens 368/512 e 0 to 1 dl 1334322583 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
1334322602 2012 Apr 13 15:10:02 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) MGC10.5.1.3@o2ib: tried all connections, increasing latency to 6s
1334322602 2012 Apr 13 15:10:02 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 1 previous similar message
1334322602 2012 Apr 13 15:10:02 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800076 sent from fs_mdt-OST0001-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: [sent 1334322602] [real_sent 1334322602] [current 1334322602] [deadline 15s] [delay -15s] req@ffff88017cafa800 x1398844479800076/t0(0) o-1->fs_mdt-OST0001_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322617 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
1334322613 2012 Apr 13 15:10:13 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800075 sent from MGC10.5.1.3@o2ib to NID 10.5.1.3@o2ib has timed out for slow reply: [sent 1334322602] [real_sent 1334322602] [current 1334322613] [deadline 11s] [delay 0s] req@ffff88030a074800 x1398844479800075/t0(0) o-1->MGS@MGC10.5.1.3@o2ib_0:26/25 lens 368/512 e 0 to 1 dl 1334322613 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
1334322613 2012 Apr 13 15:10:13 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) Skipped 1 previous similar message
1334322627 2012 Apr 13 15:10:27 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) MGC10.5.1.3@o2ib: tried all connections, increasing latency to 11s
1334322627 2012 Apr 13 15:10:27 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 2 previous similar messages
1334322643 2012 Apr 13 15:10:43 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800078 sent from MGC10.5.1.3@o2ib to NID 10.5.1.3@o2ib has timed out for slow reply: [sent 1334322627] [real_sent 1334322627] [current 1334322643] [deadline 16s] [delay 0s] req@ffff88032908c000 x1398844479800078/t0(0) o-1->MGS@MGC10.5.1.3@o2ib_0:26/25 lens 368/512 e 0 to 1 dl 1334322643 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
1334322643 2012 Apr 13 15:10:43 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) Skipped 1 previous similar message
1334322652 2012 Apr 13 15:10:52 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) MGC10.5.1.3@o2ib: tried all connections, increasing latency to 16s 1334322652 2012 Apr 13 15:10:52 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 1 previous similar message
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) MGC10.5.1.3@o2ib: tried all connections, increasing latency to 21s
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 2 previous similar messages
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800084 sent from fs_mdt-OST0001-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: [sent 1334322677] [real_sent 1334322677] [current 1334322677] [deadline 30s] [delay -30s] req@ffff8801c35a6c00 x1398844479800084/t0(0) o-1->fs_mdt-OST0001_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322707 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: 620:0:(import.c:852:ptlrpc_connect_interpret()) MGS@MGC10.5.1.3@o2ib_0 changed server handle from 0x555b88f8bfb49318 to 0x555b88f8bfb49373
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: MGC10.5.1.3@o2ib: Reactivating import
1334322677 2012 Apr 13 15:11:17 perou3 kern info kernel Lustre: MGC10.5.1.3@o2ib: Connection restored to service MGS using nid 10.5.1.3@o2ib.
1334322702 2012 Apr 13 15:11:42 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 30s
1334322702 2012 Apr 13 15:11:42 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 2 previous similar messages
1334322727 2012 Apr 13 15:12:07 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 35s
1334322727 2012 Apr 13 15:12:07 perou3 kern err kernel LustreError: 11-0: an error occurred while communicating with 10.5.1.3@o2ib. The obd_ping operation failed with -107
1334322727 2012 Apr 13 15:12:07 perou3 kern err kernel LustreError: 166-1: MGC10.5.1.3@o2ib: Connection to service MGS via nid 10.5.1.3@o2ib was lost; in progress operations using this service will fail.
1334322727 2012 Apr 13 15:12:07 perou3 kern warning kernel Lustre: 620:0:(import.c:852:ptlrpc_connect_interpret()) MGS@MGC10.5.1.3@o2ib_0 changed server handle from 0x555b88f8bfb49373 to 0x555b88f8bfb493dc
1334322727 2012 Apr 13 15:12:07 perou3 kern warning kernel Lustre: MGC10.5.1.3@o2ib: Reactivating import
1334322727 2012 Apr 13 15:12:07 perou3 kern info kernel Lustre: MGC10.5.1.3@o2ib: Connection restored to service MGS using nid 10.5.1.3@o2ib.
1334322752 2012 Apr 13 15:12:32 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 40s
1334322752 2012 Apr 13 15:12:32 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 1 previous similar message
1334322752 2012 Apr 13 15:12:32 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800103 sent from fs_mdt-OST0001-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: [sent 1334322752] [real_sent 1334322752] [current 1334322752] [deadline 45s] [delay -45s] req@ffff88030a0dc800 x1398844479800103/t0(0) o-1->fs_mdt-OST0001_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322797 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
1334322752 2012 Apr 13 15:12:32 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) Skipped 4 previous similar messages
1334322802 2012 Apr 13 15:13:22 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 50s
1334322802 2012 Apr 13 15:13:22 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 2 previous similar messages
1334322877 2012 Apr 13 15:14:37 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 50s
1334322877 2012 Apr 13 15:14:37 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 5 previous similar messages
On the MDS, we can see:
[root@perou3 ~]# lctl dl
0 UP mgc MGC10.5.1.3@o2ib f2d4e47f-96b1-e539-f0e1-e125d27e617f 5
1 UP lov fs_mdt-MDT0000-mdtlov fs_mdt-MDT0000-mdtlov_UUID 4
2 UP mdt fs_mdt-MDT0000 fs_mdt-MDT0000_UUID 3
3 UP mds mdd_obd-fs_mdt-MDT0000 mdd_obd_uuid-fs_mdt-MDT0000 3
4 UP osc fs_mdt-OST0001-osc-MDT0000 fs_mdt-MDT0000-mdtlov_UUID 5
5 UP osc fs_mdt-OST0000-osc-MDT0000 fs_mdt-MDT0000-mdtlov_UUID 5
7. Mount client
In the client syslog, we have:
1336046968 2012 May 3 14:09:28 perou7 kern info kernel Lustre: OBD class driver, http://wiki.whamcloud.com/
1336046968 2012 May 3 14:09:28 perou7 kern info kernel Lustre: Lustre Version: 2.1.0
1336046968 2012 May 3 14:09:28 perou7 kern info kernel Lustre: Build Version: B-2_1_0_0-lustrebull-20120404161806-CHANGED-2.6.32-71.24.1.bl6.Bull.23.x86_64
1336046968 2012 May 3 14:09:28 perou7 kern info kernel Lustre: Lustre LU module (ffffffffa053c2c0).
1336046968 2012 May 3 14:09:28 perou7 kern info kernel Lustre: Register global MR array, MR size: 0xffffffffffffffff, array size: 1
1336046968 2012 May 3 14:09:28 perou7 kern info kernel Lustre: Added LNI 10.5.1.6@o2ib [8/64/0/180]
1336046968 2012 May 3 14:09:28 perou7 kern info kernel Lustre: Lustre OSC module (ffffffffa09780c0).
1336046968 2012 May 3 14:09:28 perou7 kern info kernel Lustre: Lustre LOV module (ffffffffa09e3e40).
1336046968 2012 May 3 14:09:28 perou7 kern info kernel Lustre: Lustre client module (ffffffffa0d392e0).
1336046968 2012 May 3 14:09:28 perou7 user info logger lustre-tune: 0 devices have been tuned.
1336046968 2012 May 3 14:09:28 perou7 kern warning kernel Lustre: 21809:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGC10.5.1.3@o2ib->MGC10.5.1.3@o2ib_0 netid 50000: select flavor null
1336046968 2012 May 3 14:09:28 perou7 kern err kernel LustreError: 152-6: Ignoring deprecated mount option 'acl'.
1336046968 2012 May 3 14:09:28 perou7 kern warning kernel Lustre: MGC10.5.1.3@o2ib: Reactivating import
1336046968 2012 May 3 14:09:28 perou7 kern warning kernel Lustre: 21809:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import fs_mdt-MDT0000-mdc-ffff8802d12c9000->10.5.1.4@o2ib netid 50000: select flavor null
1336046968 2012 May 3 14:09:28 perou7 kern err kernel LustreError: 11-0: an error occurred while communicating with 10.5.1.4@o2ib. The mds_connect operation failed with -11
1336046993 2012 May 3 14:09:53 perou7 kern err kernel LustreError: 11-0: an error occurred while communicating with 10.5.1.4@o2ib. The mds_connect operation failed with -11
1336047018 2012 May 3 14:10:18 perou7 kern err kernel LustreError: 11-0: an error occurred while communicating with 10.5.1.4@o2ib. The mds_connect operation failed with -11
At the same time, in the MDS log we have:
1336046968 2012 May 3 14:09:28 perou3 kern warning kernel Lustre: fs_mdt-MDT0000: temporarily refusing client connection from 10.5.1.6@o2ib
1336046968 2012 May 3 14:09:28 perou3 kern err kernel LustreError: 26684:0:(ldlm_lib.c:2137:target_send_reply_msg()) @@@ processing error (11) req@ffff88032beafc00 x1400946785517577/t0(0) o-1><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1336047068 ref 1 fl Interpret:/ffffffff/ffffffff rc -11/-1
1336046993 2012 May 3 14:09:53 perou3 kern warning kernel Lustre: fs_mdt-MDT0000: temporarily refusing client connection from 10.5.1.6@o2ib
1336046993 2012 May 3 14:09:53 perou3 kern err kernel LustreError: 26684:0:(ldlm_lib.c:2137:target_send_reply_msg()) @@@ processing error (11) req@ffff88032bee3000 x1400946785517580/t0(0) o-1><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1336047093 ref 1 fl Interpret:/ffffffff/ffffffff rc -11/-1
This issue looks like LU-350, but the fact is the fix from this ticket is landed into Lustre 2.1.
The wierd thing here is that when the MDT is started, it tries to reach the failover node of the OSTs (NID 10.5.1.6@o2ib) and apparently not their primary node.
Of course, when starting the MDT before the OSTs, the MDT connects directly to the OSTs with the right NID, ie the primary one.
Regards,
Sebastien.
|