[LU-3038] OST nid is being messed up in the config log 2.4 Created: 26/Mar/13  Updated: 25/May/13  Resolved: 28/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.1.5
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: MB, mq213

Issue Links:
Duplicate
is duplicated by LU-3004 MGS llog choke on to much parameter data Resolved
Severity: 3
Rank (Obsolete): 7410

 Description   

We found this problem during hyperion DNE test. And I can manage to reproduce it locally. with separate MGS/MDS and 2 OSSes(one OST per OSS).

1. MOUNT mgs, then MDT, then OST, the config log seems fine.

[root@mds tests]# ../utils/llog_reader /mnt/mgs/CONFIGS/lustre-MDT0000 
Header size : 8192
Time : Mon Mar 25 23:32:55 2013
Number of records: 21
Target uuid : config_uuid 
-----------------------
#01 (224)marker   2 (flags=0x01, v2.3.63.0) lustre-MDT0000-mdtlov 'lov setup' Mon Mar 25 23:32:55 2013-
#02 (136)attach    0:lustre-MDT0000-mdtlov  1:lov  2:lustre-MDT0000-mdtlov_UUID  
#03 (176)lov_setup 0:lustre-MDT0000-mdtlov  1:(struct lov_desc)
		uuid=lustre-MDT0000-mdtlov_UUID  stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1
#04 (224)marker   2 (flags=0x02, v2.3.63.0) lustre-MDT0000-mdtlov 'lov setup' Mon Mar 25 23:32:55 2013-
#05 (224)marker   3 (flags=0x01, v2.3.63.0) lustre-MDT0000  'add mdt' Mon Mar 25 23:32:55 2013-
#06 (120)attach    0:lustre-MDT0000  1:mdt  2:lustre-MDT0000_UUID  
#07 (112)mount_option 0:  1:lustre-MDT0000  2:lustre-MDT0000-mdtlov  
#08 (160)setup     0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f  
#09 (224)marker   3 (flags=0x02, v2.3.63.0) lustre-MDT0000  'add mdt' Mon Mar 25 23:32:55 2013-
#10 (224)marker   9 (flags=0x01, v2.3.63.0) lustre-OST0000  'add osc' Mon Mar 25 23:33:44 2013-
#11 (088)add_uuid  nid=172.16.151.130@tcp(0x20000ac109782)  0:  1:172.16.151.130@tcp  
#12 (144)attach    0:lustre-OST0000-osc-MDT0000  1:osc  2:lustre-MDT0000-mdtlov_UUID  
#13 (152)setup     0:lustre-OST0000-osc-MDT0000  1:lustre-OST0000_UUID  2:172.16.151.130@tcp  
#14 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov  1:lustre-OST0000_UUID  2:0  3:1  
#15 (224)marker   9 (flags=0x02, v2.3.63.0) lustre-OST0000  'add osc' Mon Mar 25 23:33:44 2013-
#16 (224)marker  12 (flags=0x01, v2.3.63.0) lustre-OST0002  'add osc' Mon Mar 25 23:34:00 2013-
#17 (088)add_uuid  nid=172.16.151.131@tcp(0x20000ac109783)  0:  1:172.16.151.131@tcp  
#18 (144)attach    0:lustre-OST0002-osc-MDT0000  1:osc  2:lustre-MDT0000-mdtlov_UUID  
#19 (152)setup     0:lustre-OST0002-osc-MDT0000  1:lustre-OST0002_UUID  2:172.16.151.131@tcp  
#20 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov  1:lustre-OST0002_UUID  2:2  3:1  
#21 (224)marker  12 (flags=0x02, v2.3.63.0) lustre-OST0002  'add osc' Mon Mar 25 23:34:00 2013-

But if we mount mgs, then OSTs, then MDT, the ost NID seems being messed up, i.e. both OSTs is being to the same NID.

[root@mds tests]# ../utils/llog_reader /mnt/mgs/CONFIGS/lustre-MDT0000 
Header size : 8192
Time : Mon Mar 25 22:50:27 2013
Number of records: 27
Target uuid : config_uuid 
-----------------------
#01 (224)marker   8 (flags=0x01, v2.3.63.0) lustre-MDT0000-mdtlov 'lov setup' Mon Mar 25 22:50:27 2013-
#02 (136)attach    0:lustre-MDT0000-mdtlov  1:lov  2:lustre-MDT0000-mdtlov_UUID  
#03 (176)lov_setup 0:lustre-MDT0000-mdtlov  1:(struct lov_desc)
		uuid=lustre-MDT0000-mdtlov_UUID  stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1
#04 (224)marker   8 (flags=0x02, v2.3.63.0) lustre-MDT0000-mdtlov 'lov setup' Mon Mar 25 22:50:27 2013-
#05 (224)marker   9 (flags=0x01, v2.3.63.0) lustre-MDT0000  'add mdt' Mon Mar 25 22:50:27 2013-
#06 (120)attach    0:lustre-MDT0000  1:mdt  2:lustre-MDT0000_UUID  
#07 (112)mount_option 0:  1:lustre-MDT0000  2:lustre-MDT0000-mdtlov  
#08 (160)setup     0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f  
#09 (224)marker   9 (flags=0x02, v2.3.63.0) lustre-MDT0000  'add mdt' Mon Mar 25 22:50:27 2013-
#10 (224)marker  10 (flags=0x01, v2.3.63.0) lustre-MDT0000  'add osc(copied)' Mon Mar 25 22:50:27 2013-
#11 (224)marker  11 (flags=0x01, v2.3.63.0) lustre-OST0001  'add osc' Mon Mar 25 22:50:27 2013-
#12 (088)add_uuid  nid=172.16.151.130@tcp(0x20000ac109782)  0:  1:172.16.151.130@tcp  
#13 (144)attach    0:lustre-OST0001-osc-MDT0000  1:osc  2:lustre-MDT0000-mdtlov_UUID  
#14 (152)setup     0:lustre-OST0001-osc-MDT0000  1:lustre-OST0001_UUID  2:172.16.151.130@tcp  
#15 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov  1:lustre-OST0001_UUID  2:1  3:1  
#16 (224)marker  11 (flags=0x02, v2.3.63.0) lustre-OST0001  'add osc' Mon Mar 25 22:50:27 2013-
#17 (224)marker  11 (flags=0x02, v2.3.63.0) lustre-MDT0000  'add osc(copied)' Mon Mar 25 22:50:27 2013-
#18 (224)marker  12 (flags=0x01, v2.3.63.0) lustre-MDT0000  'add osc(copied)' Mon Mar 25 22:50:27 2013-
#19 (224)marker  13 (flags=0x01, v2.3.63.0) lustre-OST0002  'add osc' Mon Mar 25 22:50:27 2013-
#20 (088)add_uuid  nid=172.16.151.130@tcp(0x20000ac109782)  0:  1:172.16.151.130@tcp  
#21 (144)attach    0:lustre-OST0002-osc-MDT0000  1:osc  2:lustre-MDT0000-mdtlov_UUID  
#22 (152)setup     0:lustre-OST0002-osc-MDT0000  1:lustre-OST0002_UUID  2:172.16.151.130@tcp  
#23 (088)add_uuid  nid=172.16.151.131@tcp(0x20000ac109783)  0:  1:172.16.151.131@tcp  
#24 (120)add_conn  0:lustre-OST0002-osc-MDT0000  1:172.16.151.131@tcp  
#25 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov  1:lustre-OST0002_UUID  2:2  3:1  
#26 (224)marker  13 (flags=0x02, v2.3.63.0) lustre-OST0002  'add osc' Mon Mar 25 22:50:27 2013-
#27 (224)marker  13 (flags=0x02, v2.3.63.0) lustre-MDT0000  'add osc(copied)' Mon Mar 25 22:50:27 2013-



 Comments   
Comment by Di Wang [ 27/Mar/13 ]

http://review.whamcloud.com/5851

Comment by Peter Jones [ 28/Mar/13 ]

Landed for 2.4

Comment by Jian Yu [ 29/Mar/13 ]

The same issue exists on Lustre b2_1 branch.

Separate MGS/MDS and 2 OSSes (one OST per OSS):
1) mount MGS, then MDT, then OSTs, the config log looks fine:

[root@client-12vm3 ~]# llog_reader /mnt/mgs/CONFIGS/lustre-MDT0000
Header size : 8192
Time : Thu Mar 28 20:24:16 2013
Number of records: 39  
Target uuid : config_uuid
-----------------------
#01 (224)marker   1 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov setup' Thu Mar 28 20:24:16 2013-
#02 (136)attach    0:lustre-MDT0000-mdtlov  1:lov  2:lustre-MDT0000-mdtlov_UUID
#03 (176)lov_setup 0:lustre-MDT0000-mdtlov  1:(struct lov_desc)
                uuid=lustre-MDT0000-mdtlov_UUID  stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1
#04 (224)marker   1 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov setup' Thu Mar 28 20:24:16 2013-
#05 (224)marker   2 (flags=0x01, v2.1.5.0) lustre-MDT0000  'add mdt' Thu Mar 28 20:24:16 2013-
#06 (120)attach    0:lustre-MDT0000  1:mdt  2:lustre-MDT0000_UUID
#07 (112)mount_option 0:  1:lustre-MDT0000  2:lustre-MDT0000-mdtlov
#08 (160)setup     0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f
#09 (224)marker   2 (flags=0x02, v2.1.5.0) lustre-MDT0000  'add mdt' Thu Mar 28 20:24:16 2013-
#10 (224)SKIP START marker   8 (flags=0x05, v2.1.5.0) lustre          'sys.timeout' Thu Mar 28 20:24:16 2013-Thu Mar 28 20:24:17 2013
#11 (080)SKIP set_timeout=20
#12 (224)SKIP END   marker   8 (flags=0x06, v2.1.5.0) lustre          'sys.timeout' Thu Mar 28 20:24:16 2013-Thu Mar 28 20:24:17 2013
#13 (224)marker  10 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripesize' Thu Mar 28 20:24:16 2013-
#14 (112)param 0:lustre-MDT0000-mdtlov  1:lov.stripesize=1048576
#15 (224)marker  10 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripesize' Thu Mar 28 20:24:16 2013-
#16 (224)marker  12 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripecount' Thu Mar 28 20:24:16 2013-
#17 (112)param 0:lustre-MDT0000-mdtlov  1:lov.stripecount=0
#18 (224)marker  12 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripecount' Thu Mar 28 20:24:16 2013-
#19 (224)marker  14 (flags=0x01, v2.1.5.0) lustre-MDT0000  'mdt.identity_upcall' Thu Mar 28 20:24:16 2013-
#20 (128)param 0:lustre-MDT0000  1:mdt.identity_upcall=/usr/sbin/l_getidentity
#21 (224)marker  14 (flags=0x02, v2.1.5.0) lustre-MDT0000  'mdt.identity_upcall' Thu Mar 28 20:24:16 2013-
#22 (224)marker  16 (flags=0x01, v2.1.5.0) lustre-OST0000  'add osc' Thu Mar 28 20:24:17 2013-
#23 (080)add_uuid  nid=10.10.4.208@tcp(0x200000a0a04d0)  0:  1:10.10.4.208@tcp
#24 (144)attach    0:lustre-OST0000-osc-MDT0000  1:osc  2:lustre-MDT0000-mdtlov_UUID
#25 (144)setup     0:lustre-OST0000-osc-MDT0000  1:lustre-OST0000_UUID  2:10.10.4.208@tcp
#26 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov  1:lustre-OST0000_UUID  2:0  3:1
#27 (224)marker  16 (flags=0x02, v2.1.5.0) lustre-OST0000  'add osc' Thu Mar 28 20:24:17 2013-
#28 (224)SKIP START marker  20 (flags=0x05, v2.1.5.0) lustre          'sys.timeout' Thu Mar 28 20:24:17 2013-Thu Mar 28 20:24:19 2013
#29 (080)SKIP set_timeout=20
#30 (224)SKIP END   marker  20 (flags=0x06, v2.1.5.0) lustre          'sys.timeout' Thu Mar 28 20:24:17 2013-Thu Mar 28 20:24:19 2013
#31 (224)marker  23 (flags=0x01, v2.1.5.0) lustre-OST0001  'add osc' Thu Mar 28 20:24:19 2013-
#32 (080)add_uuid  nid=10.10.4.209@tcp(0x200000a0a04d1)  0:  1:10.10.4.209@tcp
#33 (144)attach    0:lustre-OST0001-osc-MDT0000  1:osc  2:lustre-MDT0000-mdtlov_UUID
#34 (144)setup     0:lustre-OST0001-osc-MDT0000  1:lustre-OST0001_UUID  2:10.10.4.209@tcp
#35 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov  1:lustre-OST0001_UUID  2:1  3:1
#36 (224)marker  23 (flags=0x02, v2.1.5.0) lustre-OST0001  'add osc' Thu Mar 28 20:24:19 2013-
#37 (224)marker  28 (flags=0x01, v2.1.5.0) lustre          'sys.timeout' Thu Mar 28 20:24:19 2013-
#38 (080)set_timeout=20
#39 (224)marker  28 (flags=0x02, v2.1.5.0) lustre          'sys.timeout' Thu Mar 28 20:24:19 2013-

2) mount MGS, then OSTs, then MDT, the OST NIDs are messed up, i.e. both OSTs have the same NID:

# llog_reader /mnt/mgs/CONFIGS/lustre-MDT0000 
Header size : 8192
Time : Thu Mar 28 20:34:47 2013
Number of records: 38
Target uuid : config_uuid 
-----------------------
#01 (224)marker  14 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov setup' Thu Mar 28 20:34:47 2013-
#02 (136)attach    0:lustre-MDT0000-mdtlov  1:lov  2:lustre-MDT0000-mdtlov_UUID  
#03 (176)lov_setup 0:lustre-MDT0000-mdtlov  1:(struct lov_desc)
                uuid=lustre-MDT0000-mdtlov_UUID  stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1
#04 (224)marker  14 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov setup' Thu Mar 28 20:34:47 2013-
#05 (224)marker  15 (flags=0x01, v2.1.5.0) lustre-MDT0000  'add mdt' Thu Mar 28 20:34:47 2013-
#06 (120)attach    0:lustre-MDT0000  1:mdt  2:lustre-MDT0000_UUID  
#07 (112)mount_option 0:  1:lustre-MDT0000  2:lustre-MDT0000-mdtlov  
#08 (160)setup     0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f  
#09 (224)marker  15 (flags=0x02, v2.1.5.0) lustre-MDT0000  'add mdt' Thu Mar 28 20:34:47 2013-
#10 (224)marker  16 (flags=0x01, v2.1.5.0) lustre-MDT0000  'add osc(copied)' Thu Mar 28 20:34:47 2013-
#11 (224)marker  17 (flags=0x01, v2.1.5.0) lustre-OST0000  'add osc' Thu Mar 28 20:34:47 2013-
#12 (080)add_uuid  nid=10.10.4.208@tcp(0x200000a0a04d0)  0:  1:10.10.4.208@tcp  
#13 (144)attach    0:lustre-OST0000-osc-MDT0000  1:osc  2:lustre-MDT0000-mdtlov_UUID  
#14 (144)setup     0:lustre-OST0000-osc-MDT0000  1:lustre-OST0000_UUID  2:10.10.4.208@tcp  
#15 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov  1:lustre-OST0000_UUID  2:0  3:1  
#16 (224)marker  17 (flags=0x02, v2.1.5.0) lustre-OST0000  'add osc' Thu Mar 28 20:34:47 2013-
#17 (224)marker  17 (flags=0x02, v2.1.5.0) lustre-MDT0000  'add osc(copied)' Thu Mar 28 20:34:47 2013-
#18 (224)marker  18 (flags=0x01, v2.1.5.0) lustre-MDT0000  'add osc(copied)' Thu Mar 28 20:34:47 2013-
#19 (224)marker  19 (flags=0x01, v2.1.5.0) lustre-OST0001  'add osc' Thu Mar 28 20:34:47 2013-
#20 (080)add_uuid  nid=10.10.4.208@tcp(0x200000a0a04d0)  0:  1:10.10.4.208@tcp  
#21 (080)add_uuid  nid=10.10.4.209@tcp(0x200000a0a04d1)  0:  1:10.10.4.208@tcp  
#22 (144)attach    0:lustre-OST0001-osc-MDT0000  1:osc  2:lustre-MDT0000-mdtlov_UUID  
#23 (144)setup     0:lustre-OST0001-osc-MDT0000  1:lustre-OST0001_UUID  2:10.10.4.208@tcp  
#24 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov  1:lustre-OST0001_UUID  2:1  3:1  
#25 (224)marker  19 (flags=0x02, v2.1.5.0) lustre-OST0001  'add osc' Thu Mar 28 20:34:47 2013-
#26 (224)marker  19 (flags=0x02, v2.1.5.0) lustre-MDT0000  'add osc(copied)' Thu Mar 28 20:34:47 2013-
#27 (224)marker  23 (flags=0x01, v2.1.5.0) lustre          'sys.timeout' Thu Mar 28 20:34:47 2013-
#28 (080)set_timeout=20 
#29 (224)marker  23 (flags=0x02, v2.1.5.0) lustre          'sys.timeout' Thu Mar 28 20:34:47 2013-
#30 (224)marker  27 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripesize' Thu Mar 28 20:34:47 2013-
#31 (112)param 0:lustre-MDT0000-mdtlov  1:lov.stripesize=1048576  
#32 (224)marker  27 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripesize' Thu Mar 28 20:34:47 2013-
#33 (224)marker  29 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripecount' Thu Mar 28 20:34:47 2013-
#34 (112)param 0:lustre-MDT0000-mdtlov  1:lov.stripecount=0  
#35 (224)marker  29 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripecount' Thu Mar 28 20:34:47 2013-
#36 (224)marker  31 (flags=0x01, v2.1.5.0) lustre-MDT0000  'mdt.identity_upcall' Thu Mar 28 20:34:47 2013-
#37 (128)param 0:lustre-MDT0000  1:mdt.identity_upcall=/usr/sbin/l_getidentity  
#38 (224)marker  31 (flags=0x02, v2.1.5.0) lustre-MDT0000  'mdt.identity_upcall' Thu Mar 28 20:34:47 2013-
Comment by Jian Yu [ 25/May/13 ]

Since the issue also exists on Lustre b2_1 branch, should we back-port the patch http://review.whamcloud.com/5851 to Lustre b2_1 branch?

Generated at Sat Feb 10 01:30:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.