[LU-3038] OST nid is being messed up in the config log 2.4 Created: 26/Mar/13 Updated: 25/May/13 Resolved: 28/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.1.5 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Di Wang | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | MB, mq213 | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 7410 | ||||||||
| Description |
|
We found this problem during hyperion DNE test. And I can manage to reproduce it locally. with separate MGS/MDS and 2 OSSes(one OST per OSS). 1. MOUNT mgs, then MDT, then OST, the config log seems fine. [root@mds tests]# ../utils/llog_reader /mnt/mgs/CONFIGS/lustre-MDT0000 Header size : 8192 Time : Mon Mar 25 23:32:55 2013 Number of records: 21 Target uuid : config_uuid ----------------------- #01 (224)marker 2 (flags=0x01, v2.3.63.0) lustre-MDT0000-mdtlov 'lov setup' Mon Mar 25 23:32:55 2013- #02 (136)attach 0:lustre-MDT0000-mdtlov 1:lov 2:lustre-MDT0000-mdtlov_UUID #03 (176)lov_setup 0:lustre-MDT0000-mdtlov 1:(struct lov_desc) uuid=lustre-MDT0000-mdtlov_UUID stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1 #04 (224)marker 2 (flags=0x02, v2.3.63.0) lustre-MDT0000-mdtlov 'lov setup' Mon Mar 25 23:32:55 2013- #05 (224)marker 3 (flags=0x01, v2.3.63.0) lustre-MDT0000 'add mdt' Mon Mar 25 23:32:55 2013- #06 (120)attach 0:lustre-MDT0000 1:mdt 2:lustre-MDT0000_UUID #07 (112)mount_option 0: 1:lustre-MDT0000 2:lustre-MDT0000-mdtlov #08 (160)setup 0:lustre-MDT0000 1:lustre-MDT0000_UUID 2:0 3:lustre-MDT0000-mdtlov 4:f #09 (224)marker 3 (flags=0x02, v2.3.63.0) lustre-MDT0000 'add mdt' Mon Mar 25 23:32:55 2013- #10 (224)marker 9 (flags=0x01, v2.3.63.0) lustre-OST0000 'add osc' Mon Mar 25 23:33:44 2013- #11 (088)add_uuid nid=172.16.151.130@tcp(0x20000ac109782) 0: 1:172.16.151.130@tcp #12 (144)attach 0:lustre-OST0000-osc-MDT0000 1:osc 2:lustre-MDT0000-mdtlov_UUID #13 (152)setup 0:lustre-OST0000-osc-MDT0000 1:lustre-OST0000_UUID 2:172.16.151.130@tcp #14 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov 1:lustre-OST0000_UUID 2:0 3:1 #15 (224)marker 9 (flags=0x02, v2.3.63.0) lustre-OST0000 'add osc' Mon Mar 25 23:33:44 2013- #16 (224)marker 12 (flags=0x01, v2.3.63.0) lustre-OST0002 'add osc' Mon Mar 25 23:34:00 2013- #17 (088)add_uuid nid=172.16.151.131@tcp(0x20000ac109783) 0: 1:172.16.151.131@tcp #18 (144)attach 0:lustre-OST0002-osc-MDT0000 1:osc 2:lustre-MDT0000-mdtlov_UUID #19 (152)setup 0:lustre-OST0002-osc-MDT0000 1:lustre-OST0002_UUID 2:172.16.151.131@tcp #20 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov 1:lustre-OST0002_UUID 2:2 3:1 #21 (224)marker 12 (flags=0x02, v2.3.63.0) lustre-OST0002 'add osc' Mon Mar 25 23:34:00 2013- But if we mount mgs, then OSTs, then MDT, the ost NID seems being messed up, i.e. both OSTs is being to the same NID. [root@mds tests]# ../utils/llog_reader /mnt/mgs/CONFIGS/lustre-MDT0000 Header size : 8192 Time : Mon Mar 25 22:50:27 2013 Number of records: 27 Target uuid : config_uuid ----------------------- #01 (224)marker 8 (flags=0x01, v2.3.63.0) lustre-MDT0000-mdtlov 'lov setup' Mon Mar 25 22:50:27 2013- #02 (136)attach 0:lustre-MDT0000-mdtlov 1:lov 2:lustre-MDT0000-mdtlov_UUID #03 (176)lov_setup 0:lustre-MDT0000-mdtlov 1:(struct lov_desc) uuid=lustre-MDT0000-mdtlov_UUID stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1 #04 (224)marker 8 (flags=0x02, v2.3.63.0) lustre-MDT0000-mdtlov 'lov setup' Mon Mar 25 22:50:27 2013- #05 (224)marker 9 (flags=0x01, v2.3.63.0) lustre-MDT0000 'add mdt' Mon Mar 25 22:50:27 2013- #06 (120)attach 0:lustre-MDT0000 1:mdt 2:lustre-MDT0000_UUID #07 (112)mount_option 0: 1:lustre-MDT0000 2:lustre-MDT0000-mdtlov #08 (160)setup 0:lustre-MDT0000 1:lustre-MDT0000_UUID 2:0 3:lustre-MDT0000-mdtlov 4:f #09 (224)marker 9 (flags=0x02, v2.3.63.0) lustre-MDT0000 'add mdt' Mon Mar 25 22:50:27 2013- #10 (224)marker 10 (flags=0x01, v2.3.63.0) lustre-MDT0000 'add osc(copied)' Mon Mar 25 22:50:27 2013- #11 (224)marker 11 (flags=0x01, v2.3.63.0) lustre-OST0001 'add osc' Mon Mar 25 22:50:27 2013- #12 (088)add_uuid nid=172.16.151.130@tcp(0x20000ac109782) 0: 1:172.16.151.130@tcp #13 (144)attach 0:lustre-OST0001-osc-MDT0000 1:osc 2:lustre-MDT0000-mdtlov_UUID #14 (152)setup 0:lustre-OST0001-osc-MDT0000 1:lustre-OST0001_UUID 2:172.16.151.130@tcp #15 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov 1:lustre-OST0001_UUID 2:1 3:1 #16 (224)marker 11 (flags=0x02, v2.3.63.0) lustre-OST0001 'add osc' Mon Mar 25 22:50:27 2013- #17 (224)marker 11 (flags=0x02, v2.3.63.0) lustre-MDT0000 'add osc(copied)' Mon Mar 25 22:50:27 2013- #18 (224)marker 12 (flags=0x01, v2.3.63.0) lustre-MDT0000 'add osc(copied)' Mon Mar 25 22:50:27 2013- #19 (224)marker 13 (flags=0x01, v2.3.63.0) lustre-OST0002 'add osc' Mon Mar 25 22:50:27 2013- #20 (088)add_uuid nid=172.16.151.130@tcp(0x20000ac109782) 0: 1:172.16.151.130@tcp #21 (144)attach 0:lustre-OST0002-osc-MDT0000 1:osc 2:lustre-MDT0000-mdtlov_UUID #22 (152)setup 0:lustre-OST0002-osc-MDT0000 1:lustre-OST0002_UUID 2:172.16.151.130@tcp #23 (088)add_uuid nid=172.16.151.131@tcp(0x20000ac109783) 0: 1:172.16.151.131@tcp #24 (120)add_conn 0:lustre-OST0002-osc-MDT0000 1:172.16.151.131@tcp #25 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov 1:lustre-OST0002_UUID 2:2 3:1 #26 (224)marker 13 (flags=0x02, v2.3.63.0) lustre-OST0002 'add osc' Mon Mar 25 22:50:27 2013- #27 (224)marker 13 (flags=0x02, v2.3.63.0) lustre-MDT0000 'add osc(copied)' Mon Mar 25 22:50:27 2013- |
| Comments |
| Comment by Di Wang [ 27/Mar/13 ] |
| Comment by Peter Jones [ 28/Mar/13 ] |
|
Landed for 2.4 |
| Comment by Jian Yu [ 29/Mar/13 ] |
|
The same issue exists on Lustre b2_1 branch. Separate MGS/MDS and 2 OSSes (one OST per OSS): [root@client-12vm3 ~]# llog_reader /mnt/mgs/CONFIGS/lustre-MDT0000
Header size : 8192
Time : Thu Mar 28 20:24:16 2013
Number of records: 39
Target uuid : config_uuid
-----------------------
#01 (224)marker 1 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov setup' Thu Mar 28 20:24:16 2013-
#02 (136)attach 0:lustre-MDT0000-mdtlov 1:lov 2:lustre-MDT0000-mdtlov_UUID
#03 (176)lov_setup 0:lustre-MDT0000-mdtlov 1:(struct lov_desc)
uuid=lustre-MDT0000-mdtlov_UUID stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1
#04 (224)marker 1 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov setup' Thu Mar 28 20:24:16 2013-
#05 (224)marker 2 (flags=0x01, v2.1.5.0) lustre-MDT0000 'add mdt' Thu Mar 28 20:24:16 2013-
#06 (120)attach 0:lustre-MDT0000 1:mdt 2:lustre-MDT0000_UUID
#07 (112)mount_option 0: 1:lustre-MDT0000 2:lustre-MDT0000-mdtlov
#08 (160)setup 0:lustre-MDT0000 1:lustre-MDT0000_UUID 2:0 3:lustre-MDT0000-mdtlov 4:f
#09 (224)marker 2 (flags=0x02, v2.1.5.0) lustre-MDT0000 'add mdt' Thu Mar 28 20:24:16 2013-
#10 (224)SKIP START marker 8 (flags=0x05, v2.1.5.0) lustre 'sys.timeout' Thu Mar 28 20:24:16 2013-Thu Mar 28 20:24:17 2013
#11 (080)SKIP set_timeout=20
#12 (224)SKIP END marker 8 (flags=0x06, v2.1.5.0) lustre 'sys.timeout' Thu Mar 28 20:24:16 2013-Thu Mar 28 20:24:17 2013
#13 (224)marker 10 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripesize' Thu Mar 28 20:24:16 2013-
#14 (112)param 0:lustre-MDT0000-mdtlov 1:lov.stripesize=1048576
#15 (224)marker 10 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripesize' Thu Mar 28 20:24:16 2013-
#16 (224)marker 12 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripecount' Thu Mar 28 20:24:16 2013-
#17 (112)param 0:lustre-MDT0000-mdtlov 1:lov.stripecount=0
#18 (224)marker 12 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripecount' Thu Mar 28 20:24:16 2013-
#19 (224)marker 14 (flags=0x01, v2.1.5.0) lustre-MDT0000 'mdt.identity_upcall' Thu Mar 28 20:24:16 2013-
#20 (128)param 0:lustre-MDT0000 1:mdt.identity_upcall=/usr/sbin/l_getidentity
#21 (224)marker 14 (flags=0x02, v2.1.5.0) lustre-MDT0000 'mdt.identity_upcall' Thu Mar 28 20:24:16 2013-
#22 (224)marker 16 (flags=0x01, v2.1.5.0) lustre-OST0000 'add osc' Thu Mar 28 20:24:17 2013-
#23 (080)add_uuid nid=10.10.4.208@tcp(0x200000a0a04d0) 0: 1:10.10.4.208@tcp
#24 (144)attach 0:lustre-OST0000-osc-MDT0000 1:osc 2:lustre-MDT0000-mdtlov_UUID
#25 (144)setup 0:lustre-OST0000-osc-MDT0000 1:lustre-OST0000_UUID 2:10.10.4.208@tcp
#26 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov 1:lustre-OST0000_UUID 2:0 3:1
#27 (224)marker 16 (flags=0x02, v2.1.5.0) lustre-OST0000 'add osc' Thu Mar 28 20:24:17 2013-
#28 (224)SKIP START marker 20 (flags=0x05, v2.1.5.0) lustre 'sys.timeout' Thu Mar 28 20:24:17 2013-Thu Mar 28 20:24:19 2013
#29 (080)SKIP set_timeout=20
#30 (224)SKIP END marker 20 (flags=0x06, v2.1.5.0) lustre 'sys.timeout' Thu Mar 28 20:24:17 2013-Thu Mar 28 20:24:19 2013
#31 (224)marker 23 (flags=0x01, v2.1.5.0) lustre-OST0001 'add osc' Thu Mar 28 20:24:19 2013-
#32 (080)add_uuid nid=10.10.4.209@tcp(0x200000a0a04d1) 0: 1:10.10.4.209@tcp
#33 (144)attach 0:lustre-OST0001-osc-MDT0000 1:osc 2:lustre-MDT0000-mdtlov_UUID
#34 (144)setup 0:lustre-OST0001-osc-MDT0000 1:lustre-OST0001_UUID 2:10.10.4.209@tcp
#35 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov 1:lustre-OST0001_UUID 2:1 3:1
#36 (224)marker 23 (flags=0x02, v2.1.5.0) lustre-OST0001 'add osc' Thu Mar 28 20:24:19 2013-
#37 (224)marker 28 (flags=0x01, v2.1.5.0) lustre 'sys.timeout' Thu Mar 28 20:24:19 2013-
#38 (080)set_timeout=20
#39 (224)marker 28 (flags=0x02, v2.1.5.0) lustre 'sys.timeout' Thu Mar 28 20:24:19 2013-
2) mount MGS, then OSTs, then MDT, the OST NIDs are messed up, i.e. both OSTs have the same NID: # llog_reader /mnt/mgs/CONFIGS/lustre-MDT0000
Header size : 8192
Time : Thu Mar 28 20:34:47 2013
Number of records: 38
Target uuid : config_uuid
-----------------------
#01 (224)marker 14 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov setup' Thu Mar 28 20:34:47 2013-
#02 (136)attach 0:lustre-MDT0000-mdtlov 1:lov 2:lustre-MDT0000-mdtlov_UUID
#03 (176)lov_setup 0:lustre-MDT0000-mdtlov 1:(struct lov_desc)
uuid=lustre-MDT0000-mdtlov_UUID stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1
#04 (224)marker 14 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov setup' Thu Mar 28 20:34:47 2013-
#05 (224)marker 15 (flags=0x01, v2.1.5.0) lustre-MDT0000 'add mdt' Thu Mar 28 20:34:47 2013-
#06 (120)attach 0:lustre-MDT0000 1:mdt 2:lustre-MDT0000_UUID
#07 (112)mount_option 0: 1:lustre-MDT0000 2:lustre-MDT0000-mdtlov
#08 (160)setup 0:lustre-MDT0000 1:lustre-MDT0000_UUID 2:0 3:lustre-MDT0000-mdtlov 4:f
#09 (224)marker 15 (flags=0x02, v2.1.5.0) lustre-MDT0000 'add mdt' Thu Mar 28 20:34:47 2013-
#10 (224)marker 16 (flags=0x01, v2.1.5.0) lustre-MDT0000 'add osc(copied)' Thu Mar 28 20:34:47 2013-
#11 (224)marker 17 (flags=0x01, v2.1.5.0) lustre-OST0000 'add osc' Thu Mar 28 20:34:47 2013-
#12 (080)add_uuid nid=10.10.4.208@tcp(0x200000a0a04d0) 0: 1:10.10.4.208@tcp
#13 (144)attach 0:lustre-OST0000-osc-MDT0000 1:osc 2:lustre-MDT0000-mdtlov_UUID
#14 (144)setup 0:lustre-OST0000-osc-MDT0000 1:lustre-OST0000_UUID 2:10.10.4.208@tcp
#15 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov 1:lustre-OST0000_UUID 2:0 3:1
#16 (224)marker 17 (flags=0x02, v2.1.5.0) lustre-OST0000 'add osc' Thu Mar 28 20:34:47 2013-
#17 (224)marker 17 (flags=0x02, v2.1.5.0) lustre-MDT0000 'add osc(copied)' Thu Mar 28 20:34:47 2013-
#18 (224)marker 18 (flags=0x01, v2.1.5.0) lustre-MDT0000 'add osc(copied)' Thu Mar 28 20:34:47 2013-
#19 (224)marker 19 (flags=0x01, v2.1.5.0) lustre-OST0001 'add osc' Thu Mar 28 20:34:47 2013-
#20 (080)add_uuid nid=10.10.4.208@tcp(0x200000a0a04d0) 0: 1:10.10.4.208@tcp
#21 (080)add_uuid nid=10.10.4.209@tcp(0x200000a0a04d1) 0: 1:10.10.4.208@tcp
#22 (144)attach 0:lustre-OST0001-osc-MDT0000 1:osc 2:lustre-MDT0000-mdtlov_UUID
#23 (144)setup 0:lustre-OST0001-osc-MDT0000 1:lustre-OST0001_UUID 2:10.10.4.208@tcp
#24 (136)lov_modify_tgts add 0:lustre-MDT0000-mdtlov 1:lustre-OST0001_UUID 2:1 3:1
#25 (224)marker 19 (flags=0x02, v2.1.5.0) lustre-OST0001 'add osc' Thu Mar 28 20:34:47 2013-
#26 (224)marker 19 (flags=0x02, v2.1.5.0) lustre-MDT0000 'add osc(copied)' Thu Mar 28 20:34:47 2013-
#27 (224)marker 23 (flags=0x01, v2.1.5.0) lustre 'sys.timeout' Thu Mar 28 20:34:47 2013-
#28 (080)set_timeout=20
#29 (224)marker 23 (flags=0x02, v2.1.5.0) lustre 'sys.timeout' Thu Mar 28 20:34:47 2013-
#30 (224)marker 27 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripesize' Thu Mar 28 20:34:47 2013-
#31 (112)param 0:lustre-MDT0000-mdtlov 1:lov.stripesize=1048576
#32 (224)marker 27 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripesize' Thu Mar 28 20:34:47 2013-
#33 (224)marker 29 (flags=0x01, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripecount' Thu Mar 28 20:34:47 2013-
#34 (112)param 0:lustre-MDT0000-mdtlov 1:lov.stripecount=0
#35 (224)marker 29 (flags=0x02, v2.1.5.0) lustre-MDT0000-mdtlov 'lov.stripecount' Thu Mar 28 20:34:47 2013-
#36 (224)marker 31 (flags=0x01, v2.1.5.0) lustre-MDT0000 'mdt.identity_upcall' Thu Mar 28 20:34:47 2013-
#37 (128)param 0:lustre-MDT0000 1:mdt.identity_upcall=/usr/sbin/l_getidentity
#38 (224)marker 31 (flags=0x02, v2.1.5.0) lustre-MDT0000 'mdt.identity_upcall' Thu Mar 28 20:34:47 2013-
|
| Comment by Jian Yu [ 25/May/13 ] |
|
Since the issue also exists on Lustre b2_1 branch, should we back-port the patch http://review.whamcloud.com/5851 to Lustre b2_1 branch? |