[LU-4190] LustreError: 18166:0:(genops.c:1570:obd_exports_barrier()) ASSERTION( list_empty(&obd->obd_exports) ) failed: Created: 30/Oct/13 Updated: 15/Dec/19 Resolved: 15/Dec/19 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | yueyuling | Assignee: | Mikhail Pershin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre2.4.0,with 2 servers and 1 client, kernel version:2.6.32-358.6.2.l2.08 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 11330 | ||||||||
| Description |
|
2 servers work normally at active-active status LustreError: 18166:0:(genops.c:320:class_newdev()) Device MGC192.168.22.50@tcp already exists at 2, won't add Call Trace: Message fromKernel panic - not syncing: LBUG |
| Comments |
| Comment by Andreas Dilger [ 30/Oct/13 ] |
Are you mounting the same MDT device (lustre-MDT0000) on both nodes? That is bad and will lead to filesystem corruption. You should only mount it on one MDS node at a time. I suggest you enable "MMP" on your devices with "tune2fs -O mmp /dev/<mdt_or_ost_device>" (this happens automatically if you format the filesystem with --failnode). |
| Comment by yueyuling [ 31/Oct/13 ] |
|
Thank you for your response! But I didn't mount the same MDT device on both nodes. There are two MDS devices in my Lustre FS. I mount one MDT device on each node. |
| Comment by Di Wang [ 05/Apr/14 ] |
|
I tried this test on current master. MDT1 [root@client-2 ~]# mkfs.lustre --reformat --mgs --mdt --index=0 --fsname lustre --failnode=10.10.4.3@tcp /dev/disk/by-id/scsi-1IET_00040001 MDT2 [root@client-3 ~]# mkfs.lustre --reformat --mgsnode=10.10.4.2@tcp --mgsnode=10.10.4.3@tcp --mdt --index=1 --fsname lustre --failnode=10.10.4.2@tcp /dev/disk/by-id/scsi-1IET_00020001 But unfortunately when it failed when I tries to mount mdt2 [root@client-3 ~]# mount -t lustre /dev/disk/by-id/scsi-1IET_00020001 /mnt/mds2/ mount.lustre: mount /dev/sdj at /mnt/mds2 failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) [root@client-3 ~]# ... LDISKFS-fs (sdj): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: srv-lustre-MDT0001: No data found on store. Initialize space Lustre: lustre-MDT0001: new disk, initializing LustreError: 11-0: lustre-MDT0000-osp-MDT0001: Communicating with 10.10.4.2@tcp, operation mds_connect failed with -11. LustreError: 13a-8: Failed to get MGS log params and no local copy. LustreError: 2354:0:(obd_mount_server.c:699:lustre_lwp_add_conn()) lustre-MDT0001: can't find lwp device. LustreError: 15c-8: MGC10.10.4.2@tcp: The configuration from log 'lustre-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 2242:0:(obd_mount_server.c:1321:server_start_targets()) lustre-MDT0001: failed to start LWP: -2 LustreError: 2242:0:(obd_mount_server.c:1776:server_fill_super()) Unable to start targets: -2 Lustre: Failing over lustre-MDT0001 Lustre: server umount lustre-MDT0001 complete LustreError: 2242:0:(obd_mount.c:1338:lustre_fill_super()) Unable to mount (-2) config log [root@client-2 ~]# llog_reader /mnt/mds1/CONFIGS/lustre-client Header size : 8192 Time : Fri Apr 4 20:36:36 2014 Number of records: 30 Target uuid : config_uuid ----------------------- #01 (224)marker 4 (flags=0x01, v2.5.57.0) lustre-clilov 'lov setup' Fri Apr 4 20:36:36 2014- #02 (120)attach 0:lustre-clilov 1:lov 2:lustre-clilov_UUID #03 (168)lov_setup 0:lustre-clilov 1:(struct lov_desc) uuid=lustre-clilov_UUID stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1 #04 (224)marker 4 (flags=0x02, v2.5.57.0) lustre-clilov 'lov setup' Fri Apr 4 20:36:36 2014- #05 (224)marker 5 (flags=0x01, v2.5.57.0) lustre-clilmv 'lmv setup' Fri Apr 4 20:36:36 2014- #06 (120)attach 0:lustre-clilmv 1:lmv 2:lustre-clilmv_UUID #07 (168)lov_setup 0:lustre-clilmv 1:(struct lov_desc) uuid=lustre-clilmv_UUID stripe:cnt=0 size=0 offset=0 pattern=0 #08 (224)marker 5 (flags=0x02, v2.5.57.0) lustre-clilmv 'lmv setup' Fri Apr 4 20:36:36 2014- #09 (224)marker 6 (flags=0x01, v2.5.57.0) lustre-MDT0000 'add mdc' Fri Apr 4 20:36:36 2014- #10 (080)add_uuid nid=10.10.4.2@tcp(0x200000a0a0402) 0: 1:10.10.4.2@tcp #11 (128)attach 0:lustre-MDT0000-mdc 1:mdc 2:lustre-clilmv_UUID #12 (136)setup 0:lustre-MDT0000-mdc 1:lustre-MDT0000_UUID 2:10.10.4.2@tcp #13 (080)add_uuid nid=10.10.4.3@tcp(0x200000a0a0403) 0: 1:10.10.4.3@tcp #14 (104)add_conn 0:lustre-MDT0000-mdc 1:10.10.4.3@tcp #15 (160)modify_mdc_tgts add 0:lustre-clilmv 1:lustre-MDT0000_UUID 2:0 3:1 4:lustre-MDT0000-mdc_UUID #16 (224)marker 6 (flags=0x02, v2.5.57.0) lustre-MDT0000 'add mdc' Fri Apr 4 20:36:36 2014- #17 (224)marker 7 (flags=0x01, v2.5.57.0) lustre-client 'mount opts' Fri Apr 4 20:36:36 2014- #18 (120)mount_option 0: 1:lustre-client 2:lustre-clilov 3:lustre-clilmv #19 (224)marker 7 (flags=0x02, v2.5.57.0) lustre-client 'mount opts' Fri Apr 4 20:36:36 2014- #20 (224)marker 11 (flags=0x01, v2.5.57.0) lustre-MDT0001 'add mdc' Fri Apr 4 20:50:05 2014- #21 (080)add_uuid nid=10.10.4.3@tcp(0x200000a0a0403) 0: 1:10.10.4.3@tcp #22 (128)attach 0:lustre-MDT0001-mdc 1:mdc 2:lustre-clilmv_UUID #23 (136)setup 0:lustre-MDT0001-mdc 1:lustre-MDT0001_UUID 2:10.10.4.3@tcp #24 (080)add_uuid nid=10.10.4.2@tcp(0x200000a0a0402) 0: 1:10.10.4.2@tcp #25 (104)add_conn 0:lustre-MDT0001-mdc 1:10.10.4.2@tcp #26 (160)modify_mdc_tgts add 0:lustre-clilmv 1:lustre-MDT0001_UUID 2:1 3:1 4:lustre-MDT0001-mdc_UUID #27 (224)marker 11 (flags=0x02, v2.5.57.0) lustre-MDT0001 'add mdc' Fri Apr 4 20:50:05 2014- #28 (224)marker 12 (flags=0x01, v2.5.57.0) lustre-client 'mount opts' Fri Apr 4 20:50:05 2014- #29 (120)mount_option 0: 1:lustre-client 2:lustre-clilov 3:lustre-clilmv #30 (224)marker 12 (flags=0x02, v2.5.57.0) lustre-client 'mount opts' Fri Apr 4 20:50:05 2014- It might be related with the change http://review.whamcloud.com/7666 Fan Yong, could you please comment here. Thanks! |
| Comment by nasf (Inactive) [ 10/Apr/14 ] |
|
The original issue happened on Lustre-2.4, but the patch http://review.whamcloud.com/#/c/7666/ only has been applied to Lustre-2.6, then even though such patch has some issues, it should not affect Lustre-2.4, right? |
| Comment by Di Wang [ 16/Apr/14 ] |
|
Oh, I am not asking the original issue shown in this ticket, but the failure I met in my test, which stops me continue the test on 2.6. Hmm, I will create a new ticket then. |
| Comment by yueyuling [ 17/Apr/14 ] |
|
In addition, I've created the MGS, MDT0000 and MDT0001 separately with different device. So, the MGS and MDT0000 are in different devices. |
| Comment by Andreas Dilger [ 02/May/14 ] |
|
Mike, could you please try configuring a test system as described here to see if a similar problem still exists in master? This seems similar to the failure in |
| Comment by Mikhail Pershin [ 10/May/14 ] |
|
I've tried to repeat those steps after |
| Comment by Jodi Levi (Inactive) [ 12/May/14 ] |
|
Duplicate of |
| Comment by Mikhail Pershin [ 13/May/14 ] |
|
Jodi, this is not duplicate of |
| Comment by Doug Oucharek (Inactive) [ 13/May/14 ] |
|
This is not a duplicate of |
| Comment by yueyuling [ 14/May/14 ] |
|
The steps of write/read data : |
| Comment by Mikhail Pershin [ 15/Dec/19 ] |
|
Outdated |