[LU-8089] MGT/MDT mount fails on secondary HA node Created: 02/May/16  Updated: 14/Jun/18  Resolved: 25/Oct/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Major
Reporter: Frank Heckes (Inactive) Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: soak
Environment:

lola
build: master commit 71d2ea0fde17ecde0bf237f486d4bafb5d54fe3f + patches


Attachments: File console-lola-9.log-20160501.bz2     File console-lola-9.log.20170720.bz2     File console-lola-9.log.bz2     File lustre-log-lola-9-20160720_0842-mount-mgs-hangs.bz2     File lustre-log-lola-9-20160721_0146-mount-mgs-hangs.bz2     File lustre-log.1461964594.5065.bz2     File lustre-log.1461964707.7204.bz2     File lustre-log.2016-05-02-0040.bz2     File lustre-log.2016-05-02-0049.bz2     File lustre-log.2016-05-02-0105.bz2     File lustre-log.2016-05-02-0109.bz2     File lustre-log.2016-05-02.0044.bz2     File messages-lola-9.log-20160501.bz2     File messages-lola-9.log.20170720.bz2     File messages-lola-9.log.bz2    
Issue Links:
Related
is related to LU-4214 Hyperion - OST never recovers on fail... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The error happens during soak testing of build '20160427' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160427). DNE is enabled. OSTs had been formatted with zfs, MDT's using ldiskfs as storage backend. There's 1 MDT per MDS and 4 OSTs per OSS. The OSS and MDT nodes are configured in HA active-active failover configuration.

The configuration, especially the mapping of node to role can be found here: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-Configuration.
For this tickets MDS nodes lola-[8,9] which form a HA cluster are of interest.

After a hard failover of the primary node (lola-8) had been initiated using pm -c , mounting the MGT/MDT (one device) on the secondary node (lola-9) failed with error message:

Apr 29 14:00:36 lola-9 kernel: LDISKFS-fs warning (device dm-9): ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, pleas
e wait.
...
...
Apr 29 14:19:06 lola-9 kernel: Lustre: Skipped 2 previous similar messages
Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(genops.c:334:class_newdev()) Device MGC192.168.1.109@o2ib10 already exists at 2, won't add
Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(obd_config.c:370:class_attach()) Cannot create device MGC192.168.1.109@o2ib10 of type mgc : -17
Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(obd_mount.c:198:lustre_start_simple()) MGC192.168.1.109@o2ib10 attach error -17
Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(obd_mount_server.c:1512:server_put_super()) no obd soaked-MDT0000
Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(obd_mount_server.c:140:server_deregister_mount()) soaked-MDT0000 not registered
Apr 29 14:19:06 lola-9 kernel: Lustre: 7204:0:(service.c:2096:ptlrpc_server_handle_request()) Skipped 9 previous similar messages
Apr 29 14:19:06 lola-9 kernel: Lustre: server umount soaked-MDT0000 complete
Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(obd_mount.c:1450:lustre_fill_super()) Unable to mount  (-17)

The exact sequence of events reads as:

  • 2016-04-29 13:53:59,419:fsmgmt.fsmgmt:INFO triggering fault mds_failover for lola-8
  • 2016-04-29 14:00:25,257:fsmgmt.fsmgmt:INFO lola-8 is up!!!
  • 2016-04-29 14:00:36,270:fsmgmt.fsmgmt:INFO Mounting soaked-MDT0000 on lola-9 ...
  • 2016-04-29 14:19:06 mount failed with message above
    Attached files:
    lola-9 messages, console files, debug log files lustre-log.1461964594.5065 and lustre-log.1461964707.7204

The effect can be reproduced manually.
The device could be mounted not before mdt-1 (primary resource of lola-9) were umounted and Lustre modules had been reloaded.

Try manually

[root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/disk/by-id/dm-name-360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0/          mount.lustre: mount /dev/mapper/360080e50002ff4f00000026952013088p1 at /mnt/soaked-mdt0 failed: File exists
[root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/disk/by-id/dm-name-360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0/
mount.lustre: mount /dev/mapper/360080e50002ff4f00000026952013088p1 at /mnt/soaked-mdt0 failed: File exists
[root@lola-9 ~]# date
Mon May  2 00:39:25 PDT 2016
[root@lola-9 ~]# date
Mon May  2 00:39:52 PDT 2016
[root@lola-9 ~]# lctl debug_kernel /tmp/lustre-log.2016-05-02-0040
Debug log: 2321 lines, 2321 kept, 0 dropped, 0 bad.

---> attached file lustre-log.2016-05-02-0040 also attached

umount + reload Lustre modules on lola-9

##### Umount and remove lustre mods
[root@lola-9 ~]# umount  /mnt/soaked-mdt1
[root@lola-9 ~]# lustre_rmmod
[root@lola-9 ~]# echo $?
0
[root@lola-9 ~]# lustre_rmmod
[root@lola-9 ~]# lctl dl
[root@lola-9 ~]# echo $?
2
[root@lola-9 ~]# date
Mon May  2 00:41:10 PDT 2016
[root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/disk/by-id/dm-name-360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0
[root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026d52013098p1 /mnt/soaked-mdt1
[root@lola-9 ~]# ^Ctl debug_kernel 
[root@lola-9 ~]# date
Mon May  2 00:43:52 PDT 2016
[root@lola-9 ~]# lctl debug_kernel /tmp/lustre-log.2016-05-02.0044
Debug log: 177060 lines, 177060 kept, 0 dropped, 0 bad.
[root@lola-9 ~]# mount
/dev/sdk1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
10.4.0.1:/export/scratch on /scratch type nfs (rw,addr=10.4.0.1)
10.4.0.1:/home on /home type nfs (rw,addr=10.4.0.1)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/mapper/360080e50002ff4f00000026952013088p1 on /mnt/soaked-mdt0 type lustre (rw,user_xattr)
/dev/mapper/360080e50002ff4f00000026d52013098p1 on /mnt/soaked-mdt1 type lustre (rw,user_xattr)

### one more time
[root@lola-9 ~]# umount /mnt/soaked-mdt1
[root@lola-9 ~]# umount /mnt/soaked-mdt0
[root@lola-9 ~]# lustre_rmmod
[root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0
[root@lola-9 ~]# echo $?
0
[root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026d52013098p1 /mnt/soaked-mdt1
[root@lola-9 ~]# echo $?
0
[root@lola-9 ~]# mount
/dev/sdk1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
10.4.0.1:/export/scratch on /scratch type nfs (rw,addr=10.4.0.1)
10.4.0.1:/home on /home type nfs (rw,addr=10.4.0.1)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/mapper/360080e50002ff4f00000026952013088p1 on /mnt/soaked-mdt0 type lustre (rw,user_xattr)
/dev/mapper/360080e50002ff4f00000026d52013098p1 on /mnt/soaked-mdt1 type lustre (rw,user_xattr)
[root@lola-9 ~]# date
Mon May  2 00:49:25 PDT 2016
[root@lola-9 ~]# lctl debug_kernel /tmp/lustre-log.2016-05-02-0049
Debug log: 78514 lines, 78514 kept, 0 dropped, 0 bad.

--> attached lustre-log.2016-05-02.0044, lustre-log.2016-05-02-0049

mount mdt-0 on lola-8 and do failover again

# Do failoverback to lola-8
[root@lola-8 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0
mount.lustre: increased /sys/block/dm-8/queue/max_sectors_kb from 1024 to 16383
mount.lustre: increased /sys/block/dm-4/queue/max_sectors_kb from 1024 to 16383
mount.lustre: increased /sys/block/sdh/queue/max_sectors_kb from 1024 to 16383
mount.lustre: increased /sys/block/sdc/queue/max_sectors_kb from 1024 to 16383
[root@lola-8 ~]# echo $?
0
[root@lola-8 ~]# lctl dl
  0 UP osd-ldiskfs soaked-MDT0000-osd soaked-MDT0000-osd_UUID 35
  1 UP mgs MGS MGS 7
  2 UP mgc MGC192.168.1.108@o2ib10 31b8074d-c86c-18f5-40f6-bbc2e7cf6f72 5
  3 UP mds MDS MDS_uuid 3
  4 UP lod soaked-MDT0000-mdtlov soaked-MDT0000-mdtlov_UUID 4
  5 UP mdt soaked-MDT0000 soaked-MDT0000_UUID 45
  6 UP mdd soaked-MDD0000 soaked-MDD0000_UUID 4
  7 UP qmt soaked-QMT0000 soaked-QMT0000_UUID 4
  8 UP osp soaked-MDT0001-osp-MDT0000 soaked-MDT0000-mdtlov_UUID 5
  9 UP osp soaked-MDT0002-osp-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 10 UP osp soaked-MDT0003-osp-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 11 UP osp soaked-OST0000-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 12 UP osp soaked-OST0001-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 13 UP osp soaked-OST0002-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 14 UP osp soaked-OST0003-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 15 UP osp soaked-OST0004-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 16 UP osp soaked-OST0005-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 17 UP osp soaked-OST0006-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 18 UP osp soaked-OST0007-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 19 UP osp soaked-OST0008-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 20 UP osp soaked-OST0009-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 21 UP osp soaked-OST000a-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 22 UP osp soaked-OST000b-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 23 UP osp soaked-OST000c-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 24 UP osp soaked-OST000d-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 25 UP osp soaked-OST000e-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 26 UP osp soaked-OST000f-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 27 UP osp soaked-OST0010-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 28 UP osp soaked-OST0011-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 29 UP osp soaked-OST0012-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 30 UP osp soaked-OST0013-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 31 UP osp soaked-OST0014-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 32 UP osp soaked-OST0015-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 33 UP osp soaked-OST0016-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 34 UP osp soaked-OST0017-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
 35 UP lwp soaked-MDT0000-lwp-MDT0000 soaked-MDT0000-lwp-MDT0000_UUID 5


--->  now do manual hard failover again
[root@lola-9 ~]#  mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0
mount.lustre: mount /dev/mapper/360080e50002ff4f00000026952013088p1 at /mnt/soaked-mdt0 failed: File exists
[root@lola-9 ~]# date
Mon May  2 01:05:13 PDT 2016
[root@lola-9 ~]# lctl debug_kernel /tmp/lustre-log.2016-05-02-0105
Debug log: 560 lines, 560 kept, 0 dropped, 0 bad.

--> attached file lustre-log.2016-05-02-0105

umount mdt-1 and restart Lustre again

# ---> no restart lustre again on secondary node:
[root@lola-9 ~]# umount /mnt/soaked-mdt1
[root@lola-9 ~]# lustre_rmmod
[root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026d52013098p1 /mnt/soaked-mdt1
[root@lola-9 ~]# echo $?
0
[root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0
[root@lola-9 ~]# echo $?
0
[root@lola-9 ~]# mount
/dev/sdk1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
10.4.0.1:/export/scratch on /scratch type nfs (rw,addr=10.4.0.1)
10.4.0.1:/home on /home type nfs (rw,addr=10.4.0.1)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/mapper/360080e50002ff4f00000026d52013098p1 on /mnt/soaked-mdt1 type lustre (rw,user_xattr)
/dev/mapper/360080e50002ff4f00000026952013088p1 on /mnt/soaked-mdt0 type lustre (rw,user_xattr)
[root@lola-9 ~]# date
Mon May  2 01:09:20 PDT 2016
[root@lola-9 ~]# lctl debug_kernel /tmp/lustre-log.2016-05-02-0109
Debug log: 119979 lines, 119979 kept, 0 dropped, 0 bad.

--> attached file lustre-log.2016-05-02-0109



 Comments   
Comment by Frank Heckes (Inactive) [ 02/May/16 ]

This seems to affect the MGT only. Failover for other MDSes (MDTs) run smoothly.

Comment by Di Wang [ 02/May/16 ]

In lustre-log.2016-05-02-0105, it fails to restart the new MGC on lola-9 because of the mgc with same name already exists. But the original MGC should be MGC192.168.1.108@o2ib10, because the original MGS is 108. Frank, could you please list device on lola-9, before doing failover on lola-8. Thanks. (Or it is a failover MGC?)

00000020:00000080:16.0:1462176165.502813:0:219876:0:(obd_config.c:362:class_attach()) attach type mgc name: MGC192.168.1.109@o2ib10 uuid: 523ef026-efaa-85c3-2ecd-2c5b9b575258
00000020:00020000:16.0:1462176165.502820:0:219876:0:(genops.c:334:class_newdev()) Device MGC192.168.1.109@o2ib10 already exists at 2, won't add
00000020:00020000:16.0:1462176165.514006:0:219876:0:(obd_config.c:370:class_attach()) Cannot create device MGC192.168.1.109@o2ib10 of type mgc : -17
Comment by Joseph Gmitter (Inactive) [ 02/May/16 ]

Hi Mike,

We are moving patch http://review.whamcloud.com/#/c/13726/ to this ticket since it was added to LU-4214 after the ticket was already closed and is directly related to this issue seen here. Can you please have a look?

Thanks.
Joe

Comment by Andreas Dilger [ 02/May/16 ]

Moved patch here from LU-4214 since that ticket was closed but the patch was never landed to master:

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/13726
Subject: LU-8089 lwp: fix LWP client connect logic
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8abaa93afd61f6c28e15e035a1a06ecf7f6d748e

Comment by Frank Heckes (Inactive) [ 20/Jul/16 ]

The error didn't occurred for soak test of build https://build.hpdd.intel.com/job/lustre-master/3406 (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160713) during a test session that is ongoing and last already 7 days.

Comment by Frank Heckes (Inactive) [ 21/Jul/16 ]

During the night the error (or at least symptom) happened again for build mentioned in previous box (lustre-master #3406).
Timing of events:

  • 2016-07-20 08:08:17 - failover started for MDS node lola-8 (MDT0000 == mgs)
  • 2016-07-20 08:14:57 - MDT0000 mount command started on secondary node (lola-9)
  • 2016-07-21 01:46 - mount command nor finished on lola-9 ; node booted after creating stack trace via sysrq-trigger

The messages inside the debug log files look different then in those from the previous event. So eventually this is a new bug.

Attached files:

  • messages, console, debug log files of MDS lola-9
Comment by Gerrit Updater [ 25/Oct/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13726/
Subject: LU-8089 lwp: change lwp export only at first connect
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 00ffad7af3bc7295797efd21eb16e7deaa715f45

Comment by Peter Jones [ 25/Oct/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:14:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.