[LU-9577] mount.lustre: mount /dev/sde at /mnt/testfs-MDT0001 failed: File exists Created: 31/May/17  Updated: 19/Oct/17  Resolved: 19/Oct/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Brian Murrell (Inactive) Assignee: Hongchao Zhang
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Build Version: 2.9.58_22_gdb59ecb


Issue Links:
Duplicate
duplicates LU-5020 OST can be all mounted successfully i... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During a DNE test we got an error:

# mount -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk11 /mnt/testfs-MDT0001

mount.lustre: increased /sys/block/sde/queue/max_sectors_kb from 512 to 16384
mount.lustre: mount /dev/sde at /mnt/testfs-MDT0001 failed: File exists
# echo $?
17

The messages log reported:

May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: errors=remount-ro
May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro
May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: ctl-testfs-MDT0000: No data found on store. Initialize space
May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: testfs-MDT0000: new disk, initializing
May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: testfs-MDT0000: Imperative Recovery not enabled, recovery window 300-900
May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: ctl-testfs-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400]:0:mdt
May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: cli-ctl-testfs-MDT0002: Allocated super-sequence [0x0000000240000400-0x0000000280000400]:2:mdt]
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: Failing over testfs-MDT0002
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 2 previous similar messages
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 11-0: testfs-MDT0000-osp-MDT0002: operation mds_disconnect to node 0@lo failed: rc = -107
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: Skipped 3 previous similar messages
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. Opts: errors=remount-ro
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: server umount testfs-MDT0002 complete
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(genops.c:334:class_newdev()) Device MDS already exists at 5, won't add
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_config.c:366:class_attach()) Cannot create device MDS of type mds : -17
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount.c:194:lustre_start_simple()) MDS attach error -17
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:1297:server_start_targets()) failed to start MDS: -17
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:1840:server_fill_super()) Unable to start targets: -17
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:1554:server_put_super()) no obd testfs-MDT0001
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:135:server_deregister_mount()) testfs-MDT0001 not registered
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 1 previous similar message
May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount.c:1501:lustre_fill_super()) Unable to mount (-17)

Unfortunately this test harness is not really intended to test Lustre itself and therefore not very well tooled to have gathered any more useful information (i.e. no Lustre debugging) than what I have here.



 Comments   
Comment by Joe Grund [ 03/Oct/17 ]

We are seeing this in RC1, and ran into it today.

Could this be triaged please and a remediation suggested (try again?).

Comment by Brian Murrell (Inactive) [ 03/Oct/17 ]

Latest was during a registration (i.e. initial for the target) mount where we got the following error:

# mount -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk13 /mnt/testfs-OST0001
mount.lustre: increased /sys/block/sdc/queue/max_sectors_kb from 512 to 16384
mount.lustre: mount /dev/sdc at /mnt/testfs-OST0001 failed: File exists
# echo $rc
17


The following was reported in syslog:

Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(genops.c:334:class_newdev()) Device OSS already exists at 11, won't add
Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_config.c:400:class_attach()) Cannot create device OSS of type ost : -17
Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount.c:198:lustre_start_simple()) OSS attach error -17
Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1338:server_start_targets()) failed to start OSS: -17
Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1866:server_fill_super()) Unable to start targets: -17
Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1576:server_put_super()) no obd testfs-OST0001
Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:135:server_deregister_mount()) testfs-OST0001 not registered
Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount  (-17)


We don't have any lustre debug for this particular instance of the failure though.

Comment by Oleg Drokin [ 03/Oct/17 ]

I wonder if the original report with MDS problems was due to LU-9034 or similar. I imagine your code already has this patch though.
Perhaps makes sense for Hongchao to take a look?

Comment by Hongchao Zhang [ 18/Oct/17 ]

This issue is not the same as LU-9034, which is caused by the unreleased MGC instance.
It should be the same issue as LU-5020, and the patch has landed on IEEL2_0 but not landed on master,
the patch is updated and tracked at https://review.whamcloud.com/#/c/10229/

Comment by Brian Murrell (Inactive) [ 18/Oct/17 ]

What action should a user take if he runs into this issue?  How does he rectify this to get to a working filesystem state?

Comment by Brad Hoagland (Inactive) [ 18/Oct/17 ]

Fixed by LU-5020 which will be in the next Lustre LTS release (LU 2.10.2).

The issue is unlikely to be seen in the field with Enterprise Edition / IML but it can be worked-around with retry/reboot or the patch can be provided.

Generated at Sat Feb 10 02:27:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.