Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9577

mount.lustre: mount /dev/sde at /mnt/testfs-MDT0001 failed: File exists

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.10.0
    • None
    • Build Version: 2.9.58_22_gdb59ecb
    • 3
    • 9223372036854775807

    Description

      During a DNE test we got an error:

      # mount -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk11 /mnt/testfs-MDT0001
      
      mount.lustre: increased /sys/block/sde/queue/max_sectors_kb from 512 to 16384
      mount.lustre: mount /dev/sde at /mnt/testfs-MDT0001 failed: File exists
      # echo $?
      17
      

      The messages log reported:

      May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: errors=remount-ro
      May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro
      May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: ctl-testfs-MDT0000: No data found on store. Initialize space
      May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: testfs-MDT0000: new disk, initializing
      May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: testfs-MDT0000: Imperative Recovery not enabled, recovery window 300-900
      May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: ctl-testfs-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400]:0:mdt
      May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: cli-ctl-testfs-MDT0002: Allocated super-sequence [0x0000000240000400-0x0000000280000400]:2:mdt]
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: Failing over testfs-MDT0002
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 2 previous similar messages
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 11-0: testfs-MDT0000-osp-MDT0002: operation mds_disconnect to node 0@lo failed: rc = -107
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: Skipped 3 previous similar messages
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. Opts: errors=remount-ro
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: server umount testfs-MDT0002 complete
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(genops.c:334:class_newdev()) Device MDS already exists at 5, won't add
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_config.c:366:class_attach()) Cannot create device MDS of type mds : -17
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount.c:194:lustre_start_simple()) MDS attach error -17
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:1297:server_start_targets()) failed to start MDS: -17
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:1840:server_fill_super()) Unable to start targets: -17
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:1554:server_put_super()) no obd testfs-MDT0001
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:135:server_deregister_mount()) testfs-MDT0001 not registered
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 1 previous similar message
      May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount.c:1501:lustre_fill_super()) Unable to mount (-17)
      

      Unfortunately this test harness is not really intended to test Lustre itself and therefore not very well tooled to have gathered any more useful information (i.e. no Lustre debugging) than what I have here.

      Attachments

        Issue Links

          Activity

            [LU-9577] mount.lustre: mount /dev/sde at /mnt/testfs-MDT0001 failed: File exists

            Fixed by LU-5020 which will be in the next Lustre LTS release (LU 2.10.2).

            The issue is unlikely to be seen in the field with Enterprise Edition / IML but it can be worked-around with retry/reboot or the patch can be provided.

            bhoagland Brad Hoagland (Inactive) added a comment - Fixed by LU-5020 which will be in the next Lustre LTS release (LU 2.10.2). The issue is unlikely to be seen in the field with Enterprise Edition / IML but it can be worked-around with retry/reboot or the patch can be provided.

            What action should a user take if he runs into this issue?  How does he rectify this to get to a working filesystem state?

            brian Brian Murrell (Inactive) added a comment - What action should a user take if he runs into this issue?  How does he rectify this to get to a working filesystem state?

            This issue is not the same as LU-9034, which is caused by the unreleased MGC instance.
            It should be the same issue as LU-5020, and the patch has landed on IEEL2_0 but not landed on master,
            the patch is updated and tracked at https://review.whamcloud.com/#/c/10229/

            hongchao.zhang Hongchao Zhang added a comment - This issue is not the same as LU-9034 , which is caused by the unreleased MGC instance. It should be the same issue as LU-5020 , and the patch has landed on IEEL2_0 but not landed on master, the patch is updated and tracked at https://review.whamcloud.com/#/c/10229/
            green Oleg Drokin added a comment -

            I wonder if the original report with MDS problems was due to LU-9034 or similar. I imagine your code already has this patch though.
            Perhaps makes sense for Hongchao to take a look?

            green Oleg Drokin added a comment - I wonder if the original report with MDS problems was due to LU-9034 or similar. I imagine your code already has this patch though. Perhaps makes sense for Hongchao to take a look?
            brian Brian Murrell (Inactive) added a comment - - edited

            Latest was during a registration (i.e. initial for the target) mount where we got the following error:

            # mount -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk13 /mnt/testfs-OST0001
            mount.lustre: increased /sys/block/sdc/queue/max_sectors_kb from 512 to 16384
            mount.lustre: mount /dev/sdc at /mnt/testfs-OST0001 failed: File exists
            # echo $rc
            17
            
            
            

            The following was reported in syslog:

            Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(genops.c:334:class_newdev()) Device OSS already exists at 11, won't add
            Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_config.c:400:class_attach()) Cannot create device OSS of type ost : -17
            Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount.c:198:lustre_start_simple()) OSS attach error -17
            Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1338:server_start_targets()) failed to start OSS: -17
            Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1866:server_fill_super()) Unable to start targets: -17
            Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1576:server_put_super()) no obd testfs-OST0001
            Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:135:server_deregister_mount()) testfs-OST0001 not registered
            Oct  2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount  (-17)
            
            
            

            We don't have any lustre debug for this particular instance of the failure though.

            brian Brian Murrell (Inactive) added a comment - - edited Latest was during a registration (i.e. initial for the target) mount where we got the following error: # mount -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk13 /mnt/testfs-OST0001 mount.lustre: increased /sys/block/sdc/queue/max_sectors_kb from 512 to 16384 mount.lustre: mount /dev/sdc at /mnt/testfs-OST0001 failed: File exists # echo $rc 17 The following was reported in syslog: Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(genops.c:334:class_newdev()) Device OSS already exists at 11, won't add Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_config.c:400:class_attach()) Cannot create device OSS of type ost : -17 Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount.c:198:lustre_start_simple()) OSS attach error -17 Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1338:server_start_targets()) failed to start OSS: -17 Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1866:server_fill_super()) Unable to start targets: -17 Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1576:server_put_super()) no obd testfs-OST0001 Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:135:server_deregister_mount()) testfs-OST0001 not registered Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount (-17) We don't have any lustre debug for this particular instance of the failure though.
            joe.grund Joe Grund added a comment -

            We are seeing this in RC1, and ran into it today.

            Could this be triaged please and a remediation suggested (try again?).

            joe.grund Joe Grund added a comment - We are seeing this in RC1, and ran into it today. Could this be triaged please and a remediation suggested (try again?).

            People

              hongchao.zhang Hongchao Zhang
              brian Brian Murrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: