[LU-9577] mount.lustre: mount /dev/sde at /mnt/testfs-MDT0001 failed: File exists Created: 31/May/17 Updated: 19/Oct/17 Resolved: 19/Oct/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Brian Murrell (Inactive) | Assignee: | Hongchao Zhang |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Build Version: 2.9.58_22_gdb59ecb |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
During a DNE test we got an error: # mount -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk11 /mnt/testfs-MDT0001 mount.lustre: increased /sys/block/sde/queue/max_sectors_kb from 512 to 16384 mount.lustre: mount /dev/sde at /mnt/testfs-MDT0001 failed: File exists # echo $? 17 The messages log reported: May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: errors=remount-ro May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: ctl-testfs-MDT0000: No data found on store. Initialize space May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: testfs-MDT0000: new disk, initializing May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: testfs-MDT0000: Imperative Recovery not enabled, recovery window 300-900 May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: ctl-testfs-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400]:0:mdt May 31 10:51:38 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: cli-ctl-testfs-MDT0002: Allocated super-sequence [0x0000000240000400-0x0000000280000400]:2:mdt] May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: Failing over testfs-MDT0002 May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 2 previous similar messages May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 11-0: testfs-MDT0000-osp-MDT0002: operation mds_disconnect to node 0@lo failed: rc = -107 May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: Skipped 3 previous similar messages May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. Opts: errors=remount-ro May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: server umount testfs-MDT0002 complete May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(genops.c:334:class_newdev()) Device MDS already exists at 5, won't add May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_config.c:366:class_attach()) Cannot create device MDS of type mds : -17 May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount.c:194:lustre_start_simple()) MDS attach error -17 May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:1297:server_start_targets()) failed to start MDS: -17 May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:1840:server_fill_super()) Unable to start targets: -17 May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:1554:server_put_super()) no obd testfs-MDT0001 May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount_server.c:135:server_deregister_mount()) testfs-MDT0001 not registered May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 1 previous similar message May 31 10:51:39 lotus-45vm15.lotus.hpdd.lab.intel.com kernel: LustreError: 28862:0:(obd_mount.c:1501:lustre_fill_super()) Unable to mount (-17) Unfortunately this test harness is not really intended to test Lustre itself and therefore not very well tooled to have gathered any more useful information (i.e. no Lustre debugging) than what I have here. |
| Comments |
| Comment by Joe Grund [ 03/Oct/17 ] |
|
We are seeing this in RC1, and ran into it today. Could this be triaged please and a remediation suggested (try again?). |
| Comment by Brian Murrell (Inactive) [ 03/Oct/17 ] |
|
Latest was during a registration (i.e. initial for the target) mount where we got the following error: # mount -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk13 /mnt/testfs-OST0001 mount.lustre: increased /sys/block/sdc/queue/max_sectors_kb from 512 to 16384 mount.lustre: mount /dev/sdc at /mnt/testfs-OST0001 failed: File exists # echo $rc 17 The following was reported in syslog: Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(genops.c:334:class_newdev()) Device OSS already exists at 11, won't add Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_config.c:400:class_attach()) Cannot create device OSS of type ost : -17 Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount.c:198:lustre_start_simple()) OSS attach error -17 Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1338:server_start_targets()) failed to start OSS: -17 Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1866:server_fill_super()) Unable to start targets: -17 Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:1576:server_put_super()) no obd testfs-OST0001 Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount_server.c:135:server_deregister_mount()) testfs-OST0001 not registered Oct 2 19:46:23 localhost kernel: LustreError: 31962:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount (-17) We don't have any lustre debug for this particular instance of the failure though. |
| Comment by Oleg Drokin [ 03/Oct/17 ] |
|
I wonder if the original report with MDS problems was due to |
| Comment by Hongchao Zhang [ 18/Oct/17 ] |
|
This issue is not the same as |
| Comment by Brian Murrell (Inactive) [ 18/Oct/17 ] |
|
What action should a user take if he runs into this issue? How does he rectify this to get to a working filesystem state? |
| Comment by Brad Hoagland (Inactive) [ 18/Oct/17 ] |
|
Fixed by The issue is unlikely to be seen in the field with Enterprise Edition / IML but it can be worked-around with retry/reboot or the patch can be provided. |