[LU-9764] recovery-double-scale_pairwise_fail test failed: mount.lustre: mount /dev/vdb at /mnt/mds3 failed: Bad file descriptor Created: 12/Jul/17  Updated: 14/Jun/18  Resolved: 14/Jun/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: nasf (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-10027 Unable to finish mount on MDS while ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It is reported that:

Starting mds3: -o rw,user_xattr  /dev/vdb /mnt/mds3
lm0413: mount.lustre: increased /sys/block/vdb/queue/max_sectors_kb from 1024 to 2147483647
lm0413: mount.lustre: mount /dev/vdb at /mnt/mds3 failed: Bad file descriptor
pdsh@lm0417: lm0413: ssh exited with exit code 9
Start of /dev/vdb on mds3 failed 9
 recovery-double-scale test_pairwise_fail: @@@@@@ FAIL: Restart of mds3 failed!

  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4976:error()
  = /usr/lib64/lustre/tests/test-framework.sh:1232:mount_facets()
  = /usr/lib64/lustre/tests/test-framework.sh:2583:facet_failover()
  = /usr/lib64/lustre/tests/recovery-double-scale.sh:61:reboot_recover_node()
  = /usr/lib64/lustre/tests/recovery-double-scale.sh:145:failover_pair()
  = /usr/lib64/lustre/tests/recovery-double-scale.sh:247:test_pairwise_fail()
  = /usr/lib64/lustre/tests/test-framework.sh:5236:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5274:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:5078:run_test()
  = /usr/lib64/lustre/tests/recovery-double-scale.sh:303:main()
Dumping lctl log to /tmp/test_logs/1496261766/recovery-double-scale.test_pairwise_fail.*.1496262136.log
lm0411: Warning: Permanently added 'lm0417,192.168.4.17' (ECDSA) to the list of known hosts.
lm0412: Warning: Permanently added 'lm0417,192.168.4.17' (ECDSA) to the list of known hosts.
lm0415: Warning: Permanently added 'lm0417,192.168.4.17' (ECDSA) to the list of known hosts.
lm0418: Warning: Permanently added 'lm0417,192.168.4.17' (ECDSA) to the list of known hosts.
lm0420: Warning: Permanently added 'lm0417,192.168.4.17' (ECDSA) to the list of known hosts.
lm0413: Warning: Permanently added 'lm0417,192.168.4.17' (ECDSA) to the list of known hosts.
lm0419: Warning: Permanently added 'lm0417,192.168.4.17' (ECDSA) to the list of known hosts.

The log shows that:

[ 653.209594] LustreError: 11853:0:(lfsck_namespace.c:6525:lfsck_namespace_setup()) lustre-MDT0002-osd: fail to init namespace LFSCK component: rc = -9 [ 653.215672] LustreError: 11853:0:(mdd_device.c:1065:mdd_prepare()) lustre-MDD0002: failed to initialize lfsck: rc = -9 [ 653.219548] LustreError: 11853:0:(obd_mount_server.c:1834:server_fill_super()) Unable to start targets: -9 [ 653.224335] Lustre: Failing over lustre-MDT0002 [ 657.283581] Lustre: server umount lustre-MDT0002 complete [ 657.286706] LustreError: 11853:0:(obd_mount.c:1445:lustre_fill_super()) Unable to mount (-9) [ 657.507398] Lustre: DEBUG MARKER: recovery-double-scale test_pairwise_fail: @@@@@@ FAIL: Restart of mds3 failed!


 Comments   
Comment by Gerrit Updater [ 12/Jul/17 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/27997
Subject: LU-9764 lfsck: reset LFSCK trace file if fail to load it
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 93d2eeedf6a4c7f9fbff4649d39bb3f614fe8103

Comment by Gerrit Updater [ 14/Jun/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27997/
Subject: LU-9764 lfsck: reset LFSCK trace file if fail to load it
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 29c8a763fd7de5fd7b4bf154581f08488e8ce50e

Comment by nasf (Inactive) [ 14/Jun/18 ]

The patch has been landed to master.

Generated at Sat Feb 10 02:29:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.