[LU-12925] interop: conf-sanity test 62 fails with “Restart of mds1 failed!” Created: 01/Nov/19  Updated: 21/Nov/19  Resolved: 21/Nov/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.12.4

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: interop
Environment:

master (2.13) servers with 2.12.3 clients


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

conf-sanity test_62 fails in interop testing with master servers and b2_12 clients. This test and others started failing on 24 OCT 2019 for master 2.12.58.171 with 2.12.3 clients. The last time this test passed was for 2.12.58.155 build #3964 servers with 2.12.2 build #18 on 17 OCT 2019.

Looking at the suite_log for the failure at https://testing.whamcloud.com/test_sets/2d201dd4-f9cc-11e9-be86-52540065bddc, we see

CMD: trevis-6vm7 mkdir -p /mnt/lustre-mds1; mount -t lustre   /dev/mapper/mds1_flakey /mnt/lustre-mds1
trevis-6vm7: mount.lustre: mount /dev/mapper/mds1_flakey at /mnt/lustre-mds1 failed: Invalid argument
trevis-6vm7: This may have multiple causes.
trevis-6vm7: Are the mount options correct?
trevis-6vm7: Check the syslog for more info.
Start of /dev/mapper/mds1_flakey on mds1 failed 22
 conf-sanity test_62: @@@@@@ FAIL: Restart of mds1 failed! 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5864:error()
  = /usr/lib64/lustre/tests/test-framework.sh:1586:mount_facets()
  = /usr/lib64/lustre/tests/test-framework.sh:3361:facet_failover()
  = /usr/lib64/lustre/tests/test-framework.sh:3455:fail()
  = /usr/lib64/lustre/tests/test-framework.sh:4182:stopall()
  = /usr/lib64/lustre/tests/test-framework.sh:4455:formatall()
  = /usr/lib64/lustre/tests/conf-sanity.sh:108:reformat()
  = /usr/lib64/lustre/tests/conf-sanity.sh:90:reformat_and_config()
  = /usr/lib64/lustre/tests/conf-sanity.sh:4603:test_62()

Looking at the MDS (vm7) console log, we see the following errors

[38024.851590] Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey
[38025.161866] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey
[38025.494661] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre   /dev/mapper/mds1_flakey /mnt/lustre-mds1
[38025.715937] LDISKFS-fs (dm-3): mounted filesystem without journal. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[38025.717858] LustreError: 19847:0:(osd_handler.c:7696:osd_mount()) lustre-MDT0000-osd: device /dev/mapper/mds1_flakey is mounted w/o journal
[38025.719942] LustreError: 19847:0:(obd_config.c:575:class_setup()) setup lustre-MDT0000-osd failed (-22)
[38025.721511] LustreError: 19847:0:(obd_mount.c:205:lustre_start_simple()) lustre-MDT0000-osd setup error -22
[38025.723385] LustreError: 19847:0:(obd_mount_server.c:1977:server_fill_super()) Unable to start osd on /dev/mapper/mds1_flakey: -22
[38025.725326] LustreError: 19847:0:(obd_mount.c:1669:lustre_fill_super()) Unable to mount  (-22)
[38025.955972] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_62: @@@@@@ FAIL: Restart of mds1 failed! 
[38026.141053] Lustre: DEBUG MARKER: conf-sanity test_62: @@@@@@ FAIL: Restart of mds1 failed!

When conf-sanity test 62 fails, we also see tests 64, 65, 66, 68 and 69 fail. Tests 63 and 67 do not fail.

We’ve seen these tests fail only once before
https://testing.whamcloud.com/test_sets/ee1e3636-f75d-11e9-a197-52540065bddc



 Comments   
Comment by Peter Jones [ 01/Nov/19 ]

Yang Sheng

Could you please investigate?

Thanks

Peter

Comment by Andreas Dilger [ 01/Nov/19 ]

This may be an incompatibility between the b2_12 and master test-framework.sh or conf-sanity.sh scripts (e.g. how the mount options are passed, or some option that is no longer being parsed correctly). It may say in the debug kernel logs what the problem is.

Comment by Gerrit Updater [ 04/Nov/19 ]

Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36660
Subject: LU-12925 test: assign right initial value for test_61
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 00a274c7af2d6b7c7da377b0968919f7fbef23f6

Comment by Gerrit Updater [ 21/Nov/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36660/
Subject: LU-12925 test: assign right initial value for test_61
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 6ab32eedff99715b9fcbc3d3f750906a658bbd7a

Generated at Sat Feb 10 02:56:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.