[LU-9837] configuration from log 'lustre-MDT0000' failed Created: 04/Aug/17  Updated: 04/Nov/17  Resolved: 04/Nov/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Critical
Reporter: James A Simmons Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL7.3 running latest 2.10.51 on both servers and clients


Issue Links:
Related
is related to LU-8066 Move lustre procfs handling to sysfs ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

With the most recent lustre code I see the following errors:

1213.769569] Lustre: DEBUG MARKER: == sanity test 17m: run e2fsck against MDT which contains short/long symlink ========================= 18:37:18 (1501886238)
[ 1222.234510] Lustre: Failing over lustre-MDT0000
[ 1223.000176] format at watchdog.c:329:lcw_dispatch_stop doesn't end in newline
[ 1223.010525] Lustre: server umount lustre-MDT0000 complete
[ 1225.446028] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[ 1225.754332] LustreError: 6373:0:(obd_config.c:563:class_setup()) setup lustre-MDT0000 failed (-12)
[ 1225.765916] LustreError: 6373:0:(obd_config.c:1691:class_config_llog_handler()) MGC10.37.248.196@o2ib1: cfg command failed: rc = -12
[ 1225.782893] Lustre: cmd=cf003 0:lustre-MDT0000 1:lustre-MDT0000_UUID 2:0 3:lustre-MDT0000-mdtlov 4:f

[ 1225.799392] LustreError: 15c-8: MGC10.37.248.196@o2ib1: The configuration from log 'lustre-MDT0000' failed (-12). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
[ 1225.830574] LustreError: 6322:0:(obd_mount_server.c:1370:server_start_targets()) failed to start server lustre-MDT0000: -12
[ 1225.844315] LustreError: 6322:0:(obd_mount_server.c:1863:server_fill_super()) Unable to start targets: -12
[ 1225.856498] LustreError: 6322:0:(obd_config.c:614:class_cleanup()) Device 4 not setup
[ 1226.155642] format at watchdog.c:329:lcw_dispatch_stop doesn't end in newline
[ 1226.165770] Lustre: server umount lustre-MDT0000 complete
[ 1226.173688] LustreError: 6322:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount (-12)
[ 1226.733763] Lustre: DEBUG MARKER: sanity test_17m: @@@@@@ FAIL: start failed

This happens for any MDS failover test.



 Comments   
Comment by James A Simmons [ 05/Aug/17 ]

While the patch for LU-9725 lets the debugfs code to go further this bug now shows up.

Comment by Joseph Gmitter (Inactive) [ 07/Aug/17 ]

Hi Hongchao,

Would you be able to try and reproduce this locally?

Thanks.
Joe

Comment by James A Simmons [ 07/Aug/17 ]

Its a recovery bug coming to light from the debugfs port. You can a complete log at:

https://testing.hpdd.intel.com/test_sets/ac7907ee-7972-11e7-bc50-5254006e85c2

Comment by Hongchao Zhang [ 10/Aug/17 ]

Hi James,
What patches should be applied for reproducing this issue? I can't reproduce it locally with the patch #28357 in LU-9725.
Thanks!

Comment by James A Simmons [ 13/Aug/17 ]

A few patches had to land to make this bug appear before. With the latest landings to master you just need to apply patch https://review.whamcloud.com/#/c/26651 to reproduce this problem.

Comment by James A Simmons [ 07/Sep/17 ]

This will be fixed by patch https://review.whamcloud.com/#/c/28818

Comment by Minh Diep [ 08/Sep/17 ]

simmonsja, I wonder if this should be a blocker as it hasn't even hit yet, only hit it because of your unlanded patch.

Comment by James A Simmons [ 04/Nov/17 ]

The patch  https://review.whamcloud.com/#/c/28818 has been merged to master that resolved this issue.

Generated at Sat Feb 10 02:29:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.