[LU-8398] conf-sanity test_32a: test_32a failed with 1 Created: 14/Jul/16  Updated: 28/Apr/20  Resolved: 28/Apr/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0, Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-9760 Conf-sanity test 32a and 32d failed w... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/4f228b3e-4975-11e6-9f8e-5254006e85c2.

The sub-test test_32a failed with the following error:

test_32a failed with 1

while this fail is in the same test as LU-7035 the errors are different.
Several errors are seen in the test log:

persistent mount opts: 
Parameters: lov.stripecount=0 lov.stripesize=1048576 mdt.identity_upcall=/usr/sbin/l_getidentity sys.timeout=20

exiting before disk write.
IOC_LIBCFS_GET_NI error 22: Invalid argument

and

CMD: onyx-64 mount -t lustre -o exclude=t32fs-OST0000 t32fs-mdt1/mdt1 /tmp/t32/mnt/mdt
onyx-64: mount.lustre: mount t32fs-mdt1/mdt1 at /tmp/t32/mnt/mdt failed: No such file or directory
onyx-64: Is the MGS specification correct?
onyx-64: Is the filesystem name correct?
onyx-64: If upgrading, is the copied client log valid? (see upgrade docs)
CMD: onyx-64 losetup -a
 conf-sanity test_32a: @@@@@@ FAIL: Mounting the MDT 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4713:error_noexit()
  = /usr/lib64/lustre/tests/conf-sanity.sh:1730:t32_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2066:test_32a()
  = /usr/lib64/lustre/tests/test-framework.sh:4991:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5028:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4893:run_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2070:main()

and

CMD: onyx-64 zpool destroy t32fs-ost1
 conf-sanity test_32a: @@@@@@ FAIL: test_32a failed with 1 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4713:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4744:error()
  = /usr/lib64/lustre/tests/test-framework.sh:4991:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5028:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4893:run_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2070:main()

Info required for matching: conf-sanity 32a



 Comments   
Comment by Jian Yu [ 19/Aug/16 ]

More failure instances on master branch:
https://testing.hpdd.intel.com/sub_tests/01cfc912-6565-11e6-b5b1-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/f07ec770-6074-11e6-906c-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/21005b46-5712-11e6-b2e2-5254006e85c2

Comment by Jian Yu [ 19/Aug/16 ]

Console log on MDS:

Lustre: DEBUG MARKER: mount -t lustre -o exclude=t32fs-OST0000 t32fs-mdt1/mdt1 /tmp/t32/mnt/mdt
Lustre: MGS: Connection restored to MGC192.168.5.144@o2ib_0 (at 0@lo)
Lustre: Skipped 32 previous similar messages
LustreError: 126628:0:(ldlm_lib.c:459:client_obd_setup()) can't add initial connection
LustreError: 126628:0:(osp_dev.c:1150:osp_init0()) t32fs-MDT0001-osp-MDT0000: can't setup obd: rc = -2
LustreError: 126628:0:(obd_config.c:578:class_setup()) setup t32fs-MDT0001-osp-MDT0000 failed (-2)
LustreError: 126628:0:(obd_config.c:1671:class_config_llog_handler()) MGC192.168.5.144@o2ib: cfg command failed: rc = -2 
Lustre:    cmd=cf003 0:t32fs-MDT0001-osp-MDT0000  1:t32fs-MDT0001_UUID  2:10.100.4.87@tcp
LustreError: 15c-8: MGC192.168.5.144@o2ib: The configuration from log 't32fs-MDT0000' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 126533:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server t32fs-MDT0000: -2
LustreError: 126533:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -2
Lustre: Failing over t32fs-MDT0000
LustreError: 126533:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount  (-2)
Lustre: DEBUG MARKER: losetup -a
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_32a: @@@@@@ FAIL: Mounting the MDT

All of the failures occurred on onyx-[64-67] test nodes with IB network.

Comment by Jian Yu [ 19/Aug/16 ]

The failure occurred before in LU-2200 and Nathaniel created patch http://review.whamcloud.com/6197 to fix that. Now the failure occurs again.

Comment by nasf (Inactive) [ 13/Sep/16 ]

Another failure instance on master:
https://testing.hpdd.intel.com/test_sets/51d0cee2-73b7-11e6-8afd-5254006e85c2

Comment by Gu Zheng (Inactive) [ 10/Oct/16 ]

Similar instance on master, but different error number.
https://testing.hpdd.intel.com/test_sets/b192bcbe-8eba-11e6-a9b0-5254006e85c2

Comment by Niu Yawei (Inactive) [ 26/Oct/16 ]

The network interface on onyx-[64-67] is IB, but I'm not sure why http://review.whamcloud.com/#/c/6197/ didn't make it work, probably there are multiple types of interfaces on onyx-[64-67], and that case can't be handled well by test script?

Comment by Bob Glossman (Inactive) [ 01/Dec/16 ]

more on master:
https://testing.hpdd.intel.com/test_sets/cb2a99e6-b776-11e6-be4d-5254006e85c2
https://testing.hpdd.intel.com/test_sets/8585bbcc-b82b-11e6-847d-5254006e85c2

Comment by James Casper [ 02/Feb/17 ]

In master branch, v2.9.52, b3499, the conf-sanity test_32a failure also caused 11 subsequent subtest failures (after 32a).

Comment by ZhangWei [ 13/Jul/17 ]

I found an issue ( https://jira.hpdd.intel.com/browse/LU-9760 ) seems very much like this one, can some one help about this ?

Comment by Minh Diep [ 01/Feb/18 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/ebcce414-06e5-11e8-a10a-52540065bddc

Comment by Andreas Dilger [ 28/Apr/20 ]

Close old issue that has not been reported in a long time.

Generated at Sat Feb 10 02:17:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.