[LU-10723] Interop 2.10.3<->2.11 sanity test_232b: OSS hung Created: 26/Feb/18  Updated: 08/Mar/18  Resolved: 08/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Quentin Bouget
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10302 hsm: obscure bug with multi-mountpoin... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test_232b - Timeout occurred after 168 mins, last suite running was sanity, restarting cluster to continue tests
^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run:
https://testing.hpdd.intel.com/test_sets/4c514328-12aa-11e8-a6ad-52540065bddc
test_232b failed with the following error:

Timeout occurred after 168 mins, last suite running was sanity, restarting cluster to continue tests

client: lustre-master tag-2.10.58
server: 2.10.3

OSS console

[ 6415.456311] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 232b: failed data version lock should not block umount ================================ 22:43:59 \(1518648239\)
[ 6415.640059] Lustre: DEBUG MARKER: == sanity test 232b: failed data version lock should not block umount ================================ 22:43:59 (1518648239)
[ 6416.022119] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x31c
[ 6416.188210] Lustre: *** cfs_fail_loc=31c, val=0***
[ 6416.188807] LustreError: 5316:0:(ldlm_request.c:469:ldlm_cli_enqueue_local()) ### delayed lvb init failed (rc -12) ns: filter-lustre-OST0000_UUID lock: ffff88005e9f6400/0xf4fc5b448cfd123c lrc: 2/0,0 mode: --/PR res: [0x99c2:0x0:0x0].0x0 rrc: 2 type: EXT [0->0] (req 0->0) flags: 0x40000000000000 nid: local remote: 0x0 expref: -99 pid: 5316 timeout: 0 lvb_type: 0
[ 6416.345912] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0
[ 6416.882703] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost1' ' /proc/mounts
[ 6417.199191] Lustre: DEBUG MARKER: umount -d /mnt/lustre-ost1
[ 6417.366042] Lustre: Failing over lustre-OST0000
[ 6417.427264] LustreError: 24697:0:(ldlm_resource.c:1100:ldlm_resource_complain()) filter-lustre-OST0000_UUID: namespace resource [0x99c2:0x0:0x0].0x0 (ffff88005d0809c0) refcount nonzero (1) after lock cleanup; forcing cleanup.
[ 6417.429304] LustreError: 24697:0:(ldlm_resource.c:1682:ldlm_resource_dump()) --- Resource: [0x99c2:0x0:0x0].0x0 (ffff88005d0809c0) refcount = 2
[ 6418.378000] Lustre: lustre-OST0000: Not available for connect from 10.2.8.127@tcp (stopping)
[ 6421.517980] Lustre: lustre-OST0000: Not available for connect from 10.2.8.125@tcp (stopping)
[ 6422.430852] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110)
[ 6426.515485] Lustre: lustre-OST0000: Not available for connect from 10.2.8.125@tcp (stopping)
[ 6426.516597] Lustre: Skipped 1 previous similar message
[ 6427.431867] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110)
[ 6431.515551] Lustre: lustre-OST0000: Not available for connect from 10.2.8.125@tcp (stopping)
[ 6431.516800] Lustre: Skipped 2 previous similar messages
[ 6432.432858] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110)
[ 6436.515436] Lustre: lustre-OST0000: Not available for connect from 10.2.8.125@tcp (stopping)
[ 6436.516663] Lustre: Skipped 2 previous similar messages
[ 6437.433853] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110)
[ 6442.434854] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110)
[ 6446.515570] Lustre: lustre-OST0000: Not available for connect from 10.2.8.125@tcp (stopping)
[ 6446.516867] Lustre: Skipped 5 previous similar messages
[ 6452.435856] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110)
[ 6452.437151] LustreError: Skipped 1 previous similar message
[ 6466.515613] Lustre: lustre-OST0000: Not available for connect from 10.2.8.125@tcp (stopping)
[ 6466.516623] Lustre: Skipped 11 previous similar messages
[ 6472.436880] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110)
[ 6472.438283] LustreError: Skipped 3 previous similar messages
[ 6501.515401] Lustre: lustre-OST0000: Not available for connect from 10.2.8.125@tcp (stopping)
[ 6501.516599] Lustre: Skipped 20 previous similar messages
[ 6507.438850] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110)
[ 6507.440165] LustreError: Skipped 6 previous similar messages
[ 6530.986984] LustreError: 24702:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880007849200 x1592411468623712/t0(0) o101->lustre-MDT0000-lwp-OST0000@10.2.8.127@tcp:23/10 lens 456/496 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
[ 6530.989429] LustreError: 24702:0:(qsd_reint.c:56:qsd_reint_completion()) lustre-OST0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x20000:0x0], rc:-5
[ 6530.990942] LustreError: 24702:0:(qsd_reint.c:56:qsd_reint_completion()) Skipped 1 previous similar message
[ 6566.515348] Lustre: lustre-OST0000: Not available for connect from 10.2.8.125@tcp (stopping)
[ 6566.516397] Lustre: Skipped 38 previous similar messages
[ 6572.439881] LustreError: 0-0: Forced cleanup waiting for filter-lustre-OST0000_UUID namespace with 1 resources in use, (rc=-110)


 Comments   
Comment by James Nunez (Inactive) [ 28/Feb/18 ]

sanity test 232b was added by patch https://review.whamcloud.com/30477 fro LU-10302 and landed to master 2.10.57~70. We should skip this test for servers with version number less than 2.10.58.

Comment by Gerrit Updater [ 02/Mar/18 ]

Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/31487
Subject: LU-10723 tests: disable sanity 232b before 2.10.58
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0c3be07197ccd9ca87fd28276c8896701c35c12c

Comment by Gerrit Updater [ 08/Mar/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31487/
Subject: LU-10723 tests: disable sanity 232b before 2.10.58
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2bf0f9873af6524b974851abf680866e06d26505

Comment by Peter Jones [ 08/Mar/18 ]

Landed for 2.11

Generated at Sat Feb 10 02:37:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.