[LU-7631] conf-sanity test_82a: getstripe -c wrong: found 2, expected 3 Created: 05/Jan/16  Updated: 19/Mar/19  Resolved: 16/Jan/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0, Lustre 2.11.0, Lustre 2.10.5
Fix Version/s: Lustre 2.13.0, Lustre 2.10.7, Lustre 2.12.1

Type: Bug Priority: Critical
Reporter: Maloo Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-5887 conf-sanity test_82a failed: '/usr/b... Resolved
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/9313c1e0-b06f-11e5-bf32-5254006e85c2.

The sub-test test_82a failed with the following error:

/usr/bin/lfs getstripe -c /mnt/lustre/d82a.conf-sanity/f82a.conf-sanity-1 wrong: found 2, expected 3

Looks like this might be related to running short of precreated OST objects on one of the OSTs and it is skipped rather than blocking the create. The MDS should allow at most 1/4 of requested stripes to be skipped if they have no objects rather than blocking the create indefinitely. However, it appears that this functionality was broken with the change from LOV to LOD, and in this case all 3 OST objects are required since (3 * 1/4 < 1) so no whole stripe could be skipped yet.

In lod_qos_prep_create() it does not set the flags = LOV_USES_DEFAULT_STRIPE for the cases when a filesystem-wide default striping is used as was done in the original qos_prep_create(), and as such lod_alloc_qos() requires that all requested stripes to be allocated. The lod_alloc_qos() code will fall back to lod_alloc_rr() with -EAGAIN if these cannot be allocated. In lod_alloc_rr() it will return success if at least one OST object was allocated, which doesn't seem correct if a large number of stripes was requested, though it isn't clear why lod_alloc_rr() doesn't wait for the OSTs to come online and allocate the requested number of objects.

Also, it looks like the check for lod_qos_is_usable() could be moved to the start of lod_alloc_qos() instead of after the pools are checked, since it doesn't use any of the pool information anyway.

Info required for matching: conf-sanity 82a



 Comments   
Comment by Richard Henwood (Inactive) [ 18/Mar/16 ]

Another failure on Master with review-dne-part-1:

https://testing.hpdd.intel.com/test_sets/421fd0d8-ebd6-11e5-93cc-5254006e85c2

Comment by Richard Henwood (Inactive) [ 20/Apr/16 ]

Another recent failure on Master with review-dne-part-1:

https://testing.hpdd.intel.com/test_sets/a9d74cae-057d-11e6-b5f1-5254006e85c2

Comment by Jian Yu [ 07/Nov/16 ]

One more failure on master branch in review-dne-part-1 test session:
https://testing.hpdd.intel.com/test_sets/01a4a8fc-a446-11e6-a980-5254006e85c2

Comment by nasf (Inactive) [ 10/Dec/16 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/2bde2290-be65-11e6-9f18-5254006e85c2

Comment by nasf (Inactive) [ 04/Feb/17 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/bac95502-ead8-11e6-af25-5254006e85c2

Comment by Minh Diep [ 20/Apr/17 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/d91b538c-2568-11e7-9de9-5254006e85c2

Comment by Andreas Dilger [ 25/Apr/17 ]

I can't see why this test is formatting a new filesystem? It should be able to run with any existing filesystem, and this should also avoid the failure since it there will not be a startup issue with the OSTs not being ready.

Comment by nasf (Inactive) [ 26/May/17 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/cf4a46ea-4252-11e7-bc6c-5254006e85c2

Comment by Gerrit Updater [ 06/Jun/17 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/27441
Subject: LU-7631 tests: wait_osts_up waits for MDS precreates
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cb7c2473cdbc2e375182e3d7de1b0fbfa6b0865a

Comment by Sebastien Buisson (Inactive) [ 22/Jun/17 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/48700424-56f8-11e7-8a1b-5254006e85c2

Comment by Gerrit Updater [ 19/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27441/
Subject: LU-7631 tests: wait_osts_up waits for MDS precreates
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: edb0fb241bb5e0cc95c240ed977abf7f234ee045

Comment by Peter Jones [ 19/Jul/17 ]

Landed for 2.11

Comment by Bob Glossman (Inactive) [ 14/Sep/17 ]

seen again on b2_10:
https://testing.hpdd.intel.com/test_sets/000cf930-9901-11e7-b778-5254006e85c2

I think the fix https://review.whamcloud.com/27441 is only landed on master, not b2_10.

Comment by Minh Diep [ 27/Oct/17 ]

hit again on master https://testing.hpdd.intel.com/test_sets/515a8c78-babf-11e7-9abd-52540065bddc

Comment by Andreas Dilger [ 01/Dec/17 ]

This test would benefit from printing out the "$ost_indices" value and the actual file layout from "lfs getstripe" in case of error. Also, getting "lctl get_param osc..prealloc__id" on the MDS before and after the test is run would tell us if the OSTs have precreated objects, or if wait_osts_up is not enough. It would also be good to make it more clear which of the "wrong:" messages is being printed. I suspect that it is just a matter of waiting for the MDS-OSS connections to have preallocated objects.

Comment by Minh Diep [ 05/Mar/18 ]

+1 on b2_10
https://testing.hpdd.intel.com/test_sets/b685aeb4-2033-11e8-b046-52540065bddc

Comment by James Nunez (Inactive) [ 20/Nov/18 ]

I took a look at all the conf-sanity test 82a failures from the past almost five months, July 1 to November 19, and this test is only failing for 2.10.5 and 2.10.6 testing. I will upload a patch with Andreas suggestions to help with debugging in case we see this test fail again.

Comment by Gerrit Updater [ 20/Nov/18 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33689
Subject: LU-7631 tests: add debug info to conf-sanity 82a
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5a48d247a6e1851595df0203f597a6aee52c38e4

Comment by Gerrit Updater [ 20/Nov/18 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33690
Subject: LU-7631 tests: wait_osts_up waits for MDS precreates
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 64ed3b90af4f2b53b4f78f27355148ab42a9ef19

Comment by Gerrit Updater [ 05/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33690/
Subject: LU-7631 tests: wait_osts_up waits for MDS precreates
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 17065139073a070d987235db7794805d264af2b3

Comment by Gerrit Updater [ 16/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33689/
Subject: LU-7631 tests: add debug info to conf-sanity 82a
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e76683a5bd540cacd2271a969aa9acd9bf790ccf

Comment by Gerrit Updater [ 28/Jan/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34121
Subject: LU-7631 tests: add debug info to conf-sanity 82a
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 32b27b2ba4feb7e4a064345e89ddfd4f07b4a381

Comment by Gerrit Updater [ 23/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34121/
Subject: LU-7631 tests: add debug info to conf-sanity 82a
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 4ae14186ce1958373c506e3abb12b891d46e70dc

Comment by Gerrit Updater [ 25/Feb/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34294
Subject: LU-7631 tests: add debug info to conf-sanity 82a
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 973f05acfb0986eb6f152df0130ac0d670e4ae0e

Comment by Gerrit Updater [ 19/Mar/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34294/
Subject: LU-7631 tests: add debug info to conf-sanity 82a
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 3fc307a9e2453c4fa13cf329bd129e24f98548c7

Generated at Sat Feb 10 02:10:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.