[LU-4600] Test failure on test suite conf-sanity, subtest test_50h "some OSC imports are still not connected" Created: 07/Feb/14  Updated: 16/Apr/20  Resolved: 16/Apr/20

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.5.3, Lustre 2.9.0, Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Alex Zhuravlev
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-3529 create striped directory Resolved
is related to LU-9225 conf-sanity test_50g: /usr/bin/lfs se... Open
Severity: 3
Rank (Obsolete): 12585

 Description   

This issue was created by maloo for wangdi <di.wang@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/994ad81c-8fbc-11e3-92cc-52540035b04c.

The sub-test test_50h failed with the following error:

some OSC imports are still not connected

Info required for matching: conf-sanity 50h



 Comments   
Comment by Andreas Dilger [ 10/Feb/14 ]

It looks like this problem has only been hit once with the patch 7196 applied. Until seen elsewhere we should assume this problem is caused by the patch.

Comment by Bob Glossman (Inactive) [ 07/Apr/14 ]

seen in b2_4
https://maloo.whamcloud.com/test_sets/450f675e-b8b6-11e3-bc82-52540035b04c

Comment by Andreas Dilger [ 29/May/14 ]

Seen on master (2.6-pre): https://maloo.whamcloud.com/test_sets/3cd0ba26-e70e-11e3-badc-52540035b04c

Comment by James Nunez (Inactive) [ 08/Aug/14 ]

I hit this on master (2.7): https://testing.hpdd.intel.com/test_sets/2046baf4-11e3-11e4-90ac-5254006e85c2

In the client log, I see

open(/mnt/lustre/d50h.conf-sanity/2/f50h.conf-sanity-0) error: File too large

Maybe related to LU-4340?

Comment by Jian Yu [ 24/Aug/14 ]

Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/84/
Distro/Arch: RHEL6.5/x86_64
Network: o2ib

https://testing.hpdd.intel.com/test_sets/9c8fcf9a-2b62-11e4-8687-5254006e85c2

Dmesg on MDS:

LustreError: 14985:0:(osp_precreate.c:719:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -5
Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre-OST0000.osc.active='1'
Lustre: Permanently reactivating lustre-OST0000
LustreError: 14718:0:(lod_qos.c:946:lod_alloc_specific()) can't lstripe objid [0x200000bd0:0x5:0x0]: have 0 want 1
LustreError: 167-0: lustre-OST0000-osc-MDT0000: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_50h: @@@@@@ FAIL: some OSC imports are still not connected 
Comment by Doug Oucharek (Inactive) [ 19/Sep/14 ]

Seen in master (2.7): https://testing.hpdd.intel.com/test_sets/218a95fe-3e0f-11e4-b06a-5254006e85c2

Comment by Bob Glossman (Inactive) [ 14/Mar/15 ]

another seen on b2_5:
https://testing.hpdd.intel.com/test_sets/2cdb6ca2-ca74-11e4-9330-5254006e85c2

Comment by Niu Yawei (Inactive) [ 24/Sep/15 ]

https://testing.hpdd.intel.com/test_sets/786f02ae-6255-11e5-8cee-5254006e85c2

Comment by James Nunez (Inactive) [ 06/Jan/16 ]

There are two different error messages found in the test_log for the failures listed in this ticket.

Some of the logs listed here have the following error in the test_log:

open(/mnt/lustre/d50h.conf-sanity/2/f50h.conf-sanity-0) error: No space left on device

which may be related to LU-7309.

Logs with this failure are at:
2015-11-22 10:46:30 - https://testing.hpdd.intel.com/test_sets/7a1ec004-9134-11e5-b507-5254006e85c2
2016-01-05 07:30:40 - https://testing.hpdd.intel.com/test_sets/33dac2da-b3aa-11e5-8114-5254006e85c2
2016-02-01 12:07:07 - https://testing.hpdd.intel.com/test_sets/9a834e80-c908-11e5-aaa9-5254006e85c2

Others test_logs mentioned in this ticket have the following error:

open(/mnt/lustre/d50h.conf-sanity/2/f50h.conf-sanity-0) error: File too large
Comment by Bob Glossman (Inactive) [ 21/Apr/16 ]

another on master:
https://testing.hpdd.intel.com/test_sets/9fdfff5e-0751-11e6-9e5d-5254006e85c2

Comment by Bob Glossman (Inactive) [ 19/Jan/17 ]

another on master:
https://testing.hpdd.intel.com/test_sets/245c2112-de9e-11e6-a08b-5254006e85c2

Comment by Bob Glossman (Inactive) [ 25/Jan/17 ]

another on master:
https://testing.hpdd.intel.com/test_sets/d599a2c8-e2f0-11e6-84f4-5254006e85c2

Comment by nasf (Inactive) [ 04/Feb/17 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/40074146-ea7c-11e6-be3b-5254006e85c2

Comment by Andreas Dilger [ 09/Feb/17 ]

+1 on master: https://testing.hpdd.intel.com/test_sets/657e0ec0-ee71-11e6-b34d-5254006e85c2

Comment by James Nunez (Inactive) [ 16/Mar/17 ]

I’ve looked at the past 17 failures and they all take place on review-zfs-part-2. The MDS and OSS logs all have the same errors.

In a recent failure, https://testing.hpdd.intel.com/test_sets/1a98b6a8-0686-11e7-98e7-5254006e85c2, the MDS console log shows:

15:36:09:[10168.048147] Lustre: setting import lustre-OST0000_UUID INACTIVE by administrator request
15:36:09:[10168.051350] LustreError: 29278:0:(osp_precreate.c:616:osp_precreate_send()) lustre-OST0000-osc-MDT0000: can't precreate: rc = -5
15:36:09:[10168.055582] LustreError: 29278:0:(osp_precreate.c:1264:osp_precreate_thread()) lustre-OST0000-osc-MDT0000: cannot precreate objects: rc = -5
15:36:09:[10176.020524] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre-OST0000.osc.active='1'
15:36:09:[10176.166752] Lustre: Permanently reactivating lustre-OST0000
15:36:09:[10177.906960] LustreError: 167-0: lustre-OST0000-osc-MDT0000: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
15:36:09:[10177.917400] LustreError: 29278:0:(osp_precreate.c:914:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -22
15:36:09:[10178.926677] LustreError: 29278:0:(osp_precreate.c:914:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -22
15:36:09:[10179.936774] LustreError: 29278:0:(osp_precreate.c:914:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -22
15:36:09:[10181.949675] LustreError: 29278:0:(osp_precreate.c:914:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -22
15:36:09:[10181.954467] LustreError: 29278:0:(osp_precreate.c:914:osp_precreate_cleanup_orphans()) Skipped 1 previous similar message
15:37:05:[10185.969586] LustreError: 29278:0:(osp_precreate.c:914:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -22
15:37:06:[10185.976987] LustreError: 29278:0:(osp_precreate.c:914:osp_precreate_cleanup_orphans()) Skipped 3 previous similar messages
15:37:06:[10186.327376] LustreError: 29258:0:(lod_qos.c:1273:lod_alloc_specific()) can't lstripe objid [0x200000bd0:0x3:0x0]: have 1 want 2
15:37:06:[10186.535832] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_50h: @@@@@@ FAIL: some OSC imports are still not connected 

On the OSS, we see:

15:36:09:[10171.051192] LustreError: 18033:0:(ofd_dev.c:1688:ofd_create_hdl()) lustre-OST0000: invalid precreate request for 0x0:33, last_id 65. Likely MDS last_id corruption
15:36:09:[10172.060525] LustreError: 18033:0:(ofd_dev.c:1688:ofd_create_hdl()) lustre-OST0000: invalid precreate request for 0x0:33, last_id 65. Likely MDS last_id corruption
15:36:09:[10173.070515] LustreError: 18033:0:(ofd_dev.c:1688:ofd_create_hdl()) lustre-OST0000: invalid precreate request for 0x0:33, last_id 65. Likely MDS last_id corruption
15:36:09:[10175.080586] LustreError: 18033:0:(ofd_dev.c:1688:ofd_create_hdl()) lustre-OST0000: invalid precreate request for 0x0:33, last_id 65. Likely MDS last_id corruption
15:36:09:[10175.085663] LustreError: 18033:0:(ofd_dev.c:1688:ofd_create_hdl()) Skipped 1 previous similar message
15:36:09:[10179.096640] LustreError: 18033:0:(ofd_dev.c:1688:ofd_create_hdl()) lustre-OST0000: invalid precreate request for 0x0:33, last_id 65. Likely MDS last_id corruption
15:36:09:[10179.104152] LustreError: 18033:0:(ofd_dev.c:1688:ofd_create_hdl()) Skipped 3 previous similar messages
15:36:09:[10179.697594] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_50h: @@@@@@ FAIL: some OSC imports are still not connected 
Comment by Shuichi Ihara (Inactive) [ 22/Mar/17 ]

+1 on master https://testing.hpdd.intel.com/sub_tests/b7ede5e6-0ef5-11e7-9053-5254006e85c2

Comment by Shuichi Ihara (Inactive) [ 26/Mar/17 ]

+1 on master
https://testing.hpdd.intel.com/test_sessions/5f184773-253a-47ab-8546-f4d4361b7ad1

Comment by Steve Guminski (Inactive) [ 18/Apr/17 ]

Another on master:

https://testing.hpdd.intel.com/test_sessions/079dc3b3-ee4c-46f6-8d64-1d7fc536c744

Comment by John Hammond [ 21/Apr/17 ]

+1 on master

Comment by Mikhail Pershin [ 26/Apr/17 ]

https://testing.hpdd.intel.com/test_sets/c0c038c2-29df-11e7-9073-5254006e85c2 on master

Comment by Andreas Dilger [ 15/Dec/17 ]

Haven't seen this in months.

Comment by Minh Diep [ 04/Jan/18 ]

+1 on 2.10.x

https://testing.hpdd.intel.com/test_sets/d3cd7fc8-f01e-11e7-8c23-52540065bddc

Generated at Sat Feb 10 01:44:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.