[LU-1912] Test failure on test suite lfsck Created: 12/Sep/12  Updated: 20/Sep/12  Resolved: 20/Sep/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0
Fix Version/s: Lustre 2.3.0, Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4431

 Description   

This issue was created by maloo for Minh Diep <mdiep@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/0e62c5d6-fa97-11e1-887d-52540035b04c.

survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh check_logdir /home/autotest/.autotest/shared_dir/2012-09-08/054638-7fbcd5d87e28
lfsck : @@@@@@ FAIL: /home/autotest/.autotest/shared_dir/2012-09-08/054638-7fbcd5d87e28 isn't a shared directory
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:3642:error_noexit()
= /usr/lib64/lustre/tests/test-framework.sh:3664:error()
= /usr/lib64/lustre/tests/test-framework.sh:3219:generate_db()
= /usr/lib64/lustre/tests/lfsck.sh:260:main()

It's strange to me the different lustre version (b2_2 and b2_3) between server and client would cause above issue. I tried on a cluster with all b2_3 and it worked for the exact some dir (ie /scratch/mdiep/tmp)



 Comments   
Comment by Minh Diep [ 12/Sep/12 ]

on b2_3

check_write_access() {
local dir=$1
local node
local file

for node in $(nodes_list); do
file=$dir/check_file.$(short_hostname $node)
if [[ ! -f "$file" ]]; then

  1. Logdir not accessible/writable from this node.
    return 1
    fi
    rm -f $file || return 1
    done
    return 0
    }

on b2_2

check_write_access() {
local dir=$1
for node in $(nodes_list); do
if [ ! -f "$dir/node.$(short_hostname ${node}).yml" ]; then

  1. Logdir not accessible/writable from this node.
    return 1
    fi
    done
    rm -f $dir/node.*.yml
    return 0
    }

This means running b2_3 <-> b2_2 will cause this issue because the file created and expected to exist will be different. I don't have a good solution for this. Let's check with Yujian

Comment by Peter Jones [ 14/Sep/12 ]

Yujian

Could you please look into this one?

Thanks

Peter

Comment by Jian Yu [ 17/Sep/12 ]

The patch for LU-1255 in http://review.whamcloud.com/#change,2376 needs to be landed on Lustre b2_2 branch.

Comment by Peter Jones [ 17/Sep/12 ]

Yujian

We don't have any plans to land anything to b2_2 at this time. Is there a way to just skip this test when interoperating with 2.2? If not then this test failure will disappear when we switching to the 2.4 interop matrix.

Peter

Comment by Minh Diep [ 17/Sep/12 ]

If approve, I think we can check and skip server or client run b2_2 or b2_3

Comment by Jian Yu [ 17/Sep/12 ]

Hi Peter,
For 2.3 client and 2.2 server, we can add Lustre version check code in 2.3 test suite to skip the test.
For 2.2 client and 2.3 server, I think there is no proper way.

Comment by Jian Yu [ 17/Sep/12 ]

Lustre client build: http://build.whamcloud.com/job/lustre-b2_2/17
Lustre server build: http://build.whamcloud.com/job/lustre-b2_3/19
Distro/Arch: RHEL6.3/x86_64
The same issue occurred: https://maloo.whamcloud.com/test_sets/d84f0c0e-ff5d-11e1-bce0-52540035b04c

Comment by Jian Yu [ 18/Sep/12 ]

Patch for master branch: http://review.whamcloud.com/4021
It also needs to be cherry-picked to b2_3 branch.

Comment by Peter Jones [ 20/Sep/12 ]

Landed for 2.3 and 2.4

Generated at Sat Feb 10 01:20:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.