[LU-4016] metadata-updates failure: attributes check failed Created: 26/Sep/13 Updated: 31/Dec/13 Resolved: 20/Dec/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.1 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | James Nunez (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Build # 1687 |
||
| Severity: | 3 |
| Rank (Obsolete): | 10782 |
| Description |
|
Test results at https://maloo.whamcloud.com/test_sessions/eb9d697c-2658-11e3-8d26-52540035b04c metadata-updates fails with "attributes check failed". Node logs don't exist, but the suite_log shows: Checking file(s) attributes ... c08: /lustre/scratch/d0.metadata-updates/c07/testfile [ uid gid size mode ] expected : 0 0 1024 0644 ; got : root 0 1024 0644 c08: Wrong file attributes c09: /lustre/scratch/d0.metadata-updates/c07/testfile [ uid gid size mode ] expected : 0 0 1024 0644 ; got : root 0 1024 0644 c09: Wrong file attributes pdsh@c10: c08: ssh exited with exit code 56 c07: /lustre/scratch/d0.metadata-updates/c07/testfile [ uid gid size mode ] expected : 0 0 1024 0644 ; got : root 0 1024 0644 c07: Wrong file attributes pdsh@c10: c09: ssh exited with exit code 56 pdsh@c10: c07: ssh exited with exit code 56 c10: /lustre/scratch/d0.metadata-updates/c07/testfile [ uid gid size mode ] expected : 0 0 1024 0644 ; got : root 0 1024 0644 c10: Wrong file attributes pdsh@c10: c10: ssh exited with exit code 56 metadata-updates : @@@@@@ FAIL: attributes check failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit() = /usr/lib64/lustre/tests/metadata-updates.sh:227:main() Although this was run during HSM testing, the copytool was not started and HSM was not enabled on the MDS, but all HSM attributes were in the test configuration files. |
| Comments |
| Comment by James Nunez (Inactive) [ 27/Sep/13 ] |
|
Another similar failure while HSM was enabled: https://maloo.whamcloud.com/test_sessions/277b4f38-2779-11e3-8a10-52540035b04c There are additional failures, after truncate and chmod: Checking file(s) attributes ... c10: /lustre/scratch/d0.metadata-updates/c08/testfile [ uid gid size mode ] expected : 0 0 1024 0644 ; got : root 0 1024 0644 c10: Wrong file attributes c08: /lustre/scratch/d0.metadata-updates/c08/testfile [ uid gid size mode ] expected : 0 0 1024 0644 ; got : root 0 1024 0644 c08: Wrong file attributes pdsh@c10: c10: ssh exited with exit code 56 c09: /lustre/scratch/d0.metadata-updates/c08/testfile [ uid gid size mode ] expected : 0 0 1024 0644 ; got : root 0 1024 0644 c09: Wrong file attributes pdsh@c10: c08: ssh exited with exit code 56 pdsh@c10: c09: ssh exited with exit code 56 metadata-updates : @@@@@@ FAIL: attributes check failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit() = /usr/lib64/lustre/tests/metadata-updates.sh:227:main() Dumping lctl log to /tmp/test_logs/2013-09-26/124150/metadata-updates..*.1380231630.log Part 2. file(s) attributes modification. Performing chmod 00222 ... Checking file(s) attributes ... c10: /lustre/scratch/d0.metadata-updates/c08/testfile [ uid gid size mode ] expected : 0 0 1024 0222 ; got : root 0 1024 0222 c10: Wrong file attributes pdsh@c10: c10: ssh exited with exit code 56 c09: /lustre/scratch/d0.metadata-updates/c08/testfile [ uid gid size mode ] expected : 0 0 1024 0222 ; got : root 0 1024 0222 c09: Wrong file attributes c08: /lustre/scratch/d0.metadata-updates/c08/testfile [ uid gid size mode ] expected : 0 0 1024 0222 ; got : root 0 1024 0222 c08: Wrong file attributes pdsh@c10: c08: ssh exited with exit code 56 pdsh@c10: c09: ssh exited with exit code 56 metadata-updates : @@@@@@ FAIL: wrong attributes after chmod Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit() = /usr/lib64/lustre/tests/metadata-updates.sh:232:main() Dumping lctl log to /tmp/test_logs/2013-09-26/124150/metadata-updates..*.1380231632.log Changing atime, mtime ... Checking atime, mtime ... Part 3. truncate file(s) to 0 size, check new file size. Truncating file(s) ... Checking file(s) attributes ... c10: /lustre/scratch/d0.metadata-updates/c08/testfile [ uid gid size mode ] expected : 0 0 0 0222 ; got : root 0 0 0222 c10: Wrong file attributes pdsh@c10: c10: ssh exited with exit code 56 c08: /lustre/scratch/d0.metadata-updates/c08/testfile [ uid gid size mode ] expected : 0 0 0 0222 ; got : root 0 0 0222 c08: Wrong file attributes pdsh@c10: c08: ssh exited with exit code 56 c09: /lustre/scratch/d0.metadata-updates/c08/testfile [ uid gid size mode ] expected : 0 0 0 0222 ; got : root 0 0 0222 c09: Wrong file attributes pdsh@c10: c09: ssh exited with exit code 56 metadata-updates : @@@@@@ FAIL: wrong attributes after truncate Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit() = /usr/lib64/lustre/tests/metadata-updates.sh:240:main() Dumping lctl log to /tmp/test_logs/2013-09-26/124150/metadata-updates..*.1380231634.log |
| Comment by James Nunez (Inactive) [ 15/Oct/13 ] |
|
Another failure at https://maloo.whamcloud.com/test_sessions/4574cfe8-35e1-11e3-b051-52540035b04c |
| Comment by Stephen Champion [ 22/Oct/13 ] |
|
I've been seeing this on rhel and sles - I think since I started testing with 1.8.5. I observed it on 2.4.1 today. Infiniband cluster, two clients, one MDS, two OSS with one OST each. Upcall is default, nsswitch point to files nis. I've never looked at it much - I just excluded the failing tests. It looks like stat is reporting %U, instead of %u. Can't say why. I looked for a bug in coreutils, but didn't see anything obvious. I've never observed this behavior outside of running acceptance. |
| Comment by Stephen Champion [ 22/Oct/13 ] |
|
for HOST in ${HOSTS//,/ } ; do n006: stat -c root %g %s 0%a /mnt/accfs/d0.metadata-updates/n006/testfile So our %u was eaten by pdsh:
Module: rcmd/exec Module: misc/netgroup Module: rcmd/ssh Less clear is what we can do about it. I do not see an option to pdsh to prohibit this processing. Escaping it as %%u works for me - but would presumably break configurations it currently works for. |
| Comment by James A Simmons [ 22/Oct/13 ] |
|
This is a old bug that has been around since the 1.6 days. I thought I was the only one that saw this problem. The way I work around it is with the below patch. diff -urp metadata-updates.sh.orig metadata-updates.sh -test_UID=$(id -u) SUMFILE=$TESTDIR/mdsum
|
| Comment by Stephen Champion [ 23/Oct/13 ] |
|
That works. I assumed that test_UID needed to be numerical, but it's not actually used anywhere else. I think we should rename it (because it's not a UID anymore) and change GID, too, for consistency. Mind if I submit a patch with those changes, based on your idea? |
| Comment by James A Simmons [ 23/Oct/13 ] |
|
Sure go right ahead |
| Comment by Stephen Champion [ 23/Oct/13 ] |
|
http://review.whamcloud.com/8052 Against master, applies cleanly to 2.4.1 and works in my environment. |
| Comment by James Nunez (Inactive) [ 05/Nov/13 ] |
|
Landed to master. |
| Comment by James A Simmons [ 18/Nov/13 ] |
|
Can we cherry pick it against 2.4.1. We will later need it cherry picked against 2.5.1 |
| Comment by James Nunez (Inactive) [ 18/Nov/13 ] |
|
The patch should apply to 2.4 and 2.5 with no problems. I'll request that it be cherry-picked against 2.4 and 2.5. |
| Comment by James Nunez (Inactive) [ 19/Nov/13 ] |
|
Reopened to apply patch to 2.5 and 2.4. 2.4 patch: http://review.whamcloud.com/#/c/8333/ |
| Comment by Jian Yu [ 23/Nov/13 ] |
|
Patch http://review.whamcloud.com/8052 was cherry-picked to Lustre b2_4 branch for 2.4.2. |
| Comment by Peter Jones [ 20/Dec/13 ] |
|
Landed for 2.4.2 and 2.6. Will land for 2.5.1 shortly |