[LU-5544] Interop 2.5.1<->2.7 failure on test suite sanity-scrub test_11: error on LL_IOC_LMV_SETSTRIPE '/mnt/lustre/1408415864/mds1' (3): Unknown error 524 Created: 26/Aug/14 Updated: 20/Apr/15 Resolved: 20/Apr/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.1, Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | nasf (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 15446 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d5923d3a-2821-11e4-8135-5254006e85c2. The sub-test test_11 failed with the following error:
CMD: onyx-58vm5,onyx-58vm6.onyx.hpdd.intel.com mount | grep /mnt/lustre' ' CMD: onyx-58vm5 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/bin:/bin:/sbin:/usr/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super\" \"all -lnet -lnd -pinger\" 4 CMD: onyx-58vm3 lctl get_param -n timeout CMD: onyx-58vm3 lctl dl | grep ' IN osc ' 2>/dev/null | wc -l CMD: onyx-58vm6.onyx.hpdd.intel.com lctl dl | grep ' IN osc ' 2>/dev/null | wc -l CMD: onyx-58vm3 /usr/sbin/lctl get_param -n version CMD: onyx-58vm3 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-MDT0000.quota_slave.enabled CMD: onyx-58vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled CMD: onyx-58vm3 /usr/sbin/lctl conf_param lustre.quota.mdt=ug3 CMD: onyx-58vm3 /usr/sbin/lctl conf_param lustre.quota.ost=ug3 error on LL_IOC_LMV_SETSTRIPE '/mnt/lustre/1408415864/mds1' (3): Unknown error 524 error: mkdir: create stripe dir '/mnt/lustre/1408415864/mds1' failed sanity-scrub test_11: @@@@@@ FAIL: (1) Fail to mkdir /mnt/lustre/1408415864/mds1 |
| Comments |
| Comment by Jodi Levi (Inactive) [ 29/Aug/14 ] |
|
Fan Yong, |
| Comment by nasf (Inactive) [ 01/Sep/14 ] |
|
The bash environment $tdir for sanity-tset 11 should be "d11.sanity-scrub", but according to the log, its name become "1408415864". That is totally unexpected, seems the dcache on client-side is broken. In fact, before the test 11, there were already some abnormal filenames in the logs that indicated the confused dcache: 00000080:00200000:0.0:1408415641.266775:0:16918:0:(file.c:3091:__ll_inode_revalidate_it()) VFS Op:inode=144115188193296385/33554432(ffff88007a12bb38),name=/
00000002:00010000:0.0:1408415641.266781:0:16918:0:(mdc_locks.c:1173:mdc_intent_lock()) (name: ,[0x200000007:0x1:0x0]) in obj [0x200000007:0x1:0x0], intent: lookup flags 00
00010000:00010000:0.0:1408415641.266786:0:16918:0:(ldlm_lock.c:795:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: lustre-MDT0000-mdc-ffff88007a58d800 lock: ffff88007a9dbd40/0xf204e34aa6520700 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0 bits 0x11 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x61a08be236e96d6d expref: -99 pid: 16916 timeout: 0 lvb_type: 0
00010000:00010000:0.0:1408415641.266791:0:16918:0:(ldlm_lock.c:1417:ldlm_lock_match()) ### matched (0 0) ns: lustre-MDT0000-mdc-ffff88007a58d800 lock: ffff88007a9dbd40/0xf204e34aa6520700 lrc: 2/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0 bits 0x11 rrc: 1 type: IBT flags: 0x0 nid: local remote: 0x61a08be236e96d6d expref: -99 pid: 16916 timeout: 0 lvb_type: 0
00000080:00010000:0.0:1408415641.266798:0:16918:0:(dcache.c:351:ll_lookup_finish_locks()) setting l_data to inode ffff88007a12bb38 (144115188193296385/33554432)
00000080:00010000:0.0:1408415641.266800:0:16918:0:(llite_internal.h:1573:ll_set_lock_data()) setting l_data to inode ffff88007a12bb38 (144115188193296385/33554432) for lock 0xf204e34aa6520700
00000080:00010000:0.0:1408415641.266803:0:16918:0:(dcache.c:252:ll_intent_drop_lock()) releasing lock with cookie 0xf204e34aa6520700 from it ffff88007b0bbb88
00010000:00010000:0.0:1408415641.266805:0:16918:0:(ldlm_lock.c:848:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PR) ns: lustre-MDT0000-mdc-ffff88007a58d800 lock: ffff88007a9dbd40/0xf204e34aa6520700 lrc: 3/1,0 mode: PR/PR res: [0x200000007:0x1:0x0].0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x61a08be236e96d6d expref: -99 pid: 16916 timeout: 0 lvb_type: 0
00010000:00010000:0.0:1408415641.266810:0:16918:0:(ldlm_lock.c:916:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-MDT0000-mdc-ffff88007a58d800 lock: ffff88007a9dbd40/0xf204e34aa6520700 lrc: 2/0,0 mode: PR/PR res: [0x200000007:0x1:0x0].0 bits 0x11 rrc: 1 type: IBT flags: 0x10000000000000 nid: local remote: 0x61a08be236e96d6d expref: -99 pid: 16916 timeout: 0 lvb_type: 0
00000080:00200000:0.0:1408415641.266816:0:16918:0:(file.c:3364:ll_inode_permission()) VFS Op:inode=144115188193296385/33554432(ffff88007a12bb38), inode mode 41ed mask 1
00000080:00200000:0.0:1408415641.266819:0:16918:0:(dcache.c:385:ll_revalidate_it()) VFS Op:name=d9.sanity-scrub,intent=0
00000080:00200000:0.0:1408415641.266821:0:16918:0:(file.c:3364:ll_inode_permission()) VFS Op:inode=144115205255725057/33554436(ffff88007b7f3b78), inode mode 41ed mask 1
00000080:00200000:0.0:1408415641.266823:0:16918:0:(dcache.c:385:ll_revalidate_it()) VFS Op:name=mds1,intent=0
00000080:00200000:0.0:1408415641.266824:0:16918:0:(file.c:3364:ll_inode_permission()) VFS Op:inode=144115205255725058/33554436(ffff88007a12b638), inode mode 41ed mask 1
00000080:00200000:0.0:1408415641.266828:0:16918:0:(file.c:3364:ll_inode_permission()) VFS Op:inode=144115205255725058/33554436(ffff88007a12b638), inode mode 41ed mask 1
00000080:00200000:0.0:1408415641.266832:0:16918:0:(namei.c:527:ll_lookup_it()) VFS Op:name=f9.sanity-scrub7933,dir=144115205255725058/33554436(ffff88007a12b638),intent=open|creat
00000002:00010000:0.0:1408415641.266837:0:16918:0:(mdc_locks.c:1173:mdc_intent_lock()) (name: f9.sanity-scrub7933,[0x200000400:0x1f4a:0x0]) in obj [0x200000400:0x2:0x0], intent: open|creat flags 0100103
In sanity-test 9, the filename "name: f9.sanity-scrub7933" is abnormal. Currently, I am not sure what caused the bad filename, maybe related with some special client-side patch(es) on Lustre-2.5.2. I will investigate more. |
| Comment by nasf (Inactive) [ 24/Dec/14 ] |
|
Sarah, have you ever reproduced the same failure with more debug logs recently? Thanks! |
| Comment by nasf (Inactive) [ 20/Apr/15 ] |
|
Will reopen it when hit it again. |