[LU-1993] racer test: ASSERTION( it_disposition(it, 0x00800000) ) failed Created: 20/Sep/12 Updated: 03/Mar/14 Resolved: 03/Mar/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server/client b2_3-tag-2.2.96 RHEL6 external OFED build |
||
| Severity: | 3 |
| Rank (Obsolete): | 8101 |
| Description |
|
Hit this LBUG when running racer on the OFED build Lustre: DEBUG MARKER: -----============= acceptance-small: racer ============----- Wed Sep 19 20:26:36 PDT 2012 Lustre: DEBUG MARKER: excepting tests: LustreError: 152-6: Ignoring deprecated mount option 'acl'. Lustre: Increasing default stripe size to min 1048576 Lustre: Mounted lustre-client LNet: 27947:0:(debug.c:324:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release. LNet: 27947:0:(debug.c:324:libcfs_debug_str2mask()) Skipped 1 previous similar message Lustre: DEBUG MARKER: Using TIMEOUT=20 LNet: 29272:0:(debug.c:324:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release. LNet: 29272:0:(debug.c:324:libcfs_debug_str2mask()) Skipped 1 previous similar message Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-5-ib,client-17-ib DURATION=900 == 20:27:22 (1348111642) LustreError: 18092:0:(file.c:2328:ll_inode_revalidate_fini()) failure -116 inode 144115406632653105 LustreError: 8396:0:(file.c:626:ll_file_open()) ASSERTION( it_disposition(it, 0x00800000) ) failed: LustreError: 8396:0:(file.c:626:ll_file_open()) LBUG Pid: 8396, comm: ls Call Trace: [<ffffffffa0c19905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Message from [<ffffffffa0c19f17>] lbug_with_loc+0x47/0xb0 [libcfs] syslogd@client- [<ffffffffa0af04e0>] ll_lov_recreate.clone.0+0x0/0xdd0 [lustre] 5 at Sep 19 20:3 [<ffffffffa0c25048>] ? libcfs_log_return+0x28/0x40 [libcfs] 0:46 ... kern [<ffffffffa0ad75d0>] ? ll_dir_open+0x0/0xf0 [lustre] el:LustreError: [<ffffffffa0ad7681>] ll_dir_open+0xb1/0xf0 [lustre] 8396:0:(file.c:6 [<ffffffff811789ba>] __dentry_open+0x10a/0x360 26:ll_file_open( [<ffffffff8121439f>] ? security_inode_permission+0x1f/0x30 )) ASSERTION( it [<ffffffff81178d24>] nameidata_to_filp+0x54/0x70 _disposition(it, [<ffffffff8118c230>] do_filp_open+0x6c0/0xd60 0x00800000) ) f [<ffffffff8119a460>] ? mntput_no_expire+0x30/0x110 ailed: Me [<ffffffff811982b2>] ? alloc_fd+0x92/0x160 ssage from syslo [<ffffffff81178769>] do_sys_open+0x69/0x140 gd@client-5 at S [<ffffffff81178880>] sys_open+0x20/0x30 ep 19 20:30:46 . [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b .. kernel:Lus treError: 8396:0:(file.c:626:ll_file_open()) LBUG Kernel panic - not syncing: LBUG Pid: 8396, comm: ls Not tainted 2.6.32-279.5.1.el6.x86_64 #1 Call Trace: [<ffffffff814fd24a>] ? panic+0xa0/0x168 [<ffffffffa0c19f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa0af04e0>] ? ll_lov_recreate.clone.0+0x0/0xdd0 [lustre] [<ffffffffa0c25048>] ? libcfs_log_return+0x28/0x40 [libcfs] [<ffffffffa0ad75d0>] ? ll_dir_open+0x0/0xf0 [lustre] [<ffffffffa0ad7681>] ? ll_dir_open+0xb1/0xf0 [lustre] [<ffffffff811789ba>] ? __dentry_open+0x10a/0x360 [<ffffffff8121439f>] ? security_inode_permission+0x1f/0x30 [<ffffffff81178d24>] ? nameidata_to_filp+0x54/0x70 [<ffffffff8118c230>] ? do_filp_open+0x6c0/0xd60 [<ffffffff8119a460>] ? mntput_no_expire+0x30/0x110 [<ffffffff811982b2>] ? alloc_fd+0x92/0x160 [<ffffffff81178769>] ? do_sys_open+0x69/0x140 [<ffffffff81178880>] ? sys_open+0x20/0x30 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b Initializing cgroup subsys cpuset Initializing cgroup subsys cpu |
| Comments |
| Comment by Peter Jones [ 20/Sep/12 ] |
|
Bob will look into this one |
| Comment by Sarah Liu [ 22/Sep/12 ] |
|
I tried running racer alone and the whole sequence, cannot reprode it. I think it's ok to drop the priority since it's not reproducible Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-5-ib,client-17-ib DURATION=900 == 20:30:06 (1348284606) |
| Comment by Peter Jones [ 22/Sep/12 ] |
|
Agreed |
| Comment by Jian Yu [ 22/Oct/12 ] |
|
Lustre Tag: v2_3_0_RC6 The same issue occurred again: Syslog on Client 2 (client-28vm2) showed that: Oct 20 19:00:43 client-28vm2 kernel: [20814.731833] Lustre: DEBUG MARKER: DURATION=900 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer Oct 20 19:00:43 client-28vm2 kernel: [20814.738668] Lustre: DEBUG MARKER: DURATION=900 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer Oct 20 19:00:43 client-28vm2 in.rshd[7495]: connect from 10.10.4.173 (10.10.4.173) Oct 20 19:00:48 client-28vm2 rshd[7495]: pam_rhosts(rsh:auth): allowed access to root@client-28vm2.lab.whamcloud.com as root Oct 20 19:00:48 client-28vm2 in.rshd[7500]: connect from 10.10.4.173 (10.10.4.173) Oct 20 19:00:48 client-28vm2 rshd[7501]: root@client-28vm2.lab.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre" USE_OFD= MGSFSTYPE=ldiskfs MDSFSTYPE=ldiskfs OSTFSTYPE=ldiskfs FSTYPE=ldiskfs sh -c "DURATION=900 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer ");echo XXRETCODE:$?' Oct 20 19:00:48 client-28vm2 rshd[7500]: pam_rhosts(rsh:auth): allowed access to root@client-28vm2.lab.whamcloud.com as root Oct 20 19:00:48 client-28vm2 rshd[7508]: root@client-28vm2.lab.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre" USE_OFD= MGSFSTYPE=ldiskfs MDSFSTYPE=ldiskfs OSTFSTYPE=ldiskfs FSTYPE=ldiskfs sh -c "DURATION=900 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer ");echo XXRETCODE:$?' Oct 20 19:06:53 client-28vm2 kernel: [21184.731271] LustreError: 19933:0:(file.c:626:ll_file_open()) ASSERTION( it_disposition(it, 0x00800000) ) failed: The back trace was not gathered somehow. |
| Comment by Oleg Drokin [ 05/Nov/12 ] |
|
This is likely related to |
| Comment by Sarah Liu [ 04/Dec/12 ] |
|
another failure seen in 2.4 testing |
| Comment by Andreas Dilger [ 03/May/13 ] |
|
Hit this on my local test system after about 12 minutes of racer testing on v2_3_64_0-93-g441ae91: LustreError: 22208:0:(file.c:640:ll_file_open()) ASSERTION( it_disposition(it, 0x00800000) ) failed: Pid: 22208, comm: ls Call Trace: [<ffffffffa0ea8905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0ea8f07>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0f946e3>] ll_file_open+0xc23/0xc30 [lustre] [<ffffffffa0f75d40>] ? ll_dir_open+0x0/0xf0 [lustre] [<ffffffffa0f75e1b>] ll_dir_open+0xdb/0xf0 [lustre] [<ffffffff811789ba>] __dentry_open+0x10a/0x360 [<ffffffff8121443f>] ? security_inode_permission+0x1f/0x30 [<ffffffff81178d24>] nameidata_to_filp+0x54/0x70 [<ffffffff8118c230>] do_filp_open+0x6c0/0xd60 [<ffffffff81178769>] do_sys_open+0x69/0x140 [<ffffffff81178880>] sys_open+0x20/0x30 I pushed http://review.whamcloud.com/6250 with an improvement to the LASSERT() message, and enabling full debug during racer test runs. |
| Comment by Jodi Levi (Inactive) [ 03/Mar/14 ] |
|
Patch landed to Master. Please reopen ticket or create a new ticket if more work is needed. |