[LU-3035] Failure on racer: ASSERTION( io->u.ci_rw.crw_count == count ) failed: 785408 != 4194304 Created: 26/Mar/13  Updated: 09/Apr/13  Resolved: 30/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Sarah Liu Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: LB
Environment:

client and server: lustre-master build# 1340; configured as one MDS with two MDTs


Severity: 3
Rank (Obsolete): 7405

 Description   

Hit LBUG when running racer under DNE with one MDS two MDTs

client console:

Lustre: DEBUG MARKER: -----============= acceptance-small: racer ============----- Tue Mar 26 11:44:28 PDT 2013
Lustre: DEBUG MARKER: excepting tests:
LustreError: 152-6: Ignoring deprecated mount option 'acl'.
Lustre: Increasing default stripe size to min 1048576
Lustre: Layout lock feature supported.
Lustre: Mounted lustre-client
LNet: 30388:0:(debug.c:324:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
LNet: 30388:0:(debug.c:324:libcfs_debug_str2mask()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: Using TIMEOUT=20
LNet: 31765:0:(debug.c:324:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
LNet: 31765:0:(debug.c:324:libcfs_debug_str2mask()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-5,client-15 DURATION=900 == 11:44:37 (1364323477)
LustreError: 495:0:(file.c:930:ll_file_io_generic()) ASSERTION( io->u.ci_rw.crw_count == count ) failed: 785408 != 4194304
LustreError: 495:0:(file.c:930:ll_file_io_generic()) LBUG
Pid: 495, comm: cat

Call Trace:
 [<ffffffffa0366895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0366e97>] lbug_with_loc+0x47/0xb0 [libcfs]

 [<ffffffffa0a26882>] ll_file_io_generic+0x542/0x600 [lustre]
Message from sy [<ffffffffa0a27baf>] ll_file_aio_read+0x13f/0x2c0 [lustre]
slogd@client-5 a [<ffffffffa0a27e9c>] ll_file_read+0x16c/0x2a0 [lustre]
t Mar 26 11:44:3 [<ffffffff81176cb5>] vfs_read+0xb5/0x1a0
9 ...
 kernel: [<ffffffff8100bd6e>] ? reschedule_interrupt+0xe/0x20
LustreError: 495 [<ffffffff81176df1>] sys_read+0x51/0x90
:0:(file.c:930:l [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
l_file_io_generi
c()) ASSERTION( io->u.ci_rw.crw_Kernel panic - not syncing: LBUG
count == count )Pid: 495, comm: cat Not tainted 2.6.32-279.19.1.el6.x86_64 #1
 failed: 785408 Call Trace:
!= 4194304

 [<ffffffff814e9541>] ? panic+0xa0/0x168
Message from sy [<ffffffffa0366eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
slogd@client-5 a [<ffffffffa0a26882>] ? ll_file_io_generic+0x542/0x600 [lustre]
t Mar 26 11:44:3 [<ffffffffa0a27baf>] ? ll_file_aio_read+0x13f/0x2c0 [lustre]
9 ...
 kernel: [<ffffffffa0a27e9c>] ? ll_file_read+0x16c/0x2a0 [lustre]
LustreError: 495 [<ffffffff81176cb5>] ? vfs_read+0xb5/0x1a0
:0:(file.c:930:l [<ffffffff8100bd6e>] ? reschedule_interrupt+0xe/0x20
l_file_io_generi [<ffffffff81176df1>] ? sys_read+0x51/0x90
c()) LBUG

 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Message from sysInitializing cgroup subsys cpuset
Initializing cgroup subsys cpu


 Comments   
Comment by Peter Jones [ 27/Mar/13 ]

Jinshan

Could you please comment on this one?

Peter

Comment by Jinshan Xiong (Inactive) [ 27/Mar/13 ]

This should be related to:

commit ae76dd2f1866c9350df8cb4e772c12cc0d3c4314
Author: Niu Yawei <niu@whamcloud.com>
Date: Thu Mar 7 23:58:11 2013 -0500

LU-2910 clio: restore iov when restart io

so I reassign it to Niu.

Comment by Niu Yawei (Inactive) [ 28/Mar/13 ]

This assertion isn't proper, since the crw_count could be changed in lov_io_rw_iter_init(). I'm going to change it as LASSERTF(io->ci_nob == 0, "%zd", io->ci_nob).

Comment by Niu Yawei (Inactive) [ 28/Mar/13 ]

http://review.whamcloud.com/5864

Comment by Peter Jones [ 30/Mar/13 ]

Landed for 2.4

Comment by Minh Diep [ 08/Apr/13 ]

also hit this in fc18 client testing using tag 2.3.63

client-1 login: [ 911.413689] LustreError: 152-6: Ignoring deprecated mount option 'acl'.
[ 931.554379] LustreError: 152-6: Ignoring deprecated mount option 'acl'.
[ 962.209496] LustreError: 9285:0:(file.c:2610:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000402:0x33:0x0] error: rc = -116
[ 962.546186] LustreError: 9281:0:(file.c:2610:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000402:0x33:0x0] error: rc = -116
[ 1803.114221] LustreError: 11503:0:(file.c:930:ll_file_io_generic()) ASSERTION( io->u.ci_rw.crw_count == count ) failed: 808960 != 4194304
[ 1803.127256] LustreError: 11503:0:(file.c:930:ll_file_io_generic()) LBUG
[ 1803.138194] Kernel panic - not syncing: LBUG
[ 1803.142714] Pid: 11503, comm: cat Tainted: GF O 3.6.10-4.fc18.x86_64 #1
[ 1803.150525] Call Trace:
[ 1803.153126] [<ffffffff816198db>] panic+0xc1/0x1d0
[ 1803.158204] [<ffffffffa0297e5b>] lbug_with_loc+0xab/0xc0 [libcfs]
[ 1803.164765] [<ffffffffa07fc2e0>] ll_file_io_generic+0x600/0x670 [lustre]
[ 1803.171958] [<ffffffffa07fca10>] ll_file_aio_read+0xf0/0x200 [lustre]
[ 1803.178884] [<ffffffffa07fcc35>] ll_file_read+0x115/0x220 [lustre]
[ 1803.185508] [<ffffffff81190a99>] vfs_read+0xa9/0x180
[ 1803.190853] [<ffffffff81190bba>] sys_read+0x4a/0x90
[ 1803.196099] [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b

Comment by Niu Yawei (Inactive) [ 09/Apr/13 ]

Minh, 2.3.63 doesn't have above fix.

Generated at Sat Feb 10 01:30:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.