[LU-2170] osc_extent_merge()) ASSERTION( cur->oe_osclock == victim->oe_osclock) while running racer Created: 13/Oct/12 Updated: 17/Jul/17 Resolved: 13/Nov/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0, Lustre 2.1.4 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Oleg Drokin | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 5202 | ||||
| Description |
|
Hit this assertion running racer in a loop in a single node setup: [ 8788.630974] LustreError: 4352:0:(vvp_io.c:1038:vvp_io_commit_write()) Skipped 833 previous similar messages [ 8834.780249] LustreError: 22605:0:(osc_cache.c:506:osc_extent_merge()) ASSERTION( cur->oe_osclock == victim->oe_osclock ) failed: [ 8834.780830] LustreError: 22605:0:(osc_cache.c:506:osc_extent_merge()) LBUG [ 8834.781139] Pid: 22605, comm: cat [ 8834.781364] [ 8834.781364] Call Trace: [ 8834.781773] [<ffffffffa0c9e915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [ 8834.782075] [<ffffffffa0c9ef27>] lbug_with_loc+0x47/0xb0 [libcfs] [ 8834.782366] [<ffffffffa04ce847>] osc_extent_merge+0x697/0x6a0 [osc] [ 8834.782657] [<ffffffff81112df1>] ? generic_file_buffered_write+0x1f1/0x300 [ 8834.782947] [<ffffffff8106fea7>] ? current_fs_time+0x27/0x30 [ 8834.783209] [<ffffffff81051f73>] ? __wake_up+0x53/0x70 [ 8834.783474] [<ffffffffa04d601c>] osc_extent_release+0xfc/0x3e0 [osc] [ 8834.783782] [<ffffffffa059a610>] ? lov_io_end_wrapper+0x0/0x100 [lov] [ 8834.784070] [<ffffffffa04c7ddf>] osc_io_end+0x1f/0x30 [osc] [ 8834.784371] [<ffffffffa11dd7a0>] cl_io_end+0x60/0x150 [obdclass] [ 8834.784661] [<ffffffffa059a701>] lov_io_end_wrapper+0xf1/0x100 [lov] [ 8834.785159] [<ffffffffa059a1ae>] lov_io_call+0x8e/0x130 [lov] [ 8834.785465] [<ffffffffa059bdec>] lov_io_end+0x4c/0x110 [lov] [ 8834.785758] [<ffffffffa11dd7a0>] cl_io_end+0x60/0x150 [obdclass] [ 8834.786013] [<ffffffffa11e28d2>] cl_io_loop+0xc2/0x1b0 [obdclass] [ 8834.786353] [<ffffffffa0a7822b>] ll_file_io_generic+0x42b/0x550 [lustre] [ 8834.786644] [<ffffffffa0a7911c>] ll_file_aio_write+0x13c/0x2c0 [lustre] [ 8834.811295] Call Trace: [ 8834.811521] [<ffffffff814f75e4>] ? panic+0xa0/0x168 [ 8834.811788] [<ffffffffa0c9ef7b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [ 8834.812079] [<ffffffffa04ce847>] ? osc_extent_merge+0x697/0x6a0 [osc] [ 8834.812364] [<ffffffff81112df1>] ? generic_file_buffered_write+0x1f1/0x300 [ 8834.812605] [<ffffffff8106fea7>] ? current_fs_time+0x27/0x30 [ 8834.812928] [<ffffffff81051f73>] ? __wake_up+0x53/0x70 [ 8834.813301] [<ffffffffa04d601c>] ? osc_extent_release+0xfc/0x3e0 [osc] [ 8834.813615] [<ffffffffa059a610>] ? lov_io_end_wrapper+0x0/0x100 [lov] [ 8834.813903] [<ffffffffa04c7ddf>] ? osc_io_end+0x1f/0x30 [osc] [ 8834.814164] [<ffffffffa11dd7a0>] ? cl_io_end+0x60/0x150 [obdclass] [ 8834.816218] [<ffffffffa059a701>] ? lov_io_end_wrapper+0xf1/0x100 [lov] [ 8834.816513] [<ffffffffa059a1ae>] ? lov_io_call+0x8e/0x130 [lov] [ 8834.816814] [<ffffffffa059bdec>] ? lov_io_end+0x4c/0x110 [lov] [ 8834.817184] [<ffffffffa11dd7a0>] ? cl_io_end+0x60/0x150 [obdclass] [ 8834.817462] [<ffffffffa11e28d2>] ? cl_io_loop+0xc2/0x1b0 [obdclass] [ 8834.817769] [<ffffffffa0a7822b>] ? ll_file_io_generic+0x42b/0x550 [lustre] [ 8834.818078] [<ffffffffa0a7911c>] ? ll_file_aio_write+0x13c/0x2c0 [lustre] [ 8834.818385] [<ffffffffa0a79409>] ? ll_file_write+0x169/0x2a0 [lustre] [ 8834.818676] [<ffffffff8117b2e8>] ? vfs_write+0xb8/0x1a0 [ 8834.818903] [<ffffffff810d5192>] ? audit_syscall_entry+0x272/0x2a0 [ 8834.819185] [<ffffffff8117bbb1>] ? sys_write+0x51/0x90 [ 8834.819425] [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b I have a crashdump for this one |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 20/Oct/12 ] |
|
patch is at: http://review.whamcloud.com/4316 |
| Comment by Cory Spitz [ 01/Nov/12 ] |
|
The patch has been effective at eliminating our reproducer. BTW, here are the details of the reproducer from our test engineer: fsx_mpi - fsx(file system exerciser) This test is open sourced and part of LTP. iozone - file system benchmark, freely available growfiles_mpi - growfiles extends and truncates a file or files. doio_mpi - doio_mpi uses the following system calls to do the I/O: The program locks the regions of the file that are being mmstress - This is a test program that performs general stress with For the commands I was using, 'ubrun' is a tool that can do a little I used a tool, FSLOAD, that randomly picks a command from the pool The commands I was using: ubrun -t -e APPS_CL -s "iozone test complete" aptrun -n 1 -M 1 APPS_CL=iozone/src/current/RUN/iozone -A2 -f ./ioz1.$$ ubrun -t -e OSTEST_CL -s "Hit time value" aptrun -n 4 OSTEST_CL=bin/growfiles_mpi -b -e 1 -i 0 -u -L 300 -d . ubrun -t -e LTPROOT_CL -o aptrun -n 4 LTPROOT_CL=testcases/bin/doio_mpi -i 50000 10000b:./do1 ubrun -s "Hit time value of" -x -e OSTEST_CL -o -T gf1 -t -D aptrun -n 8 OSTEST_CL=bin/growfiles_mpi -b -i 0 -L 300 -g 4090 -T 100 -t 408990 -C 10 -c 1000 -d dir1 -S 200 ubrun ubrun -t -D -o -T CL_LTPmtest05 -e LTPROOT_CL -p. -B "cp $LTPROOT_CL/testcases/bin/dummy ." aptrun -n 1 LTPROOT_CL=testcases/bin/mmstress ubrun -t -D -o -T CL_LTPmtest05 -e LTPROOT_CL -p. -B "cp $LTPROOT_CL/testcases/bin/dummy ." aptrun -n 1 LTPROOT_CL=testcases/bin/mmstress |
| Comment by Jinshan Xiong (Inactive) [ 05/Nov/12 ] |
|
Cory, thank you so much about the reproduce info. |
| Comment by Jinshan Xiong (Inactive) [ 13/Nov/12 ] |
|
patch has been landed to master |