[LU-2179] parallel-scale test_write_append_truncate: APPEND-after-trunc bad file size 1048576 != 1215563 Created: 15/Oct/12 Updated: 01/May/13 Resolved: 01/May/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 5217 |
| Description |
|
This issue was created by maloo for yujian <yujian@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/b5ed6690-167a-11e2-80d0-52540035b04c. Lustre Tag: v2_3_0_RC3 The sub-test test_write_append_truncate failed with the following error: r= 0: create /mnt/lustre/d0.write_append_truncate/f0.wat, max size: 3703701, seed 1350245712: No such file or directory r= 0 l=0000: WR A 645675/0x09da2b, AP a 1219262/0x129abe, TR@ 709840/0x0ad4d0 r= 0 l=1000: WR M 954949/0x0e9245, AP m 194499/0x02f7c3, TR@ 1136109/0x1155ed r= 0 l=1926: APPEND-after-trunc bad file size 1048576 != 1215563 r= 0 l=1926: append-after-TRUNC bad [880684-1047299]/[0xd702c-0xffb03] != 0 r= 0 l=1926: WR C 880684/0x0d702c, AP c 168263/0x029147, TR@ 1047300/0x0ffb04 000000 C C C C C C C C C C C C C C C C * 0d7020 C C C C C C C C C C C C c c c c 0d7030 c c c c c c c c c c c c c c c c * 100000 Info required for matching: parallel-scale write_append_truncate |
| Comments |
| Comment by Peter Jones [ 15/Oct/12 ] |
|
Oleg will be looking into this |
| Comment by Peter Jones [ 16/Oct/12 ] |
|
Jinshan is working on a fix for this issue |
| Comment by Jinshan Xiong (Inactive) [ 16/Oct/12 ] |
|
patch is at: http://review.whamcloud.com/4281 |
| Comment by Jian Yu [ 17/Oct/12 ] |
After applying patch set 2 on b2_3 (based on commit e5d5cd2) and building FC15 client packages manually, I ran the write_append_truncate test with the following parameters on the FC15 clients with RHEL6.3/x86_64 2.3.0 RC3 (build #36) servers: == parallel-scale test write_append_truncate: write_append_truncate ================================== 03:08:12 (1350468492) OPTIONS: clients=client-18,client-5 write_REP=10000 write_THREADS=8 MACHINEFILE=/tmp/parallel-scale.machines client-18 client-5 + write_append_truncate -v -s 1350245712 -n 10000 /mnt/lustre/d0.write_append_truncate/f0.wat + chmod 0777 /mnt/lustre drwxrwxrwx 4 root root 4096 Oct 17 03:08 /mnt/lustre + su mpiuser sh -c "/usr/lib64/openmpi/bin/mpirun -mca orte_rsh_agent rsh:ssh -np 16 -machinefile /tmp/parallel-scale.machines write_append_truncate -v -s 1350245712 -n 10000 /mnt/lustre/d0.write_append_truncate/f0.wat " So far, the test has been run 5 times successfully. It's still ongoing to complete 10 times. |
| Comment by Jian Yu [ 17/Oct/12 ] |
The sixth run hung somehow: r= 0 l=6790: WR E 84295/0x014947, AP e 372625/0x05af91, TR@ 136326/0x021486 r= 0 l=6791: WR F 596548/0x091a44, AP f 686234/0x0a789a, TR@ 1005524/0x0f57d4 r= 0 l=6792: WR G 812779/0x0c66eb, AP g 926155/0x0e21cb, TR@ 1305652/0x13ec34 r= 0 l=6793: WR H 841164/0x0cd5cc, AP h 1080566/0x107cf6, TR@ 1399403/0x155a6b Stack trace on the Client node showed that: [206222.460480] write_append_tr S 0000000000000000 0 15826 15801 0x00000080 [206222.467627] ffff8802f3635938 0000000000000082 0000000000000000 ffff880269654560 [206222.475152] ffff8802f3635fd8 ffff8802f3635fd8 0000000000013840 0000000000013840 [206222.482681] ffff8803272f1720 ffff880269654560 0000000000000000 0000000100000000 [206222.490208] Call Trace: [206222.492737] [<ffffffff81474d9d>] schedule_hrtimeout_range_clock+0x50/0x111 [206222.499770] [<ffffffff81080b33>] ? arch_local_irq_save+0x15/0x1b [206222.505931] [<ffffffff8147588c>] ? _raw_spin_unlock_irqrestore+0x17/0x19 [206222.512792] [<ffffffff8106f1d0>] ? add_wait_queue+0x3d/0x45 [206222.518520] [<ffffffff81474e71>] schedule_hrtimeout_range+0x13/0x15 [206222.524940] [<ffffffff8112fd7f>] poll_schedule_timeout+0x48/0x64 [206222.531101] [<ffffffff81130585>] do_select+0x4b1/0x4f5 [206222.536397] [<ffffffff8112fe45>] ? __pollwait+0x0/0xcc [206222.541699] [<ffffffff8112ff11>] ? pollwake+0x0/0x54 [206222.546822] [<ffffffff8112ff11>] ? pollwake+0x0/0x54 [206222.551946] [<ffffffff8122c9d4>] ? radix_tree_lookup_slot+0xe/0x10 [206222.558287] [<ffffffff8104127e>] ? should_resched+0xe/0x2d [206222.563928] [<ffffffff814742d0>] ? _cond_resched+0xe/0x22 [206222.569482] [<ffffffff810d9d04>] ? filemap_fault+0x20d/0x36c [206222.575298] [<ffffffff810d8118>] ? unlock_page+0x27/0x2b [206222.580774] [<ffffffff81059b0b>] ? current_fs_time+0x37/0x3e [206222.586590] [<ffffffff8113485b>] ? touch_atime+0x116/0x131 [206222.592239] [<ffffffff8104127e>] ? should_resched+0xe/0x2d [206222.597898] [<ffffffff814742d0>] ? _cond_resched+0xe/0x22 [206222.603452] [<ffffffff8112fb60>] ? might_fault+0x21/0x23 [206222.608921] [<ffffffff8113072c>] core_sys_select+0x163/0x202 [206222.614737] [<ffffffff811212b2>] ? do_sync_read+0xbf/0xff [206222.620299] [<ffffffff8113085c>] sys_select+0x91/0xb9 [206222.625508] [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b [206222.631582] write_append_tr S 0000000000000000 0 15827 15801 0x00000080 [206222.638729] ffff8802f3637a58 0000000000000082 0000000000000000 ffff880269c9ae40 [206222.646255] ffff8802f3637fd8 ffff8802f3637fd8 0000000000013840 0000000000013840 [206222.653782] ffff8803272d9720 ffff880269c9ae40 ffff8802f3637681 0000000000000000 [206222.661311] Call Trace: [206222.663840] [<ffffffff81474d9d>] schedule_hrtimeout_range_clock+0x50/0x111 [206222.670872] [<ffffffff81080b33>] ? arch_local_irq_save+0x15/0x1b [206222.677033] [<ffffffff8147588c>] ? _raw_spin_unlock_irqrestore+0x17/0x19 [206222.683894] [<ffffffff8106f1d0>] ? add_wait_queue+0x3d/0x45 [206222.689622] [<ffffffff81474e71>] schedule_hrtimeout_range+0x13/0x15 [206222.696042] [<ffffffff8112fd7f>] poll_schedule_timeout+0x48/0x64 [206222.702203] [<ffffffff81130d13>] do_sys_poll+0x2f4/0x386 [206222.707671] [<ffffffff8112fe45>] ? __pollwait+0x0/0xcc [206222.712967] [<ffffffff8112ff11>] ? pollwake+0x0/0x54 [206222.718097] [<ffffffff8112ff11>] ? pollwake+0x0/0x54 [206222.723220] [<ffffffff8106f23d>] ? autoremove_wake_function+0x2b/0x3d [206222.729813] [<ffffffff81059b0b>] ? current_fs_time+0x37/0x3e [206222.735637] [<ffffffff8113472b>] ? file_update_time+0xf9/0x113 [206222.741633] [<ffffffff81128c90>] ? pipe_write+0x448/0x45a [206222.747198] [<ffffffff81121099>] ? fsnotify_modify+0x5f/0x67 [206222.753019] [<ffffffff81130e48>] sys_poll+0x51/0xbb [206222.758063] [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b Maloo report: https://maloo.whamcloud.com/test_sets/8782dbba-18c7-11e2-a6a7-52540035b04c |
| Comment by Jian Yu [ 18/Oct/12 ] |
|
Hi Xiong, |
| Comment by Jodi Levi (Inactive) [ 18/Oct/12 ] |
| Comment by Jinshan Xiong (Inactive) [ 20/Oct/12 ] |
|
patch for master is at: http://review.whamcloud.com/4317 |
| Comment by Jodi Levi (Inactive) [ 19/Apr/13 ] |
|
With Change, 4295 and Change, 4317 landed, can this ticket be closed? |