[LU-8879] tests: speed up copytool_cleanup() in sanity-hsm Created: 30/Nov/16 Updated: 05/Aug/20 Resolved: 05/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | CEA | Assignee: | Quentin Bouget |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The test suite sanity-hsm.sh spends a fair amount of time in copytool_ceanup() which itself waits almost systematically 10 secs for the coordinator to stop. |
| Comments |
| Comment by Gerrit Updater [ 30/Nov/16 ] |
|
Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: http://review.whamcloud.com/24025 |
| Comment by Quentin Bouget [ 02/Dec/16 ] |
|
On my setup (1 node): 40min --> 24min |
| Comment by Quentin Bouget [ 16/Dec/16 ] |
|
While using this patch I ran into this error while running sanity-hsm test_27a: <4>Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-hsm ============----- Fri Dec 16 12:47:55 UTC 2016 <3>LustreError: 26613:0:(mdt_coordinator.c:857:mdt_hsm_cdt_start()) lustre-MDT0000: Coordinator already started <3>LustreError: 26613:0:(mdt_coordinator.c:857:mdt_hsm_cdt_start()) Skipped 2 previous similar messages <6>Lustre: lustre-MDT0000: Connection restored to 10.0.0.1@tcp (at 0@lo) <6>Lustre: Skipped 4 previous similar messages <4>Lustre: Mounted lustre-client <4>Lustre: DEBUG MARKER: Using TIMEOUT=20 <4>Lustre: DEBUG MARKER: excepting tests: <6>Lustre: Modifying parameter general.mdt.lustre-MDT0000.hsm_control in log params <4>Lustre: DEBUG MARKER: == sanity-hsm test 27a: Remove the archive of an imported file (Operation not permitted) == 12:47:56 (1481892476) <4>Lustre: DEBUG MARKER: sanity-hsm test_27a: @@@@@@ FAIL: import of d27a.sanity-hsm/f27a.sanity-hsm to /mnt/lustre/d27a.sanity-hsm/f27a.sanity-hsm failed <4>Lustre: DEBUG MARKER: == sanity-hsm test complete, duration 6 sec == 12:48:01 (1481892481) <4>Lustre: Unmounted lustre-client <1>BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8 <1>IP: [<ffffffffa1155f2b>] lu_context_key_get+0x1b/0x60 [obdclass] <4>PGD 4eb41d067 PUD 5071e1067 PMD 0 <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/devices/system/cpu/online <4>CPU 1 <4>Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) lfsck(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) libcfs(U) exportfs sha512_generic crc32c_intel nls_utf8 ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport virtio_rng virtio_console virtio_net i2c_piix4 i2c_core sg ext4 jbd2 mbcache virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] <4> <4>Pid: 27725, comm: lctl Not tainted 2.6.32.573.18.1.el6_lustre #1 QEMU Standard PC <4>RIP: 0010:[<ffffffffa1155f2b>] [<ffffffffa1155f2b>] lu_context_key_get+0x1b/0x60 [obdclass] <4>RSP: 0018:ffff880e81803dc8 EFLAGS: 00010246 <4>RAX: 0000000000000017 RBX: ffff880e9f8dd000 RCX: 0000000000000000 <4>RDX: 0000000000000000 RSI: ffffffffa183d540 RDI: ffff880e9f8dd8e0 <4>RBP: ffff880e81803dc8 R08: ffffffffa1829620 R09: 0000000000000001 <4>R10: 00007fff29d49040 R11: 0000000000000246 R12: ffff880e81803e48 <4>R13: 0000000000000006 R14: 00007fff29d4bfe1 R15: 0000000000000007 <4>FS: 00007f6d4887a700(0000) GS:ffff880028240000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>CR2: 00000000000000b8 CR3: 00000004b08c2000 CR4: 00000000001407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process lctl (pid: 27725, threadinfo ffff880e81800000, task ffff880e79cb7520) <4>Stack: <4> ffff880e81803e98 ffffffffa18297b5 00007f6d484587a0 ffff880505b87340 <4><d> ffff880504f205f8 ffff880e79cb7520 ffff880e81803f18 ffffffff8104f204 <4><d> ffff8809d9201e80 ffff8804e2856d80 ffff880e81803f58 0000000000000014 <4>Call Trace: <4> [<ffffffffa18297b5>] mdt_hsm_cdt_control_seq_write+0x195/0xef0 [mdt] <4> [<ffffffff8104f204>] ? __do_page_fault+0x1f4/0x500 <4> [<ffffffff81197cd4>] ? cp_new_stat+0xe4/0x100 <4> [<ffffffff811fdd7e>] proc_reg_write+0x7e/0xc0 <4> [<ffffffff81192238>] vfs_write+0xb8/0x1a0 <4> [<ffffffff81193726>] ? fget_light_pos+0x16/0x50 <4> [<ffffffff81192d71>] sys_write+0x51/0xb0 <4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b <4>Code: 5c 41 5d c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 48 63 46 20 48 3b 34 c5 e0 60 1d a1 75 0a 48 8b 57 10 <48> 8b 04 c2 c9 c3 48 c7 c7 20 f6 1a a1 48 c7 c2 98 ad 18 a1 48 <1>RIP [<ffffffffa1155f2b>] lu_context_key_get+0x1b/0x60 [obdclass] <4> RSP <ffff880e81803dc8> <4>CR2: 00000000000000b8 I don't believe this is related to my patch although this is rather close to the changes I made and in case someone sees something I'm missing, I'm putting it here. |
| Comment by Gerrit Updater [ 05/May/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24025/ |
| Comment by Peter Jones [ 05/May/17 ] |
|
Landed for 2.10 |