[LU-8879] tests: speed up copytool_cleanup() in sanity-hsm Created: 30/Nov/16  Updated: 05/Aug/20  Resolved: 05/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Improvement Priority: Minor
Reporter: CEA Assignee: Quentin Bouget
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

The test suite sanity-hsm.sh spends a fair amount of time in copytool_ceanup() which itself waits almost systematically 10 secs for the coordinator to stop.



 Comments   
Comment by Gerrit Updater [ 30/Nov/16 ]

Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: http://review.whamcloud.com/24025
Subject: LU-8879 tests: speed up copytool_cleanup() in sanity-hsm
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8f50cab5d1b9013d811660e7917bb4bd04aa767b

Comment by Quentin Bouget [ 02/Dec/16 ]

On my setup (1 node): 40min --> 24min
On maloo: 1h6min --> 53min

Comment by Quentin Bouget [ 16/Dec/16 ]

While using this patch I ran into this error while running sanity-hsm test_27a:

<4>Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-hsm ============----- Fri Dec 16 12:47:55 UTC 2016
<3>LustreError: 26613:0:(mdt_coordinator.c:857:mdt_hsm_cdt_start()) lustre-MDT0000: Coordinator already started
<3>LustreError: 26613:0:(mdt_coordinator.c:857:mdt_hsm_cdt_start()) Skipped 2 previous similar messages
<6>Lustre: lustre-MDT0000: Connection restored to 10.0.0.1@tcp (at 0@lo)
<6>Lustre: Skipped 4 previous similar messages
<4>Lustre: Mounted lustre-client
<4>Lustre: DEBUG MARKER: Using TIMEOUT=20
<4>Lustre: DEBUG MARKER: excepting tests:
<6>Lustre: Modifying parameter general.mdt.lustre-MDT0000.hsm_control in log params
<4>Lustre: DEBUG MARKER: == sanity-hsm test 27a: Remove the archive of an imported file (Operation not permitted) == 12:47:56 (1481892476)
<4>Lustre: DEBUG MARKER: sanity-hsm test_27a: @@@@@@ FAIL: import of d27a.sanity-hsm/f27a.sanity-hsm to /mnt/lustre/d27a.sanity-hsm/f27a.sanity-hsm failed
<4>Lustre: DEBUG MARKER: == sanity-hsm test complete, duration 6 sec == 12:48:01 (1481892481)
<4>Lustre: Unmounted lustre-client
<1>BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
<1>IP: [<ffffffffa1155f2b>] lu_context_key_get+0x1b/0x60 [obdclass]
<4>PGD 4eb41d067 PUD 5071e1067 PMD 0 
<4>Oops: 0000 [#1] SMP 
<4>last sysfs file: /sys/devices/system/cpu/online
<4>CPU 1 
<4>Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) lfsck(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) libcfs(U) exportfs sha512_generic crc32c_intel nls_utf8 ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport virtio_rng virtio_console virtio_net i2c_piix4 i2c_core sg ext4 jbd2 mbcache virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
<4>
<4>Pid: 27725, comm: lctl Not tainted 2.6.32.573.18.1.el6_lustre #1 QEMU Standard PC
<4>RIP: 0010:[<ffffffffa1155f2b>]  [<ffffffffa1155f2b>] lu_context_key_get+0x1b/0x60 [obdclass]
<4>RSP: 0018:ffff880e81803dc8  EFLAGS: 00010246
<4>RAX: 0000000000000017 RBX: ffff880e9f8dd000 RCX: 0000000000000000
<4>RDX: 0000000000000000 RSI: ffffffffa183d540 RDI: ffff880e9f8dd8e0
<4>RBP: ffff880e81803dc8 R08: ffffffffa1829620 R09: 0000000000000001
<4>R10: 00007fff29d49040 R11: 0000000000000246 R12: ffff880e81803e48
<4>R13: 0000000000000006 R14: 00007fff29d4bfe1 R15: 0000000000000007
<4>FS:  00007f6d4887a700(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 00000000000000b8 CR3: 00000004b08c2000 CR4: 00000000001407e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process lctl (pid: 27725, threadinfo ffff880e81800000, task ffff880e79cb7520)
<4>Stack:
<4> ffff880e81803e98 ffffffffa18297b5 00007f6d484587a0 ffff880505b87340
<4><d> ffff880504f205f8 ffff880e79cb7520 ffff880e81803f18 ffffffff8104f204
<4><d> ffff8809d9201e80 ffff8804e2856d80 ffff880e81803f58 0000000000000014
<4>Call Trace:
<4> [<ffffffffa18297b5>] mdt_hsm_cdt_control_seq_write+0x195/0xef0 [mdt]
<4> [<ffffffff8104f204>] ? __do_page_fault+0x1f4/0x500
<4> [<ffffffff81197cd4>] ? cp_new_stat+0xe4/0x100
<4> [<ffffffff811fdd7e>] proc_reg_write+0x7e/0xc0
<4> [<ffffffff81192238>] vfs_write+0xb8/0x1a0
<4> [<ffffffff81193726>] ? fget_light_pos+0x16/0x50
<4> [<ffffffff81192d71>] sys_write+0x51/0xb0
<4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
<4>Code: 5c 41 5d c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 48 63 46 20 48 3b 34 c5 e0 60 1d a1 75 0a 48 8b 57 10 <48> 8b 04 c2 c9 c3 48 c7 c7 20 f6 1a a1 48 c7 c2 98 ad 18 a1 48 
<1>RIP  [<ffffffffa1155f2b>] lu_context_key_get+0x1b/0x60 [obdclass]
<4> RSP <ffff880e81803dc8>
<4>CR2: 00000000000000b8

I don't believe this is related to my patch although this is rather close to the changes I made and in case someone sees something I'm missing, I'm putting it here.

Comment by Gerrit Updater [ 05/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24025/
Subject: LU-8879 tests: speed up copytool_cleanup() in sanity-hsm
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a729f66cf1917b80975937568552625f1cc45271

Comment by Peter Jones [ 05/May/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:21:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.