[LU-4839] Test failure sanity-hsm test_60: Timed out waiting for progress update Created: 31/Mar/14 Updated: 10/Aug/15 Resolved: 23/Apr/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.5.3 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Nathaniel Clark |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | 22pl, HB, mq115 | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 13335 | ||||
| Description |
|
This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com> This issue relates to the following test suite run: The sub-test test_60 failed with the following error:
Info required for matching: sanity-hsm 60 |
| Comments |
| Comment by Jian Yu [ 20/Aug/14 ] |
|
While testing patch http://review.whamcloud.com/11097 on Lustre b2_5 branch with FSTYPE=zfs, sanity-hsm test 60 hit the same failure: |
| Comment by Jian Yu [ 22/Aug/14 ] |
|
While testing patch http://review.whamcloud.com/11539 on Lustre b2_5 branch with FSTYPE=zfs, sanity-hsm test 60 hit the same failure: |
| Comment by Jian Yu [ 28/Aug/14 ] |
|
While testing patch http://review.whamcloud.com/11574 on Lustre b2_5 branch with FSTYPE=zfs, sanity-hsm test 60 hit the same failure: |
| Comment by Bruno Faccini (Inactive) [ 14/Sep/14 ] |
|
+1 at https://testing.hpdd.intel.com/test_sets/03614ef8-3b88-11e4-ad5c-5254006e85c2, during auto-tests of patch http://review.whamcloud.com/11895 (for It is noteworthy that all this failures occurred with ZFS and also that each time copytool seem to have handled the request and in the process to archive file but triggered a "bandwith control" event. |
| Comment by Peter Jones [ 23/Sep/14 ] |
|
Nathaniel Does Bruno's comment give you some insight on how to avoid this failure? Thanks Peter |
| Comment by Nathaniel Clark [ 23/Sep/14 ] |
|
Re: "bandwith control" events The passing tests also seem to all have them in the copytool log. Re: ZFS This is NOT zfs-only, this also happens during DNE testing: |
| Comment by Nathaniel Clark [ 27/Sep/14 ] |
|
This seems like it might be a timing issue where the action completes before it can be picked up by the while loop. |
| Comment by Nathaniel Clark [ 27/Sep/14 ] |
|
Fix bandwidth control in lhsmtool. The active request was failing too quickly. |
| Comment by Jian Yu [ 29/Sep/14 ] |
|
More instances on Lustre b2_5 branch: |
| Comment by Jodi Levi (Inactive) [ 15/Oct/14 ] |
|
Patch landed to Master. |
| Comment by Jian Yu [ 25/Oct/14 ] |
|
One more instance on Lustre b2_5 branch: https://testing.hpdd.intel.com/test_sets/ea696b02-5c6a-11e4-b364-5254006e85c2 |
| Comment by Jian Yu [ 25/Oct/14 ] |
|
Just found James had back-ported it to Lustre b2_5 branch: http://review.whamcloud.com/12405 |
| Comment by Dmitry Eremin (Inactive) [ 06/Nov/14 ] |
|
Failed again in master https://testing.hpdd.intel.com/test_sets/d8ddbeb4-65bc-11e4-9c16-5254006e85c2 |
| Comment by nasf (Inactive) [ 07/Nov/14 ] |
|
Another failure instance on b2_5: |
| Comment by Andreas Dilger [ 08/Nov/14 ] |
|
Dmitry, was your master test failure based on a tree that had this fix applied? Nasf, the b2_5 patch hasn't landed yet v |
| Comment by Dmitry Eremin (Inactive) [ 10/Nov/14 ] |
|
Sure, It was on latest master at that time. My patch is on top and patch for this bug is last in this list. 3 days ago Dmitry Eremin LU-5577 libcfs: fix warnings in libcfs/curproc.h 79/12379/3 commit | commitdiff | tree | snapshot
4 days ago John L. Hammond LU-5814 lov: remove unused {get,set}_info handlers 45/12445/4 commit | commitdiff | tree | snapshot
5 days ago Frank Zago LU-5691 hsm: remove a request from the index if not... 42/12142/2 commit | commitdiff | tree | snapshot
5 days ago Bob Glossman LU-5853 build: fix el7 build regression 46/12546/2 commit | commitdiff | tree | snapshot
5 days ago Dmitry Eremin LU-5383 utils: fix array index out of bounds 24/12524/2 commit | commitdiff | tree | snapshot
5 days ago Dmitry Eremin LU-5577 changelog: fix comparison between signed and... 74/12474/2 commit | commitdiff | tree | snapshot
5 days ago John L. Hammond LU-5814 echo: remove userspace LSM handling 46/12446/4 commit | commitdiff | tree | snapshot
5 days ago John L. Hammond LU-5814 lov: remove LL_IOC_RECREATE_{FID,OBJ} 42/12442/4 commit | commitdiff | tree | snapshot
5 days ago John L. Hammond LU-2675 utils: remove loadgen 95/12395/2 commit | commitdiff | tree | snapshot
5 days ago Dmitry Eremin LU-5577 obdclass: change uuid_unpack arg to size_t 89/12389/2 commit | commitdiff | tree | snapshot
5 days ago Dmitry Eremin LU-5577 obdclass: change lu_site->ls_purge_start to... 84/12384/2 commit | commitdiff | tree | snapshot
5 days ago Dmitry Eremin LU-5577 mdd: lu_dirent_calc_size() return type to size_t 83/12383/2 commit | commitdiff | tree | snapshot
5 days ago Dmitry Eremin LU-5577 ldlm: count of pools is unsigned long 04/12304/3 commit | commitdiff | tree | snapshot
5 days ago Wei Liu LU-5387 test: Skip sanity test_239 if MDS version older... 41/12241/2 commit | commitdiff | tree | snapshot
5 days ago Johann Lombardi LU-5668 test: enable ior data consistency check 58/12058/9 commit | commitdiff | tree | snapshot
5 days ago Jian Yu LU-5443 lustre: replace direct HZ access with kernel... 52/12052/8 commit | commitdiff | tree | snapshot
5 days ago Liang Zhen LU-5545 ptlrpc: false alarm in AT network latency measuring 18/12018/5 commit | commitdiff | tree | snapshot
5 days ago Andriy Skulysh LU-5651: ptlrpc: fix import state during replay 15/12015/4 commit | commitdiff | tree | snapshot
5 days ago Dmitry Eremin LU-5591 lod: fix Null pointer dereference in lod_ah_init() 70/11770/8 commit | commitdiff | tree | snapshot
5 days ago Dmitry Eremin LU-5589 obdclass: fix NULL pointer dereference 69/11769/5 commit | commitdiff | tree | snapshot
5 days ago John L. Hammond LU-2675 obd: cleanup struct md_op_data and uses 34/11734/4 commit | commitdiff | tree | snapshot
5 days ago Emoly Liu LU-4167 tests: correct version check to enable ff_convert 56/11556/6 commit | commitdiff | tree | snapshot
5 days ago Dmitry Eremin LU-5577 mdc: fix comparison between signed and unsigned 79/11379/17 commit | commitdiff | tree | snapshot
5 days ago Bruno Faccini LU-4176 tests: re-enable sanity-hsm/test_31a 77/9577/5 commit | commitdiff | tree | snapshot
5 days ago Niu Yawei LU-5807 qos: enable QOS_DEBUG() 34/12434/3 commit | commitdiff | tree | snapshot
5 days ago Jian Yu LU-4856 obdclass: check val in proc_max_dirty_pages_in_mb() 69/12269/4 commit | commitdiff | tree | snapshot
5 days ago Alexander.Boyko LU-5380 at: net AT after connect 55/11155/2 commit | commitdiff | tree | snapshot
5 days ago Niu Yawei LU-4810 utils: print messages when set tunables 65/9865/5 commit | commitdiff | tree | snapshot
5 days ago Jinshan Xiong LU-3259 clio: cl_lock simplification 58/10858/15 commit | commitdiff | tree | snapshot
6 days ago Amir Shehata LU-4181 tests: cleanup lustre before starting lnet... 69/12469/3 commit | commitdiff | tree | snapshot
6 days ago Jian Yu LU-5079 tests: decrease at_max value in replay-vbr... 90/12490/2 commit | commitdiff | tree | snapshot
6 days ago Bob Glossman LU-5825 kernel: kernel update [RHEL7 3.10.0-123.9.2... 78/12478/3 commit | commitdiff | tree | snapshot
6 days ago Bob Glossman LU-5795 kernel: kernel update [SLES11 SP3 3.0.101-0.40] 01/12401/2 commit | commitdiff | tree | snapshot
6 days ago Kit Westneat LU-5842 tests: reduce time to run sanity-sec tests... 32/12532/2 commit | commitdiff | tree | snapshot
7 days ago Lai Siyao LU-3270 statahead: race in start/stop statahead 66/9666/8 commit | commitdiff | tree | snapshot
7 days ago Lai Siyao LU-2272 statahead: ll_intent_drop_lock() called in... 65/9665/9 commit | commitdiff | tree | snapshot
7 days ago Lai Siyao LU-3270 statahead: use dcache-like interface for sa... 64/9664/11 commit | commitdiff | tree | snapshot
7 days ago Joshua Walgenbach LU-4647 nodemap: add mapping functionality 99/9299/44 commit | commitdiff | tree | snapshot
9 days ago Liang Zhen LU-5435 lnet: lustre network latency simulation 09/11409/14 commit | commitdiff | tree | snapshot
9 days ago Jinshan Xiong LU-4665 utils: lfs setstripe to specify OSTs 83/9383/29 commit | commitdiff | tree | snapshot
9 days ago Liang Zhen LU-5435 lnet: LNet drop rule implementation 14/11314/10 commit | commitdiff | tree | snapshot
9 days ago Jinshan Xiong LU-4198 clio: generalize cl_sync_io 56/8656/18 commit | commitdiff | tree | snapshot
9 days ago Liang Zhen LU-5435 libcfs: copy out ioctl inline buffer 13/11313/14 commit | commitdiff | tree | snapshot
9 days ago Fan Yong LU-5519 lfsck: repair slave LMV for striped directory 48/11848/15 commit | commitdiff | tree | snapshot
9 days ago Henri Doreau LU-3613 llite: Add ioctl to get parent fids from link EA. 69/7069/17 commit | commitdiff | tree | snapshot
10 days ago Fan Yong LU-5519 lfsck: repair master LMV for striped directory 47/11847/12 commit | commitdiff | tree | snapshot
10 days ago Fan Yong LU-5519 lfsck: repair bad name hash for striped directory 46/11846/13 commit | commitdiff | tree | snapshot
10 days ago Yang Sheng LU-5584 llite: ensure all data flush out when umount 03/12103/10 commit | commitdiff | tree | snapshot
10 days ago Oleg Drokin Revert "LU-5568 lnet: fix kernel crash when network... 02/12502/2 commit | commitdiff | tree | snapshot
11 days ago Wang Shilong LU-5568 lnet: fix kernel crash when network failed... 18/11718/11 commit | commitdiff | tree | snapshot
11 days ago Frank Zago LU-5756 hsm: add missing return code in llapi_hsm_copyt... 14/12314/6 commit | commitdiff | tree | snapshot
11 days ago Nathaniel Clark LU-5743 build: Update to zfs/spl 0.6.3-1.1 73/12273/3 commit | commitdiff | tree | snapshot
11 days ago Bob Glossman LU-5641 tests: ensure user daemon is in group bin 44/12044/4 commit | commitdiff | tree | snapshot
11 days ago Niu Yawei LU-5287 export: hold exp_lock when modify exp_flags 71/11871/3 commit | commitdiff | tree | snapshot
11 days ago Minh Diep LU-5674 test: print spl debug info 80/11580/18 commit | commitdiff | tree | snapshot
11 days ago Vitaly Fertman LU-4942 at: per-export lock callback timeout 36/9336/9 commit | commitdiff | tree | snapshot
11 days ago Patrick Farrell LU-5626 ldiskfs: update non-htree dotdot in rename 39/11939/11 commit | commitdiff | tree | snapshot
11 days ago Johann Lombardi LU-5675 quota: correctly set II_FL_NONUNQ in dt_index_r... 74/12074/3 commit | commitdiff | tree | snapshot
11 days ago Fan Yong LU-5519 lfsck: LFSCK code framework adjustment (2) 45/11845/13 commit | commitdiff | tree | snapshot
11 days ago Fan Yong LU-5518 lfsck: recover orphans from backend lost+found 36/11536/25 commit | commitdiff | tree | snapshot
11 days ago Fan Yong LU-5517 lfsck: repair invalid nlink count 16/11516/29 commit | commitdiff | tree | snapshot
11 days ago Niu Yawei LU-5727 ldlm: revert changes to ldlm_cancel_aged_policy() 48/12448/3 commit | commitdiff | tree | snapshot
11 days ago Niu Yawei LU-5777 quota: reserve enough credits for setattr 61/12361/3 commit | commitdiff | tree | snapshot
13 days ago Jian Yu LU-5606 tests: add version check codes to conf-sanity... 76/12376/2 commit | commitdiff | tree | snapshot
13 days ago Henri Doreau LU-1996 lustre: Flexible changelog format. 60/4060/25 commit | commitdiff | tree | snapshot
2014-10-25 Fan Yong LU-5624 tests: ignore bad lfsck performance for ZFS... 22/12322/2 commit | commitdiff | tree | snapshot
2014-10-25 John L. Hammond LU-2675 llog: remove obd_llog_init() and obd_llod_finish() 81/11781/2 commit | commitdiff | tree | snapshot
2014-10-25 John L. Hammond LU-2675 osc: remove obsolete llog handling 74/11774/4 commit | commitdiff | tree | snapshot
2014-10-25 John L. Hammond LU-2675 lustre: remove linux/obd_support.h 31/11931/3 commit | commitdiff | tree | snapshot
2014-10-25 John L. Hammond LU-4075 osd: handle getxattr for trusted.version 49/11649/2 commit | commitdiff | tree | snapshot
2014-10-25 Li Xi LU-5054 llite: enforce pool name length limit 06/10306/11 commit | commitdiff | tree | snapshot
2014-10-25 John L. Hammond LU-5352 dt: correct if condition in dt_index_read() 21/11121/6 commit | commitdiff | tree | snapshot
2014-10-24 John L. Hammond LU-2675 mgc: remove libmgc.c 72/11772/4 commit | commitdiff | tree | snapshot
2014-10-24 John L. Hammond LU-2675 libcfs: add libcfs/byteorder.h 86/11986/2 commit | commitdiff | tree | snapshot
2014-10-24 John L. Hammond LU-2675 libcfs: remove LUSTRE_{,SRV_}LNET_PID 85/11985/2 commit | commitdiff | tree | snapshot
2014-10-24 John L. Hammond LU-5779 test: wait for CT registration in sanity-hsm... 67/12367/2 commit | commitdiff | tree | snapshot
2014-10-22 Nathaniel Clark LU-5706 tests: Ensure preconditions in conf-sanity/57 36/12236/6 commit | commitdiff | tree | snapshot
2014-10-22 John L. Hammond LU-2675 libcfs: ignore CDEBUG_ENTRY_EXIT for userspace 81/12281/2 commit | commitdiff | tree | snapshot
2014-10-22 Amir Shehata LU-2456 lnet: lnetctl utility man page 59/11859/10 commit | commitdiff | tree | snapshot
2014-10-22 Amir Shehata LU-2456 lnet: configure lnet on startup 98/11798/11 commit | commitdiff | tree | snapshot
2014-10-22 Amir Shehata LU-2456 lnet: DLC user space Configuration utility 26/8026/65 commit | commitdiff | tree | snapshot
2014-10-22 Amir Shehata LU-2456 lnet: DLC user space Configuration library 25/8025/63 commit | commitdiff | tree | snapshot
2014-10-22 Fan Yong LU-5506 lfsck: skip orphan MDT-object handling for... 44/11444/23 commit | commitdiff | tree | snapshot
2014-10-22 Fan Yong LU-5516 lfsck: repair orphan parent MDT-object 91/11391/29 commit | commitdiff | tree | snapshot
2014-10-21 Alexander.Boyko LU-5079 ptlrpc: fix early reply timeout for recovery 13/11213/11 commit | commitdiff | tree | snapshot
2014-10-20 Henri Doreau LU-5752 doc: Added missing manpages to Makefile.am 08/12308/2 commit | commitdiff | tree | snapshot
2014-10-20 Fan Yong LU-4976 osp: add doxygen comments for osp_object.c... 99/10799/12 commit | commitdiff | tree | snapshot
2014-10-17 James Nunez LU-4298 utils: do not create file with no striping... 75/8375/8 commit | commitdiff | tree | snapshot
2014-10-16 Alex Zhuravlev LU-4974 lod: documentation for lod_object.c 22/11022/10 commit | commitdiff | tree | snapshot
2014-10-16 Fan Yong LU-5516 lfsck: repair the lost name entry 49/12249/3 commit | commitdiff | tree | snapshot
2014-10-16 Fan Yong LU-5515 lfsck: repair bad file type in name entry 48/12248/3 commit | commitdiff | tree | snapshot
2014-10-16 Fan Yong LU-5513 lfsck: repair multiple referenced name entry 47/12247/5 commit | commitdiff | tree | snapshot
2014-10-15 John L. Hammond LU-2675 libcfs: remove {ENTRY,EXIT}_NESTING macros 84/11984/5 commit | commitdiff | tree | snapshot
2014-10-15 Yang Sheng LU-951 test: re-enable replay-single test_73a 27/12227/3 commit | commitdiff | tree | snapshot
2014-10-11 Oleg Drokin New tag 2.6.54 2.6.54 v2_6_54 v2_6_54_0 commit | commitdiff | tree | snapshot
2014-10-11 Bruno Faccini LU-5573 obdclass: strengthen against concurrent server... 14/12114/4 commit | commitdiff | tree | snapshot
2014-10-11 Bobi Jam LU-4943 obdclass: detach MGC dev on error 29/10129/14 commit | commitdiff | tree | snapshot
2014-10-11 Nathaniel Clark LU-4839 utils: fix bandwidth ctl in lhsmtool 93/12093/7 commit | commitdiff | tree | snapshot
|
| Comment by nasf (Inactive) [ 10/Nov/14 ] |
|
We need the b2_5 patch, another failure on b2_5: |
| Comment by Nathaniel Clark [ 10/Nov/14 ] |
|
Patch for b2_5 |
| Comment by Jian Yu [ 10/Nov/14 ] |
|
Hi Nathaniel Clark,
The patch for Lustre b2_5 branch is ready to land. |
| Comment by Bob Glossman (Inactive) [ 10/Nov/14 ] |
|
still seen in master: |
| Comment by Nathaniel Clark [ 11/Nov/14 ] |
|
Current failures have a delay during copytool startup: 1415239123.290216 lhsmtool_posix[22116]: action=0 src=(null) dst=(null) mount_point=/mnt/lustre 1415239123.678180 lhsmtool_posix[22117]: waiting for message from kernel 1415239133.981741 lhsmtool_posix[22117]: copytool fs=lustre archive#=2 item_count=1 1415239133.982096 lhsmtool_posix[22117]: waiting for message from kernel 1415239133.982184 lhsmtool_posix[22118]: '[0x200000401:0x1c2:0x0]' action ARCHIVE reclen 72, cookie=0x545ad59d 1415239133.984387 lhsmtool_posix[22118]: processing file 'd60.sanity-hsm/f60.sanity-hsm' 1415239134.028282 lhsmtool_posix[22118]: archiving '/mnt/lustre/.lustre/fid/0x200000401:0x1c2:0x0' to '/home/autotest2/.autotest/shared_dir/2014-11-05/074239-70147036187460/arc1/01c2/0000/0401/0000/0002/0000/0x200000401:0x1c2:0x0_tmp' 1415239149.689299 lhsmtool_posix[22118]: saving stripe info of '/mnt/lustre/.lustre/fid/0x200000401:0x1c2:0x0' in /home/autotest2/.autotest/shared_dir/2014-11-05/074239-70147036187460/arc1/01c2/0000/0401/0000/0002/0000/0x200000401:0x1c2:0x0_tmp.lov 1415239151.495965 lhsmtool_posix[22118]: start copy of 39000000 bytes from '/mnt/lustre/.lustre/fid/0x200000401:0x1c2:0x0' to '/home/autotest2/.autotest/shared_dir/2014-11-05/074239-70147036187460/arc1/01c2/0000/0401/0000/0002/0000/0x200000401:0x1c2:0x0_tmp' 1415239156.616170 lhsmtool_posix[22118]: %13 1415239156.625435 lhsmtool_posix[22118]: bandwith control: 1048576B/s excess=1048576 sleep for 1.000000000s 1415239161.652771 lhsmtool_posix[22118]: %26 1415239161.661059 lhsmtool_posix[22118]: bandwith control: 1048576B/s excess=1048576 sleep for 1.000000000s 1415239166.690009 lhsmtool_posix[22118]: %40 1415239166.699557 lhsmtool_posix[22118]: bandwith control: 1048576B/s excess=1048576 sleep for 1.000000000s 1415239171.725522 lhsmtool_posix[22118]: %53 1415239171.729102 lhsmtool_posix[22118]: bandwith control: 1048576B/s excess=1048576 sleep for 1.000000000s 1415239176.737525 lhsmtool_posix[22118]: %67 1415239176.740985 lhsmtool_posix[22118]: bandwith control: 1048576B/s excess=1048576 sleep for 1.000000000s exiting: Interrupt Notice the amount of time from first log message to the first bandwidth control message (about 33sec). This would let the 30 sec timeout occur before any coping had actually occurred. Some even as long as a minute |
| Comment by Nathaniel Clark [ 12/Nov/14 ] |
| Comment by nasf (Inactive) [ 16/Nov/14 ] |
|
another failure instance on b2_5: |
| Comment by Gerrit Updater [ 17/Nov/14 ] |
|
Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: http://review.whamcloud.com/12682 |
| Comment by Gerrit Updater [ 23/Nov/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12682/ |
| Comment by Gerrit Updater [ 23/Nov/14 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/12823 |
| Comment by Jian Yu [ 23/Nov/14 ] |
|
Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12823 |
| Comment by Gerrit Updater [ 01/Dec/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12405/ |
| Comment by Jodi Levi (Inactive) [ 01/Dec/14 ] |
|
Patches landed to Master. |
| Comment by Jian Yu [ 02/Dec/14 ] |
|
The failure still occurred on Lustre b2_5 branch after the patches were landed: |
| Comment by Jian Yu [ 03/Dec/14 ] |
I just found the patch http://review.whamcloud.com/12823 has not been landed on Lustre b2_5 branch. |
| Comment by Gerrit Updater [ 04/Dec/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12823/ |
| Comment by Jian Yu [ 08/Dec/14 ] |
|
Lustre b2_5 build: https://build.hpdd.intel.com/job/lustre-b2_5/105/ (which contains the patch http://review.whamcloud.com/12823) The same failure still occurred: https://testing.hpdd.intel.com/test_sets/c0eea1ea-7dba-11e4-a179-5254006e85c2 |
| Comment by Li Wei (Inactive) [ 09/Dec/14 ] |
|
Indeed. b2_5: https://testing.hpdd.intel.com/test_sets/4aecf848-7cc0-11e4-b42a-5254006e85c2 |
| Comment by Jian Yu [ 12/Dec/14 ] |
|
More instance on Lustre b2_5 branch: |
| Comment by Andreas Dilger [ 12/Dec/14 ] |
|
Still seeing this test fail on master. 7x in the past week: |
| Comment by Andreas Dilger [ 15/Dec/14 ] |
|
Nathaniel, is it possible the test still isn't giving enough time for this to pass on review-zfs? This seems like one of the more common failures in review-zfs, so if we increase the wait time only for ZFS backed filesystems it will hopefully allow more passes (assuming there isn't some other real failure here, I haven't looked into the logs). |
| Comment by Nathaniel Clark [ 29/Dec/14 ] |
|
Unfortunately the wait time can't be increased much more without compromising the test. Since the test is trying to ensure that updates happen every 5 seconds instead of the default 30. I've already pushed the wait time up to 20 seconds. There seems to be a significant delay between "archiving" and "saving striping info": 1417489810.840712 lhsmtool_posix[8894]: processing file 'd60.sanity-hsm/f60.sanity-hsm' 1417489810.895598 lhsmtool_posix[8894]: archiving '/mnt/lustre/.lustre/fid/0x400000401:0x1c2:0x0' to '/home/autotest2/.autotest/shared_dir/2014-12-01/143223-70364285431720/arc1/01c2/0000/0401/0000/0004/0000/0x400000401:0x1c2:0x0_tmp' 1417489841.241595 lhsmtool_posix[8894]: saving stripe info of '/mnt/lustre/.lustre/fid/0x400000401:0x1c2:0x0' in /home/autotest2/.autotest/shared_dir/2014-12-01/143223-70364285431720/arc1/01c2/0000/0401/0000/0004/0000/0x400000401:0x1c2:0x0_tmp.lov 1417489845.934025 lhsmtool_posix[8894]: start copy of 39000000 bytes from '/mnt/lustre/.lustre/fid/0x400000401:0x1c2:0x0' to '/home/autotest2/.autotest/shared_dir/2014-12-01/143223-70364285431720/arc1/01c2/0000/0401/0000/0004/0000/0x400000401:0x1c2:0x0_tmp' 1417489850.089915 lhsmtool_posix[8894]: %13 This step is all about creating destination directories and opening for write the destination file (except for an open_by_fid). My current theory is that the issue resides with NFS being the shared directory that copytool wants to write to. |
| Comment by Gerrit Updater [ 30/Dec/14 ] |
|
Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: http://review.whamcloud.com/13214 |
| Comment by Gerrit Updater [ 11/Feb/15 ] |
|
Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: http://review.whamcloud.com/13731 |
| Comment by nasf (Inactive) [ 26/Feb/15 ] |
|
Another failure instance on b2_5: |
| Comment by Bruno Faccini (Inactive) [ 26/Feb/15 ] |
|
Nasf, Nathaniel, BTW, I don't know what causes such delay, looks like it only occurs with ZFS and could also be related to some VM/disk issue... |
| Comment by Gerrit Updater [ 03/Mar/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13731/ |
| Comment by Gerrit Updater [ 04/Mar/15 ] |
|
Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: http://review.whamcloud.com/13962 |
| Comment by Peter Jones [ 23/Apr/15 ] |
|
Landed for 2.8 |