HSM _not only_ small fixes and to do list goes here (LU-3647)

[LU-3971] CLONE - Posix copytool cleanup Created: 18/Sep/13  Updated: 03/Jun/14  Resolved: 14/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Technical task Priority: Major
Reporter: Henri Doreau (Inactive) Assignee: Jodi Levi (Inactive)
Resolution: Fixed Votes: 0
Labels: HSM

Issue Links:
Related
is related to LU-3694 Posix copytool cleanup Resolved
Rank (Obsolete): 10582

 Description   

Several minor issues have been identified during the review of the initial version of the HSM posix copytool, such as calling select() on regular files.



 Comments   
Comment by Jodi Levi (Inactive) [ 18/Sep/13 ]

http://review.whamcloud.com/7568 landed in 2.5
http://review.whamcloud.com/#/c/7583/ also needed, but not critical for 2.5

Comment by Bruno Faccini (Inactive) [ 25/Sep/13 ]

Change #7583 got a bunch of auto-tests errors, starting with one in conf-sanity/test_61 solved in LU-3938 where patch (http://review.whamcloud.com/7671) just landed.

So can patch be re-base+submitted ??

Thanks!

Comment by Henri Doreau (Inactive) [ 25/Sep/13 ]

Thanks Bruno,

patch rebased and re-pushed.

Comment by Bruno Faccini (Inactive) [ 01/Oct/13 ]

Patch-set #6 failed in sanity-hsm/test_104 (LU-4022 has been created to address this error) with the following log :

== sanity-hsm test 104: Copy tool data field == 07:05:05 (1380377105)
CMD: wtm-11vm1 pkill -CONT -x lhsmtool_posix
Purging archive on wtm-11vm1
CMD: wtm-11vm1 rm -rf /home/cgearing/.autotest/shared_dir/2013-09-27/185645-70126231448960/arc1/*
Starting copytool agt1 on wtm-11vm1
CMD: wtm-11vm1 mkdir -p /home/cgearing/.autotest/shared_dir/2013-09-27/185645-70126231448960/arc1
CMD: wtm-11vm1 lhsmtool_posix  --daemon --hsm-root /home/cgearing/.autotest/shared_dir/2013-09-27/185645-70126231448960/arc1 --bandwidth 1 /mnt/lustre < /dev/null > /logdir/test_logs/2013-09-27/lustre-reviews-el6-x86_64--review--1_2_1__18438__-70126231448960-185644/sanity-hsm.test_104.copytool_log.wtm-11vm1.log 2>&1
/usr/lib64/lustre/tests/sanity-hsm.sh: line 394: [: /mnt/lustre: integer expression expected
39+0 records in
39+0 records out
39000000 bytes (39 MB) copied, 5.79256 s, 6.7 MB/s
CMD: wtm-11vm7 /usr/sbin/lctl set_param mdt.lustre-MDT0000.hsm_control=disabled
mdt.lustre-MDT0000.hsm_control=disabled
CMD: wtm-11vm7 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
CMD: wtm-11vm7 /usr/sbin/lctl get_param -n			mdt.lustre-MDT0000.hsm.agent_actions |			grep 0x200002341:0x77:0x0 | cut -f16 -d=
CMD: wtm-11vm7 /usr/sbin/lctl set_param mdt.lustre-MDT0000.hsm_control=enabled
mdt.lustre-MDT0000.hsm_control=enabled
CMD: wtm-11vm7 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
 sanity-hsm test_104: @@@@@@ FAIL: Data field in records is () and not ([434541]) 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4266:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4293:error()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:2297:test_104()
  = /usr/lib64/lustre/tests/test-framework.sh:4547:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4580:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4435:run_test()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:2301:main()
Dumping lctl log to /logdir/test_logs/2013-09-27/lustre-reviews-el6-x86_64--review--1_2_1__18438__-70126231448960-185644/sanity-hsm.test_104.*.1380377113.log
CMD: wtm-11vm1,wtm-11vm2.rosso.whamcloud.com,wtm-11vm7,wtm-11vm8 /usr/sbin/lctl dk > /logdir/test_logs/2013-09-27/lustre-reviews-el6-x86_64--review--1_2_1__18438__-70126231448960-185644/sanity-hsm.test_104.debug_log.\$(hostname -s).1380377113.log;
         dmesg > /logdir/test_logs/2013-09-27/lustre-reviews-el6-x86_64--review--1_2_1__18438__-70126231448960-185644/sanity-hsm.test_104.dmesg.\$(hostname -s).1380377113.log
CMD: wtm-11vm1 pkill -INT -x lhsmtool_posix
Copytool is stopped on wtm-11vm1

I wonder if the "/usr/lib64/lustre/tests/sanity-hsm.sh: line 394: [: /mnt/lustre: integer expression expected" (from cleanup_large_files()/make_large_for_progress() ??...) error could be the root cause of this ...

Comment by Aurelien Degremont (Inactive) [ 01/Oct/13 ]

> I wonder if the "/usr/lib64/lustre/tests/sanity-hsm.sh: line 394: [: /mnt/lustre: integer expression expected" (from cleanup_large_files()/make_large_for_progress() ??...) error could be the root cause of this ...

I think this is due to cleanup_large_files() introduced by another patch during September.

Comment by John Hammond [ 01/Oct/13 ]

See LU-3973 for a bug in cleanup_large_files().

Comment by Henri Doreau (Inactive) [ 14/Nov/13 ]

Patch http://review.whamcloud.com/#/c/7583/ has been merged. This ticket can be closed.

Comment by Jodi Levi (Inactive) [ 14/Nov/13 ]

Per last comment in ticket, patch has landed and ticket can be closed.

Generated at Sat Feb 10 01:38:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.