[LU-14060] sanity test_426 (splice-test) issues Created: 21/Oct/20  Updated: 22/Oct/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13745 tasks hang with copy_file_range: ll_f... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test_426 has a few issues:

  1. trivial should not be used when adding tests.
  2. The commit message specifies "Test-Parameters: trivial envdefinitions=ONLY=425 testlist=sanity" but the test added was test_426.
  3. test_426() itself does not check the exit status of the first the invocations of splice-read. The first and third invocations fail:
    == sanity test 426: splice test on Lustre ============================================================ 04:48:51 (1601959731)
    splice-test: splice: Bad address
    concurrent reader with O_DIRECT
    read: /mnt/lustre/f426.sanity: unexpected EOF
    concurrent reader with O_DIRECT
    concurrent reader without O_DIRECT
    concurrent reader without O_DIRECT
    splice-test: splice: Bad address
    sequential reader with O_DIRECT
    sequential reader without O_DIRECT
    Resetting fail_loc on all nodes...CMD: trevis-57vm1
    

    (From https://testing.whamcloud.com/sub_tests/32fb3881-204a-462a-92cf-561557edb59a).

  4. splice-find.c mixes fork without exact, printf(), and fprintf(stderr, ...) which causes the duplicated and out of order messages.
  5. read_from_pipe() should use posix_memalign() to allocate its buffer for O_DIRECT.
  6. read_from_pipe() decrements size by sz instead of ret.
  7. do_splice1() is vulnerable to deadlock since the read_from_pipe() is not called until after the splice() completes.
  8. do_splice2() does not check for errors from fork().
  9. do_splice2() does not check the exit status of the child process.
  10. Generally we should check for failure from system calls as rc < 0 rather than rc == -1.
  11. Error handling uses the non-standard BSD extension err().
  12. Allocation should use the standard function posix_memalign() rather then aligned_alloc().


 Comments   
Comment by James A Simmons [ 21/Oct/20 ]

That explains some of the failure I see in another patch I did.  Note this was taken from the xfstest suite so we can push these fixes back to them.

Comment by Wang Shilong (Inactive) [ 22/Oct/20 ]

Thanks for this.

Originally patch was added for test_425, but it conflicts when rebasing i forgot to update commit as well.

Yup, this is totally copied from xfstest, original motivation is to make sure Lustre did not hang with such splice test

Generated at Sat Feb 10 03:06:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.