[LU-11958] sanity test_244: FAIL: sendfile+grouplock failed: sendfile_copy: assertion 'sret > 0' failed: sendfile failed: Input/output error Created: 12/Feb/19  Updated: 13/Feb/19  Resolved: 13/Feb/19

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Patrick Farrell (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-11951 sanity: test_231a failure, idle disco... Resolved
Related
is related to LU-9793 sanity test 244 fail Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for elena <c17455@cray.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/3cd6d080-2d6f-11e9-a886-52540065bddc

test_244 failed with the following error:

sendfile+grouplock failed
== sanity test 244: sendfile with group lock tests =================================================== 19:31:23 (1549827083)
35+0 records in
35+0 records out
36700160 bytes (37 MB) copied, 0.646688 s, 56.8 MB/s
Starting test test10 at 1549827084
sendfile_grouplock: sendfile_grouplock.c:259: sendfile_copy: assertion 'sret > 0' failed: sendfile failed: Input/output error
 sanity test_244: @@@@@@ FAIL: sendfile+grouplock failed

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_244 - sendfile+grouplock failed



 Comments   
Comment by Patrick Farrell (Inactive) [ 13/Feb/19 ]

Sorry, this took me a bit longer to get to than intended.
This is https://jira.whamcloud.com/browse/LU-11951.

Here's the log snippet that shows it:

 00000100:00080000:0.0:1549827085.531760:0:10251:0:(import.c:1138:ptlrpc_connect_interpret()) ffff9772c1163000 lustre-OST0004_UUID: changing import state from CONNECTING to FULL
00000080:00000004:0.0:1549827085.531764:0:10251:0:(lcommon_misc.c:102:cl_ocd_update()) Changing connect_flags: 0xa0425af2e3440478 -> 0xa0425af2e3440478
00000080:00080000:0.0:1549827085.531765:0:10251:0:(lcommon_misc.c:75:cl_init_ea_size()) updating def/max_easize: 72/216
00000100:00080000:0.0:1549827085.531768:0:10251:0:(recover.c:223:ptlrpc_wake_delayed()) @@@ waking (set ffff9772d49d7a00): req@ffff9772d77e0480 x1625110094957376/t0(0) o101->lustre-OST0004-osc-ffff9772fac82800@10.2.4.95@tcp:28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
00000100:00100000:0.0:1549827085.531774:0:10251:0:(client.c:2061:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_rcv:df4ae693-0f0b-93c1-a51d-0dbdcc8e83e4:10251:1625110094957392:10.2.4.95@tcp:8
00000100:00020000:0.0:1549827085.531783:0:7370:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation: req@ffff9772d77e0480 x1625110094957376/t0(0) o101->lustre-OST0004-osc-ffff9772fac82800@10.2.4.95@tcp:28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
00000100:00100000:0.0:1549827085.535956:0:7370:0:(client.c:2061:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc sendfile_groupl:df4ae693-0f0b-93c1-a51d-0dbdcc8e83e4:7370:1625110094957376:10.2.4.95@tcp:101
00010000:00010000:0.0:1549827085.535963:0:7370:0:(ldlm_request.c:592:ldlm_cli_enqueue_fini()) ### client-side enqueue END (FAILED) ns: lustre-OST0004-osc-ffff9772fac82800 lock: ffff9772e5e16900/0xc841a84d1dd2ac52 lrc: 4/0,1 mode: --/PW res: [0x7ec9:0x0:0x0].0x0 rrc: 2 type: EXT [0->4095] (req 0->4095) flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 7370 timeout: 0 lvb_type: 1

The key item is "req wrong generation" causing the enqueue to fail.

Comment by Patrick Farrell (Inactive) [ 13/Feb/19 ]

Dupe of LU-11951

Generated at Sat Feb 10 02:48:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.