[LU-11559] sanity-sec test_21: req wrong generation: Created: 23/Oct/18  Updated: 19/Dec/23  Resolved: 31/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.5, Lustre 2.12.8
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11951 sanity: test_231a failure, idle disco... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/245407e4-d644-11e8-ad90-52540065bddc

test_21 failed with the following error in the test logs, not sure if it is related:

:
CMD: onyx-30vm6.onyx.whamcloud.com runas -u60004 -g60004 -G60004 lfs quota -q /mnt/lustre
running as uid/gid/euid/egid 60004/60004/60004/60004, groups: 60004
 [lfs] [quota] [-q] [/mnt/lustre]
/usr/lib64/lustre/tests/sanity-sec.sh: line 1273: [0]: syntax error: operand expected (error token is "[0]")
CMD: onyx-30vm9 /usr/sbin/lctl nodemap_del c0
CMD: onyx-30vm9 /usr/sbin/lctl nodemap_del c1
:
test_21 returned 1

In the console logs for the client I found:
https://testing.whamcloud.com/test_logs/248f4b92-d644-11e8-ad90-52540065bddc/show_text

[ 9492.526296] LustreError: 16852:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff989925543000 x1615054606938112/t0(0) o19->lustre-OST0001-osc-ffff9899209db800@10.2.4.100@tcp:28/4 lens 336/336 e 0 to 0 dl 0 ref 2 fl Rpc:NW/0/ffffffff rc 0/-1
[ 9492.528661] LustreError: 16852:0:(osc_quota.c:293:osc_quotactl()) ptlrpc_queue_wait failed, rc: -5

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-sec test_21 - test_21 returned 1



 Comments   
Comment by Sebastien Buisson [ 23/Oct/18 ]

Hit again:
https://testing.whamcloud.com/test_sets/7905294a-d698-11e8-b589-52540065bddc

But this time in test 22, and without "req wrong generation" error message in console (maybe skipped?).

Looks like the command 'lfs quota' as run by user 60005 returns nothing, hence the script error at sanity-sec.sh::1273.

Comment by James Nunez (Inactive) [ 22/Feb/19 ]

I'm seeing something similar in sanityn test 77j:
Looking at the suite_log for https://testing.whamcloud.com/test_sets/190f746e-280f-11e9-a318-52540065bddc , we see the first call to dd fails

jobid_var=procname_uid
ost.OSS.ost_io.nrs_policies=tbf opcode
ost.OSS.ost_io.nrs_tbf_rule=start ost_r opcode={ost_read} rate=5
ost.OSS.ost_io.nrs_tbf_rule=start ost_w opcode={ost_write} rate=20
CMD: trevis-24vm1.trevis.whamcloud.com,trevis-24vm2 dd if=/dev/zero of=/mnt/lustre/d77j.sanityn/nrs_r_trevis-24vm1.trevis.whamcloud.com bs=1M count=16
trevis-24vm2: dd: error writing ‘/mnt/lustre/d77j.sanityn/nrs_r_trevis-24vm1.trevis.whamcloud.com’: Input/output error
trevis-24vm2: 3+0 records in
trevis-24vm2: 2+0 records out
trevis-24vm2: 2097152 bytes (2.1 MB) copied, 0.0333487 s, 62.9 MB/s
trevis-24vm1: 16+0 records in
trevis-24vm1: 16+0 records out
trevis-24vm1: 16777216 bytes (17 MB) copied, 0.0697378 s, 241 MB/s
 sanityn test_77j: @@@@@@ FAIL: dd at 0 on client failed (1) 

We do see an error in the client (vm2) console log

[38020.602925] Lustre: DEBUG MARKER: == sanityn test 77j: check TBF-OPCode NRS policy ===================================================== 10:28:04 (1549189684)
[38021.702293] Lustre: DEBUG MARKER: dd if=/dev/zero of=/mnt/lustre/d77j.sanityn/nrs_r_trevis-24vm1.trevis.whamcloud.com bs=1M count=16
[38021.901674] LustreError: 15575:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff9b73c0197600 x1624403585735904/t0(0) o101->lustre-OST0003-osc-ffff9b742883f800@10.9.5.44@tcp:28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
[38022.511219] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanityn test_77j: @@@@@@ FAIL: dd at 0 on client failed \(1\) 
Comment by Andreas Dilger [ 22/Mar/19 ]

I suspect this is the same as LU-11951, which was fixed on master with patch https://review.whamcloud.com/34221 "LU-11951 ptlrpc: reset generation for old requests" on 2018-02-27.

Comment by Sarah Liu [ 02/Jun/20 ]

seeing similar error in b2_12 testing
https://testing.whamcloud.com/test_sets/8b96945c-7d48-43cc-8d05-a5895867af8c

CMD: trevis-33vm11.trevis.whamcloud.com chmod 200 /mnt/lustre/d21.sanity-sec
CMD: trevis-33vm11.trevis.whamcloud.com chmod 300 /mnt/lustre/d21.sanity-sec
CMD: trevis-33vm11.trevis.whamcloud.com chmod 400 /mnt/lustre/d21.sanity-sec
CMD: trevis-33vm11.trevis.whamcloud.com chmod 500 /mnt/lustre/d21.sanity-sec
CMD: trevis-33vm11.trevis.whamcloud.com chmod 600 /mnt/lustre/d21.sanity-sec
CMD: trevis-33vm11.trevis.whamcloud.com chmod 700 /mnt/lustre/d21.sanity-sec
CMD: trevis-33vm11.trevis.whamcloud.com chmod 303 /mnt/lustre/d21.sanity-sec
CMD: trevis-33vm11.trevis.whamcloud.com chmod 777 /mnt/lustre/d21.sanity-sec
CMD: trevis-33vm12 runas -u60006 -g60006 -G60006 lfs quota -q /mnt/lustre
trevis-33vm12: running as uid/gid/euid/egid 60006/60006/60006/60006, groups: 60006
trevis-33vm12:  [lfs] [quota] [-q] [/mnt/lustre]
pdsh@trevis-33vm11: trevis-33vm12: ssh exited with exit code 2
/usr/lib64/lustre/tests/sanity-sec.sh: line 1267: [8]: syntax error: operand expected (error token is "[8]")
CMD: trevis-47vm10 /usr/sbin/lctl nodemap_del c0
CMD: trevis-47vm10 /usr/sbin/lctl nodemap_del c1
CMD: trevis-47vm10 /usr/sbin/lctl nodemap_modify --name default --property admin --value 0
CMD: trevis-47vm10 /usr/sbin/lctl get_param -n nodemap.active
CMD: trevis-47vm10 /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d @
CMD: trevis-47vm10 /usr/sbin/lctl get_param nodemap.default.admin_nodemap
On MGS 10.9.3.130, default.admin_nodemap = nodemap.default.admin_nodemap=0
Comment by Alena Nikitenko [ 23/Nov/21 ]

Might have encountered something similar in sanity-quota test_37 (2.12.8) https://testing.whamcloud.com/test_sets/96c30097-9119-47df-a3ab-58781f980762

[64651.546456] LustreError: 3888:0:(client.c:1228:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffffa023604d2880 x1716956121891456/t0(0) o19->lustre-OST0004-osc-ffffa0237b2b5000@10.240.24.13@tcp:28/4 lens 336/336 e 0 to 0 dl 0 ref 2 fl Rpc:NW/0/ffffffff rc 0/-1
[64651.551120] LustreError: 3888:0:(osc_quota.c:308:osc_quotactl()) ptlrpc_queue_wait failed, rc: -5
Generated at Sat Feb 10 02:44:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.