[LU-186] recovery-mds-scale (FLAVOR=OSS): (filter.c:151:filter_finish_transno()) LBUG Created: 01/Apr/11  Updated: 06/Apr/11  Resolved: 06/Apr/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Minor
Reporter: Hongchao Zhang Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Bugzilla ID: 20,394
Rank (Obsolete): 10096

 Description   

the last_rcvd could be equal to lcd->lcd_last_transno in some special cases, then change "<=" into
"<", and add some more debug info atm.



 Comments   
Comment by Mikhail Pershin [ 01/Apr/11 ]

Are you sure that bug exists in master? There is no '<=' check in filter_finish_transno() but assertion LASSERT(last_rcvd >= le64_to_cpu(lcd->lcd_last_transno)); which is correct.

Comment by Hongchao Zhang [ 02/Apr/11 ]

yes, there is no such issue in master.
as for the cause of the bug, i am a little confused about the comment#5 of 20394 in bugzilla,

...
if (last_rcvd <= le64_to_cpu(lcd->lcd_last_transno))

{ spin_unlock(&filter->fo_translock); LBUG(); }

last_rcvd likely equals to lcd->lcd_last_transno because the transaction of setattr might already
been committed, but the server doesn't have a chance to send the reply to the client side, which
then causes the request being handled immediately. For this case, at lease assertion for last_rcvd
== lcd->lcd_last_transno might be (wrongly) hit.
...

this issue should be caused by resent replay request, which cause the transno in the replay
request equals to the transno in lcd->lcd_last_transno, is that correct?

Generated at Sat Feb 10 01:04:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.