[LU-921] generate warnings in case of discarding dirty pages Created: 13/Dec/11  Updated: 22/Dec/12  Resolved: 22/Dec/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.x (1.8.0 - 1.8.5)
Fix Version/s: Lustre 2.4.0, Lustre 2.1.4

Type: Bug Priority: Minor
Reporter: Jay Lan (Inactive) Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None
Environment:

sles10sp3, sles11sp1, centos5.6, centos6


Attachments: File patch.LU921.b2_1    
Issue Links:
Related
is related to LU-2505 lfsck: BUG: soft lockup - CPU#0 stuck... Resolved
Severity: 3
Bugzilla ID: 21,812
Rank (Obsolete): 5412

 Description   

"NASA/AMES is asking for the ability to be able to tell users which files may be suspect after an
adverse cluster event; panics, hangs, client evictions, etc.

One example that may be easier than others is when a client is evicted and the client is forced to
toss dirty pages of open files. The site has been experimenting with a way to list the inode
numbers on the mds with the associated pages tossed on the clients. I'll leave it to them to
discuss here.
" – from Oracle BZ 21812.

Oracle has released a patch (attachment 33114.) It affected client side. We like Whamcloud to include this patch for 1.8.x and 2.1.x.



 Comments   
Comment by Peter Jones [ 14/Dec/11 ]

Hongchao

Could you please look at what would be required to provide this capability in a Whamcloud release. The current approach needs several adjustments. Please talk to Johann for details.

Peter

Comment by Jay Lan (Inactive) [ 14/Dec/11 ]

I cherry-picked Oracle's patch for our nas-1.8.6 branch without a problem. But, the patch applied to code that do not exist in 2.1.

Our Lustre system admin consider this a very useful feature to them and we need your expertise to port the feature to 2.1. Thanks!

Comment by Hongchao Zhang [ 28/Dec/11 ]

status update:
the patch is under creating and test.

Comment by Hongchao Zhang [ 30/Dec/11 ]

the patch is tracked at http://review.whamcloud.com/#change,1908

Comment by Jay Lan (Inactive) [ 18/Jun/12 ]

There is no activity on this LU for almost 6 months. This fix is very important to use. Could you upgrade the priority? Thanks!

Comment by Peter Jones [ 18/Jun/12 ]

Jay

I will take this up with you offline

Peter

Comment by Hongchao Zhang [ 16/Oct/12 ]

the updated patch has been pushed to Gerrit

Comment by Peter Jones [ 30/Oct/12 ]

Jay

Can you please confirm whether the patch meets your needs?

Thanks

Peter

Comment by Jay Lan (Inactive) [ 30/Oct/12 ]

Hi Peter, certainly will do. Thanks~

Comment by Jay Lan (Inactive) [ 30/Oct/12 ]

I cherry-picked the patch into b2_1 branch. It was done cleanly. However, compilation failed:
/usr/src/packages/BUILD/lustre-2.1.3/lustre/llite/llite_lib.c:2256: error: 'OBD_IOC_GETDTNAME' undeclared (first use in this function)

Comment by Hongchao Zhang [ 31/Oct/12 ]

the OBD_IOC_GETDTNAME is introduced by "http://review.whamcloud.com/#change,1646" in LU-819, did you apply that patch to b2_1?

this patch needs ll_get_fsname redefined in http://review.whamcloud.com/2025 && http://review.whamcloud.com/#change,3704

I have created a updated patch containing these missing stuff in b2_1, pls refs the attachment

Comment by Jay Lan (Inactive) [ 31/Oct/12 ]

I provide our Lustre group the new rpms and ask for feedback.

Comment by Jay Lan (Inactive) [ 01/Nov/12 ]

Why is recovery-small test 24 ALWAYS_EXCEPT?

I think we can remove it from the ALWAYS_EXCEPT.

Comment by Jay Lan (Inactive) [ 01/Nov/12 ]

The test result looks good!

Can you make changes to remove test "24" from ALWAYS_EXCEPT as part of the patch? Thanks!

Comment by Peter Jones [ 01/Nov/12 ]

Jay that is really a separate issue and should be tracked as such. Could you please open a separate ticket to track the issue about test 24? Thanks!

Comment by Jay Lan (Inactive) [ 01/Nov/12 ]

Peter, a new test case recovery-small 24b was created to test this issue as part of the patch. That was how I discovered that test_24[ab] were skipped.

Let me know if you want to fix that in this case, since recovery-small.sh is modified anyway, or you want me to open a new case. I am fine either way.

Comment by Peter Jones [ 01/Nov/12 ]

Ah sorry - I did not realize that. In that case I will defer to Hongchao to comment.

Comment by Hongchao Zhang [ 02/Nov/12 ]

Yes, this test is put into ALWAYS_EXCEPT in bz5494, and I tested it under master and b2_1, both passed locally,
I'll try to create a patch to remove it from A_E, and check whether it pass or not in Toro.

Comment by Hongchao Zhang [ 02/Nov/12 ]

the patch is tracked at http://review.whamcloud.com/#change,4443

Comment by Emoly Liu [ 02/Dec/12 ]

Port for b2_1 is here http://review.whamcloud.com/#change,4716. I combine the two related patches into a single one.

Comment by Peter Jones [ 22/Dec/12 ]

Landed for 2.1.4 and 2.4

Generated at Sat Feb 10 01:11:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.