[LU-4981] need to remount after sanity 133g Created: 29/Apr/14  Updated: 08/May/14  Resolved: 08/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Critical
Reporter: John Hammond Assignee: Bobbie Lind (Inactive)
Resolution: Fixed Votes: 0
Labels: procfs, tests

Issue Links:
Related
is related to LU-4930 osd_object_destroy()) ASSERTION( osd_... Resolved
Severity: 3
Rank (Obsolete): 13801

 Description   

Sanity 133g tries to write garbage to every file in /proc/fs/lustre,... As you can imagine, this may affect the other tests in subtle ways.

[18:14:13] John Hammond: I'm a bit concerned that sanity 133g may be having/exposing some subtle effects on sanity.
[18:14:53] John Hammond: I see weird messages that the current identity upcall is '^E'.
[18:15:33] John Hammond: Just FYI. I'll probably create an issue here soon.
[18:16:41] Oleg Drokin: ah… I very well can believe that. Luckily, we do not use identity-upcall in our testing I think
[18:16:56] John Hammond: We do use it.
[18:18:07] Oleg Drokin: do we? interesting
[18:18:37] Oleg Drokin: sigh, need to make it the very last test then?
[18:18:40] John Hammond: If the mdt.*.identity_upcall is set to something other than NONE then we use that.
[18:19:43] John Hammond: I see mdt_identity_do_upcall() being executed several times in a sanity run.
[18:20:13] Oleg Drokin: hopefully mostly before test 133
[18:20:29] Oleg Drokin: but there were some other side effects like disabling changelog masks and such
[18:20:37] Oleg Drokin: so it does make sense to move the test to be the very last one
[18:22:13] John Hammond: Or unmount and remount after it.



 Comments   
Comment by John Hammond [ 30/Apr/14 ]

[18:40:05] Oleg Drokin: does that actually reset all proc vars if we don't do full module unload, though?
[18:42:09] John Hammond: No not all.
[18:44:00] Oleg Drokin: so we need to go through the entire unmount and remount. just easier to move it to the end?
[18:44:56] John Hammond: Sure, except for when we have two tests that each need to be the last one...
[18:45:27] Oleg Drokin: true, I guess. so full unmount and reload of modules test 65j style then?
[18:50:20] John Hammond: Indeed.

Comment by John Hammond [ 30/Apr/14 ]

Maybe we should kill this part:

        rc = write(fd, &fd, 1);
        if (rc != 1)
                perror("write one byte");

This explains why the identity_upcall was being set to "^E" == "\005".

Comment by Oleg Drokin [ 30/Apr/14 ]

I think 1 byte write is important since there are certain pieces in the code that might not handle that properly

Comment by Jodi Levi (Inactive) [ 30/Apr/14 ]

Bobbie,
Could you please take this one?
Thank you!

Comment by Di Wang [ 30/Apr/14 ]

http://review.whamcloud.com/10174 Since this will block me landing 9511, I will just push a patch here.

Comment by Jodi Levi (Inactive) [ 08/May/14 ]

Patch landed to Master. please reopen ticket if more work is needed

Generated at Sat Feb 10 01:47:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.