[LU-6920] sanity test_205 failed with old jobstats not expired Created: 28/Jul/15 Updated: 24/Aug/16 Resolved: 28/Oct/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Aditya Pandit (Inactive) | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Test 205 was executed for 5 iterations Using JobID environment variable FAKE_JOBID=id.205.mkdir.24449 fre0105: Warning: Permanently added 'fre0107,192.168.101.7' (RSA) to the list of known hosts. fre0108: Warning: Permanently added 'fre0107,192.168.101.7' (RSA) to the list of known hosts. mdt.lustre-MDT0000.job_cleanup_interval=600 |
| Comments |
| Comment by Ashish Purkar [ 08/Sep/15 ] |
|
It's easily reproducible on master Lustre: DEBUG MARKER: Test: mkdir /mnt/lustre/d205.sanity.expire Lustre: DEBUG MARKER: Using JobID environment variable FAKE_JOBID=id.205.mkdir.18733 Lustre: DEBUG MARKER: sanity test_205: @@@@@@ FAIL: old jobstats not expired Lustre: lustre-MDD0000: changelog off Lustre: DEBUG MARKER: == sanity test complete, duration 34 sec == 19:52:57 (1441191177) |
| Comment by Bob Glossman (Inactive) [ 04/Oct/15 ] |
|
another one seen with sles11sp4 on master: |
| Comment by Bob Glossman (Inactive) [ 07/Oct/15 ] |
|
another sles11sp4 on master: |
| Comment by Andreas Dilger [ 07/Oct/15 ] |
|
It looks like this test is only failing on SLES11 SP3/SP4 and not others. |
| Comment by Andreas Dilger [ 07/Oct/15 ] |
|
Bob, assigning this to you for further investigation. It seems to be failing regularly for SLES11 SP3/SP4 tests, and even if we could get those patches to pass once, this is failing for more test runs than it is passing and we would just be introducing a regression by landing those patches. |
| Comment by Andreas Dilger [ 07/Oct/15 ] |
|
It may be that increasing the timeout for jobid expiry by a second or two would be enough. The most efficient way might be to use wait_update() with a maximum of (left + 5) seconds so that it doesn't wait longer than needed. |
| Comment by Gerrit Updater [ 07/Oct/15 ] |
|
Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/16753 |
| Comment by Gerrit Updater [ 28/Oct/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16753/ |
| Comment by Joseph Gmitter (Inactive) [ 28/Oct/15 ] |
|
Landed for 2.8 |
| Comment by Ashish Purkar (Inactive) [ 24/Aug/16 ] |
|
| Comment by Peter Jones [ 24/Aug/16 ] |
|
Ashish Given that the fix that did land was in the already released 2.8 release it would probably be clearer to open a new ticket linked to this one Peter |
| Comment by Ashish Purkar (Inactive) [ 24/Aug/16 ] |
|
> Given that the fix that did land was in the already released 2.8 release it would probably be clearer to open a new ticket linked to this one |