[LU-6144] Rolling downgrade: sanity test_160 keeps running and cannot finish Created: 20/Jan/15 Updated: 16/Jan/22 Resolved: 16/Jan/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | Mikhail Pershin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
before downgrade: server and client lustre-master build #2810 |
||
| Severity: | 3 |
| Rank (Obsolete): | 17126 |
| Description |
|
sanity test_160 just keep running for over several hours and cannot finish client log shows: == sanity test 160a: changelog sanity == 12:53:31 (1421787211) Registered as changelog user cl6 945564 01CREAT 20:53:32.152975358 2015.01.20 0x0 t=[0x20000bf81:0x48:0x0] j=cp.00 p=[0x20000bf81:0x46:0x0] pic1.jpg 945565 08RENME 20:53:32.162975688 2015.01.20 0x0 t=[0:0x0:0x0] j=mv.0 p=[0x200000 bf81:0x41:0x0] zach s=[0x20000bf81:0x46:0x0] sp=[0x20000bf81:0x42:0x0] zachy 945566 03HLINK 20:53:32.166975820 2015.01.20 0x0 t=[0x20000bf81:0x48:0x0] j=ln.00 p=[0x20000bf81:0x42:0x0] portland.jpg 945567 04SLINK 20:53:32.169975919 2015.01.20 0x0 t=[0x20000bf81:0x49:0x0] j=ln.00 p=[0x20000bf81:0x41:0x0] desktop.jpg 945568 06UNLNK 20:53:32.172976018 2015.01.20 0x1 t=[0x20000bf81:0x49:0x0] j=rm.00 p=[0x20000bf81:0x41:0x0] desktop.jpg verifying changelog mask mdd.lustre-MDT0000.changelog_mask=-MKDIR mdd.lustre-MDT0000.changelog_mask=+CLOSE mdd.lustre-MDT0000.changelog_mask=+MKDIR mdd.lustre-MDT0000.changelog_mask=-CLOSE 35 02MKDIR 05:32:13.88218195 2015.01.20 0x0 t=[0x2000090a1:0x20e1:0x0] j=test_idd .205.29582 p=[0x200000007:0x1:0x0] f205.sanity 36 07RMDIR 05:32:14.57249262 2015.01.20 0x0 t=[0x2000090a1:0x20e1:0x0] j=test_idd .205.11894 p=[0x200000007:0x1:0x0] f205.sanity 37 05MKNOD 05:32:15.51282063 2015.01.20 0x0 t=[0x2000090a1:0x20e2:0x0] j=test_idd .205.26781 p=[0x200000007:0x1:0x0] f205.sanity 38 06UNLNK 05:32:16.46314899 2015.01.20 0x1 t=[0x2000090a1:0x20e2:0x0] j=test_idd .205.14763 p=[0x200000007:0x1:0x0] f205.sanity 39 01CREAT 05:32:17.41347732 2015.01.20 0x0 t=[0x2000090a1:0x20e3:0x0] j=test_idd .205.8582 p=[0x200000007:0x1:0x0] f205.sanity 40 12LYOUT 05:32:17.43347798 2015.01.20 0x0 t=[0x2000090a1:0x20e3:0x0] j=test_idd .205.8582 41 13TRUNC 05:32:19.204418125 2015.01.20 0xe t=[0x2000090a1:0x20e3:0x0] j=test_ii d.205.9105 42 13TRUNC 05:32:21.790503464 2015.01.20 0xe t=[0x2000090a1:0x20e3:0x0] j=test_ii d.205.30748 43 08RENME 05:32:22.974542541 2015.01.20 0x0 t=[0:0x0:0x0] j=test_id.205.11518 pp =[0x200000007:0x1:0x0] jobstats_test_rename s=[0x2000090a1:0x20e3:0x0] sp=[0x2000 000007:0x1:0x0] f205.sanity 44 01CREAT 05:32:33.660892782 2015.01.20 0x0 t=[0x2000090a1:0x20e6:0x0] p=[0x2000 000007:0x1:0x0] f205.sanity 45 06UNLNK 05:32:33.684894002 2015.01.20 0x1 t=[0x2000090a1:0x20e3:0x0] p=[0x2000 000007:0x1:0x0] jobstats_test_rename 46 02MKDIR 05:32:44.995265190 2015.01.20 0x0 t=[0x2000090a1:0x20e7:0x0] j=mkdir.. 0 p=[0x200000007:0x1:0x0] d206.sanity |
| Comments |
| Comment by Andreas Dilger [ 21/Jan/15 ] |
|
Is this failure reproducible? Are there any logs available (console on client/agent and MDS)? I'm assuming that "rolling downgrade" actually means "upgrade then downgrade"? We do not support formatting at a newer version and then downgrade to an older version. It is never allowed to downgrade beyond the version used for formatting, and downgrade of multiple versions is also not allowed. |
| Comment by Sarah Liu [ 21/Jan/15 ] |
|
Hi Andreas, Yes, as you said, before rolling downgrade, I first did the rolling upgrade from 2.6.0 to master(it passed), so the system was formatted under 2.6.0, then downgrade to 2.6.0. I also hit this issue after downgrade the MDS from master to 2.6.0. I will try to get more logs. |
| Comment by Sarah Liu [ 18/Feb/15 ] |
|
Can still hit this error in tag-2.6.94 testing, same symptom, test_160a keep running with no other errors, can only see the messages showing in the description. |
| Comment by Jodi Levi (Inactive) [ 18/Feb/15 ] |
|
Mike, |
| Comment by Isaac Huang (Inactive) [ 04/Mar/15 ] |
|
Two instances of sanity test_160a time out (logs are all there): Seemed like the MDS rebooted during the test, not sure if that's intended or not. |
| Comment by Sarah Liu [ 09/Mar/15 ] |
|
No, that was not intended. the test should be done in 30 secs. |
| Comment by Mikhail Pershin [ 16/Jan/22 ] |
|
Outdated |
| Comment by Andreas Dilger [ 16/Jan/22 ] |
|
Mike, for future reference, we typically mark Jira tickets Resolved instead of Closed, since Resolved still allows tags to be added/removed (eg. always_except or similar) and a few other useful maintenance actions that Closed does not. |
| Comment by Mikhail Pershin [ 16/Jan/22 ] |
|
I see, will use 'Resolved'. 'Closed' could be reopened for maintenance as I can see though that is inconvenient comparing with 'Resolved' |