[LU-7822] import.c:1249:ptlrpc_connect_interpret()) soaked-MDT0001_UUID went back in time Created: 26/Feb/16 Updated: 13/Oct/21 Resolved: 13/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Frank Heckes (Inactive) | Assignee: | Di Wang |
| Resolution: | Cannot Reproduce | Votes: | 1 |
| Labels: | soak | ||
| Environment: |
lola |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Error happens during soak testing of build '20160224' (b2_8 RC2) (see: Sequence of events:
All Lustre clients show the following error message: lola-27.log:Feb 26 09:04:37 lola-27 kernel: LustreError: 3684:0:(import.c:1249:ptlrpc_connect_interpret()) soaked-MDT0000_UUID went back in time (transno 120267673197 was previously committed, server now claims 120267671307)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 lola-27.log:Feb 26 09:04:37 lola-27 kernel: LustreError: 3684:0:(mdc_request.c:554:mdc_replay_open()) @@@ Can't properly replay without open data. req@ffff8806b1593980 x1527085335890828/t85900680419(85900680419) o101->soaked-MDT0000-mdc-ffff88082f749000@192.168.1.108@o2ib10:12/10 lens 896/544 e 0 to 0 dl 1456506319 ref 2 fl Interpret:RP/4/0 rc 301/301 The corresponding error message on MGS/MDT node lola-8 is: lola-8.log:Feb 26 09:04:37 lola-8 kernel: LustreError: 4561:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff8808231ac680 x1527085335890828/t0(85900680419) o101->713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:333/0 lens 896/3512 e 0 to 0 dl 1456506318 ref 1 fl Interpret:/4/0 rc 0/0 After recovery completed the errors appear in messages file of lola-8 lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806786, ql: 7, comp: 9, conn: 16, next: 120260806788, next_update 120260806887 last_committed: 120260806785) lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806789, ql: 7, comp: 9, conn: 16, next: 120260806887, next_update 120260806887 last_committed: 120260806785) lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806888, ql: 7, comp: 9, conn: 16, next: 120260806892, next_update 120260806903 last_committed: 120260806785) lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806893, ql: 7, comp: 9, conn: 16, next: 120260806903, next_update 120260806903 last_committed: 120260806785) lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806904, ql: 7, comp: 9, conn: 16, next: 120260806946, next_update 120260806960 last_committed: 120260806785) lola-8.log:Feb 26 09:15:42 lola-8 kernel: LustreError: 4664:0:(ldlm_lib.c:1900:check_for_next_transno()) soaked-MDT0001: waking for gap in transno, VBR is OFF (skip: 120260806947, ql: 7, comp: 9, conn: 16, next: 120260806958, next_update 120260806960 last_committed: 120260806785) ... ... etc etc lola-8.log:Feb 26 09:15:51 lola-8 kernel: LustreError: 4583:0:(ldlm_lockd.c:858:ldlm_server_blocking_ast()) ### BUG 6063: lock collide during recovery ns: mdt-soaked-MDT0000_UUID lock: ffff8803ecf60dc0/0x73ce4df547baebbb lrc: 3/0,0 mode: EX/EX res: [0x200010da0:0x74:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x40000001000020 nid: 0@lo remote: 0x73ce4df547baebb4 expref: 95 pid: 4836 timeout: 0 lvb_type: 0 lola-8.log:Feb 26 09:15:51 lola-8 kernel: LustreError: 4583:0:(ldlm_lockd.c:858:ldlm_server_blocking_ast()) Skipped 3 previous similar messages ... ... Attached console, message files of lola-8,9,27 (27--> as example client) |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 26/Feb/16 ] |
|
Hi Di, |
| Comment by Di Wang [ 26/Feb/16 ] |
|
hmm, it looks like the object is missing for this replay(open_no_create) request. lola-27.log:Feb 26 09:04:37 lola-27 kernel: LustreError: 3684:0:(import.c:1249:ptlrpc_connect_interpret()) soaked-MDT0000_UUID went back in time (transno 120267673197 was previously committed, server now claims 120267671307)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 This is interesting. lola-27 seems keeps the same replay request to lola-8 in every recovery. messages-lola-8.log:Feb 25 16:53:17 lola-8 kernel: LustreError: 4536:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff8804346976c0 x1527085335890828/t0(85900680419) o101->713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:209/0 lens 896/3512 e 0 to 0 dl 1456448059 ref 1 fl Interpret:/4/0 rc 0/0 messages-lola-8.log:Feb 25 17:15:44 lola-8 kernel: LustreError: 4451:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff88082b319cc0 x1527085335890828/t0(85900680419) o101->713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:738/0 lens 896/3512 e 0 to 0 dl 1456450098 ref 1 fl Interpret:/4/0 rc 0/0 messages-lola-8.log:Feb 25 23:10:46 lola-8 kernel: LustreError: 4401:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff880816d100c0 x1527085335890828/t0(85900680419) o101->713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:191/0 lens 896/3512 e 0 to 0 dl 1456470691 ref 1 fl Interpret:/4/0 rc 0/0 messages-lola-8.log:Feb 25 23:54:27 lola-8 kernel: LustreError: 4450:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff8807b679d0c0 x1527085335890828/t0(85900680419) o101->713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:512/0 lens 896/3512 e 0 to 0 dl 1456473277 ref 1 fl Interpret:/4/0 rc 0/0 messages-lola-8.log:Feb 26 00:30:11 lola-8 kernel: LustreError: 4389:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff880821640080 x1527085335890828/t0(85900680419) o101->713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:391/0 lens 896/3512 e 0 to 0 dl 1456475421 ref 1 fl Interpret:/4/0 rc 0/0 messages-lola-8.log:Feb 26 00:56:32 lola-8 kernel: LustreError: 4543:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff880824a71980 x1527085335890828/t0(85900680419) o101->713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:498/0 lens 896/3512 e 0 to 0 dl 1456477038 ref 1 fl Interpret:/4/0 rc 0/0 messages-lola-8.log:Feb 26 02:40:43 lola-8 kernel: LustreError: 4449:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff88081e04dcc0 x1527085335890828/t0(85900680419) o101->713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:668/0 lens 896/3512 e 0 to 0 dl 1456483248 ref 1 fl Interpret:/4/0 rc 0/0 messages-lola-8.log:Feb 26 08:00:34 lola-8 kernel: LustreError: 4408:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff8808119ea9c0 x1527085335890828/t0(85900680419) o101->713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:229/0 lens 896/3512 e 0 to 0 dl 1456502439 ref 1 fl Interpret:/4/0 rc 0/0 messages-lola-8.log:Feb 26 09:04:37 lola-8 kernel: LustreError: 4561:0:(mdt_open.c:1251:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff8808231ac680 x1527085335890828/t0(85900680419) o101->713987fa-97c6-9c64-e2ea-b436903a0b12@192.168.1.127@o2ib100:333/0 lens 896/3512 e 0 to 0 dl 1456506318 ref 1 fl Interpret:/4/0 rc 0/0 I think client should throw such replay request once it is failed, since it is only open, no create. |
| Comment by Di Wang [ 26/Feb/16 ] |
|
looks like the open replay data is already deleted from the replay request. So I do not know why this request still exist in the replay list. I wonder if this related with the recent xid change? Niu, please have a look? thanks. LustreError: 3684:0:(mdc_request.c:554:mdc_replay_open()) @@@ Can't properly replay without open data. req@ffff8806b1593980 x1527085335890828/t85900680419(85900680419) o101->soaked-MDT0000-mdc-ffff88082f749000@192.168.1.108@o2ib10:12/10 lens 896/544 e 0 to 0 dl 1456470692 ref 2 fl Interpret:RP/4/0 rc 301/301 And also this is probably not a critical ticket here, because I did not see this broke the consistency of the namespace. Only some noise message caused by this mysterious replay request? |