[LU-6989] ldlm_lib.c:1831:check_for_next_transno system restarts unexpectedly Created: 12/Aug/15  Updated: 10/Oct/21  Resolved: 10/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Alex Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

https://build.hpdd.intel.com/job/lustre-master/3133/arch=x86_64,build_type=server,distro=el6.6,ib_stack=inkernel/


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During an active file deletion, an error occurs, after which the system restarts unexpectedly.
Logs
Aug 12 16:14:42 hard kernel: LustreError: 2954:0:(ldlm_lib.c:1831:check_for_next_transno()) FS-MDT0000: waking for gap in transno, VBR is OFF (skip: 115964304784, ql: 1, comp: 2, conn: 3, next: 115964304785, next_update 0 last_committed: 115964304265)
Aug 12 16:14:42 hard kernel: LustreError: 2954:0:(ldlm_lib.c:1831:check_for_next_transno()) FS-MDT0000: waking for gap in transno, VBR is OFF (skip: 115964304793, ql: 1, comp: 2, conn: 3, next: 115964304794, next_update 0 last_committed: 115964304265)
Aug 12 16:14:42 hard kernel: LustreError: 2954:0:(ldlm_lib.c:1831:check_for_next_transno()) FS-MDT0000: waking for gap in transno, VBR is OFF (skip: 115964304795, ql: 1, comp: 2, conn: 3, next: 115964304796, next_update 0 last_committed: 115964304265)
Aug 12 16:14:42 hard kernel: LustreError: 2954:0:(ldlm_lib.c:1831:check_for_next_transno()) FS-MDT0000: waking for gap in transno, VBR is OFF (skip: 115964304807, ql: 1, comp: 2, conn: 3, next: 115964304808, next_update 0 last_committed: 115964304265)
Aug 12 16:14:42 hard kernel: LustreError: 2954:0:(ldlm_lib.c:1831:check_for_next_transno()) FS-MDT0000: waking for gap in transno, VBR is OFF (skip: 115964304809, ql: 1, comp: 2, conn: 3, next: 115964304810, next_update 0 last_committed: 115964304265)
Aug 12 16:14:42 hard kernel: Lustre: FS-MDT0000: disconnecting 1 stale clients
Aug 12 16:14:43 hard kernel: Lustre: FS-MDT0000: Recovery over after 0:36, of 3 clients 2 recovered and 1 was evicted.
Aug 12 16:14:43 hard kernel: Lustre: FS-OST0000: deleting orphan objects from 0x0:1292643 to 0x0:1292823



 Comments   
Comment by Andreas Dilger [ 13/Aug/15 ]

Which system is restarting (client, MDS, OSS)? Are you running the MDS and OSS on the same node? Please provide the console logs from the failing node.

Comment by Alex [ 14/Aug/15 ]

Yes, it is running on the same node MGS, MDT, OST. The client on the other.
How to enable the full logs to provide the required output?

Generated at Sat Feb 10 02:05:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.