Details
-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
Lustre 2.4.1
-
None
-
mds1 (mgs + mds for home)
mds2 ( mds for scratch)
6 oss server serving 6 volumes for home and 24 volumes for scratch
mds1.ibb@o2ib0:mds2.ibb@o2ib0:/home
86T 42T 43T 50% /global/home
mds1.ibb@o2ib0:mds2.ibb@o2ib0:/scratch
342T 89T 250T 27% /global/scratch
all clients mounts:
# Lustre home and scratch FS from ahab1:ahab2
mds1.ibb@o2ib0:mds2.ibb@o2ib0:/home /global/home lustre
rw,user_xattr,localflock,_netdev 0 0
mds1.ibb@o2ib0:mds2.ibb@o2ib0:/scratch /global/scratch lustre rw,user_xattr,localflock,_netdev 0 0mds1 (mgs + mds for home) mds2 ( mds for scratch) 6 oss server serving 6 volumes for home and 24 volumes for scratch mds1.ibb@o2ib0 : mds2.ibb@o2ib0 :/home 86T 42T 43T 50% /global/home mds1.ibb@o2ib0 : mds2.ibb@o2ib0 :/scratch 342T 89T 250T 27% /global/scratch all clients mounts: # Lustre home and scratch FS from ahab1:ahab2 mds1.ibb@o2ib0 : mds2.ibb@o2ib0 :/home /global/home lustre rw,user_xattr,localflock,_netdev 0 0 mds1.ibb@o2ib0 : mds2.ibb@o2ib0 :/scratch /global/scratch lustre rw,user_xattr,localflock,_netdev 0 0
-
3
-
12604
Description
These messages appear every few hours on the oss nodes:
oss6 kernel: : LustreError:
0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 117s: evicting client at 192.168.224.14@o2ib ns: filter-scratch-OST000b_UUID lock: ffff8804a321f000/0xaa2e9b983dbd2233 lrc: 3/0,0 mode: PW/PW res: [0x4a3a76:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 nid: 192.168.224.14@o2ib remote: 0x55c758a593d4fc6 expref: 24 pid: 12551timeout: 5161946610 lvb_type: 0
On the client:
pod24b14 kernel: : LustreError: 11-0:
scratch-OST000b-osc-ffff880312dff800: Communicating with 192.168.254.36@o2ib, operation obd_ping failed with -107.
pod24b14 kernel: : Lustre:
scratch-OST000b-osc-ffff880312dff800: Connection to scratch-OST000b (at
192.168.254.36@o2ib) was lost; in progress operations using this service will wait for recovery to complete Feb 3 12:21:45 pod24b14 kernel: : Lustre: Skipped 1 previous pod24b14 kernel: : LustreError: 167-0:
scratch-OST000b-osc-ffff880312dff800: This client was evicted by scratch-OST000b; in progress operations using this service will fail.
pod24b14 kernel: : Lustre:
6039:0:(llite_lib.c:2506:ll_dirty_page_discard_warn()) scratch: dirty page
discard: 192.168.254.41@o2ib:192.168.254.42@o2ib:/scratch/fid:
[0x2000020a6:0x130dc:0x0]/ may get corrupted (rc -108)pod24b14 kernel: : LustreError: 16480:0:(vvp_io.c:1088:vvp_io_commit_write()) Write page 0 of inodeffff880476e1a638 failed -108
pod24b14 kernel: : LustreError:
16516:0:(osc_lock.c:817:osc_ldlm_completion_ast()) lock@ffff8806063297b8[2
3 0 1 1 00000000] W(2):[0,
18446744073709551615]@[0x1000b0000:0x4a3a76:0x0]
lock@ffff8806063297b8
pod24b14 kernel: : LustreError:
16516:0:(osc_lock.c:817:osc_ldlm_completion_ast()) dlmlock returned -5
od24b14 kernel: : LustreError:
16480:0:(cl_lock.c:1420:cl_unuse_try()) result = -5, this is unlikely!
pod24b14 kernel: : LustreError:
16480:0:(cl_lock.c:1435:cl_unuse_locked()) lock@ffff880606329978[1 0 0 1 0 00000000] W(2):[0, 18446744073709551615]@[0x2000020a6:0x14f4a:0x0]
lock@ffff880606329978
pod24b14 kernel: : LustreError: 16480:0:(cl_lock.c:1435:cl_unuse_locked()) unuse return -5