Details
-
Bug
-
Resolution: Won't Fix
-
Critical
-
None
-
Lustre 2.8.0
-
lola
build: 2.7.62-28-g0754bc8, 0754bc8f2623bea184111af216f7567608db35b6; soakbuild '20151104.1'
-
3
-
9223372036854775807
Description
Error occurred during soak testing of build '20151104.1' on cluster lola (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151104.1). MDTs are fromated with ldiskfs and OSTs with zfs as storage backend. DNE is enabled. MDSes are configured in HA failover configuration.
Sequence of events:
- 2015-11-04 18:47:30 – mds_restart lola-9 completed
- 2015-11-04 18:50:30 – OSS (lola-5) evict client
lola-5.log:Nov 4 18:50:30 lola-5 kernel: LustreError: 0:0:(ldlm_lockd.c:342:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 192.168.1.131@o2ib100 ns: filter-soaked-OST0007_UUID lock: ffff880313f841c0/0x15cebc1506e2a9b5 lrc: 3/0,0 mode: PW/PW res: [0x2b9a7de:0x0:0x0].0x0 rrc: 4 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000080010020 nid: 192.168.1.131@o2ib100 remote: 0x175f680569401922 expref: 5 pid: 10807 timeout: 4306845817 lvb_type: 0
similar message exist on all OSS nodes
- 2015-11-04 18:55:45 – client (lola-31) evicted from ost7
Nov 4 18:50:45 lola-31 kernel: LustreError: 167-0: soaked-Nov 4 18:50:30 -osc-ffff881071e62400: This client w as evicted by soaked-OST0007; in progress operations using this service will fail. Nov 4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:887:ldlm_resource_complain()) so aked-OST0007-osc-ffff881071e62400: namespace resource [0x2b9a7de:0x0:0x0].0x0 (ffff8806d80cfcc0) refco unt nonzero (1) after lock cleanup; forcing cleanup. Nov 4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- R esource: [0x2b9a7de:0x0:0x0].0x0 (ffff8806d80cfcc0) refcount = 2 Nov 4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Grant ed locks (in reverse order): Nov 4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:1508:ldlm_resource_dump()) ### # ## ns: soaked-OST0007-osc-ffff881071e62400 lock: ffff880850f12a80/0x175f680569401922 lrc: 3/0,1 mode: PW/PW res: [0x2b9a7de:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1844674407370955 1615) flags: 0x526480000000 nid: local remote: 0x15cebc1506e2a9b5 expref: -99 pid: 15402 timeout: 0 lv b_type: 1
- till 2015-11-04 18:59:05 ost
{8, a, b, c}
are evicted with same error messages on client and OSSes
- 2015-11-05 – client (lola-31) osc stay in state DISCONN, EVICTED for the OSTs affected (see
file 'evicted-client.txt.bz2) - client node is unusable and all jobs crashed
The problem might be related to LU-2067.
Attached files:
- OSSes (lola-[2-5]: messages, console log files
- client lola-31: messages, console log files, 'lctl ..state* - output
Attachments
Issue Links
- mentioned in
-
Page Loading...