Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
None
-
lola
build: master branch, 2.7.65-38-g607f691 ; 607f6919ea67b101796630d4b55649a12ea0e859
-
3
-
9223372036854775807
Description
The error happened during soak testing of build '20160126' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160126). DNE is enabled.
MDTs had been formated with ldiskfs, OSTs with zfs.
No faults were injected during soak test. Only application load and execution of lfsck were imposed on the test cluster.
Sequence of events:
- Jan 27 05:44:56 - Started lfsck - command on primary MDS (lola-8):
lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A
- Jan 27 05:49 - OSS node lola-5 hit LBUG (see
LU-7720) - Jan 27 08:46 Rebooted lola-5, remounted OSTs, enabled debug for lfsck + increased debug buffer (512MB);
increasing number of blocked ost_* - threads
A huge number of debug logs were printed before oom-killer starts:Call Trace: [<ffffffff8106cc43>] ? dequeue_entity+0x113/0x2e0 [<ffffffff8152bd26>] __mutex_lock_slowpath+0x96/0x210 [<ffffffffa0fcbe7b>] ? ofd_seq_load+0xbb/0xa90 [ofd] [<ffffffff8152b84b>] mutex_lock+0x2b/0x50 [<ffffffffa0fbff18>] ofd_create_hdl+0xc28/0x2640 [ofd] [<ffffffffa093a66b>] ? lustre_pack_reply_v2+0x1eb/0x280 [ptlrpc] [<ffffffffa093a7a6>] ? lustre_pack_reply_flags+0xa6/0x1e0 [ptlrpc] [<ffffffffa093a8f1>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [<ffffffffa09a4f9c>] tgt_request_handle+0x8ec/0x1470 [ptlrpc] [<ffffffffa094c201>] ptlrpc_main+0xe41/0x1910 [ptlrpc] [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0 [<ffffffffa094b3c0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] [<ffffffff8109e78e>] kthread+0x9e/0xc0 [<ffffffff8100c28a>] child_rip+0xa/0x20 [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 [<ffffffff8100c280>] ? child_rip+0x0/0x20 LustreError: dumping log to /tmp/lustre-log.1453949036.15397 Pid: 15443, comm: ll_ost00_065
--> attached this debug log file (/tmp/lustre-log.1453949036.15397)
- Jan 27 18:45 oom-killer started on OSS node lola-5 + crash 3 mins later
- Memory exhausted by slab 'size-1048576' with ~ 27GB
(see archive: lola-5-oom-killer-2.tar.bz2) - Jan 28 03:59 - lfsck - command still in not finished (see mds-lfsck-status-nslayout.log.bz2, mds-lfsck-status-oi.log.bz2, oss-lfsck-status.log.bz2)
Attachments
Issue Links
- duplicates
-
LU-6923 writing process hung at txg_wait_open
- Open