Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.7.0
-
Lustre 2.7.54
SPL/ZFS 0.6.4.1-1
TOSS kernel 2.6.32-504.8.1.2chaos.ch5.3.x86_64
-
3
-
9223372036854775807
Description
Running mds-survey on a newly created file system triggers a crash and reboot.
The MDS and OSS nodes are up, lustre is running. Whether the filesystem is mounted on any clients has effect on the problem - it occurs either way. Backend is ZFS. mds-survey using all defaults; no environment variables set to control it.
shell shows almost nothing leading up to the crash:
[root@zwicky-lcrash-mds1:2015-06-24.3]# mds-survey Wed Jun 24 12:35:05 PDT 2015 /usr/bin/mds-survey from zwicky-lcrash-mds1 mdt 1 file 100000 dir 4 thr 4 create
Console output is:
Lustre: Echo OBD driver; http://www.lustre.org/ LustreError: 68263:0:(echo_client.c:1676:echo_md_lookup()) lookup MDT0000-tests: rc = -2 LustreError: 68263:0:(echo_client.c:1875:echo_md_destroy_internal()) Can't find child MDT0000-tests: rc = -2 Lustre: ctl-lcrash-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1106 in_atomic(): 0, irqs_disabled(): 1, pid: 68300, name: lctl Pid: 68300, comm: lctl Tainted: P --------------- 2.6.32-504.16.2.1chaos.ch5.3.x86_64 #1 Call Trace: [<ffffffff8105e6aa>] ? __might_sleep+0xda/0x100 [<ffffffff8104e05b>] ? __do_page_fault+0x10b/0x510 [<ffffffffa07c0683>] ? libcfs_debug_vmsg2+0x5e3/0xbe0 [libcfs] [<ffffffff8153421e>] ? do_page_fault+0x3e/0xa0 [<ffffffff815315d5>] ? page_fault+0x25/0x30 [<ffffffff8105d0e2>] ? task_rq_lock+0x42/0xa0 [<ffffffff81065a3c>] ? try_to_wake_up+0x3c/0x3e0 [<ffffffffa12dd263>] ? echo_object_free+0x2b3/0x460 [obdecho] [<ffffffff81065e35>] ? wake_up_process+0x15/0x20 [<ffffffff8152efb2>] ? __mutex_unlock_slowpath+0x42/0x60 [<ffffffff8152ef2b>] ? mutex_unlock+0x1b/0x20 [<ffffffffa0968051>] ? lu_site_purge+0x411/0x500 [obdclass] [<ffffffffa0968581>] ? lu_object_limit+0x71/0x80 [obdclass] [<ffffffffa09686c0>] ? lu_object_find_try+0x130/0x260 [obdclass] [<ffffffffa09688a1>] ? lu_object_find_at+0xb1/0xe0 [obdclass] [<ffffffffa07bd2b8>] ? libcfs_log_return+0x28/0x40 [libcfs] [<ffffffffa12292f1>] ? mdd_lookup+0x111/0x180 [mdd] [<ffffffffa12dea33>] ? echo_md_create_internal+0x153/0x640 [obdecho] [<ffffffffa12e8bb2>] ? echo_md_handler+0x1302/0x1860 [obdecho] [<ffffffffa12ea98c>] ? echo_client_iocontrol+0x187c/0x29e0 [obdecho] [<ffffffff8113ca91>] ? lru_cache_add_lru+0x21/0x40 [<ffffffff8115b2fd>] ? page_add_new_anon_rmap+0x9d/0xf0 [<ffffffff81176e8c>] ? __kmalloc+0x22c/0x240 [<ffffffffa093131c>] ? class_handle_ioctl+0x165c/0x21e0 [obdclass] [<ffffffffa09182ab>] ? obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff811a5882>] ? vfs_ioctl+0x22/0xa0 [<ffffffff811a5ea4>] ? do_vfs_ioctl+0x84/0x5e0 [<ffffffff811a6481>] ? sys_ioctl+0x81/0xa0 [<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b