Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.1.0
-
3
-
4682
Description
We had three occurrences of this crash on our classified 2.1 Lustre cluster, all on OSS nodes.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [<ffffffffa0a8e061>] filter_export_stats_init+0x1f1/0x500 [obdfilter]
machine_kexec
crash_kexec
oops_end
no_context
__bad_area_nosemaphore
bad_area_nosemaphore
__do_page_fault
do_page_fault
page_fault
[exception RIP: filter_export_stats_init+497]
filter_reconnect
target_handle_connect
ost_handle
ptlrpc_main
kernel_thread
The timeframe conincided with the ASSERT reported in LU-1085. As in the other bugs we hit during that window, this crash was preceded by hundreds of messages like this:
LustreError: 14210:0:(genops.c:1270:class_disconnect_stale_exports()) ls5-OST0349: disconnect stale client [UUID]@<unknown>
Oleg has suggested that the patch for LU-106 may help here, and we have pulled it into our branch but haven't pushed it out yet.
Attachments
Issue Links
- Trackbacks
-
Changelog 2.1 Changes from version 2.1.1 to version 2.1.2 Server support for kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1....