Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.10.3
-
None
-
3
-
9223372036854775807
Description
Hi,
we tested warble1 hardware all we could for about a week and found no hardware issues. we also replaced sas cards and cables just to be safe.
warble1 now is 3.10.0-693.21.1.el7.x86_64 and zfs 0.7.8 and has these patches applied
usr/src/lustre-2.10.3/lu10212-estale.patch usr/src/lustre-2.10.3/lu10707-ksocklnd-revert-jiffies.patch usr/src/lustre-2.10.3/lu10707-lnet-route-jiffies.patch usr/src/lustre-2.10.3/lu10887-lfsck.patch usr/src/lustre-2.10.3/lu8990-put-root.patch
when the dagg MDT's were mounted on warble1 they COMPLETED ok and then about 5 seconds later it hit an LBUG in lfsck.
... 2018-05-02 22:06:06 [ 2919.828067] Lustre: dagg-MDT0000: Client 22c84389-af1f-9970-0e9b-70c3a4861afd (at 10.8.49.155@tcp201) reconnecting 2018-05-02 22:06:06 [ 2919.828113] Lustre: dagg-MDT0002: Recovery already passed deadline 0:31. If you do not want to wait more, please abort the recovery by force. 2018-05-02 22:06:38 [ 2951.686211] Lustre: dagg-MDT0002: recovery is timed out, evict stale exports 2018-05-02 22:06:38 [ 2951.694197] Lustre: dagg-MDT0002: disconnecting 1 stale clients 2018-05-02 22:06:38 [ 2951.736799] Lustre: 24680:0:(ldlm_lib.c:2544:target_recovery_thread()) too long recovery - read logs 2018-05-02 22:06:38 [ 2951.746774] Lustre: dagg-MDT0002: Recovery over after 6:24, of 125 clients 124 recovered and 1 was evicted. 2018-05-02 22:06:38 [ 2951.746775] LustreError: dumping log to /tmp/lustre-log.1525262798.24680 2018-05-02 22:06:44 [ 2957.910031] LustreError: 33236:0:(dt_object.c:213:dt_mode_to_dft()) LBUG 2018-05-02 22:06:44 [ 2957.917615] Pid: 33236, comm: lfsck_namespace 2018-05-02 22:06:44 [ 2957.922760] 2018-05-02 22:06:44 [ 2957.922760] Call Trace: 2018-05-02 22:06:44 [ 2957.928142] [<ffffffffc06457ae>] libcfs_call_trace+0x4e/0x60 [libcfs] 2018-05-02 22:06:44 [ 2957.935374] [<ffffffffc064583c>] lbug_with_loc+0x4c/0xb0 [libcfs] 2018-05-02 22:06:44 [ 2957.942270] [<ffffffffc0d82573>] dt_mode_to_dft+0x73/0x80 [obdclass] 2018-05-02 22:06:44 [ 2957.949398] [<ffffffffc115ac81>] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck] 2018-05-02 22:06:44 [ 2957.957911] [<ffffffffc0d7ea22>] ? htable_lookup+0x102/0x180 [obdclass] 2018-05-02 22:06:44 [ 2957.965289] [<ffffffffc1186f4a>] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck] 2018-05-02 22:06:44 [ 2957.974129] [<ffffffffc115ce71>] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck] 2018-05-02 22:06:44 [ 2957.983217] [<ffffffff8102954d>] ? __switch_to+0xcd/0x500 2018-05-02 22:06:44 [ 2957.989375] [<ffffffffc114098e>] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck] 2018-05-02 22:06:44 [ 2957.997154] [<ffffffff810cb0b5>] ? sched_clock_cpu+0x85/0xc0 2018-05-02 22:06:44 [ 2958.003538] [<ffffffff8102954d>] ? __switch_to+0xcd/0x500 2018-05-02 22:06:44 [ 2958.009648] [<ffffffff810c7c70>] ? default_wake_function+0x0/0x20 2018-05-02 22:06:44 [ 2958.016449] [<ffffffffc11405c0>] ? lfsck_assistant_engine+0x0/0x20b0 [lfsck] 2018-05-02 22:06:44 [ 2958.024186] [<ffffffff810b4031>] kthread+0xd1/0xe0 2018-05-02 22:06:44 [ 2958.029662] [<ffffffff810b3f60>] ? kthread+0x0/0xe0 2018-05-02 22:06:44 [ 2958.035220] [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0 2018-05-02 22:06:44 [ 2958.041197] [<ffffffff810b3f60>] ? kthread+0x0/0xe0 2018-05-02 22:06:44 [ 2958.046723] 2018-05-02 22:06:44 [ 2958.048771] Kernel panic - not syncing: LBUG 2018-05-02 22:06:44 [ 2958.053576] CPU: 2 PID: 33236 Comm: lfsck_namespace Tainted: P OE ------------ 3.10.0-693.21.1.el7.x86_64 #1 2018-05-02 22:06:44 [ 2958.065051] Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 1.3.7 02/08/2018 2018-05-02 22:06:44 [ 2958.073066] Call Trace: 2018-05-02 22:06:44 [ 2958.076060] [<ffffffff816ae7c8>] dump_stack+0x19/0x1b 2018-05-02 22:06:44 [ 2958.081738] [<ffffffff816a8634>] panic+0xe8/0x21f 2018-05-02 22:06:44 [ 2958.087058] [<ffffffffc0645854>] lbug_with_loc+0x64/0xb0 [libcfs] 2018-05-02 22:06:44 [ 2958.093781] [<ffffffffc0d82573>] dt_mode_to_dft+0x73/0x80 [obdclass] 2018-05-02 22:06:44 [ 2958.100741] [<ffffffffc115ac81>] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck] 2018-05-02 22:06:45 [ 2958.109091] [<ffffffffc0d7ea22>] ? htable_lookup+0x102/0x180 [obdclass] 2018-05-02 22:06:45 [ 2958.116294] [<ffffffffc1186f4a>] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck] 2018-05-02 22:06:45 [ 2958.124963] [<ffffffffc115ce71>] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck] 2018-05-02 22:06:45 [ 2958.133889] [<ffffffff8102954d>] ? __switch_to+0xcd/0x500 2018-05-02 22:06:45 [ 2958.139866] [<ffffffffc114098e>] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck] 2018-05-02 22:06:45 [ 2958.147492] [<ffffffff810cb0b5>] ? sched_clock_cpu+0x85/0xc0 2018-05-02 22:06:45 [ 2958.153724] [<ffffffff8102954d>] ? __switch_to+0xcd/0x500 2018-05-02 22:06:45 [ 2958.159689] [<ffffffff810c7c70>] ? wake_up_state+0x20/0x20 2018-05-02 22:06:45 [ 2958.165742] [<ffffffffc11405c0>] ? lfsck_master_engine+0x1310/0x1310 [lfsck] 2018-05-02 22:06:45 [ 2958.173343] [<ffffffff810b4031>] kthread+0xd1/0xe0 2018-05-02 22:06:45 [ 2958.178685] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 2018-05-02 22:06:45 [ 2958.185227] [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0 2018-05-02 22:06:45 [ 2958.191065] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 2018-05-02 22:06:45 [ 2958.197613] Kernel Offset: disabled
I've failed the MDT's back to warble2 and mounted them by hand with -o skip_lfsck
cheers,
robin
Attachments
Issue Links
- is related to
-
LU-10887 2 MDTs stuck in WAITING
- Resolved