Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.7.0
-
None
-
lustre2.7.3 fe
-
3
-
9223372036854775807
Description
OST disk issue required reboot of OSS. This caused MDT threads to hang in lod_qos_prep_create. The MDT required a reboot about 6 hours after the OST recovered.
OST Disk ERRORS
Jun 18 09:56:37 nbp2-oss5 kernel: sd 16:0:0:7: [sdcu] Sense Key : Recovered Error [current] Jun 18 09:56:37 nbp2-oss5 kernel: sd 16:0:0:7: [sdcu] <<vendor>> ASC=0x95 ASCQ=0x1
OSS Rebooted at Jun 18 14:30:00
MDT Errors at OSS reboot time
Jun 18 12:31:12 nbp2-mds kernel: Call Trace: Jun 18 12:31:12 nbp2-mds kernel: [<ffffffff811cb40c>] ? __getblk+0x2c/0x2a0 Jun 18 12:31:12 nbp2-mds kernel: [<ffffffff81584435>] rwsem_down_failed_common+0x95/0x1d0 Jun 18 12:31:12 nbp2-mds kernel: [<ffffffff81584593>] rwsem_down_write_failed+0x23/0x30 Jun 18 12:31:12 nbp2-mds kernel: [<ffffffff812c7fe3>] call_rwsem_down_write_failed+0x13/0x20 Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa11f07c0>] ? lod_declare_object_create+0x0/0x450 [lod] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffff81583a92>] ? down_write+0x32/0x40 Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa11f7065>] lod_qos_prep_create+0xc25/0x1aa0 [lod] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa0f41459>] ? osd_declare_qid+0x289/0x480 [osd_ldiskfs] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa11e8c02>] lod_declare_striped_object+0x162/0x980 [lod] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa0f1b735>] ? osd_declare_object_create+0x1c5/0x340 [osd_ldiskfs] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa11f0a7f>] lod_declare_object_create+0x2bf/0x450 [lod] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa125ad76>] mdd_declare_object_create_internal+0x116/0x340 [mdd] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa125670e>] mdd_create+0x69e/0x1740 [mdd] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa1118348>] mdo_create+0x18/0x50 [mdt] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa11224ff>] mdt_reint_open+0x1f8f/0x2c70 [mdt] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa05d491c>] ? upcall_cache_get_entry+0x29c/0x880 [libcfs] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa110928d>] mdt_reint_rec+0x5d/0x200 [mdt] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa10ece7b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa10ed346>] mdt_intent_reint+0x1f6/0x440 [mdt] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa10eb92e>] mdt_intent_policy+0x4be/0xd10 [mdt] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa09047a7>] ldlm_lock_enqueue+0x127/0xa50 [ptlrpc] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa093055b>] ldlm_handle_enqueue0+0x51b/0x14d0 [ptlrpc] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa09b9eb1>] tgt_enqueue+0x61/0x230 [ptlrpc] Jun 18 12:31:12 nbp2-mds kernel: [<ffffffffa09baece>] tgt_request_handle+0x8be/0x1020 [ptlrpc] Jun 18 12:31:13 nbp2-mds kernel: [<ffffffffa0964ca1>] ptlrpc_main+0xf41/0x1a80 [ptlrpc] Jun 18 12:31:13 nbp2-mds kernel: [<ffffffffa0963d60>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc] Jun 18 12:31:13 nbp2-mds kernel: [<ffffffff810a379e>] kthread+0x9e/0xc0 Jun 18 12:31:13 nbp2-mds kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Jun 18 12:31:13 nbp2-mds kernel: [<ffffffff810a3700>] ? kthread+0x0/0xc0 Jun 18 12:31:13 nbp2-mds kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
MDS rebooted at Jun 18 17:58:59
Backtrace at time of MDS crash is attached.