Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6893

recovery-double-scale test_pairwise_fail: mount failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.8.0
    • None
    • server: lustre-master build # 3093 RHEL7.1
      client: SLES11 SP3
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/816ceeca-2623-11e5-92e6-5254006e85c2.

      The sub-test test_pairwise_fail failed with the following error:

      mount /mnt/lustre on onyx-34vm6 failed
      

      OST dmesg

      [ 2212.173927] Lustre: DEBUG MARKER: ==== Checking the clients loads BEFORE failover -- failure NOT OK
      [ 2212.998458] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Done checking client loads. Failing type1=clients item1=onyx-34vm5,onyx-34vm6 ... 
      [ 2213.336608] Lustre: DEBUG MARKER: Done checking client loads. Failing type1=clients item1=onyx-34vm5,onyx-34vm6 ...
      [ 2253.919179] LNet: Service thread pid 4319 was inactive for 40.03s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [ 2253.926365] Pid: 4319, comm: ll_ost_io00_067
      [ 2253.928076] 
      Call Trace:
      [ 2253.931101]  [<ffffffff8160a409>] schedule+0x29/0x70
      [ 2253.933951]  [<ffffffff816082b5>] schedule_timeout+0x175/0x2d0
      [ 2253.936095]  [<ffffffffa081e3aa>] ? ptlrpc_start_bulk_transfer+0x16a/0x710 [ptlrpc]
      [ 2253.938049]  [<ffffffff8107ee80>] ? process_timeout+0x0/0x10
      [ 2253.940092]  [<ffffffffa07e2cae>] target_bulk_io+0x4de/0xb00 [ptlrpc]
      [ 2253.941956]  [<ffffffff810a9650>] ? default_wake_function+0x0/0x20
      [ 2253.944070]  [<ffffffffa088f941>] tgt_brw_write+0x10b1/0x1650 [ptlrpc]
      [ 2253.945900]  [<ffffffff812dfbab>] ? string.isra.6+0x3b/0xf0
      [ 2253.947956]  [<ffffffffa07e01f0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
      [ 2253.949879]  [<ffffffffa088b29b>] tgt_request_handle+0x88b/0x1100 [ptlrpc]
      [ 2253.952132]  [<ffffffffa0832fbb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      [ 2253.954241]  [<ffffffffa0830078>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [ 2253.956362]  [<ffffffffa0836900>] ptlrpc_main+0xc00/0x1f60 [ptlrpc]
      [ 2253.958180]  [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40
      [ 2253.960260]  [<ffffffffa0835d00>] ? ptlrpc_main+0x0/0x1f60 [ptlrpc]
      [ 2253.962104]  [<ffffffff8109739f>] kthread+0xcf/0xe0
      [ 2253.965627]  [<ffffffff810972d0>] ? kthread+0x0/0xe0
      [ 2253.967444]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
      [ 2253.969266]  [<ffffffff810972d0>] ? kthread+0x0/0xe0
      
      [ 2253.972575] LustreError: dumping log to /tmp/lustre-log.1436386922.4319
      [ 2254.275273] Pid: 4320, comm: ll_ost_io00_068
      [ 2254.279470] 
      Call Trace:
      [ 2254.285875]  [<ffffffff8160a409>] schedule+0x29/0x70
      [ 2254.287722]  [<ffffffff816082b5>] schedule_timeout+0x175/0x2d0
      [ 2254.289620]  [<ffffffffa081e3aa>] ? ptlrpc_start_bulk_transfer+0x16a/0x710 [ptlrpc]
      [ 2254.291530]  [<ffffffff8107ee80>] ? process_timeout+0x0/0x10
      [ 2254.293318]  [<ffffffffa07e2cae>] target_bulk_io+0x4de/0xb00 [ptlrpc]
      [ 2254.295041]  [<ffffffff810a9650>] ? default_wake_function+0x0/0x20
      [ 2254.296794]  [<ffffffffa088f941>] tgt_brw_write+0x10b1/0x1650 [ptlrpc]
      [ 2254.298482]  [<ffffffff812dfbab>] ? string.isra.6+0x3b/0xf0
      [ 2254.300158]  [<ffffffffa07e01f0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
      [ 2254.301808]  [<ffffffffa088b29b>] tgt_request_handle+0x88b/0x1100 [ptlrpc]
      [ 2254.303486]  [<ffffffffa0832fbb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      [ 2254.305306]  [<ffffffffa0830078>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [ 2254.306949]  [<ffffffffa0836900>] ptlrpc_main+0xc00/0x1f60 [ptlrpc]
      [ 2254.308571]  [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40
      [ 2254.310115]  [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0
      [ 2254.311663]  [<ffffffffa0835d00>] ? ptlrpc_main+0x0/0x1f60 [ptlrpc]
      [ 2254.313179]  [<ffffffff8109739f>] kthread+0xcf/0xe0
      [ 2254.314588]  [<ffffffff810972d0>] ? kthread+0x0/0xe0
      [ 2254.315973]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
      [ 2254.317357]  [<ffffffff810972d0>] ? kthread+0x0/0xe0
      
      [ 2254.319694] Pid: 4330, comm: ll_ost_io00_071
      [ 2254.320938] 
      Call Trace:
      [ 2254.323020]  [<ffffffff8160a409>] schedule+0x29/0x70
      [ 2254.324327]  [<ffffffff816082b5>] schedule_timeout+0x175/0x2d0
      [ 2254.325717]  [<ffffffffa081e3aa>] ? ptlrpc_start_bulk_transfer+0x16a/0x710 [ptlrpc]
      [ 2254.327192]  [<ffffffff8107ee80>] ? process_timeout+0x0/0x10
      [ 2254.328564]  [<ffffffffa07e2cae>] target_bulk_io+0x4de/0xb00 [ptlrpc]
      [ 2254.329942]  [<ffffffff810a9650>] ? default_wake_function+0x0/0x20
      [ 2254.331377]  [<ffffffffa088f941>] tgt_brw_write+0x10b1/0x1650 [ptlrpc]
      [ 2254.332768]  [<ffffffff812dfbab>] ? string.isra.6+0x3b/0xf0
      [ 2254.334173]  [<ffffffffa07e01f0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
      [ 2254.335613]  [<ffffffffa088b29b>] tgt_request_handle+0x88b/0x1100 [ptlrpc]
      [ 2254.337071]  [<ffffffffa0832fbb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      [ 2254.338680]  [<ffffffffa0830078>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [ 2254.340149]  [<ffffffffa0836900>] ptlrpc_main+0xc00/0x1f60 [ptlrpc]
      [ 2254.341516]  [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40
      [ 2254.342882]  [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0
      [ 2254.344241]  [<ffffffffa0835d00>] ? ptlrpc_main+0x0/0x1f60 [ptlrpc]
      [ 2254.345619]  [<ffffffff8109739f>] kthread+0xcf/0xe0
      [ 2254.346921]  [<ffffffff810972d0>] ? kthread+0x0/0xe0
      [ 2254.348223]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
      [ 2254.349523]  [<ffffffff810972d0>] ? kthread+0x0/0xe0
      
      [ 2254.351850] Pid: 4257, comm: ll_ost_io00_039
      [ 2254.353101] 
      Call Trace:
      [ 2254.355139]  [<ffffffff8160a409>] schedule+0x29/0x70
      [ 2254.356389]  [<ffffffff816082b5>] schedule_timeout+0x175/0x2d0
      [ 2254.357738]  [<ffffffffa081e3aa>] ? ptlrpc_start_bulk_transfer+0x16a/0x710 [ptlrpc]
      [ 2254.359179]  [<ffffffff8107ee80>] ? process_timeout+0x0/0x10
      [ 2254.360681]  [<ffffffffa07e2cae>] target_bulk_io+0x4de/0xb00 [ptlrpc]
      [ 2254.362039]  [<ffffffff810a9650>] ? default_wake_function+0x0/0x20
      [ 2254.363468]  [<ffffffffa088f941>] tgt_brw_write+0x10b1/0x1650 [ptlrpc]
      [ 2254.364833]  [<ffffffff812dfbab>] ? string.isra.6+0x3b/0xf0
      [ 2254.366180]  [<ffffffffa07e01f0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
      [ 2254.367586]  [<ffffffffa088b29b>] tgt_request_handle+0x88b/0x1100 [ptlrpc]
      [ 2254.369024]  [<ffffffffa0832fbb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      [ 2254.370488]  [<ffffffffa0830078>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [ 2254.371916]  [<ffffffffa0836900>] ptlrpc_main+0xc00/0x1f60 [ptlrpc]
      [ 2254.373266]  [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40
      [ 2254.374639]  [<ffffffffa0835d00>] ? ptlrpc_main+0x0/0x1f60 [ptlrpc]
      [ 2254.375995]  [<ffffffff8109739f>] kthread+0xcf/0xe0
      [ 2254.377272]  [<ffffffff810972d0>] ? kthread+0x0/0xe0
      [ 2254.378521]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
      [ 2254.379843]  [<ffffffff810972d0>] ? kthread+0x0/0xe0
      
      [ 2254.382140] LNet: Service thread pid 2840 was inactive for 40.17s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [ 2255.071164] LNet: Service thread pid 4267 was inactive for 40.03s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [ 2255.076671] LNet: Skipped 7 previous similar messages
      [ 2255.078304] LustreError: dumping log to /tmp/lustre-log.1436386923.4267
      [ 2264.610805] Lustre: lustre-OST0001: haven't heard from client 2d4be017-9a1c-7408-76c8-bc0239710d98 (at 10.2.4.133@tcp) in 49 seconds. I think it's dead, and I am evicting it. exp ffff88004084b000, cur 1436386933 expire 1436386903 last 1436386884
      [ 2264.617306] Lustre: Skipped 6 previous similar messages
      [ 2266.890151] LustreError: 4269:0:(ldlm_lib.c:3077:target_bulk_io()) @@@ Eviction on bulk WRITE  req@ffff880051723000 x1506160196432764/t0(0) o4->2d4be017-9a1c-7408-76c8-bc0239710d98@10.2.4.133@tcp:199/0 lens 608/448 e 2 to 0 dl 1436386944 ref 1 fl Interpret:/0/0 rc 0/0
      [ 2266.905189] Lustre: lustre-OST0003: Bulk IO write error with 2d4be017-9a1c-7408-76c8-bc0239710d98 (at 10.2.4.133@tcp), client will retry: rc -107
      

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: