Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7490

out_tx_write_exec()) LBUG

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • Lustre 2.8.0
    • lola
      build: 2.7.63-4-gf84e06e, a7eface85ea2d2aa6198681264b082a0244855d4 + patches
    • 3
    • 9223372036854775807

    Description

      The error occurred during soak testing of master branch build '20151122' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20151122). DNE is enabled. MDSes are configured in active-active failover configuration.

      Sequence of events:

      • 2015-11-26 10:32 Failover resources (mdt-0,1) lola-8 --> lola-9 started
      • 2015-11-26 11:40 Failback resources (mdt-0.1) lola-9 --> lola-8 completed successful
      • 2015-11-26 11:44 LBUG on lola-8. See the following message.
      Nov 26 11:44:54 lola-8 kernel: LustreError: 8491:0:(out_lib.c:692:out_tx_write_exec()) read record [0x240089779:0x1:0x0] tail_pos 173122472 rc -53 index 50635 size 172659608
      Nov 26 11:44:54 lola-8 kernel: LustreError: 8491:0:(out_lib.c:693:out_tx_write_exec()) LBUG
      Nov 26 11:44:54 lola-8 kernel: Pid: 8491, comm: mdt_out03_004
      Nov 26 11:44:54 lola-8 kernel: 
      Nov 26 11:44:54 lola-8 kernel: Call Trace:
      Nov 26 11:44:54 lola-8 kernel: [<ffffffffa07fb875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      Nov 26 11:44:54 lola-8 kernel: [<ffffffffa07fbe77>] lbug_with_loc+0x47/0xb0 [libcfs]
      Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bb60a0>] out_tx_write_exec+0x500/0x7a0 [ptlrpc]
      Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bb934b>] ? out_tx_xattr_set_exec+0xeb/0x680 [ptlrpc]
      Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bae13a>] out_tx_end+0xda/0x5d0 [ptlrpc]
      Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bb3726>] out_handle+0xbd6/0x1890 [ptlrpc]
      Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0afa4e0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
      Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0baae1c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
      Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0b52711>] ptlrpc_main+0xe41/0x1910 [ptlrpc]
      Nov 26 11:44:54 lola-8 kernel: [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0
      Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0b518d0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
      Nov 26 11:44:54 lola-8 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0
      Nov 26 11:44:54 lola-8 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
      Nov 26 11:44:54 lola-8 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
      Nov 26 11:44:54 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
      Nov 26 11:44:54 lola-8 kernel: 
      Nov 26 11:44:54 lola-8 kernel: LustreError: dumping log to /tmp/lustre-log.1448567093.8491
      

      Most likely this event is related to LU-7488 which happened almost at the same time on the HA failover partner (lola-9)

      Attached console and messages log file of MDS (lola-8), kernel debug log file mentioned in the LBUG error message and error messages extracted from Lustre client nodes messages files that showed up at the same time.

      Attachments

        1. messages-lola-8.log.bz2
          503 kB
          Frank Heckes
        2. lustre-log.1448567093.8491.bz2
          53 kB
          Frank Heckes
        3. lola-8-lbug-client-messages.txt.bz2
          3 kB
          Frank Heckes
        4. console-lola-8.log.bz2
          290 kB
          Frank Heckes

        Issue Links

          Activity

            People

              di.wang Di Wang
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: