Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4183

MDS crash when running fsx with NFS export

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.4.1, Lustre 2.5.0
    • None
    • Lustre 2.5.0-RC1, RHEL 6, OpenSFS cluster with one MDS/MGS, two OSSs with two OSTs each, one node with Lustre mounted/NFS server exporting Lustre, three Lustre clients NFS mounting the file system
    • 3
    • 11320

    Description

      Running three instances of fsx writing to an NFS mounted Lustre file system with a single NFS server, the MDS crashes and reboots. From the MDS console:

      Message from syslogd@c03 at Oct 29 13:04:27 ...
       kernel:LustreError: 16448:0:(osd_internal.h:963:osd_trans_exec_op()) ASSERTION( rb < OSD_OT_MAX ) failed: rb = 12
      
      Message from syslogd@c03 at Oct 29 13:04:27 ...
       kernel:LustreError: 16448:0:(osd_internal.h:963:osd_trans_exec_op()) LBUG
      

      From the crash dump:

      crash> bt
      PID: 16448  TASK: ffff880738300040  CPU: 4   COMMAND: "mdt00_005"
       #0 [ffff8806565c1628] machine_kexec at ffffffff81035d6b
       #1 [ffff8806565c1688] crash_kexec at ffffffff810c0e22
       #2 [ffff8806565c1758] panic at ffffffff8150de5f
       #3 [ffff8806565c17d8] lbug_with_loc at ffffffffa04f0eeb [libcfs]
       #4 [ffff8806565c17f8] osd_punch at ffffffffa0d12243 [osd_ldiskfs]
       #5 [ffff8806565c1848] llog_osd_write_blob at ffffffffa0639050 [obdclass]
       #6 [ffff8806565c18b8] llog_osd_write_rec at ffffffffa063d04e [obdclass]
       #7 [ffff8806565c1968] llog_write_rec at ffffffffa0610428 [obdclass]
       #8 [ffff8806565c19c8] llog_cat_add_rec at ffffffffa0619299 [obdclass]
       #9 [ffff8806565c1a38] llog_add at ffffffffa0610221 [obdclass]
      #10 [ffff8806565c1a88] mdd_changelog_store at ffffffffa0f54463 [mdd]
      #11 [ffff8806565c1af8] mdd_changelog_data_store at ffffffffa0f465c6 [mdd]
      #12 [ffff8806565c1b48] mdd_attr_set at ffffffffa0f4c9b1 [mdd]
      #13 [ffff8806565c1bc8] mdt_attr_set at ffffffffa0e1cc28 [mdt]
      #14 [ffff8806565c1c18] mdt_reint_setattr at ffffffffa0e1d4dd [mdt]
      #15 [ffff8806565c1c88] mdt_reint_rec at ffffffffa0e16eb1 [mdt]
      #16 [ffff8806565c1ca8] mdt_reint_internal at ffffffffa0dfec93 [mdt]
      #17 [ffff8806565c1ce8] mdt_reint at ffffffffa0dfef94 [mdt]
      #18 [ffff8806565c1d08] mdt_handle_common at ffffffffa0e01a8a [mdt]
      #19 [ffff8806565c1d58] mds_regular_handle at ffffffffa0e3bc55 [mdt]
      #20 [ffff8806565c1d68] ptlrpc_server_handle_request at ffffffffa07f2e25 [ptlrpc]
      #21 [ffff8806565c1e48] ptlrpc_main at ffffffffa07f418d [ptlrpc]
      #22 [ffff8806565c1ee8] kthread at ffffffff81096a36
      #23 [ffff8806565c1f48] kernel_thread at ffffffff8100c0ca
      

      On the OSTs, osd-ldiskfs.track_declares_assert=1

      From the crash dump, here is the system information:

      crash> sys
            KERNEL: /home/build/kernel/rpmbuild/BUILD/kernel-2.6.32.358.18.1.el6_lustre/vmlinux
          DUMPFILE: /var/crash/127.0.0.1-2013-10-29-13:05:16/vmcore  [PARTIAL DUMP]
              CPUS: 24
              DATE: Tue Oct 29 13:04:30 2013
            UPTIME: 3 days, 22:31:58
      LOAD AVERAGE: 0.17, 0.06, 0.02
             TASKS: 362
          NODENAME: c03
           RELEASE: 2.6.32-358.18.1.el6_lustre.x86_64
           VERSION: #1 SMP Fri Oct 11 16:41:53 PDT 2013
           MACHINE: x86_64  (2399 Mhz)
            MEMORY: 32 GB
             PANIC: "Kernel panic - not syncing: LBUG"
      

      Two of the three fsx commands were:
      ./fsx -c 5 /misc/export/nfs_x_1
      ./fsx /misc/export/nfs_x_2

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: