Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8611

OST mount crashed: lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 )

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      Error happened during performance testing of lustre-reviews build #41416 on cluster Spirit.
      Configuration reads as:
      1 MDS with single MDT formatted with zfs
      2 OSS with 2 and 3 OST / OSS formatted with zfs
      16 Lustre clients
      Besides executing performance test, purpose is to verify patch for LU-8573.

      Issue might be related to LU-8508. (OST mount problem is reported in this ticket also)

      • MDT/MGT mount completes (completed) successful
      • Mount the first OST fails (reproducible) with the following error message:
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16660:0:(mgc_request.c:253:do_config_log_add()) MGC192.1
        68.1.3@o2ib: failed processing log, type 4: rc = -22
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_u
        pdate()) cannot add nodemap config to non-existing MGS.
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(nodemap_storage.c:1324:nodemap_fs_init()) zfste
        st-OST0000: error loading nodemap config file, file must be removed via ldiskfs: rc = -22
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8808
        0be2b540[0x0, 1, [0x1:0x0:0x0] hash exist]{
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_stora
        ge@ffff88080be2b590
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-zfs@fff
        f880802e12378osd-zfs-object@ffff880802e12378
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff88
        080be2b540
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8808
        0be2ed80[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_stora
        ge@ffff88080be2edd0
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-zfs@fff
        f880035ce6128osd-zfs-object@ffff880035ce6128
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff88
        080be2ed80
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8808
        0be2b480[0x0, 1, [0x200000003:0x2:0x0] hash exist]{
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_stora
        ge@ffff88080be2b4d0
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-zfs@fff
        f880802e124a0osd-zfs-object@ffff880802e124a0
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff88
        080be2b480
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8808
        22f44840[0x0, 1, [0x200000003:0x3:0x0] hash exist]{
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_stora
        ge@ffff880822f44890
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-zfs@fff
        f88081fdc5720osd-zfs-object@ffff88081fdc5720
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff88
        0822f44840
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8808
        0be2bd80[0x0, 1, [0xa:0x0:0x0] hash exist]{
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_stora
        ge@ffff88080be2bdd0
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-zfs@fff
        f880802e12cb8osd-zfs-object@ffff880802e12cb8
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff88
        080be2bd80
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8808
        378be3c0[0x0, 1, [0xa:0x8:0x0] hash exist]{
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_stora
        ge@ffff8808378be410
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-zfs@fff
        f88080be70818osd-zfs-object@ffff88080be70818
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff88
        08378be3c0
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(obd_config.c:578:class_setup()) setup zfstest-O
        ST0000 failed (-22)
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16788:0:(obd_config.c:1671:class_config_llog_handler())
        MGC192.168.1.3@o2ib: cfg command failed: rc = -22
        Sep 14 08:07:17 spirit-aeon-1 kernel: Lustre:    cmd=cf003 0:zfstest-OST0000  1:dev  2:0  3:f
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 15b-f: MGC192.168.1.3@o2ib: The configuration from log '
        zfstest-OST0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versio
        ns of Lustre.
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16660:0:(obd_mount_server.c:1352:server_start_targets())
         failed to start server zfstest-OST0000: -22
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16660:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( a
        tomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
        Sep 14 08:07:17 spirit-aeon-1 kernel: LustreError: 16660:0:(lu_object.c:1243:lu_device_fini()) LBUG
        Sep 14 08:07:17 spirit-aeon-1 kernel: Pid: 16660, comm: mount.lustre
        Sep 14 08:07:17 spirit-aeon-1 kernel: #012Call Trace:
        Sep 14 08:07:17 spirit-aeon-1 kernel: [<ffffffffa0bcb7d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
        Sep 14 08:07:17 spirit-aeon-1 kernel: [<ffffffffa0bcbd75>] lbug_with_loc+0x45/0xc0 [libcfs]
        Sep 14 08:07:17 spirit-aeon-1 kernel: [<ffffffffa0d1ec78>] lu_device_fini+0xb8/0xc0 [obdclass]
        Sep 14 08:07:17 spirit-aeon-1 kernel: [<ffffffffa0d03d72>] ls_device_put+0x82/0x2a0 [obdclass]
        Sep 14 08:07:17 spirit-aeon-1 kernel: [<ffffffffa0d0406d>] local_oid_storage_fini+0xdd/0x210 [obdclass]
        Sep 14 08:07:17 spirit-aeon-1 kernel: [<ffffffffa0fab281>] mgc_set_info_async+0x951/0x1630 [mgc]
        Sep 14 08:07:17 spirit-aeon-1 kernel: [<ffffffffa0d181c9>] ? lustre_process_log+0x9e9/0xc00 [obdclass]
        Sep 14 08:07:17 spirit-aeon-1 kernel: [<ffffffffa0bd6957>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffffa0d42bf4>] server_start_targets+0x794/0x2d20 [obdclass]
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffffa0d1b32d>] ? lustre_start_mgc+0x20d/0x2490 [obdclass]
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffffa0d14030>] ? class_config_llog_handler+0x0/0x1b60 [obdclass]
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffffa0d4620d>] server_fill_super+0x108d/0x184c [obdclass]
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffffa0d1e058>] lustre_fill_super+0x328/0x950 [obdclass]
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffffa0d1dd30>] ? lustre_fill_super+0x0/0x950 [obdclass]
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffff811e235d>] mount_nodev+0x4d/0xb0
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffffa0d15f88>] lustre_mount+0x38/0x60 [obdclass]
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffff811e2d09>] mount_fs+0x39/0x1b0
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffff811fe5df>] vfs_kern_mount+0x5f/0xf0
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffff81200b2e>] do_mount+0x24e/0xa40
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffff8116e30e>] ? __get_free_pages+0xe/0x50
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffff812013b6>] SyS_mount+0x96/0xf0
        Sep 14 08:07:18 spirit-aeon-1 kernel: [<ffffffff81646d89>] system_call_fastpath+0x16/0x1b
        Sep 14 08:07:18 spirit-aeon-1 kernel:
        

        Correlated error message on MDS:

        Sep 14 08:07:17 spirit-3 kernel: LustreError: 140-5: Server zfstest-OST0000 requested index 0, but that index is already in use. Use --writeconf to force
        Sep 14 08:07:17 spirit-3 kernel: LustreError: 95443:0:(mgs_handler.c:531:mgs_target_reg()) Failed to write zfstest-OST0000 log (-98)
        
      • FS has been formatted with help of script framework. No errors occured during reformat and format worked fine
        with EE-3.1 version three days ago. (framework configuration unchanged)
      • One error message on OSS states that lustre version might be different. Double checked the installation state on all nodes to be done with the build specified above.
      • Recorded incidents at 'Sep 13 12:53:17', 'Sep 14 08:07:17'
        No crash dump was written, although panic on LBUG was set.
        Attached files: console,messag logs of affected node (spirit-aeon-1) and MDS (spirit-3)

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: