Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3316

ASSERTION(list_empty(&ls->ls_los_list)) failure on test suite sanity-quota / test_7c

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.5.0, Lustre 2.4.2
    • Lustre 2.4.1
    • None
    • 3
    • 8206

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/570b8f14-b713-11e2-bd0f-52540035b04c.

      The sub-test test_7c failed with the following error:

      test failed to respond and timed out

      Info required for matching: sanity-quota 7c

      Console log from mds:

      02:42:32:Lustre: DEBUG MARKER: == sanity-quota test 7c: Quota reintegration (restart mds during reintegration) == 02:41:58 (1367919718)
      02:42:32:Lustre: DEBUG MARKER: lctl get_param -n osc.*MDT*.sync_*
      02:42:32:Lustre: DEBUG MARKER: lctl set_param fail_val=0
      02:42:32:Lustre: DEBUG MARKER: lctl set_param fail_loc=0
      02:42:32:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=none
      02:42:32:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug
      02:42:32:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
      02:42:32:Lustre: DEBUG MARKER: umount -d /mnt/mds1
      02:42:32:Lustre: Failing over lustre-MDT0000
      02:42:32:Lustre: Skipped 1 previous similar message
      02:42:32:LustreError: 21170:0:(local_storage.c:184:ls_device_put()) ASSERTION( list_empty(&ls->ls_los_list) ) failed: 
      02:42:32:LustreError: 21170:0:(local_storage.c:184:ls_device_put()) LBUG
      02:42:32:Pid: 21170, comm: umount
      02:42:32:
      02:42:32:Call Trace:
      02:42:32: [<ffffffffa0590895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      02:42:32: [<ffffffffa0590e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      02:42:32: [<ffffffffa06e4859>] ls_device_put+0x1a9/0x1e0 [obdclass]
      02:42:32: [<ffffffffa06dc6a5>] llog_osd_cleanup+0xc5/0x140 [obdclass]
      02:42:32: [<ffffffffa06b772a>] __llog_ctxt_put+0xca/0x140 [obdclass]
      02:42:32: [<ffffffffa06b7854>] llog_cleanup+0xb4/0x440 [obdclass]
      02:42:32: [<ffffffffa06d0f31>] ? lprocfs_remove+0x31/0x40 [obdclass]
      02:42:32: [<ffffffffa06d13ed>] ? lprocfs_obd_cleanup+0x5d/0xb0 [obdclass]
      02:42:32: [<ffffffffa0cd7ad5>] mgs_device_fini+0x1c5/0x5a0 [mgs]
      02:42:32: [<ffffffffa06f1907>] class_cleanup+0x577/0xda0 [obdclass]
      02:42:32: [<ffffffffa06c6ac6>] ? class_name2dev+0x56/0xe0 [obdclass]
      02:42:32: [<ffffffffa06f31ec>] class_process_config+0x10bc/0x1c80 [obdclass]
      02:42:32: [<ffffffffa06eca13>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
      02:42:32: [<ffffffffa06f3f29>] class_manual_cleanup+0x179/0x6f0 [obdclass]
      02:42:32: [<ffffffffa06c6ac6>] ? class_name2dev+0x56/0xe0 [obdclass]
      02:42:32: [<ffffffffa072961d>] server_put_super+0x46d/0xf00 [obdclass]
      02:42:32: [<ffffffff8118334b>] generic_shutdown_super+0x5b/0xe0
      02:42:32: [<ffffffff81183436>] kill_anon_super+0x16/0x60
      02:42:32: [<ffffffffa06f5d86>] lustre_kill_super+0x36/0x60 [obdclass]
      02:42:32: [<ffffffff81183bd7>] deactivate_super+0x57/0x80
      02:42:32: [<ffffffff811a1c4f>] mntput_no_expire+0xbf/0x110
      02:42:32: [<ffffffff811a26bb>] sys_umount+0x7b/0x3a0
      02:42:32: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Issue Links

          Activity

            [LU-3316] ASSERTION(list_empty(&ls->ls_los_list)) failure on test suite sanity-quota / test_7c
            yujian Jian Yu added a comment -

            Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/59/
            Distro/Arch: RHEL6.4/x86_64

            sanity-scrub test 0 also hit this failure:
            https://maloo.whamcloud.com/test_sets/28af10f4-5aed-11e3-85e2-52540035b04c

            Just back-ported the patch to Lustre b2_4 branch: http://review.whamcloud.com/8461

            yujian Jian Yu added a comment - Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/59/ Distro/Arch: RHEL6.4/x86_64 sanity-scrub test 0 also hit this failure: https://maloo.whamcloud.com/test_sets/28af10f4-5aed-11e3-85e2-52540035b04c Just back-ported the patch to Lustre b2_4 branch: http://review.whamcloud.com/8461

            patch was landed

            tappro Mikhail Pershin added a comment - patch was landed

            another: https://maloo.whamcloud.com/test_sets/c52a6b34-04e6-11e3-b035-52540035b04c

            This test set was on o2ib, not tcp. I wonder if that is significant.

            bogl Bob Glossman (Inactive) added a comment - another: https://maloo.whamcloud.com/test_sets/c52a6b34-04e6-11e3-b035-52540035b04c This test set was on o2ib, not tcp. I wonder if that is significant.
            bfaccini Bruno Faccini (Inactive) added a comment - +1 at https://maloo.whamcloud.com/test_sets/715bb308-fc05-11e2-9222-52540035b04c
            adilger Andreas Dilger added a comment - Recent failure: https://maloo.whamcloud.com/test_sets/e2388b22-e6d0-11e2-8d9a-52540035b04c

            http://review.whamcloud.com/#change,6334

            The ls_device_put() might be called wrongly if local_oid_storage struct is not removed due to race.

            As for second call traces in comment #1, it doesn't look related.

            tappro Mikhail Pershin added a comment - http://review.whamcloud.com/#change,6334 The ls_device_put() might be called wrongly if local_oid_storage struct is not removed due to race. As for second call traces in comment #1, it doesn't look related.

            Mike,
            Could you please comment on this one?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - Mike, Could you please comment on this one? Thank you!

            I can't find another failure quite like this one but there are others on the same test with a different ASSERTION crash:

            16:28:34:Lustre: DEBUG MARKER: == sanity-quota test 7c: Quota reintegration (restart mds during reintegration) == 16:27:54 (1364858874)
            16:28:34:Lustre: DEBUG MARKER: lctl get_param -n osc.*MDT*.sync_*
            16:28:34:Lustre: DEBUG MARKER: lctl set_param fail_val=0
            16:28:34:Lustre: DEBUG MARKER: lctl set_param fail_loc=0
            16:28:34:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=none
            16:28:34:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug
            16:28:34:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
            16:28:34:Lustre: DEBUG MARKER: umount -d /mnt/mds1
            16:28:34:Lustre: Failing over lustre-MDT0000
            16:28:34:LustreError: 3036:0:(lod_dev.c:813:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: 
            16:28:34:LustreError: 3036:0:(lod_dev.c:813:lod_device_free()) LBUG
            16:28:34:Pid: 3036, comm: obd_zombid
            16:28:34:
            16:28:34:Call Trace:
            16:28:34: [<ffffffffa05bd895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            16:28:34: [<ffffffffa05bde97>] lbug_with_loc+0x47/0xb0 [libcfs]
            16:28:34: [<ffffffffa0e434bb>] lod_device_free+0x1eb/0x220 [lod]
            16:28:34: [<ffffffffa0725e4d>] class_decref+0x46d/0x580 [obdclass]
            16:28:34: [<ffffffffa0703399>] obd_zombie_impexp_cull+0x309/0x5d0 [obdclass]
            16:28:34: [<ffffffffa0703725>] obd_zombie_impexp_thread+0xc5/0x1c0 [obdclass]
            16:28:34: [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
            16:28:34: [<ffffffffa0703660>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass]
            16:28:34: [<ffffffff8100c0ca>] child_rip+0xa/0x20
            16:28:34: [<ffffffffa0703660>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass]
            16:28:34: [<ffffffffa0703660>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass]
            16:28:34: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
            

            https://maloo.whamcloud.com/test_sets/4518d960-9d2d-11e2-a280-52540035b04c
            https://maloo.whamcloud.com/test_sets/acbc4838-841c-11e2-b461-52540035b04c
            https://maloo.whamcloud.com/test_sets/f8326724-8189-11e2-9f6b-52540035b04c
            https://maloo.whamcloud.com/test_sets/327878b4-5eb3-11e2-ba27-52540035b04c

            utopiabound Nathaniel Clark added a comment - I can't find another failure quite like this one but there are others on the same test with a different ASSERTION crash: 16:28:34:Lustre: DEBUG MARKER: == sanity-quota test 7c: Quota reintegration (restart mds during reintegration) == 16:27:54 (1364858874) 16:28:34:Lustre: DEBUG MARKER: lctl get_param -n osc.*MDT*.sync_* 16:28:34:Lustre: DEBUG MARKER: lctl set_param fail_val=0 16:28:34:Lustre: DEBUG MARKER: lctl set_param fail_loc=0 16:28:34:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=none 16:28:34:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug 16:28:34:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 16:28:34:Lustre: DEBUG MARKER: umount -d /mnt/mds1 16:28:34:Lustre: Failing over lustre-MDT0000 16:28:34:LustreError: 3036:0:(lod_dev.c:813:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: 16:28:34:LustreError: 3036:0:(lod_dev.c:813:lod_device_free()) LBUG 16:28:34:Pid: 3036, comm: obd_zombid 16:28:34: 16:28:34:Call Trace: 16:28:34: [<ffffffffa05bd895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 16:28:34: [<ffffffffa05bde97>] lbug_with_loc+0x47/0xb0 [libcfs] 16:28:34: [<ffffffffa0e434bb>] lod_device_free+0x1eb/0x220 [lod] 16:28:34: [<ffffffffa0725e4d>] class_decref+0x46d/0x580 [obdclass] 16:28:34: [<ffffffffa0703399>] obd_zombie_impexp_cull+0x309/0x5d0 [obdclass] 16:28:34: [<ffffffffa0703725>] obd_zombie_impexp_thread+0xc5/0x1c0 [obdclass] 16:28:34: [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20 16:28:34: [<ffffffffa0703660>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] 16:28:34: [<ffffffff8100c0ca>] child_rip+0xa/0x20 16:28:34: [<ffffffffa0703660>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] 16:28:34: [<ffffffffa0703660>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] 16:28:34: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 https://maloo.whamcloud.com/test_sets/4518d960-9d2d-11e2-a280-52540035b04c https://maloo.whamcloud.com/test_sets/acbc4838-841c-11e2-b461-52540035b04c https://maloo.whamcloud.com/test_sets/f8326724-8189-11e2-9f6b-52540035b04c https://maloo.whamcloud.com/test_sets/327878b4-5eb3-11e2-ba27-52540035b04c

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: