Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10818

mds-survey test 2 hangs with “ASSERTION( ma->ma_need & (MA_LOV | MA_LMV) ) failed”

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.0, Lustre 2.10.7
    • Lustre 2.11.0, Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      mds-survey test_2 hangs. The last thing we see in the test output is

      == mds-survey test 2: Metadata survey with stripe_count = 1 ========================================== 14:28:15 (1517437695)
      CMD: trevis-12vm12 /usr/sbin/lctl dl
      + file_count=94322 thrlo=1 thrhi=8 dir_count=4 layer=mdd stripe_count=1 rslt_loc=/tmp targets="trevis-12vm12:lustre-MDT0000" /usr/bin/mds-survey
      Wed Jan 31 14:28:17 PST 2018 /usr/bin/mds-survey from trevis-12vm9
      mdt 1 file   94322 dir    4 thr    4 create 11863.49 [    0.00, 23978.87] lookup 372653.14 [ 372653.14, 372653.14] md_getattr 
      

       

      There are several examples of this hang in Maloo, but many of the call traces seem incomplete. Looking at the MDS console log for the test suite logs at https://testing.hpdd.intel.com/test_sets/00a4d694-2252-11e8-a4b1-52540065bddc, we see

      [55686.709160] Lustre: DEBUG MARKER: == mds-survey test 2: Metadata survey with stripe_count = 1 ========================================== 14:28:15 (1517437695)
      [55686.770937] Lustre: DEBUG MARKER: /usr/sbin/lctl dl
      [55687.328593] Lustre: Echo OBD driver; http://www.lustre.org/
      [55688.487749] LustreError: 23896:0:(echo_client.c:1795:echo_md_lookup()) lookup MDT0000-tests: rc = -2
      [55688.487759] LustreError: 23896:0:(echo_client.c:2027:echo_md_destroy_internal()) Can't find child MDT0000-tests: rc = -2
      [55689.008253] LustreError: 24007:0:(echo_client.c:1795:echo_md_lookup()) lookup MDT0000-tests3: rc = -2
      [55689.008264] LustreError: 24007:0:(echo_client.c:1795:echo_md_lookup()) Skipped 2 previous similar messages
      [55689.008267] LustreError: 24007:0:(echo_client.c:2027:echo_md_destroy_internal()) Can't find child MDT0000-tests3: rc = -2
      [55689.008268] LustreError: 24007:0:(echo_client.c:2027:echo_md_destroy_internal()) Skipped 2 previous similar messages
      [55699.028489] LustreError: 24353:0:(echo_client.c:1397:echo_big_lmm_get()) ASSERTION( ma->ma_need & (MA_LOV | MA_LMV) ) failed: 
      [55699.032133] LustreError: 24355:0:(echo_client.c:1397:echo_big_lmm_get()) ASSERTION( ma->ma_need & (MA_LOV | MA_LMV) ) failed: 
      [55699.032134] LustreError: 24355:0:(echo_client.c:1397:echo_big_lmm_get()) LBUG
      [55699.032135] Pid: 24355, comm: lctl
      [55699.032135] 
      [55699.032135] Call Trace:
      [55699.043010] LustreError: 24353:0:(echo_client.c:1397:echo_big_lmm_get()) LBUG
      [55699.044901]  [<ffffffff81019b19>] dump_trace+0x59/0x310
      [55699.046838] Pid: 24353, comm: lctl
      [55699.048565] 
      [55699.048565] Call Trace:
      [55699.051619]  [<ffffffff81019b19>] dump_trace+0x59/0x310
      [55699.082944]  [<ffffffffa08616ca>] libcfs_call_trace+0x4a/0x60 [libcfs]
      [55699.082947]  [<ffffffffa08616ca>] libcfs_call_trace+0x4a/0x60 [libcfs]
      [55699.091042]  [<ffffffffa0861741>] lbug_with_loc+0x41/0xa0 [libcfs]
      [55699.091043]  [<ffffffffa0861741>] lbug_with_loc+0x41/0xa0 [libcfs]
      [55699.091056]  [<ffffffffa0833397>] echo_big_lmm_get+0x637/0x7a0 [obdecho]
      [55699.091076]  [<ffffffffa0834a68>] echo_attr_get_complex+0x518/0x6b0 [obdecho]
      [55699.091085]  [<ffffffffa0838dee>] echo_md_handler.isra.45+0x1a3e/0x2b30 [obdecho]
      [55699.091092]  [<ffffffffa083aec9>] echo_client_iocontrol+0xfe9/0x1ab0 [obdecho]
      [55699.101840]  [<ffffffffa0833397>] echo_big_lmm_get+0x637/0x7a0 [obdecho]
      [55699.103625]  [<ffffffffa0834a68>] echo_attr_get_complex+0x518/0x6b0 [obdecho]
      [55699.105445]  [<ffffffffa0838dee>] echo_md_handler.isra.45+0x1a3e/0x2b30 [obdecho]
      [55699.107243]  [<ffffffffa083aec9>] echo_client_iocontrol+0xfe9/0x1ab0 [obdecho]
      [55699.135047]  [<ffffffffa0bc9ed2>] class_handle_ioctl+0x1822/0x1d20 [obdclass]
      [55699.135049]  [<ffffffffa0bc9ed2>] class_handle_ioctl+0x1822/0x1d20 [obdclass]
      [55699.135114]  [<ffffffffa0bb054a>] obd_class_ioctl+0xba/0x150 [obdclass]
      [55699.140337]  [<ffffffffa0bb054a>] obd_class_ioctl+0xba/0x150 [obdclass]
      [55699.146489]  [<ffffffff81219f7d>] do_vfs_ioctl+0x2cd/0x4a0
      [55699.146489]  [<ffffffff81219f7d>] do_vfs_ioctl+0x2cd/0x4a0
      [55699.169646]  [<ffffffff8121a1c4>] SyS_ioctl+0x74/0x80
      [55699.169647]  [<ffffffff8121a1c4>] SyS_ioctl+0x74/0x80
      [55699.169684]  [<ffffffff8160d1be>] entry_SYSCALL_64_fastpath+0x12/0xaa
      [55699.174486]  [<ffffffff8160d1be>] entry_SYSCALL_64_fastpath+0x12/0xaa
      [55699.196143] (null)
      [55699.196146] (null)DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x12/0xaa
      [55699.196147] 
      [55699.196147] (null)Leftover inexact backtrace:
      [55699.196147] 
      [55699.196156] 
      [55699.196164] Kernel panic - not syncing: LBUG
      [55699.196172] CPU: 1 PID: 24355 Comm: lctl Tainted: G           OE   N  4.4.103-6.38_lustre-default #1
      [55699.196173] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      

       

      This test started failing on 2018-01-31. 

      Here are a few of the logs for failures:

      https://testing.hpdd.intel.com/test_sets/42ff8372-071d-11e8-a6ad-52540065bddc

      https://testing.hpdd.intel.com/test_sets/110bab20-18a2-11e8-bd00-52540065bddc

      https://testing.hpdd.intel.com/test_sets/ff52db72-2762-11e8-b74b-52540065bddc 

      Attachments

        Issue Links

          Activity

            [LU-10818] mds-survey test 2 hangs with “ASSERTION( ma->ma_need & (MA_LOV | MA_LMV) ) failed”

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33976/
            Subject: LU-10818 obdecho: don't set ma_need in echo_attr_get_complex()
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 0920f54a0866f77b49afd2308b798d4db3b69802

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33976/ Subject: LU-10818 obdecho: don't set ma_need in echo_attr_get_complex() Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 0920f54a0866f77b49afd2308b798d4db3b69802

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33976
            Subject: LU-10818 obdecho: don't set ma_need in echo_attr_get_complex()
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 4d2a41a71ceb306b09609be41ac4d96e39f07b9d

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33976 Subject: LU-10818 obdecho: don't set ma_need in echo_attr_get_complex() Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 4d2a41a71ceb306b09609be41ac4d96e39f07b9d
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33097/
            Subject: LU-10818 obdecho: don't set ma_need in echo_attr_get_complex()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 40f70cd4cb1bb33c754607862dece7c6c1c30d38

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33097/ Subject: LU-10818 obdecho: don't set ma_need in echo_attr_get_complex() Project: fs/lustre-release Branch: master Current Patch Set: Commit: 40f70cd4cb1bb33c754607862dece7c6c1c30d38

            I was meaning to upload this patch we had for this issue, but I have been away from work, lately. We may not want to land it since Lai has already posted a patch, but I am uploading it since it has some additional code rework that we might want to merge into Lai's patch.

            nangelinas Nikitas Angelinas added a comment - I was meaning to upload this patch we had for this issue, but I have been away from work, lately. We may not want to land it since Lai has already posted a patch, but I am uploading it since it has some additional code rework that we might want to merge into Lai's patch.

            Nikitas Angelinas (nangelinas@cray.com) uploaded a new patch: https://review.whamcloud.com/33097
            Subject: LU-10818 obdecho: don't set ma_need in echo_attr_get_complex()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2b29f356a3b2696874eeb83a886fc693166cb7a4

            gerrit Gerrit Updater added a comment - Nikitas Angelinas (nangelinas@cray.com) uploaded a new patch: https://review.whamcloud.com/33097 Subject: LU-10818 obdecho: don't set ma_need in echo_attr_get_complex() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2b29f356a3b2696874eeb83a886fc693166cb7a4

            Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33092
            Subject: LU-10818 obdecho: ma_need is mistakenly set
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: db71e8411535635cadd9818ccbc87f8794b22835

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33092 Subject: LU-10818 obdecho: ma_need is mistakenly set Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: db71e8411535635cadd9818ccbc87f8794b22835
            pjones Peter Jones added a comment -

            Lai

            Could you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Lai Could you please investigate? Thanks Peter
            sarah Sarah Liu added a comment -
            sarah Sarah Liu added a comment - This failure blocks master testing tag-2.11.51 on EL7 server/client https://testing.hpdd.intel.com/test_sets/d79b14a4-471b-11e8-95c0-52540065bddc on EL7 server/SLES12sp3 client https://testing.hpdd.intel.com/test_sets/7be267e6-4718-11e8-960d-52540065bddc

            People

              laisiyao Lai Siyao
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: