Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10350

ost-pools test 1n fails with 'failed to write to /mnt/lustre/d1n.ost-pools/file: 1'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.7, Lustre 2.15.0
    • Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.3, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.10.6, Lustre 2.12.1, Lustre 2.12.6
    • None
    • 3
    • 9223372036854775807

    Description

      ost-pools tests 1n, 11, 15, 16, 19 and 22 all fail trying to create/open or write files with the following error message:

      File too large
      

      For example, from the test_log of test_1n

      == ost-pools test 1n: Pool with a 15 char pool name works well ======================================= 10:03:28 (1512554608)
      CMD: trevis-8vm4 lctl pool_new lustre.testpool1234567
      trevis-8vm4: Pool lustre.testpool1234567 created
      CMD: trevis-8vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool1234567 				2>/dev/null || echo foo
      CMD: trevis-8vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool1234567 				2>/dev/null || echo foo
      CMD: trevis-8vm1.trevis.hpdd.intel.com lctl get_param -n lov.lustre-*.pools.testpool1234567 		2>/dev/null || echo foo
      CMD: trevis-8vm1.trevis.hpdd.intel.com lctl get_param -n lov.lustre-*.pools.testpool1234567 		2>/dev/null || echo foo
      CMD: trevis-8vm4 lctl pool_add lustre.testpool1234567 OST0000
      trevis-8vm4: OST lustre-OST0000_UUID added to pool lustre.testpool1234567
      CMD: trevis-8vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool1234567 |
      				sort -u | tr '\n' ' ' 
      CMD: trevis-8vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool1234567 |
      				sort -u | tr '\n' ' ' 
      CMD: trevis-8vm1.trevis.hpdd.intel.com lctl get_param -n lov.lustre-*.pools.testpool1234567 |
      		sort -u | tr '\n' ' ' 
      CMD: trevis-8vm1.trevis.hpdd.intel.com lctl get_param -n lov.lustre-*.pools.testpool1234567 |
      		sort -u | tr '\n' ' ' 
      dd: failed to open '/mnt/lustre/d1n.ost-pools/file': File too large
       ost-pools test_1n: @@@@@@ FAIL: failed to write to /mnt/lustre/d1n.ost-pools/file: 1 
      

      In the dmesg log for the MDS (vm4), we can see a failure

      [18753.542095] Lustre: DEBUG MARKER: == ost-pools test 1n: Pool with a 15 char pool name works well ======================================= 13:37:10 (1512567430)
      [18753.714379] Lustre: DEBUG MARKER: lctl pool_new lustre.testpool1234567
      [18758.015205] Lustre: DEBUG MARKER: lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool1234567 				2>/dev/null || echo foo
      [18758.331296] Lustre: DEBUG MARKER: lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool1234567 				2>/dev/null || echo foo
      [18760.686719] Lustre: DEBUG MARKER: lctl pool_add lustre.testpool1234567 OST0000
      [18766.993199] Lustre: DEBUG MARKER: lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool1234567 |
      				sort -u | tr '\n' ' ' 
      [18767.303867] Lustre: DEBUG MARKER: lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool1234567 |
      				sort -u | tr '\n' ' ' 
      [18768.515291] LustreError: 3750:0:(lod_qos.c:1350:lod_alloc_specific()) can't lstripe objid [0x200029443:0xdaad:0x0]: have 1 want 7
      [18768.704524] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  ost-pools test_1n: @@@@@@ FAIL: failed to write to \/mnt\/lustre\/d1n.ost-pools\/file: 1 
      [18768.896290] Lustre: DEBUG MARKER: ost-pools test_1n: @@@@@@ FAIL: failed to write to /mnt/lustre/d1n.ost-pools/file: 1
      [18769.103049] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /home/autotest/autotest/logs/test_logs/2017-12-05/lustre-master-el7-x86_64--full--1_1_1__3676___6c155f47-820d-447d-893f-15b24418827f/ost-pools.test_1n.debug_log.$(hostname -s).1512567446.log;
               dmesg > /home/autotest/autotest/lo
      

      and similar failures for the other tests. Note: there are 7 OSTs and 1 MDS for the following test suite:
      https://testing.hpdd.intel.com/test_sets/fdd54642-dae4-11e7-8027-52540065bddc

      These ost-pools tests started failing with the ‘File too large’ error on September 27, 2017 with 2.10.52.113.

      Note: So far we are only seeing these failures during 'full' test sessions and not in review-* test sessions.

      Logs for some of the other instances of this failure are at:
      https://testing.hpdd.intel.com/test_sets/da2df238-db44-11e7-9c63-52540065bddc
      https://testing.hpdd.intel.com/test_sets/4fc12420-daa0-11e7-9c63-52540065bddc
      https://testing.hpdd.intel.com/test_sets/307880b4-da7c-11e7-9c63-52540065bddc
      https://testing.hpdd.intel.com/test_sets/0e1cd21c-da73-11e7-8027-52540065bddc
      https://testing.hpdd.intel.com/test_sets/c1f5d0c8-dadb-11e7-9c63-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: