Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2245

failure on sanity.sh test_101c: Small 4k read IO 1!

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.4.0
    • None
    • server/client are both RHEL6
    • 3
    • 5317

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/34cdc80e-21c2-11e2-b552-52540035b04c.

      The sub-test test_101c failed with the following error:

      Small 4k read IO 1!

      == sanity test 101c: check stripe_size aligned read-ahead =================== 00:07:32 (1351494452)
      CMD: client-21-ib /usr/sbin/lctl set_param -n obdfilter.*.read_cache_enable=0
      client-21-ib: error: set_param: /proc/{fs,sys}/{lnet,lustre}/obdfilter/*/read_cache_enable: Found no match
      CMD: client-21-ib /usr/sbin/lctl set_param -n obdfilter.*.writethrough_cache_enable=0
      client-21-ib: error: set_param: /proc/{fs,sys}/{lnet,lustre}/obdfilter/*/writethrough_cache_enable: Found no match
      osc.lustre-OST0000-osc-ffff8803348f2000.rpc_stats=0
      osc.lustre-OST0001-osc-ffff8803348f2000.rpc_stats=0
      osc.lustre-OST0002-osc-ffff8803348f2000.rpc_stats=0
      osc.lustre-OST0003-osc-ffff8803348f2000.rpc_stats=0
      osc.lustre-OST0004-osc-ffff8803348f2000.rpc_stats=0
      osc.lustre-OST0005-osc-ffff8803348f2000.rpc_stats=0
      osc.lustre-OST0006-osc-ffff8803348f2000.rpc_stats=0
      
      6.364788s, 102.967MB/s
      osc.lustre-OST0000-osc-ffff8803348f2000 rpc check passed!
      osc.lustre-OST0001-osc-ffff8803348f2000 rpc check passed!
       sanity test_101c: @@@@@@ FAIL: Small 4k read IO 1! 
        Trace dump:
      

      Attachments

        Issue Links

          Activity

            [LU-2245] failure on sanity.sh test_101c: Small 4k read IO 1!

            Closing this issue since we haven't seen this failure recently, outside of the previously reported and unrelated issue.

            adilger Andreas Dilger added a comment - Closing this issue since we haven't seen this failure recently, outside of the previously reported and unrelated issue.

            This recent burst of failures was caused by patch 7803 landing.

            adilger Andreas Dilger added a comment - This recent burst of failures was caused by patch 7803 landing.

            I'm hitting this error with different values for the RPCs at https://maloo.whamcloud.com/test_sets/ecfe391a-c41c-11e3-a793-52540035b04c

            Small 32k read IO 240 !
            
            jamesanunez James Nunez (Inactive) added a comment - I'm hitting this error with different values for the RPCs at https://maloo.whamcloud.com/test_sets/ecfe391a-c41c-11e3-a793-52540035b04c Small 32k read IO 240 !

            Found only two failures of test_101c() in the past 4 weeks:
            https://maloo.whamcloud.com/sub_tests/313362c4-6bac-11e2-a7b9-52540035b04c
            https://maloo.whamcloud.com/sub_tests/9e3f4c7a-5d83-11e2-8199-52540035b04c

            Lowered priority, since there are more important bugs to fix for now.

            adilger Andreas Dilger added a comment - Found only two failures of test_101c() in the past 4 weeks: https://maloo.whamcloud.com/sub_tests/313362c4-6bac-11e2-a7b9-52540035b04c https://maloo.whamcloud.com/sub_tests/9e3f4c7a-5d83-11e2-8199-52540035b04c Lowered priority, since there are more important bugs to fix for now.

            I just checked these failures, Sorry for the delay. The test is failed because there are single page RPC during read-ahead. One possible reason might be we do random 64k reading here, if some pages are being evicted because of memory pressure, and these area(64) are being read again, we might see single page RPC here. Probably we should add no-duplicate options in reads, i.e. no area should be read twice even for random read.

            But I am not sure it is the real reason here, since there are not enough debug log here. Is this reproducible ?

            di.wang Di Wang (Inactive) added a comment - I just checked these failures, Sorry for the delay. The test is failed because there are single page RPC during read-ahead. One possible reason might be we do random 64k reading here, if some pages are being evicted because of memory pressure, and these area(64) are being read again, we might see single page RPC here. Probably we should add no-duplicate options in reads, i.e. no area should be read twice even for random read. But I am not sure it is the real reason here, since there are not enough debug log here. Is this reproducible ?
            yujian Jian Yu added a comment - More instances: https://maloo.whamcloud.com/test_sets/313ec4a8-2fe2-11e2-866f-52540035b04c https://maloo.whamcloud.com/test_sets/669d8a72-2fb4-11e2-b30e-52540035b04c

            Wang Di, could you take a look at this?

            liwei Li Wei (Inactive) added a comment - Wang Di, could you take a look at this?
            sarah Sarah Liu added a comment - SLES11 SP2 client also has this issue: https://maloo.whamcloud.com/test_sets/d0072b9c-2534-11e2-9e7c-52540035b04c
            sarah Sarah Liu added a comment - - edited
            {comment removed, unrelated to this bug}
            sarah Sarah Liu added a comment - - edited {comment removed, unrelated to this bug}

            The procfs access problems should be fixed by updating the test to use set_obdfilter_param(), as the parameters in question are now exported by OSDs.

            liwei Li Wei (Inactive) added a comment - The procfs access problems should be fixed by updating the test to use set_obdfilter_param(), as the parameters in question are now exported by OSDs.

            People

              liwei Li Wei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: