Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4748

sanity test 116b: error: set_param: /proc/{fs,sys}/{lnet,lustre}/17%: Found no match

Details

    • 3
    • 13069

    Description

      While running sanity test 116b with MDSCOUNT=4, it failed as follows:

      == sanity test 116b: QoS shouldn't LBUG if not enough OSTs found on the 2nd pass == 07:33:42 (1394462022)
      CMD: shadow-26vm12 lctl get_param -n lov.*mdtlov*.qos_threshold_rr
      CMD: shadow-26vm12 lctl set_param lov.*mdtlov*.qos_threshold_rr 0
      lov.lustre-MDT0000-mdtlov.qos_threshold_rr=0
      lov.lustre-MDT0001-mdtlov.qos_threshold_rr=0
      lov.lustre-MDT0002-mdtlov.qos_threshold_rr=0
      lov.lustre-MDT0003-mdtlov.qos_threshold_rr=0
      CMD: shadow-26vm12 lctl set_param fail_loc=0x147
      fail_loc=0x147
      total: 20 creates in 0.09 seconds: 228.53 creates/second
      CMD: shadow-26vm12 lctl set_param fail_loc=0
      fail_loc=0
      CMD: shadow-26vm12 lctl set_param lov.*mdtlov*.qos_threshold_rr 17% 17% 17% 17%
      shadow-26vm12: error: set_param: /proc/{fs,sys}/{lnet,lustre}/17%: Found no match
      lov.lustre-MDT0000-mdtlov.qos_threshold_rr=17%
      lov.lustre-MDT0001-mdtlov.qos_threshold_rr=17%
      lov.lustre-MDT0002-mdtlov.qos_threshold_rr=17%
      lov.lustre-MDT0003-mdtlov.qos_threshold_rr=17%
       sanity test_116b: @@@@@@ FAIL: test_116b failed with 3
      

      Maloo report: https://maloo.whamcloud.com/test_sets/9d5c1624-a861-11e3-a16f-52540035b04c

      The same test passed with MDSCOUNT=2.

      Attachments

        Issue Links

          Activity

            [LU-4748] sanity test 116b: error: set_param: /proc/{fs,sys}/{lnet,lustre}/17%: Found no match

            Patch landed to b2_5 for 2.5.2.

            adilger Andreas Dilger added a comment - Patch landed to b2_5 for 2.5.2.

            Patch landed to Master. Additional patch will be back ported soon.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master. Additional patch will be back ported soon.
            emoly.liu Emoly Liu added a comment -

            BTW, I tried MDSCOUNT=2, it really did pass. I think there is something improper in parsing parameters format.

            I created another ticket LU-4762 to adress the parameter format issue in "lctl set_param"

            emoly.liu Emoly Liu added a comment - BTW, I tried MDSCOUNT=2, it really did pass. I think there is something improper in parsing parameters format. I created another ticket LU-4762 to adress the parameter format issue in "lctl set_param"
            emoly.liu Emoly Liu added a comment - backport to b2_5: http://review.whamcloud.com/9636
            emoly.liu Emoly Liu added a comment -

            The patch to fix $old_rr is here: http://review.whamcloud.com/9580

            emoly.liu Emoly Liu added a comment - The patch to fix $old_rr is here: http://review.whamcloud.com/9580
            emoly.liu Emoly Liu added a comment -

            There is a problem in test script when getting $old_rr.

            old_rr=$(do_facet $SINGLEMDS lctl get_param -n lov.*mdtlov*.qos_threshold_rr)
            [root@centos6-1 tests]# ../utils/lctl get_param -n lov.*mdtlov*.qos_threshold_rr
            17%
            17%
            17%
            17%
            [root@centos6-1 tests]# ../utils/lctl get_param lov.*mdtlov*.qos_threshold_rr
            lov.lustre-MDT0000-mdtlov.qos_threshold_rr=17%
            lov.lustre-MDT0001-mdtlov.qos_threshold_rr=17%
            lov.lustre-MDT0002-mdtlov.qos_threshold_rr=17%
            lov.lustre-MDT0003-mdtlov.qos_threshold_rr=17%
            

            Since the value is wrong, when we reset threshold_rr, the error happened.

            BTW, I tried MDSCOUNT=2, it really did pass. I think there is something improper in parsing parameters format.

            I will push a patch to fix $old_rr first, and then work on the parameters format.

            emoly.liu Emoly Liu added a comment - There is a problem in test script when getting $old_rr. old_rr=$(do_facet $SINGLEMDS lctl get_param -n lov.*mdtlov*.qos_threshold_rr) [root@centos6-1 tests]# ../utils/lctl get_param -n lov.*mdtlov*.qos_threshold_rr 17% 17% 17% 17% [root@centos6-1 tests]# ../utils/lctl get_param lov.*mdtlov*.qos_threshold_rr lov.lustre-MDT0000-mdtlov.qos_threshold_rr=17% lov.lustre-MDT0001-mdtlov.qos_threshold_rr=17% lov.lustre-MDT0002-mdtlov.qos_threshold_rr=17% lov.lustre-MDT0003-mdtlov.qos_threshold_rr=17% Since the value is wrong, when we reset threshold_rr, the error happened. BTW, I tried MDSCOUNT=2, it really did pass. I think there is something improper in parsing parameters format. I will push a patch to fix $old_rr first, and then work on the parameters format.
            pjones Peter Jones added a comment -

            Guidance from Di

            "If the failure caused by resetting the original threadhold_rr,
            do_facet $SINGLEMDS lctl set_param lov.mdtlov.qos_threshold_rr $old_rr
            probably we need add true at the end of the test. Though I did not check why "reset threashold_rr" is failed."

            pjones Peter Jones added a comment - Guidance from Di "If the failure caused by resetting the original threadhold_rr, do_facet $SINGLEMDS lctl set_param lov. mdtlov .qos_threshold_rr $old_rr probably we need add true at the end of the test. Though I did not check why "reset threashold_rr" is failed."

            People

              emoly.liu Emoly Liu
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: