Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3002

replay-single 90: lfs find does not report the affected lustre-OST000d_UUID for fd

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • Lustre 2.4.0
    • None
    • 3
    • 7315

    Description

      This issue was created by maloo for Li Wei <liwei@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/a27c80dc-9050-11e2-8311-52540035b04c.

      The sub-test test_90 failed with the following error:

      lfs getsripe does not report the affected lustre-OST000d_UUID for fd

      Info required for matching: replay-single 90

      == replay-single test 90: lfs find identifies the missing striped file segments == 18:07:06 (1363655226)
      Create the files
      CMD: c01 lctl get_param -n obdfilter.lustre-OST000d.uuid
      Fail ost14 lustre-OST000d_UUID, display the list of affected files
      CMD: c01 grep -c /mnt/ost14' ' /proc/mounts
      Stopping /mnt/ost14 (opts:) on c01
      CMD: c01 umount -d /mnt/ost14
      CMD: c01 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      General Query: lfs find /mnt/lustre/d0.replay-single/d90
      /mnt/lustre/d0.replay-single/d90
      /mnt/lustre/d0.replay-single/d90/f7
      /mnt/lustre/d0.replay-single/d90/f13
      /mnt/lustre/d0.replay-single/d90/f8
      /mnt/lustre/d0.replay-single/d90/all
      /mnt/lustre/d0.replay-single/d90/f9
      /mnt/lustre/d0.replay-single/d90/f11
      /mnt/lustre/d0.replay-single/d90/f15
      /mnt/lustre/d0.replay-single/d90/f12
      /mnt/lustre/d0.replay-single/d90/f6
      /mnt/lustre/d0.replay-single/d90/f2
      /mnt/lustre/d0.replay-single/d90/f4
      /mnt/lustre/d0.replay-single/d90/f14
      /mnt/lustre/d0.replay-single/d90/f3
      /mnt/lustre/d0.replay-single/d90/f10
      /mnt/lustre/d0.replay-single/d90/f5
      /mnt/lustre/d0.replay-single/d90/f0
      /mnt/lustre/d0.replay-single/d90/f1
      Querying files on shutdown ost14: lfs find --obd lustre-OST000d_UUID
      /mnt/lustre/d0.replay-single/d90/f13
      /mnt/lustre/d0.replay-single/d90/all
       replay-single test_90: @@@@@@ FAIL: lfs find does not report the affected lustre-OST000d_UUID for fd 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:3977:error_noexit()
        = /usr/lib64/lustre/tests/replay-single.sh:2776:test_90()
        = /usr/lib64/lustre/tests/test-framework.sh:4255:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:4288:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:4143:run_test()
        = /usr/lib64/lustre/tests/replay-single.sh:2792:main()
      Dumping lctl log to /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/replay-single.test_90.*.1363655230.log
      CMD: c01,c02,c03,c04,c05,c06,c08,c09 /usr/sbin/lctl dk > /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/replay-single.test_90.debug_log.\$(hostname -s).1363655230.log;
               dmesg > /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/replay-single.test_90.dmesg.\$(hostname -s).1363655230.log
      Check getstripe: /usr/bin/lfs getstripe -r --obd lustre-OST000d_UUID
      
      /mnt/lustre/d0.replay-single/d90/f13
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_layout_gen:     0
      lmm_stripe_offset:  13
      	obdidx		 objid		 objid		 group
      	    13	          3138	        0xc42	             0 *
      
      
      /mnt/lustre/d0.replay-single/d90/all
      lmm_stripe_count:   16
      lmm_stripe_size:    1048576
      lmm_layout_gen:     0
      lmm_stripe_offset:  15
      	obdidx		 objid		 objid		 group
      	    13	          3137	        0xc41	             0 *
      /mnt/lustre/d0.replay-single/d90/all
       replay-single test_90: @@@@@@ FAIL: lfs getsripe does not report the affected lustre-OST000d_UUID for fd 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:3977:error_noexit()
        = /usr/lib64/lustre/tests/replay-single.sh:2787:test_90()
        = /usr/lib64/lustre/tests/test-framework.sh:4255:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:4288:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:4143:run_test()
        = /usr/lib64/lustre/tests/replay-single.sh:2792:main()
      Dumping lctl log to /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/replay-single.test_90.*.1363655235.log
      CMD: c01,c02,c03,c04,c05,c06,c08,c09 /usr/sbin/lctl dk > /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/replay-single.test_90.debug_log.\$(hostname -s).1363655235.log;
               dmesg > /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/replay-single.test_90.dmesg.\$(hostname -s).1363655235.log
      Failover ost14 to c01
      18:07:26 (1363655246) waiting for c01 network 900 secs ...
      18:07:26 (1363655246) network interface is UP
      CMD: c01 hostname
      CMD: c01 test -b /dev/lvm-OSS/P6
      Starting ost14:   /dev/lvm-OSS/P6 /mnt/ost14
      CMD: c01 mkdir -p /mnt/ost14; mount -t lustre   		                   /dev/lvm-OSS/P6 /mnt/ost14
      CMD: c01 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh set_default_debug \"0x33f0404\" \" 0xffb7e3ff\" 32 
      CMD: c01 e2label /dev/lvm-OSS/P6 2>/dev/null
      Started lustre-OST000d
      Resetting fail_loc on all nodes...CMD: c01,c02,c03,c04,c05,c06,c08,c09 lctl set_param -n fail_loc=0 2>/dev/null || true
      done.
      CMD: c01,c02,c03,c04,c05,c06,c08 rc=\$([ -f /proc/sys/lnet/catastrophe ] &&
      		echo \$(< /proc/sys/lnet/catastrophe) || echo 0);
      		if [ \$rc -ne 0 ]; then echo \$(hostname): \$rc; fi
      		exit \$rc
      

      Attachments

        Activity

          [LU-3002] replay-single 90: lfs find does not report the affected lustre-OST000d_UUID for fd

          The patch has landed to master.

          liwei Li Wei (Inactive) added a comment - The patch has landed to master.
          liwei Li Wei (Inactive) added a comment - http://review.whamcloud.com/5796

          This is not due to DNE, but a result of OST 13 being selected by the test. The "d" in "fd" should be converted to "13". I'll submit a patch shortly.

          liwei Li Wei (Inactive) added a comment - This is not due to DNE, but a result of OST 13 being selected by the test. The "d" in "fd" should be converted to "13". I'll submit a patch shortly.

          People

            liwei Li Wei (Inactive)
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: