Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2828

conf-sanity test_64 test_59: MDS dt_object.h dt_declare_record_write() ASSERTION( dt != NULL )

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0, Lustre 2.8.0
    • Lustre 2.4.0
    • A patch pushed via git.
    • 3
    • 6849

    Description

      From this test run:
      https://maloo.whamcloud.com/test_sessions/f62dc660-7943-11e2-9cb9-52540035b04c

      The patch being tests is not involved in this area of the code.

      conf-sanity test_64

      Error: 'test failed to respond and timed out'
      Failure Rate: 4.00% of last 100 executions [all branches]

      In the MDS the following is seen:

      09:37:51:Lustre: DEBUG MARKER: == conf-sanity test 64: check lfs df --lazy == 09:37:45 (1361122665)
      09:37:51:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
      09:37:51:Lustre: DEBUG MARKER: test -b /dev/lvm-MDS/P1
      09:37:51:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl  		                   /dev/lvm-MDS/P1 /mnt/mds1
      09:37:51:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 
      09:37:51:Lustre: lustre-MDT0000: used disk, loading
      09:37:51:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
      09:37:51:Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1 2>/dev/null
      09:38:02:Lustre: lustre-OST0000-osc-MDT0000: Connection to lustre-OST0000 (at 10.10.17.34@tcp) was lost; in progress operations using this service will wait for recovery to complete
      09:38:14:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
      09:38:14:Lustre: DEBUG MARKER: umount -d -f /mnt/mds1
      09:38:14:LustreError: 7883:0:(client.c:1048:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88006c4a2c00 x1427239946161272/t0(0) o13->lustre-OST0000-osc-MDT0000@10.10.17.34@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      09:38:14:LustreError: 24649:0:(dt_object.h:979:dt_declare_record_write()) ASSERTION( dt != NULL ) failed: dt is NULL when we want to write record
      09:38:14:LustreError: 24649:0:(dt_object.h:979:dt_declare_record_write()) LBUG
      09:38:14:Pid: 24649, comm: osp-pre-1
      09:38:14:
      09:38:14:Call Trace:
      09:38:14: [<ffffffffa0ee7895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      09:38:14: [<ffffffffa0ee7e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      09:38:14: [<ffffffffa0704ca5>] osp_write_last_oid_seq_files+0x595/0x6a0 [osp]
      09:38:14: [<ffffffffa070918d>] osp_precreate_thread+0x80d/0x1460 [osp]
      09:38:14: [<ffffffffa0708980>] ? osp_precreate_thread+0x0/0x1460 [osp]
      09:38:14: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      09:38:14: [<ffffffffa0708980>] ? osp_precreate_thread+0x0/0x1460 [osp]
      09:38:14: [<ffffffffa0708980>] ? osp_precreate_thread+0x0/0x1460 [osp]
      09:38:14: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      

      Looks like the MDS paniced on unmount.

      Attachments

        Issue Links

          Activity

            [LU-2828] conf-sanity test_64 test_59: MDS dt_object.h dt_declare_record_write() ASSERTION( dt != NULL )

            Patch removing tests 59 and 64 from ALWAYS_EXCEPT list landed to master (pre-2.8).

            jamesanunez James Nunez (Inactive) added a comment - Patch removing tests 59 and 64 from ALWAYS_EXCEPT list landed to master (pre-2.8).

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13757/
            Subject: LU-2828 test: Remove tests from ALWAYS_EXCEPT list
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 26de2a56803ce20f6ae21dac650d598e4335f247

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13757/ Subject: LU-2828 test: Remove tests from ALWAYS_EXCEPT list Project: fs/lustre-release Branch: master Current Patch Set: Commit: 26de2a56803ce20f6ae21dac650d598e4335f247

            Patch to remove tests 59 and 64 from the ALWAYS_EXCEPT list at http://review.whamcloud.com/13757

            jamesanunez James Nunez (Inactive) added a comment - Patch to remove tests 59 and 64 from the ALWAYS_EXCEPT list at http://review.whamcloud.com/13757

            James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/13757
            Subject: LU-2828 test: Remove tests from ALWAYS_EXCEPT list
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 60b129f7f8716937752c5e89e4ad301f29252792

            gerrit Gerrit Updater added a comment - James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/13757 Subject: LU-2828 test: Remove tests from ALWAYS_EXCEPT list Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 60b129f7f8716937752c5e89e4ad301f29252792

            conf-sanity.sh test_59 and test_64 are still being skipped due to this bug.

            adilger Andreas Dilger added a comment - conf-sanity.sh test_59 and test_64 are still being skipped due to this bug.
            pjones Peter Jones added a comment -

            Landed for 2.4

            pjones Peter Jones added a comment - Landed for 2.4
            bobijam Zhenyu Xu added a comment -

            patch tracking at http://review.whamcloud.com/5528

            commit message
            LU-2828 osp: correct osp device finialize order
                
                Should stop osp precreate thread before releasing its last used
                oid/seq files.
            
            bobijam Zhenyu Xu added a comment - patch tracking at http://review.whamcloud.com/5528 commit message LU-2828 osp: correct osp device finialize order Should stop osp precreate thread before releasing its last used oid/seq files.
            sarah Sarah Liu added a comment - another instance seen in ldiskfs: https://maloo.whamcloud.com/test_sets/5368ca28-7e58-11e2-8f4f-52540035b04c
            utopiabound Nathaniel Clark added a comment - - edited

            I've seen it several times in ZFS testing.

            maloo says:

            Failure Rate: 36.00% of last 100 executions [all branches]

            for failures in test_59

            utopiabound Nathaniel Clark added a comment - - edited I've seen it several times in ZFS testing. maloo says: Failure Rate: 36.00% of last 100 executions [all branches] for failures in test_59

            Alex,
            This is coming up regularly in Review runs. Do you have any ideas?

            jlevi Jodi Levi (Inactive) added a comment - Alex, This is coming up regularly in Review runs. Do you have any ideas?

            People

              bobijam Zhenyu Xu
              keith Keith Mannthey (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: