Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2720 osc_page_delete()) ASSERTION(0) failed
  3. LU-2788

Sanity test_132 (lu_object.c:1982:lu_ucred_assert()) ASSERTION( uc != ((void *)0) ) failed:

    XMLWordPrintable

Details

    • Technical task
    • Resolution: Cannot Reproduce
    • Major
    • None
    • None
    • None
    • A patched pushed via git.
    • 6754

    Description

      While http://review.whamcloud.com/5222 was being testing and assertion was trigged. So far this this error has not been seen outside of this patch.

      The test is sanity 132

      As seen in the logs for the MDS in all occasions.

      14:06:06:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
      14:06:06:Lustre: DEBUG MARKER: test -b /dev/lvm-MDS/P1
      14:06:06:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl  		                   /dev/lvm-MDS/P1 /mnt/mds1
      14:06:06:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 
      14:06:06:LustreError: 17859:0:(fld_index.c:201:fld_index_create()) srv-lustre-MDT0000: insert range [0x000000000000000c-0x0000000100000000):0:mdt failed: rc = -17
      14:06:06:Lustre: lustre-MDT0000: used disk, loading
      14:06:06:Lustre: 17859:0:(mdt_lproc.c:383:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /usr/sbin/l_getidentity
      14:06:06:Lustre: Increasing default stripe size to min 1048576
      14:06:06:Lustre: Enabling SOM
      14:06:06:LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
      14:06:06:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
      14:06:06:LNet: 18036:0:(debug.c:324:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
      14:06:06:LNet: 18036:0:(debug.c:324:libcfs_debug_str2mask()) Skipped 1 previous similar message
      14:06:06:Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1 2>/dev/null
      14:06:06:Lustre: 3644:0:(client.c:1845:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1360015559/real 1360015559]  req@ffff88002f949800 x1426077333348683/t0(0) o8->lustre-OST0006-osc-MDT0000@10.10.4.195@tcp:28/4 lens 400/544 e 0 to 1 dl 1360015564 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      14:06:17:LustreError: 11-0: lustre-OST0004-osc-MDT0000: Communicating with 10.10.4.195@tcp, operation ost_connect failed with -19.
      14:06:18:Lustre: DEBUG MARKER: lctl get_param -n timeout
      14:06:19:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
      14:06:19:Lustre: DEBUG MARKER: Using TIMEOUT=20
      14:06:19:Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
      14:06:19:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.sys.jobid_var=procname_uid
      14:07:12:Lustre: MGS: haven't heard from client 2b7f8516-fc0a-afb9-790c-1965aaaa46c2 (at 10.10.4.197@tcp) in 50 seconds. I think it's dead, and I am evicting it. exp ffff880078f2e800, cur 1360015629 expire 1360015599 last 1360015579
      14:07:23:Lustre: lustre-MDT0000: haven't heard from client 5fcc94dc-d9c0-7c5c-7665-6b8afe791bb0 (at 10.10.4.197@tcp) in 50 seconds. I think it's dead, and I am evicting it. exp ffff88007832ec00, cur 1360015634 expire 1360015604 last 1360015584
      14:07:23:LustreError: 17820:0:(lu_object.c:1982:lu_ucred_assert()) ASSERTION( uc != ((void *)0) ) failed: 
      14:07:23:LustreError: 17820:0:(lu_object.c:1982:lu_ucred_assert()) LBUG
      14:07:23:Pid: 17820, comm: ll_evictor
      14:07:23:
      14:07:23:Call Trace:
      14:07:23: [<ffffffffa04d7895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      14:07:23: [<ffffffffa04d7e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      14:07:23: [<ffffffffa0664755>] lu_ucred_assert+0x45/0x50 [obdclass]
      14:07:23: [<ffffffffa0c52c66>] mdd_xattr_sanity_check+0x36/0x1f0 [mdd]
      14:07:23: [<ffffffffa0c58221>] mdd_xattr_del+0xf1/0x540 [mdd]
      14:07:23: [<ffffffffa0e3fe0a>] mdt_som_attr_set+0xfa/0x390 [mdt]
      14:07:23: [<ffffffffa0e401ec>] mdt_ioepoch_close_on_eviction+0x14c/0x170 [mdt]
      14:07:23: [<ffffffffa0f100c9>] ? osp_key_init+0x59/0x1a0 [osp]
      14:07:23: [<ffffffffa0e40c4b>] mdt_ioepoch_close+0x2ab/0x3b0 [mdt]
      14:07:23: [<ffffffffa0e411fe>] mdt_mfd_close+0x4ae/0x6e0 [mdt]
      14:07:23: [<ffffffffa0e1297e>] mdt_obd_disconnect+0x3ae/0x4d0 [mdt]
      14:07:23: [<ffffffffa061cd78>] class_fail_export+0x248/0x580 [obdclass]
      14:07:23: [<ffffffffa07f9079>] ping_evictor_main+0x249/0x640 [ptlrpc]
      14:07:23: [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
      14:07:23: [<ffffffffa07f8e30>] ? ping_evictor_main+0x0/0x640 [ptlrpc]
      14:07:23: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      14:07:23: [<ffffffffa07f8e30>] ? ping_evictor_main+0x0/0x640 [ptlrpc]
      14:07:23: [<ffffffffa07f8e30>] ? ping_evictor_main+0x0/0x640 [ptlrpc]
      14:07:23: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      

      The root cause looks to be the client being locked up but even if the client was in a death spin it should not bring the mds down.

      Attachments

        Activity

          People

            keith Keith Mannthey (Inactive)
            keith Keith Mannthey (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: