Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • Upstream
    • None
    • 3
    • 9223372036854775807

    Description

      Lustre: DEBUG MARKER: == replay-single test 111b: DNE: unlink striped dir, fail MDT2 ======================================= 01:31:46 (1573417906)
      ...
      LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      Lustre: DEBUG MARKER: tmp.kWfRXHK7nj: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid,mdc.lustre-MDT0001-mdc-*.mds_server_uuid
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      PGD 0 P4D 0 
      Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
      CPU: 0 PID: 11132 Comm: mdt00_000 Tainted: G        W  O     --------- ---  4.18.0 #31
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      RIP: 0010:          (null)
      Code: Bad RIP value.
      RSP: 0018:ffff88017a7dbcb8 EFLAGS: 00010202
      RAX: ffffffff8216dd00 RBX: ffff880172b72a80 RCX: 0000000000000000
      RDX: 0000000000000098 RSI: 0000000000000000 RDI: ffff88012e41cd48
      RBP: ffff880172364208 R08: 0000000000000028 R09: ffff880172b72f30
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801723643a8
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff880198e00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 000000014e73c000 CR4: 00000000000006b0
      Call Trace:
       ? target_handle_connect+0x1ca9/0x2eb0 [ptlrpc]
       ? tgt_request_handle+0x148/0x17b0 [ptlrpc]
       ? ptlrpc_main+0x1182/0x3180 [ptlrpc]
       ? __kthread_parkme+0x2a/0x80
       ? kthread+0x100/0x140
       ? ptlrpc_register_service+0x15a0/0x15a0 [ptlrpc]
       ? kthread_flush_work_fn+0x10/0x10
       ? ret_from_fork+0x24/0x30
      

      Attachments

        Activity

          [LU-12956] oops in target_handle_connect()
          pjones Peter Jones added a comment -

          Landed for 2.14

          pjones Peter Jones added a comment - Landed for 2.14

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40513/
          Subject: LU-12956 ldlm: fix hrtimer using
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 01a70a56540f095b3dc30656b7135636b4b3abef

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40513/ Subject: LU-12956 ldlm: fix hrtimer using Project: fs/lustre-release Branch: master Current Patch Set: Commit: 01a70a56540f095b3dc30656b7135636b4b3abef

          Alexander Boyko (alexander.boyko@hpe.com) uploaded a new patch: https://review.whamcloud.com/40513
          Subject: LU-12956 ldlm: fix hrtimer using
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: f787eddc0a1f6f41aae87ceda94dbf7fcd92f646

          gerrit Gerrit Updater added a comment - Alexander Boyko (alexander.boyko@hpe.com) uploaded a new patch: https://review.whamcloud.com/40513 Subject: LU-12956 ldlm: fix hrtimer using Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f787eddc0a1f6f41aae87ceda94dbf7fcd92f646
          aboyko Alexander Boyko added a comment - - edited

          Looks like hrtimer_expires_remaining(timer) should be changed to hrtimer_get_remaining(timer). With hrtimer_expires_remaining() race is possible with hrtimer_start()>hrtimer_start_range_ns()>switch_hrtimer_base()

          aboyko Alexander Boyko added a comment - - edited Looks like hrtimer_expires_remaining(timer) should be changed to hrtimer_get_remaining(timer). With hrtimer_expires_remaining() race is possible with hrtimer_start() >hrtimer_start_range_ns() >switch_hrtimer_base()

          Lustre: DEBUG MARKER: == replay-single test 80a: DNE: create remote dir, drop update rep from MDT0, fail MDT0 ============== 14:07:34 (1573808854)
          LustreError: 5753:0:(ldlm_lib.c:3080:target_send_reply_msg()) @@@ dropping reply req@00000000432bd521 x1650253602473600/t365072220167(0) o1000->lustre-MDT0001-mdtlov_UUID@192.168.122.230@tcp:36/0 lens 1056/4320 e 0 to 0 dl 1573808861 ref 1 fl Interpret:/0/0 rc 0/0 job:'osp_up0-1.0'
          Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
          Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
          LustreError: 8083:0:(ldlm_lib.c:1331:target_handle_connect()) ASSERTION( timer->base->get_time ) failed:
          LustreError: 8083:0:(ldlm_lib.c:1331:target_handle_connect()) LBUG
          Call Trace:
          ? __schedule+0x2ad/0xb00
          schedule+0x34/0x80
          lbug_with_loc+0x79/0x80 [libcfs]
          ? target_handle_connect+0x2c62/0x2f20 [ptlrpc]
          ? tgt_request_handle+0x148/0x17b0 [ptlrpc]
          ? ptlrpc_main+0x1182/0x3180 [ptlrpc]
          ? kthread+0x100/0x140

          bzzz Alex Zhuravlev added a comment - Lustre: DEBUG MARKER: == replay-single test 80a: DNE: create remote dir, drop update rep from MDT0, fail MDT0 ============== 14:07:34 (1573808854) LustreError: 5753:0:(ldlm_lib.c:3080:target_send_reply_msg()) @@@ dropping reply req@00000000432bd521 x1650253602473600/t365072220167(0) o1000->lustre-MDT0001-mdtlov_UUID@192.168.122.230@tcp:36/0 lens 1056/4320 e 0 to 0 dl 1573808861 ref 1 fl Interpret:/0/0 rc 0/0 job:'osp_up0-1.0' Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 LustreError: 8083:0:(ldlm_lib.c:1331:target_handle_connect()) ASSERTION( timer->base->get_time ) failed: LustreError: 8083:0:(ldlm_lib.c:1331:target_handle_connect()) LBUG Call Trace: ? __schedule+0x2ad/0xb00 schedule+0x34/0x80 lbug_with_loc+0x79/0x80 [libcfs] ? target_handle_connect+0x2c62/0x2f20 [ptlrpc] ? tgt_request_handle+0x148/0x17b0 [ptlrpc] ? ptlrpc_main+0x1182/0x3180 [ptlrpc] ? kthread+0x100/0x140

          sure

          (gdb) list *(target_handle_connect+0x1ca9)
          0x15979 is in target_handle_connect (include/linux/hrtimer.h:285).
          280		return ktime_to_ns(timer->node.expires);
          281	}
          282	
          283	static inline ktime_t hrtimer_expires_remaining(const struct hrtimer *timer)
          284	{
          285		return ktime_sub(timer->node.expires, timer->base->get_time());
          286	}
          287	
          288	static inline ktime_t hrtimer_cb_get_time(struct hrtimer *timer)
          289	{
          

          and objdump:

                                  stale = target->obd_stale_clients;
          000000000001591a <target_handle_connect+0x1c7a> mov    0x5b8(%rbp),%r13d
          0000000000015921 <target_handle_connect+0x1c81> mov    0x6dc(%rbp),%r14d
          0000000000015928 <target_handle_connect+0x1c88> mov    0x5b0(%rbp),%r15d
          000000000001592f <target_handle_connect+0x1c8f> mov    %eax,0x10(%rsp)
          0000000000015933 <target_handle_connect+0x1c93> mov    0x688(%rbp),%rax
          000000000001593a <target_handle_connect+0x1c9a> mov    %rax,0x8(%rsp)
          000000000001593f <target_handle_connect+0x1c9f> mov    0x6a0(%rbp),%rax
          0000000000015946 <target_handle_connect+0x1ca6> callq  *0x60(%rax)
          0000000000015949 <target_handle_connect+0x1ca9> mov    0x8(%rsp),%rcx
          000000000001594e <target_handle_connect+0x1cae> mov    $0x3b9aca00,%esi
                                          msg = "to recover in";
          0000000000015953 <target_handle_connect+0x1cb3> mov    $0x0,%r10
                                  15956: R_X86_64_32S     .rodata.str1.1+0xfaa
          000000000001595a <target_handle_connect+0x1cba> sub    %rax,%rcx
          000000000001595d <target_handle_connect+0x1cbd> mov    %rcx,%rax
          0000000000015960 <target_handle_connect+0x1cc0> cqto   
          0000000000015962 <target_handle_connect+0x1cc2> idiv   %rsi
          

          looks like obd_recovery_timer wasn't initialized

          bzzz Alex Zhuravlev added a comment - sure (gdb) list *(target_handle_connect+0x1ca9) 0x15979 is in target_handle_connect (include/linux/hrtimer.h:285). 280 return ktime_to_ns(timer->node.expires); 281 } 282 283 static inline ktime_t hrtimer_expires_remaining( const struct hrtimer *timer) 284 { 285 return ktime_sub(timer->node.expires, timer->base->get_time()); 286 } 287 288 static inline ktime_t hrtimer_cb_get_time(struct hrtimer *timer) 289 { and objdump: stale = target->obd_stale_clients; 000000000001591a <target_handle_connect+0x1c7a> mov 0x5b8(%rbp),%r13d 0000000000015921 <target_handle_connect+0x1c81> mov 0x6dc(%rbp),%r14d 0000000000015928 <target_handle_connect+0x1c88> mov 0x5b0(%rbp),%r15d 000000000001592f <target_handle_connect+0x1c8f> mov %eax,0x10(%rsp) 0000000000015933 <target_handle_connect+0x1c93> mov 0x688(%rbp),%rax 000000000001593a <target_handle_connect+0x1c9a> mov %rax,0x8(%rsp) 000000000001593f <target_handle_connect+0x1c9f> mov 0x6a0(%rbp),%rax 0000000000015946 <target_handle_connect+0x1ca6> callq *0x60(%rax) 0000000000015949 <target_handle_connect+0x1ca9> mov 0x8(%rsp),%rcx 000000000001594e <target_handle_connect+0x1cae> mov $0x3b9aca00,%esi msg = "to recover in" ; 0000000000015953 <target_handle_connect+0x1cb3> mov $0x0,%r10 15956: R_X86_64_32S .rodata.str1.1+0xfaa 000000000001595a <target_handle_connect+0x1cba> sub %rax,%rcx 000000000001595d <target_handle_connect+0x1cbd> mov %rcx,%rax 0000000000015960 <target_handle_connect+0x1cc0> cqto 0000000000015962 <target_handle_connect+0x1cc2> idiv %rsi looks like obd_recovery_timer wasn't initialized

          Can you please run "list *(target_handle_connect+0x1ca9)" with gdb on the ptlrpc.ko module to see where that is failing.

          adilger Andreas Dilger added a comment - Can you please run " list *(target_handle_connect+0x1ca9) " with gdb on the ptlrpc.ko module to see where that is failing.

          People

            aboyko Alexander Boyko
            bzzz Alex Zhuravlev
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: