Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12928

recovery-small test_136: crash in sec2target_str() with review-dne-selinux-ssk

    Details

    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following review-dne-selinux-ssk run: https://testing.whamcloud.com/test_sets/c4cb7246-fc38-11e9-9487-52540065bddc

      Test failed when both onyx-66vm1 and onyx-66vm2 crashed during recovery-small test_136 with the same stack trace. It looks like the clients were trying to refresh the key after losing connection to the server, and some kernel timer accessed invalid memory.

      [ 7122.083618] Lustre: DEBUG MARKER: == recovery-small test 136: changelog_deregister leaving pending records ============================= 20:27:12 (1572553632)
      [ 7165.453719] LNetError: 13554:0:(peer.c:3724:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.2.5.166@tcp added to recovery queue. Health = 900
      [ 7170.455640] LNetError: 13554:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.2.5.163@tcp added to recovery queue. Health = 900
      [ 7193.457642] LNetError: 13554:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.2.5.163@tcp added to recovery queue. Health = 0
      :
      :
      [ 7229.210628] Lustre: 13560:0:(sec_gss.c:688:gss_cli_ctx_handle_err_notify()) req x1648934820665600/t0, ctx ffff984b21bb9c00 idx 0xec1e62374fed889(0->lustre-MDT0002_UUID): server respond (00080000/00000000)
      [ 7229.213867] Lustre: 13560:0:(sec_gss.c:720:gss_cli_ctx_handle_err_notify()) NO_CONTEXT: server might lost the context, retrying
      [ 7229.216008] Lustre: 13560:0:(sec_gss.c:315:cli_ctx_expire()) ctx ffff984b21bb9c00(0->lustre-MDT0002_UUID) get expired: 1573158292(+604552s)
      :
      :
      [ 7253.102691] Lustre: 30140:0:(sec_gss.c:315:cli_ctx_expire()) Skipped 1 previous similar message
      [ 7253.205949] Lustre: DEBUG MARKER: keyctl show | grep lustre | cut -c1-11 |
      				sed -e 's/ //g;' |
      				xargs -IX keyctl setperm X 0x3f3f3f3f
      [ 7269.340618] Lustre: 0:0:(gss_keyring.c:150:ctx_upcall_timeout_kr()) ctx ffff984b3fd03da0, key ffffffff91aaaac7
      [ 7269.342435] general protection fault: 0000 [#1] SMP 
      [ 7269.358981] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G   3.10.0-957.27.2.el7.x86_64 #1
      [ 7269.363007] RIP: 0010:[<ffffffffc0ba0605>]  [<ffffffffc0ba0605>] sec2target_str+0x15/0xb0 [ptlrpc]
      [ 7269.375062] Call Trace:
      [ 7269.375510]  <IRQ> 
      [ 7269.375906]  [<ffffffffc0ce3a96>] cli_ctx_expire+0x96/0x120 [ptlrpc_gss]
      [ 7269.377128]  [<ffffffff91aaaac7>] ? __internal_add_timer+0xc7/0x130
      [ 7269.378174]  [<ffffffff91aaaac7>] ? __internal_add_timer+0xc7/0x130
      [ 7269.379245]  [<ffffffffc0cfe8d0>] ? ctx_unlist_kr+0xc0/0xc0 [ptlrpc_gss]
      [ 7269.380365]  [<ffffffffc0cfe955>] ctx_upcall_timeout_kr+0x85/0xd0 [ptlrpc_gss]
      [ 7269.381580]  [<ffffffff91aa91a8>] call_timer_fn+0x38/0x110
      [ 7269.382504]  [<ffffffffc0cfe8d0>] ? ctx_unlist_kr+0xc0/0xc0 [ptlrpc_gss]
      [ 7269.383615]  [<ffffffff91aab60d>] run_timer_softirq+0x24d/0x300
      [ 7269.384609]  [<ffffffff91aa2155>] __do_softirq+0xf5/0x280
      [ 7269.385550]  [<ffffffff9217a32c>] call_softirq+0x1c/0x30
      [ 7269.386458]  [<ffffffff91a2e675>] do_softirq+0x65/0xa0
      [ 7269.387319]  [<ffffffff91aa24d5>] irq_exit+0x105/0x110
      [ 7269.388183]  [<ffffffff9217b6e8>] smp_apic_timer_interrupt+0x48/0x60
      [ 7269.389246]  [<ffffffff92177df2>] apic_timer_interrupt+0x162/0x170
      [ 7269.390273]  <EOI> 
      [ 7269.390626]  [<ffffffff9216bd70>] ? __cpuidle_text_start+0x8/0x8
      [ 7269.391688]  [<ffffffff9216bf9b>] ? native_safe_halt+0xb/0x20
      [ 7269.392652]  [<ffffffff9216bd8e>] default_idle+0x1e/0xc0
      [ 7269.393550]  [<ffffffff91a366f0>] arch_cpu_idle+0x20/0xc0
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      recovery-small test_136 - onyx-66vm1, onyx-66vm2 crashed during recovery-small test_136

        Attachments

          Activity

            People

            • Assignee:
              ys Yang Sheng
              Reporter:
              maloo Maloo
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: