Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11768

sanity-quota test 6 fails with ‘LNet: Service thread pid <pid> was inactive for …’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.8
    • Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.3, Lustre 2.12.6, Lustre 2.12.7
    • None
    • 3
    • 9223372036854775807

    Description

      sanity-quota test_6 started failing on November 13, 2018, Lustre tag 2.11.56.140, with the error

      [ 1733.308968] LNet: Service thread pid 18400 was inactive for 40.06s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: 
      

      In sanity-quota test_6, we scan the OST1 dmesg log to see if watchdog was triggered. Looking at the logs for https://testing.whamcloud.com/test_sets/9f3095ea-fdc2-11e8-b837-52540065bddc , the dmesg log from OST1 (vm3) contains the NET error and the stack trace

      [18752.909319] Lustre: DEBUG MARKER: lctl set_param -n osd*.*OS*.force_sync=1
      [18795.136287] LNet: Service thread pid 14192 was inactive for 40.14s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [18795.137958] Pid: 14192, comm: ll_ost_io00_002 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Sat Dec 8 05:52:11 UTC 2018
      [18795.138944] Call Trace:
      [18795.139235]  [<ffffffffc0f2a880>] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc]
      [18795.140051]  [<ffffffffc0f2acd3>] ptlrpc_queue_wait+0x83/0x230 [ptlrpc]
      [18795.140837]  [<ffffffffc1115308>] qsd_send_dqacq+0x2e8/0x340 [lquota]
      [18795.141528]  [<ffffffffc1123383>] qsd_acquire+0x8e3/0xcb0 [lquota]
      [18795.142183]  [<ffffffffc11238d4>] qsd_op_begin0+0x184/0x960 [lquota]
      [18795.142838]  [<ffffffffc1124312>] qsd_op_begin+0x262/0x4b0 [lquota]
      [18795.143571]  [<ffffffffc116eac7>] osd_declare_quota+0xd7/0x360 [osd_zfs]
      [18795.144322]  [<ffffffffc1177ff0>] osd_declare_write_commit+0x3d0/0x7f0 [osd_zfs]
      [18795.145083]  [<ffffffffc12958d9>] ofd_commitrw_write+0x939/0x1d40 [ofd]
      [18795.145833]  [<ffffffffc1299de2>] ofd_commitrw+0x4b2/0xa10 [ofd]
      [18795.146465]  [<ffffffffc0f98d6c>] obd_commitrw+0x9c/0x370 [ptlrpc]
      [18795.147178]  [<ffffffffc0f9b9dd>] tgt_brw_write+0x100d/0x1a90 [ptlrpc]
      [18795.147927]  [<ffffffffc0f9f29a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [18795.148649]  [<ffffffffc0f4391b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [18795.149488]  [<ffffffffc0f4724c>] ptlrpc_main+0xafc/0x1fb0 [ptlrpc]
      [18795.150201]  [<ffffffff86abdf21>] kthread+0xd1/0xe0
      [18795.150788]  [<ffffffff871255f7>] ret_from_fork_nospec_end+0x0/0x39
      [18795.151437]  [<ffffffffffffffff>] 0xffffffffffffffff
      [18795.152089] LustreError: dumping log to /tmp/lustre-log.1544552141.14192
       

      There is no other indication of a problem in the console and dmesg logs. We see this issue for both zfs and ldiskfs environments.

      Some of these failures are attributed to LU-11644, but the stack traces do not look the same.

      Logs for this failure are at
      https://testing.whamcloud.com/test_sets/e2bf61ea-e78f-11e8-b67f-52540065bddc
      https://testing.whamcloud.com/test_sets/bca63f5a-f60e-11e8-bfe1-52540065bddc
      https://testing.whamcloud.com/test_sets/613c72d6-f5d9-11e8-bfe1-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: