Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      It looks like "LU-9480 lnet: implement Peer Discovery" commit 0f1aaad4c1b4447ee5097b8bb79a49d09eaa23c2 broke lolnd (suggested by git bisect)

      This manifests in e.g. sanity test 101b hanging with this in logs:

      [  215.914245] Lustre: DEBUG MARKER: == sanity test 101b: check stride-io mode read-ahead ================================================= 01:32:15 (1504675935)
      [  215.985320] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000401:0x5:0x0], use llapi_layout_get_by_path()
      [  256.717500] LNet: Service thread pid 4032 was inactive for 40.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [  256.720328] Pid: 4032, comm: ll_ost_io00_002
      [  256.721561] 
      Call Trace:
      [  256.723391]  [<ffffffff81704339>] schedule+0x29/0x70
      [  256.724533]  [<ffffffff81700972>] schedule_timeout+0x162/0x2a0
      [  256.725651]  [<ffffffff810879f0>] ? process_timeout+0x0/0x10
      [  256.726859]  [<ffffffffa0534e3e>] target_bulk_io+0x4ee/0xb20 [ptlrpc]
      [  256.729276]  [<ffffffff810b7ce0>] ? default_wake_function+0x0/0x20
      [  256.730431]  [<ffffffffa05ddf08>] tgt_brw_read+0xf38/0x1870 [ptlrpc]
      [  256.731359]  [<ffffffffa01ba4a4>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [  256.732387]  [<ffffffffa0579f90>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc]
      [  256.733578]  [<ffffffffa0532800>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
      [  256.734845]  [<ffffffffa057a102>] ? lustre_pack_reply_flags+0x72/0x1f0 [ptlrpc]
      [  256.736719]  [<ffffffffa057a291>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
      [  256.737931]  [<ffffffffa05dad2b>] tgt_request_handle+0x93b/0x1390 [ptlrpc]
      [  256.738981]  [<ffffffffa05853b1>] ptlrpc_server_handle_request+0x251/0xae0 [ptlrpc]
      [  256.740764]  [<ffffffffa0589168>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [  256.741800]  [<ffffffff81706487>] ? _raw_spin_unlock_irq+0x27/0x50
      [  256.742938]  [<ffffffffa0588710>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [  256.743943]  [<ffffffff810a2eda>] kthread+0xea/0xf0
      [  256.744963]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0
      [  256.745913]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
      [  256.746933]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0
      
      [  256.748798] LustreError: dumping log to /tmp/lustre-log.1504675975.4032
      [  269.494952] LustreError: 2624:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff8800720b3e00
      

      Easy to reproduce, just run this on a single node: ONLY=101 REFORMAT=yes sh sanity.sh

      Attachments

        Issue Links

          Activity

            [LU-9949] lolnd broken
            pfarrell Patrick Farrell (Inactive) made changes -
            Link Original: This issue is related to LU-9920 [ LU-9920 ]
            pjones Peter Jones made changes -
            Link New: This issue duplicates LU-9992 [ LU-9992 ]
            pjones Peter Jones made changes -
            Fix Version/s Original: Lustre 2.12.0 [ 13495 ]
            Resolution New: Duplicate [ 3 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            jgmitter Joseph Gmitter (Inactive) made changes -
            Priority Original: Blocker [ 1 ] New: Major [ 3 ]
            jgmitter Joseph Gmitter (Inactive) made changes -
            Fix Version/s New: Lustre 2.12.0 [ 13495 ]
            jhammond John Hammond made changes -
            Link New: This issue is related to LU-9992 [ LU-9992 ]
            jhammond John Hammond made changes -
            Link New: This issue is related to LU-9920 [ LU-9920 ]
            pjones Peter Jones made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Amir Shehata [ ashehata ]
            green Oleg Drokin made changes -
            Priority Original: Minor [ 4 ] New: Blocker [ 1 ]
            green Oleg Drokin made changes -
            Key Original: DDN-458 New: LU-9949
            Workflow Original: classic default workflow [ 55754 ] New: Sub-task Blocking [ 55755 ]
            Project Original: DDN [ 10086 ] New: Lustre [ 10000 ]
            green Oleg Drokin created issue -

            People

              ashehata Amir Shehata (Inactive)
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: