Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9949

lolnd broken

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      It looks like "LU-9480 lnet: implement Peer Discovery" commit 0f1aaad4c1b4447ee5097b8bb79a49d09eaa23c2 broke lolnd (suggested by git bisect)

      This manifests in e.g. sanity test 101b hanging with this in logs:

      [  215.914245] Lustre: DEBUG MARKER: == sanity test 101b: check stride-io mode read-ahead ================================================= 01:32:15 (1504675935)
      [  215.985320] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000401:0x5:0x0], use llapi_layout_get_by_path()
      [  256.717500] LNet: Service thread pid 4032 was inactive for 40.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [  256.720328] Pid: 4032, comm: ll_ost_io00_002
      [  256.721561] 
      Call Trace:
      [  256.723391]  [<ffffffff81704339>] schedule+0x29/0x70
      [  256.724533]  [<ffffffff81700972>] schedule_timeout+0x162/0x2a0
      [  256.725651]  [<ffffffff810879f0>] ? process_timeout+0x0/0x10
      [  256.726859]  [<ffffffffa0534e3e>] target_bulk_io+0x4ee/0xb20 [ptlrpc]
      [  256.729276]  [<ffffffff810b7ce0>] ? default_wake_function+0x0/0x20
      [  256.730431]  [<ffffffffa05ddf08>] tgt_brw_read+0xf38/0x1870 [ptlrpc]
      [  256.731359]  [<ffffffffa01ba4a4>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [  256.732387]  [<ffffffffa0579f90>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc]
      [  256.733578]  [<ffffffffa0532800>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
      [  256.734845]  [<ffffffffa057a102>] ? lustre_pack_reply_flags+0x72/0x1f0 [ptlrpc]
      [  256.736719]  [<ffffffffa057a291>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
      [  256.737931]  [<ffffffffa05dad2b>] tgt_request_handle+0x93b/0x1390 [ptlrpc]
      [  256.738981]  [<ffffffffa05853b1>] ptlrpc_server_handle_request+0x251/0xae0 [ptlrpc]
      [  256.740764]  [<ffffffffa0589168>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [  256.741800]  [<ffffffff81706487>] ? _raw_spin_unlock_irq+0x27/0x50
      [  256.742938]  [<ffffffffa0588710>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [  256.743943]  [<ffffffff810a2eda>] kthread+0xea/0xf0
      [  256.744963]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0
      [  256.745913]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
      [  256.746933]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0
      
      [  256.748798] LustreError: dumping log to /tmp/lustre-log.1504675975.4032
      [  269.494952] LustreError: 2624:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff8800720b3e00
      

      Easy to reproduce, just run this on a single node: ONLY=101 REFORMAT=yes sh sanity.sh

      Attachments

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: