Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2109

__llog_process_thread() GPF

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • Orion
    • 3037

    Description

      Observed after rebooting the grove-mds2 on to our latest kernel and lustre-orion tag. The server was rebooted while under a fairly heavy test load. No detailed investigation has been done yet, the MDS was able to successfully restart after rebooting the node again and claims to have successfully completed recovery this time.

      — First boot —

      Mounting grove-mds2/mgs on /mnt/lustre/local/lstest-MGS0000
      Lustre: Lustre: Build Version: orion-2_2_49_57_3-49chaos-49chaos--PRISTINE-2.6.32-220.17.1.3chaos.ch5.x86_64
      Lustre: MGS: Mounted grove-mds2/mgs
      Mounting grove-mds2/mdt0 on /mnt/lustre/local/lstest-MDT0000
      LustreError: 11-0: MGC172.20.5.2@o2ib500: Communicating with 0@lo, operation llog_origin_handle_create failed with -2
      LustreError: 4503:0:(mgc_request.c:250:do_config_log_add()) failed processing sptlrpc log: -2
      Lustre: 4508:0:(fld_index.c:356:fld_index_init()) srv-lstest-MDT0000: File "fld" doesn't support range lookup, using stub. DNE and FIDs on OST will not work with this backend
      Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.48@o2ib500
      Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.2.199@o2ib500
      LNet: 26244:0:(o2iblnd_cb.c:2340:kiblnd_passive_connect()) Conn race 172.20.4.51@o2ib500
      LNet: 26244:0:(o2iblnd_cb.c:2340:kiblnd_passive_connect()) Conn race 172.20.4.44@o2ib500
      Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.53@o2ib500
      Lustre: Skipped 186 previous similar messages
      LustreError: 4567:0:(llog_cat.c:186:llog_cat_id2handle()) error opening log id 0x5a5a5a5a5a5a5a5a:5a5a5a5a: rc -2
      LustreError: 4567:0:(llog_cat.c:503:llog_cat_cancel_records()) Cannot find log 0x5a5a5a5a5a5a5a5a
      LustreError: 4567:0:(llog_cat.c:533:llog_cat_cancel_records()) Cancel 0 of 1 llog-records failed: -2
      LustreError: 4567:0:(osp_sync.c:705:osp_sync_process_committed()) lstest-OST0281-osc-MDT0000: can't cancel record: -2
      LustreError: 4567:0:(llog_cat.c:186:llog_cat_id2handle()) error opening log id 0x5a5a5a5a5a5a5a5a:5a5a5a5a: rc -2
      LustreError: 4567:0:(llog_cat.c:503:llog_cat_cancel_records()) Cannot find log 0x5a5a5a5a5a5a5a5a
      LustreError: 4567:0:(llog_cat.c:533:llog_cat_cancel_records()) Cancel 0 of 1 llog-records failed: -2
      LustreError: 4567:0:(osp_sync.c:705:osp_sync_process_committed()) lstest-OST0281-osc-MDT0000: can't cancel record: -2
      general protection fault: 0000 1 SMP
      last sysfs file: /sys/module/sg/initstate
      CPU 7

      Pid: 4567, comm: osp-syn-641
      Tainted: P W ---------------- 2.6.32-220.17.1.3chaos.ch5.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
      RIP: 0010:[<ffffffffa0694252>] [<ffffffffa0694252>] __llog_process_thread+0x2a2/0xc80 [obdclass]
      RSP: 0018:ffff882f66e9db60 EFLAGS: 00010206
      RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000008701 RCX: 0000000000000000
      RDX: 000000000021e000 RSI: 0000000000000001 RDI: ffff882f63cfe000
      RBP: ffff882f66e9dc10 R08: ffff88179d8f1900 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff88179caf4058
      R13: 000000000000fcff R14: ffff882f63cfc000 R15: ffff882f63cfe000
      FS: 00007ffff7fdc700(0000) GS:ffff881894820000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007ffff7ff9000 CR3: 0000000001a85000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process osp-syn-641
      (pid: 4567, threadinfo ffff882f66e9c000, task ffff882ff9988ae0)
      Stack:
      0000000000002000 0000000000000050 0000000000000010 0000000000000000
      <0> ffff882f63cfc001 000086fe00000000 0000000000000000 ffff882f640124c0
      <0> ffff882f66e9de80 0000000000000000 000000000021e000 0000fd009aee4b80
      Call Trace:
      [<ffffffffa0dd4570>] ? osp_sync_process_queues+0x0/0xf60 [osp]
      [<ffffffffa0694d33>] __llog_process+0x103/0x4d0 [obdclass]
      [<ffffffffa06961bb>] llog_cat_process_cb+0x21b/0x290 [obdclass]
      [<ffffffffa06946fe>] __llog_process_thread+0x74e/0xc80 [obdclass]
      [<ffffffff810618d4>] ? enqueue_task_fair+0x64/0x100
      [<ffffffffa0695fa0>] ? llog_cat_process_cb+0x0/0x290 [obdclass]
      [<ffffffffa0694d33>] __llog_process+0x103/0x4d0 [obdclass]
      [<ffffffffa06953d8>] __llog_cat_process+0x98/0x260 [obdclass]
      [<ffffffffa0dd4570>] ? osp_sync_process_queues+0x0/0xf60 [osp]
      [<ffffffff81051ba3>] ? __wake_up+0x53/0x70
      [<ffffffffa0dd64f2>] osp_sync_thread+0x1c2/0x620 [osp]
      [<ffffffffa0dd6330>] ? osp_sync_thread+0x0/0x620 [osp]
      [<ffffffff8100c14a>] child_rip+0xa/0x20
      [<ffffffffa0dd6330>] ? osp_sync_thread+0x0/0x620 [osp]
      [<ffffffffa0dd6330>] ? osp_sync_thread+0x0/0x620 [osp]
      [<ffffffff8100c140>] ? child_rip+0x0/0x20

      — Second boot —

      Mounting grove-mds2/mgs on /mnt/lustre/local/lstest-MGS0000
      Lustre: Lustre: Build Version: orion-2_2_49_57_3-49chaos-49chaos--PRISTINE-2.6.32-220.17.1.3chaos.ch5.x86_64
      Lustre: MGS: Mounted grove-mds2/mgs
      Mounting grove-mds2/mdt0 on /mnt/lustre/local/lstest-MDT0000
      LustreError: 11-0: MGC172.20.5.2@o2ib500: Communicating with 0@lo, operation llog_origin_handle_create failed with -2
      LustreError: 4525:0:(mgc_request.c:250:do_config_log_add()) failed processing sptlrpc log: -2
      Lustre: 4530:0:(fld_index.c:356:fld_index_init()) srv-lstest-MDT0000: File "fld" doesn't support range lookup, using stub. DNE and FIDs on OST will not work with this backend
      Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.183@o2ib500
      Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.22@o2ib500
      Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.24@o2ib500
      Lustre: Skipped 3 previous similar messages
      Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.3.36@o2ib500
      Lustre: Skipped 2 previous similar messages
      Lustre: lstest-MDT0000: Mounted grove-mds2/mdt0
      Lustre: lstest-MDT0000: Will be in recovery for at least 5:00, or until 255 clients reconnect.
      LustreError: 11-0: lstest-OST0282-osc-MDT0000: Communicating with 172.20.4.42@o2ib500, operation ost_connect failed with -16
      Lustre: lstest-MDT0000: Recovery over after 1:10, of 255 clients 255 recovered and 0 were evicted.
      LustreError: 11-0: lstest-OST0282-osc-MDT0000: Communicating with 172.20.4.42@o2ib500, operation ost_connect failed with -16

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              behlendorf Brian Behlendorf
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: