Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2332

"Unable to handle kernel paging request" in osc_queue_sync_pages

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • Sequoia, Lustre 2.3.54-2chaos on the clients, lustre 2.3.54-6chaos on the servers. github.com/chaos/lustre
    • 3
    • 5564

    Description

      We hit the following bad page request and Oops on a Lustre client (Sequoia I/O Node) while running ior. It happened during a read phase.

      2012-11-14 16:25:51.438302 {DefaultControlEventListener} [mmcs]{753}.3.0: Unable to handle kernel paging request for data at address 0x00000188
      2012-11-14 16:25:51.478050 {DefaultControlEventListener} [mmcs]{753}.3.0: Unable to handle kernel paging request for data at address 0x00000188
      2012-11-14 16:25:51.518060 {DefaultControlEventListener} [mmcs]{753}.3.0: Faulting instruction address: 0x8000000004766018
      2012-11-14 16:25:51.557887 {DefaultControlEventListener} [mmcs]{753}.3.0: Oops: Kernel access of bad area, sig: 11 [#1]
      2012-11-14 16:25:51.598089 {DefaultControlEventListener} [mmcs]{753}.3.0: SMP NR_CPUS=68 Blue Gene/Q
      2012-11-14 16:25:51.637946 {DefaultControlEventListener} [mmcs]{753}.3.0: Modules linked in: lmv(U) mgc(U) lustre(U) mdc(U) fid(U) fld(U) lov(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) bgvrnic bgmudm
      2012-11-14 16:25:51.678349 {DefaultControlEventListener} [mmcs]{753}.3.0: NIP: 8000000004766018 LR: 8000000004765f98 CTR: c00000000042dd78
      2012-11-14 16:25:51.718547 {DefaultControlEventListener} [mmcs]{753}.3.0: REGS: c0000003e04bacd0 TRAP: 0300   Not tainted  (2.6.32-220.23.3.bgq.13llnl.V1R1M2.bgq62_16.ppc64)
      2012-11-14 16:25:51.758561 {DefaultControlEventListener} [mmcs]{753}.3.0: MSR: 0000000080029000 <EE,ME,CE>  CR: 24028488  XER: 20000000
      2012-11-14 16:25:51.798747 {DefaultControlEventListener} [mmcs]{753}.3.0: DEAR: 0000000000000188, ESR: 0000000000000000
      2012-11-14 16:25:51.838717 {DefaultControlEventListener} [mmcs]{753}.3.0: TASK = c0000003c0706f60[4778] 'sysiod' THREAD: c0000003e04b8000 CPU: 57
      2012-11-14 16:25:51.878716 {DefaultControlEventListener} [mmcs]{753}.3.0: GPR00: 8000000004791308 c0000003e04baf50 8000000004795e00 0000000000000000 
      2012-11-14 16:25:51.918673 {DefaultControlEventListener} [mmcs]{753}.3.0: GPR04: c000000319261de0 c0000003c0706f60 0000000000000000 0000000000000000 
      2012-11-14 16:25:51.958294 {DefaultControlEventListener} [mmcs]{753}.3.0: GPR08: c0000002de5bf840 c0000003c07072c0 0000000100117cf6 c00000000042dd78 
      2012-11-14 16:25:51.998721 {DefaultControlEventListener} [mmcs]{753}.3.0: GPR12: 8000000004772710 c000000000770a00 0000000000000062 0000000000000060 
      2012-11-14 16:25:52.038665 {DefaultControlEventListener} [mmcs]{753}.3.0: GPR16: 0000000000000000 8000000000c2f384 80000000047770f0 800000000477d7d0 
      2012-11-14 16:25:52.078590 {DefaultControlEventListener} [mmcs]{753}.3.0: GPR20: 0000000002000400 00000000000010b0 0000000000000008 c0000003c37906c0 
      2012-11-14 16:25:52.118609 {DefaultControlEventListener} [mmcs]{753}.3.0: GPR24: c000000000710380 8000000004791088 c000000319261de0 0000000100117b01 
      2012-11-14 16:25:52.158704 {DefaultControlEventListener} [mmcs]{753}.3.0: GPR28: c000000319261ec0 c000000000710380 80000000047945f8 c0000003e04baf50 
      2012-11-14 16:25:52.199096 {DefaultControlEventListener} [mmcs]{753}.3.0: NIP [8000000004766018] .osc_io_unplug0+0x138/0x6f0 [osc]
      2012-11-14 16:25:52.239389 {DefaultControlEventListener} [mmcs]{753}.3.0: LR [8000000004765f98] .osc_io_unplug0+0xb8/0x6f0 [osc]
      2012-11-14 16:25:52.278920 {DefaultControlEventListener} [mmcs]{753}.3.0: Call Trace:
      2012-11-14 16:25:52.318772 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04baf50] [c0000003c37906c0] 0xc0000003c37906c0 (unreliable)
      2012-11-14 16:25:52.358541 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb080] [800000000476713c] .osc_queue_sync_pages+0x21c/0x460 [osc]
      2012-11-14 16:25:52.398292 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb160] [8000000004754198] .osc_io_submit+0x228/0x6b0 [osc]
      2012-11-14 16:25:52.438255 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb290] [80000000025fa7d8] .cl_io_submit_rw+0xd8/0x270 [obdclass]
      2012-11-14 16:25:52.478260 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb350] [80000000052d7160] .lov_io_submit+0x3b0/0x10b0 [lov]
      2012-11-14 16:25:52.518467 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb450] [80000000025fa7d8] .cl_io_submit_rw+0xd8/0x270 [obdclass]
      2012-11-14 16:25:52.558405 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb510] [80000000025fea24] .cl_io_read_page+0x124/0x280 [obdclass]
      2012-11-14 16:25:52.598323 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb5d0] [8000000006b1d3fc] .ll_readpage+0xdc/0x2c0 [lustre]
      2012-11-14 16:25:52.638412 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb680] [c000000000096924] .generic_file_aio_read+0x4d8/0x6ec
      2012-11-14 16:25:52.678479 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb7c0] [8000000006b610f4] .vvp_io_read_start+0x274/0x640 [lustre]
      2012-11-14 16:25:52.718337 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb8e0] [80000000025faa3c] .cl_io_start+0xcc/0x220 [obdclass]
      2012-11-14 16:25:52.758244 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb980] [8000000002602854] .cl_io_loop+0x194/0x2c0 [obdclass]
      2012-11-14 16:25:52.798116 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bba30] [8000000006ada390] .ll_file_io_generic+0x410/0x670 [lustre]
      2012-11-14 16:25:52.838735 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bbb30] [8000000006adb134] .ll_file_aio_read+0x1d4/0x3a0 [lustre]
      2012-11-14 16:25:52.878194 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bbc00] [8000000006adb450] .ll_file_read+0x150/0x320 [lustre]
      2012-11-14 16:25:52.918073 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bbce0] [c0000000000d21a0] .vfs_read+0xd0/0x1c4
      2012-11-14 16:25:52.958634 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bbd80] [c0000000000d2390] .SyS_read+0x54/0x98
      2012-11-14 16:25:52.998257 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bbe30] [c000000000000580] syscall_exit+0x0/0x2c
      2012-11-14 16:25:53.038087 {DefaultControlEventListener} [mmcs]{753}.3.0: Instruction dump:
      2012-11-14 16:25:53.078379 {DefaultControlEventListener} [mmcs]{753}.3.0: 393902a8 92d90290 91f90294 92990298 f93902a0 fa790280 fa590288 e95d0000 
      2012-11-14 16:25:53.118273 {DefaultControlEventListener} [mmcs]{753}.3.0: e8da00f0 e92d0c68 e8ad0c68 39290360 <e8e6018a> e89e8128 7c030378 399902e0 
      2012-11-14 16:25:53.158440 {DefaultControlEventListener} [mmcs]{753}.14.1: Kernel panic - not syncing: Fatal exception
      2012-11-14 16:25:53.198523 {DefaultControlEventListener} [mmcs]{753}.14.1: Faulting instruction address: 0x800000000473aae8
      2012-11-14 16:25:53.238587 {DefaultControlEventListener} [mmcs]{753}.14.1: Oops: Kernel access of bad area, sig: 11 [#2]
      2012-11-14 16:25:53.278251 {DefaultControlEventListener} [mmcs]{753}.14.1: SMP NR_CPUS=68 Blue Gene/Q
      2012-11-14 16:25:53.318109 {DefaultControlEventListener} [mmcs]{753}.14.1: Modules linked in: lmv(U) mgc(U) lustre(U) mdc(U) fid(U) fld(U) lov(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) bgvrnic bgmudm
      2012-11-14 16:25:53.358097 {DefaultControlEventListener} [mmcs]{753}.14.1: NIP: 800000000473aae8 LR: 800000000473aa58 CTR: c00000000042dd78
      2012-11-14 16:25:53.398124 {DefaultControlEventListener} [mmcs]{753}.14.1: REGS: c000000313faf7c0 TRAP: 0300   Tainted: G      D    ----------------    (2.6.32-220.23.3.bgq.13llnl.V1R1M2.bgq62_16.ppc64)
      2012-11-14 16:25:53.438188 {DefaultControlEventListener} [mmcs]{753}.14.1: MSR: 0000000080029000 <EE,ME,CE>  CR: 24282448  XER: 00000000
      2012-11-14 16:25:53.478266 {DefaultControlEventListener} [mmcs]{753}.14.1: DEAR: 0000000000000188, ESR: 0000000000000000
      2012-11-14 16:25:53.518102 {DefaultControlEventListener} [mmcs]{753}.14.1: TASK = c0000003e5052fc0[3692] 'ptlrpcd_28' THREAD: c000000313fac000 CPU: 12
      2012-11-14 16:25:53.558305 {DefaultControlEventListener} [mmcs]{753}.14.1: GPR00: 800000000478e120 c000000313fafa40 8000000004795e00 800000000478e0c0 
      2012-11-14 16:25:53.598106 {DefaultControlEventListener} [mmcs]{753}.14.1: GPR04: 0000000000000000 c0000003e5053320 0000000000000000 c0000003c782e560 
      2012-11-14 16:25:53.638262 {DefaultControlEventListener} [mmcs]{753}.14.1: GPR08: 8000000004775f80 800000000478e0e8 0000000100117cf6 0000000100117b01 
      2012-11-14 16:25:53.677971 {DefaultControlEventListener} [mmcs]{753}.14.1: GPR12: c0000003e5052fc0 c00000000074c100 8000000004775a20 8000000004778168 
      2012-11-14 16:25:53.718276 {DefaultControlEventListener} [mmcs]{753}.14.1: GPR16: 0000000002000400 0000000000000008 00000000000005c0 c000000000710380 
      2012-11-14 16:25:53.757954 {DefaultControlEventListener} [mmcs]{753}.14.1: GPR20: 8000000000c2f380 8000000000c2f384 c0000002dc5cf800 800000000478e070 
      2012-11-14 16:25:53.798174 {DefaultControlEventListener} [mmcs]{753}.14.1: GPR24: c000000313fafed8 c000000319261de0 0000000100117b01 0000000000000000 
      2012-11-14 16:25:53.838260 {DefaultControlEventListener} [mmcs]{753}.14.1: GPR28: c000000319261ec0 c000000000710380 8000000004792e88 c000000313fafa40 
      2012-11-14 16:25:53.878250 {DefaultControlEventListener} [mmcs]{753}.14.1: NIP [800000000473aae8] .brw_interpret+0x5b8/0x1880 [osc]
      2012-11-14 16:25:53.918267 {DefaultControlEventListener} [mmcs]{753}.14.1: LR [800000000473aa58] .brw_interpret+0x528/0x1880 [osc]
      2012-11-14 16:25:53.958639 {DefaultControlEventListener} [mmcs]{753}.14.1: Call Trace:
      2012-11-14 16:25:53.998166 {DefaultControlEventListener} [mmcs]{753}.14.1: [c000000313fafa40] [800000000473aa40] .brw_interpret+0x510/0x1880 [osc] (unreliable)
      2012-11-14 16:25:54.038257 {DefaultControlEventListener} [mmcs]{753}.14.1: [c000000313fafb80] [8000000003bb6964] .ptlrpc_check_set+0x364/0x4e80 [ptlrpc]
      2012-11-14 16:25:54.078395 {DefaultControlEventListener} [mmcs]{753}.14.1: [c000000313fafd20] [8000000003c0d1cc] .ptlrpcd_check+0x66c/0x8a0 [ptlrpc]
      2012-11-14 16:25:54.118285 {DefaultControlEventListener} [mmcs]{753}.14.1: [c000000313fafe40] [8000000003c0d708] .ptlrpcd+0x308/0x510 [ptlrpc]
      2012-11-14 16:25:54.178067 {DefaultControlEventListener} [mmcs]{753}.14.1: [c000000313faff90] [c00000000001a9e0] .kernel_thread+0x54/0x70
      2012-11-14 16:25:54.218226 {DefaultControlEventListener} [mmcs]{753}.14.1: Instruction dump:
      2012-11-14 16:25:54.258239 {DefaultControlEventListener} [mmcs]{753}.3.0: f9f70050 91770064 f9d70058 e95d0000 e8d900f0 e8ad0c68 e98d0c68 Call Trace:
      2012-11-14 16:25:54.298259 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04baa00] [c000000000008160] .show_stack+0x7c/0x184 (unreliable)
      2012-11-14 16:25:54.338096 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04baab0] [c000000000432c0c] .panic+0x80/0x1a8
      2012-11-14 16:25:54.378365 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bab40] [c000000000018d58] .die+0x1a4/0x1bc
      2012-11-14 16:25:54.418056 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04babe0] [c00000000001e9e0] .bad_page_fault+0xb8/0xd4
      2012-11-14 16:25:54.458132 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bac60] [c000000000013e4c] storage_fault_common+0x48/0x4c
      2012-11-14 16:25:54.498250 {DefaultControlEventListener} [mmcs]{753}.3.0: --- Exception: 300 at .osc_io_unplug0+0x138/0x6f0 [osc]
      2012-11-14 16:25:54.538454 {DefaultControlEventListener} [mmcs]{753}.3.0:     LR = .osc_io_unplug0+0xb8/0x6f0 [osc]
      2012-11-14 16:25:54.578285 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04baf50] [c0000003c37906c0] 0xc0000003c37906c0 (unreliable)
      2012-11-14 16:25:54.618099 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb080] [800000000476713c] .osc_queue_sync_pages+0x21c/0x460 [osc]
      2012-11-14 16:25:54.658237 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb160] [8000000004754198] .osc_io_submit+0x228/0x6b0 [osc]
      2012-11-14 16:25:54.698239 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb290] [80000000025fa7d8] .cl_io_submit_rw+0xd8/0x270 [obdclass]
      2012-11-14 16:25:54.738085 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb350] [80000000052d7160] .lov_io_submit+0x3b0/0x10b0 [lov]
      2012-11-14 16:25:54.778232 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb450] [80000000025fa7d8] .cl_io_submit_rw+0xd8/0x270 [obdclass]
      2012-11-14 16:25:54.818624 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb510] [80000000025fea24] .cl_io_read_page+0x124/0x280 [obdclass]
      2012-11-14 16:25:54.858242 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb5d0] [8000000006b1d3fc] .ll_readpage+0xdc/0x2c0 [lustre]
      2012-11-14 16:25:54.898121 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb680] [c000000000096924] .generic_file_aio_read+0x4d8/0x6ec
      2012-11-14 16:25:54.938238 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb7c0] [8000000006b610f4] .vvp_io_read_start+0x274/0x640 [lustre]
      2012-11-14 16:25:54.978092 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb8e0] [80000000025faa3c] .cl_io_start+0xcc/0x220 [obdclass]
      2012-11-14 16:25:55.018114 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bb980] [8000000002602854] .cl_io_loop+0x194/0x2c0 [obdclass]
      2012-11-14 16:25:55.058242 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bba30] [8000000006ada390] .ll_file_io_generic+0x410/0x670 [lustre]
      2012-11-14 16:25:55.098188 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bbb30] [8000000006adb134] .ll_file_aio_read+0x1d4/0x3a0 [lustre]38a50360 
      2012-11-14 16:25:55.178442 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bbc00] [8000000006adb450] .ll_file_read+0x150/0x320 [lustre]
      2012-11-14 16:25:55.218300 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bbce0] [c0000000000d21a0] .vfs_read+0xd0/0x1c4
      2012-11-14 16:25:55.258127 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bbd80] [c0000000000d2390] .SyS_read+0x54/0x98
      2012-11-14 16:25:55.298134 {DefaultControlEventListener} [mmcs]{753}.3.0: [c0000003e04bbe30] [c000000000000580] syscall_exit+0x0/0x2c
      2012-11-14 16:25:55.338164 {DefaultControlEventListener} [mmcs]{753}.3.0: 7c030378 e97900e8 e91900f8 
      

      LU-1650 might be related, but it is not clear to me at first glance.

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: