Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2094

LBUG in osc_req_attr_set along write path

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.2.0, Lustre 2.1.1
    • None
    • Seen on SLES11-SP1 with Lustre 2.1.1 and Lustre 2.2
      Also seen on SLES11-SP2 with Lustre 2.2
    • 3
    • 4377

    Description

      LBUG reported in LU-1650 brings down client nodes. But the LBUG occurs on a write path rather than a read path. This is the occurrence reported by Michael McKay - 17/Sep/12 in LU-1650.

      > 2012-04-20T22:00:48.036542-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:698:osc_req_attr_set()) lock@ffff8803f4522358[3 2 0 1 1 00000006] P(0):[0, 18446744073709551615]@[0x100000000:0x4b1f5c0:0x0] {
      > 2012-04-20T22:00:48.065914-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:698:osc_req_attr_set())     lovsub@ffff8803f525e4e0: [0 ffff880404198470 P(0):[0, 18446744073709551615]@[0x23d979800:0x1:0x0]]
      > 2012-04-20T22:00:48.065973-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:698:osc_req_attr_set())     osc@ffff8803f4a3ab50: (null) 00001000 0x85ead42e54740750 1 (null) size: 0 mtime: 0 atime: 0 ctime: 0 blocks: 0
      > 2012-04-20T22:00:48.065987-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:698:osc_req_attr_set()) } lock@ffff8803f4522358
      > 2012-04-20T22:00:48.095709-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:698:osc_req_attr_set()) no cover page!
      > 2012-04-20T22:00:48.199435-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:698:osc_req_attr_set()) lock@ffff8803f4522dd8[1 1 0 1 1 00000000] W(2):[0, 0]@[0x100000000:0x4b1f5c0:0x0] {
      > 2012-04-20T22:00:48.199467-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:698:osc_req_attr_set())     lovsub@ffff8803f7d0a4e0: [0 ffff8807e9ae1a88 W(2):[0, 18446744073709551615]@[0x23d979800:0x1:0x0]]
      > 2012-04-20T22:00:48.199478-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:698:osc_req_attr_set())     osc@ffff8803f5c37b50: (null) 00000000 0x0 0 (null) size: 0 mtime: 0 atime: 0 ctime: 0 blocks: 0
      > 2012-04-20T22:00:48.199489-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:698:osc_req_attr_set()) } lock@ffff8803f4522dd8
      > 2012-04-20T22:00:48.199500-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:698:osc_req_attr_set()) no cover page!
      > 2012-04-20T22:00:48.199509-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:700:osc_req_attr_set()) page@ffff8803f4901680[3 ffff8807eb639b08:0 ^(null)_ffff8803f49015c0 2 0 1 (null) ffff8807e9b9c380 0x0]
      > 2012-04-20T22:00:48.199523-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:700:osc_req_attr_set()) page@ffff8803f49015c0[1 ffff8807ea301b58:0 ^ffff8803f4901680_(null) 2 0 1 (null) (null) 0x0]
      > 2012-04-20T22:00:48.199543-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:700:osc_req_attr_set()) vvp-page@ffff8803f7438f50(0:0:0) vm@ffffea000dc16b30 200000000005019 4:0 ffff8803f4901680 0 lru
      > 2012-04-20T22:00:48.199556-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:700:osc_req_attr_set()) lov-page@ffff8803f757d978
      > 2012-04-20T22:00:48.199600-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:700:osc_req_attr_set()) osc-page@ffff8803f4b317a8: 1< 0x845fed 2 0 - - + > 2< 0 0 2016 0x3 0x420 | (null) ffff880401df2628 ffff8807e847c340 ffffffffa0516a20 ffff8803f4b317a8 > 3< + ffff8807ffdfe0c0 1 39 0 > 4< 0 0 8 33759232 - | - - - - > 5< - - - - | 0 - - | 0 - ->
      > 2012-04-20T22:00:48.226323-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:700:osc_req_attr_set()) end page@ffff8803f49015c0
      > 2012-04-20T22:00:48.226361-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:700:osc_req_attr_set()) dump uncover page!
      > 2012-04-20T22:00:48.226372-05:00 c0-0c0s5n1 Pid: 7196, comm: flush-lustre-1
      > 2012-04-20T22:00:48.226381-05:00 c0-0c0s5n1 Call Trace:
      > 2012-04-20T22:00:48.226418-05:00 c0-0c0s5n1 [<ffffffff81007b11>] try_stack_unwind+0x1b1/0x200
      > 2012-04-20T22:00:48.226432-05:00 c0-0c0s5n1 [<ffffffff81006420>] dump_trace+0x90/0x300
      > 2012-04-20T22:00:48.255723-05:00 c0-0c0s5n1 [<ffffffffa014b8d2>] libcfs_debug_dumpstack+0x52/0x80 [libcfs]
      > 2012-04-20T22:00:48.255753-05:00 c0-0c0s5n1 [<ffffffffa0514536>] osc_req_attr_set+0x206/0x2a0 [osc]
      > 2012-04-20T22:00:48.255775-05:00 c0-0c0s5n1 [<ffffffffa024d0ae>] cl_req_attr_set+0xce/0x1f0 [obdclass]
      > 2012-04-20T22:00:48.255786-05:00 c0-0c0s5n1 [<ffffffffa0501e77>] osc_send_oap_rpc+0xc67/0x1c50 [osc]
      > 2012-04-20T22:00:48.255817-05:00 c0-0c0s5n1 [<ffffffffa0503136>] osc_check_rpcs+0x2d6/0x470 [osc]
      > 2012-04-20T22:00:48.255832-05:00 c0-0c0s5n1 [<ffffffffa0514bf0>] osc_io_submit+0x220/0x4d0 [osc]
      > 2012-04-20T22:00:48.285929-05:00 c0-0c0s5n1 [<ffffffffa024d3d8>] cl_io_submit_rw+0x78/0x1a0 [obdclass]
      > 2012-04-20T22:00:48.285958-05:00 c0-0c0s5n1 [<ffffffffa0575924>] lov_io_submit+0x254/0xb40 [lov]
      > 2012-04-20T22:00:48.311494-05:00 c0-0c0s5n1 [<ffffffffa024d3d8>] cl_io_submit_rw+0x78/0x1a0 [obdclass]
      > 2012-04-20T22:00:48.311524-05:00 c0-0c0s5n1 [<ffffffffa05e8424>] ll_writepage+0x2c4/0x480 [lustre]
      > 2012-04-20T22:00:48.311557-05:00 c0-0c0s5n1 [<ffffffff810d0b22>] __writepage+0x12/0x50
      > 2012-04-20T22:00:48.311580-05:00 c0-0c0s5n1 [<ffffffff810d1213>] write_cache_pages+0x1d3/0x3b0
      > 2012-04-20T22:00:48.311592-05:00 c0-0c0s5n1 [<ffffffff810d140f>] generic_writepages+0x1f/0x30
      > 2012-04-20T22:00:48.337108-05:00 c0-0c0s5n1 [<ffffffff810d143c>] do_writepages+0x1c/0x40
      > 2012-04-20T22:00:48.337139-05:00 c0-0c0s5n1 [<ffffffff8112c422>] writeback_single_inode+0x92/0x2c0
      > 2012-04-20T22:00:48.337195-05:00 c0-0c0s5n1 [<ffffffff8112cfca>] writeback_inodes_wb+0x26a/0x3e0
      > 2012-04-20T22:00:48.337213-05:00 c0-0c0s5n1 [<ffffffff8112d2e0>] wb_writeback+0x1a0/0x260
      > 2012-04-20T22:00:48.337259-05:00 c0-0c0s5n1 [<ffffffff8112d67c>] wb_do_writeback+0x1fc/0x210
      > 2012-04-20T22:00:48.337274-05:00 c0-0c0s5n1 [<ffffffff8112d6cb>] bdi_writeback_task+0x3b/0xc0
      > 2012-04-20T22:00:48.337286-05:00 c0-0c0s5n1 [<ffffffff810e2321>] bdi_start_fn+0x81/0xf0
      > 2012-04-20T22:00:48.362673-05:00 c0-0c0s5n1 [<ffffffff81063dae>] kthread+0x8e/0xa0
      > 2012-04-20T22:00:48.362703-05:00 c0-0c0s5n1 [<ffffffff81003d3a>] child_rip+0xa/0x20
      > 2012-04-20T22:00:48.362745-05:00 c0-0c0s5n1 LustreError: 7196:0:(osc_io.c:702:osc_req_attr_set()) LBUG
      
      These messages are the result of executing this code path in osc_req_attr
      

      This bug occurs frequently during system regression testing, but we have no explicit reproducer.

      The dumps are too large to attach here, so I will put them on the Cray anonymous ftp server.

      2.1 Dump: ftp.cray.com:/outbound/LU1650.21.osc_req_attr_set.lbug.tgz
      2.2 Dump: ftp.cray.com:/outbound/LU1650.22.osc_req_attr_set.lbug.tgz

      Let me know if you want other dumps or if you want me to look for other info in our collection of dumps. Note that dklogs are NOT available; performance issues require that debug tracing be kept to a minimum.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              amk Ann Koehler (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: