Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2958

LBUG triggered in seq_client_alloc_fid() - ASSERTION( seq != ((void *)0) ) failed

Details

    • 3
    • 7213

    Description

      We're continually hitting this LBUG when running 2.3.62-2chaos.

      2013-03-13 10:28:27.644618 {DefaultControlEventListener} [mmcs]{8}.5.1: LustreError: 12943:0:(fid_request.c:329:seq_client_alloc_fid()) ASSERTION( seq != ((void *)0) ) failed: 
      2013-03-13 10:28:27.645378 {DefaultControlEventListener} [mmcs]{8}.5.1: LustreError: 12943:0:(fid_request.c:329:seq_client_alloc_fid()) LBUG
      2013-03-13 10:28:27.645734 {DefaultControlEventListener} [mmcs]{8}.5.1: Call Trace:
      2013-03-13 10:28:27.646114 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b3b0] [c000000000008160] .show_stack+0x7c/0x184 (unreliable)
      2013-03-13 10:28:27.646480 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b460] [8000000000420cb8] .libcfs_debug_dumpstack+0xd8/0x150 [libcfs]
      2013-03-13 10:28:27.646853 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b510] [8000000000421480] .lbug_with_loc+0x50/0xc0 [libcfs]
      2013-03-13 10:28:27.647213 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b5a0] [8000000001958b80] .seq_client_alloc_fid+0x4f0/0x740 [fid]
      2013-03-13 10:28:27.647576 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b6c0] [8000000001a4e310] .mdc_fid_alloc+0x140/0x1e0 [mdc]
      2013-03-13 10:28:27.647944 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b770] [8000000001a65818] .mdc_intent_lock+0x508/0x838 [mdc]
      2013-03-13 10:28:27.648345 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b8f0] [8000000001ce72e0] .ll_lookup_it+0x450/0xfb0 [lustre]
      2013-03-13 10:28:27.648774 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268ba70] [8000000001ce7f10] .ll_lookup_nd+0xd0/0x580 [lustre]
      2013-03-13 10:28:27.649207 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bb30] [c0000000000df10c] .__lookup_hash+0x180/0x1c8
      2013-03-13 10:28:27.649699 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bbd0] [c0000000000e3690] .do_filp_open+0x260/0xadc
      2013-03-13 10:28:27.650185 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bd80] [c0000000000d1ca8] .do_sys_open+0x8c/0x18c
      2013-03-13 10:28:27.650632 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268be30] [c000000000000580] syscall_exit+0x0/0x2c
      2013-03-13 10:28:27.651100 {DefaultControlEventListener} [mmcs]{8}.5.1: Kernel panic - not syncing: LBUG
      2013-03-13 10:28:27.651554 {DefaultControlEventListener} [mmcs]{8}.5.1: Call Trace:
      2013-03-13 10:28:27.652039 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b3d0] [c000000000008160] .show_stack+0x7c/0x184 (unreliable)
      2013-03-13 10:28:27.652453 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b480] [c0000000004557cc] .panic+0xb8/0x1e0
      2013-03-13 10:28:27.652880 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b510] [80000000004214e0] .lbug_with_loc+0xb0/0xc0 [libcfs]
      2013-03-13 10:28:27.653312 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b5a0] [8000000001958b80] .seq_client_alloc_fid+0x4f0/0x740 [fid]
      2013-03-13 10:28:27.653734 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b6c0] [8000000001a4e310] .mdc_fid_alloc+0x140/0x1e0 [mdc]
      2013-03-13 10:28:27.654166 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b770] [8000000001a65818] .mdc_intent_lock+0x508/0x838 [mdc]
      2013-03-13 10:28:27.654595 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b8f0] [8000000001ce72e0] .ll_lookup_it+0x450/0xfb0 [lustre]
      2013-03-13 10:28:27.655016 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268ba70] [8000000001ce7f10] .ll_lookup_nd+0xd0/0x580 [lustre]
      2013-03-13 10:28:27.655440 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bb30] [c0000000000df10c] .__lookup_hash+0x180/0x1c8
      2013-03-13 10:28:27.655861 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bbd0] [c0000000000e3690] .do_filp_open+0x260/0xadc
      2013-03-13 10:28:27.656281 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bd80] [c0000000000d1ca8] .do_sys_open+0x8c/0x18c
      2013-03-13 10:28:27.656711 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268be30] [c000000000000580] syscall_exit+0x0/0x2c
      2013-03-13 10:28:27.657108 {DefaultControlEventListener} [mmcs]{8}.5.1: LustreError: dumping log to /tmp/lustre-log.1363195707.12943
      

      Attachments

        Issue Links

          Activity

            [LU-2958] LBUG triggered in seq_client_alloc_fid() - ASSERTION( seq != ((void *)0) ) failed

            I think this is in fact the same problem as LU-2911. As I understand that bug, it relates to master clients talking to servers upgraded from 1.8. I discovered while investigating LU-2986 that the server in question was upgraded to 2.1 from 1.8. Consequently, the client doesn't load the LMV layer. But LU-1445 http://review.whamcloud.com/4787 removed the obd_fid_init() calls from llite, assuming that it will be taken care of in LMV. Since we don't have LMV, cl_seq is never initialized and we fail the assertion.

            nedbass Ned Bass (Inactive) added a comment - I think this is in fact the same problem as LU-2911 . As I understand that bug, it relates to master clients talking to servers upgraded from 1.8. I discovered while investigating LU-2986 that the server in question was upgraded to 2.1 from 1.8. Consequently, the client doesn't load the LMV layer. But LU-1445 http://review.whamcloud.com/4787 removed the obd_fid_init() calls from llite, assuming that it will be taken care of in LMV. Since we don't have LMV, cl_seq is never initialized and we fail the assertion.

            Prakash, it is my expectation that 2.4 clients will work with 2.1 servers. I thought the LU-2911 problem was related to upgrade/downgrade for 1.8.9->2.4->1.8.9, but in fact it is an interop problem between 1.8.9 or 2.1.2 clients and master servers.

            adilger Andreas Dilger added a comment - Prakash, it is my expectation that 2.4 clients will work with 2.1 servers. I thought the LU-2911 problem was related to upgrade/downgrade for 1.8.9->2.4->1.8.9, but in fact it is an interop problem between 1.8.9 or 2.1.2 clients and master servers.
            di.wang Di Wang added a comment -

            duplicate with 2911

            di.wang Di Wang added a comment - duplicate with 2911

            This is on one of our production machines, so I'm unsure what kind of workload the users are putting on the system. We just recently upgraded the clients to run the new 2.3.62 tag (from 2.3.58), which I'm sure is why we're seeing it now. Also, I just noticed that the clients are mounting a file system running 2.1.2-3chaos, so this isn't exactly a supported configuration (2.3.62 clients using 2.1.2 servers).

            prakash Prakash Surya (Inactive) added a comment - This is on one of our production machines, so I'm unsure what kind of workload the users are putting on the system. We just recently upgraded the clients to run the new 2.3.62 tag (from 2.3.58), which I'm sure is why we're seeing it now. Also, I just noticed that the clients are mounting a file system running 2.1.2-3chaos, so this isn't exactly a supported configuration (2.3.62 clients using 2.1.2 servers).

            Prakash, any info on what kind of workload/operation is triggering this problem?

            adilger Andreas Dilger added a comment - Prakash, any info on what kind of workload/operation is triggering this problem?

            People

              wc-triage WC Triage
              prakash Prakash Surya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: