Details

    • 3
    • 4751

    Description

      We have got this general protection fault on a client after some toubles with
      the MDS :

      crash> bt
      PID: 6696 TASK: ffff88184352e700 CPU: 7 COMMAND: "robinhood"
      #0 [ffff8810f44fbc10] machine_kexec at ffffffff8102e77b
      #1 [ffff8810f44fbc70] crash_kexec at ffffffff810a6cd8
      #2 [ffff8810f44fbd40] oops_end at ffffffff8146aad0
      #3 [ffff8810f44fbd70] die at ffffffff8101021b
      #4 [ffff8810f44fbda0] do_general_protection at ffffffff8146a642
      #5 [ffff8810f44fbdd0] general_protection at ffffffff81469e15
      [exception RIP: llog_cat_put+53]
      RIP: ffffffffa09f4f65 RSP: ffff8810f44fbe80 RFLAGS: 00010296
      RAX: 0000000000000000 RBX: ffff8816756d7600 RCX: 000000000000001d
      RDX: 000000000000001d RSI: ffffffff81675340 RDI: 5a5a5a5a5a5a59fa
      RBP: ffff8810f44fbec0 R8: 0000000000000000 R9: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8820517c90c0
      R13: 00000000ffffffe2 R14: ffffffffffffffe2 R15: 0000000040086694
      ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
      #6 [ffff8810f44fbec8] mdc_changelog_send_thread at ffffffffa0c8572f
      #7 [ffff8810f44fbf48] kernel_thread at ffffffff8100d1aa
      [exception RIP: child_rip]
      RIP: ffffffff8100d1a0 RSP: ffff8810f44fbf58 RFLAGS: 00000200
      RAX: 0000000000000000 RBX: ffff8818436c80f8 RCX: 0000000000000001
      RDX: 0000000000000500 RSI: ffff8816756d7600 RDI: ffffffffa0c85490
      RBP: ffff8816a7ddfa58 R8: 0000000000000246 R9: 0000000000000000
      R10: ffff8818436a6800 R11: 0000000000000000 R12: ffff880019b772c0
      R13: ffff8816756d7600 R14: 0000000000000000 R15: 0000000040086694
      ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
      bt: WARNING: possibly bogus exception frame

      with this error message shown just before in dmesg :
      LustreError: 6696:0:(mdc_request.c:1256:mdc_changelog_send_thread()) llog_create() failed -30

      After some investigation, I found that the handle used in llog_cat_put was
      created in the following call path :
      mdc_changelog_send_thread
      llog_create
      lop->lop_create = llog_client_create
      llog_client_create

      where the way the llog_handle is returned in case of error doesn't look
      compatible with the error handling done in mdc_changelog_send_thread (llog_create
      don't do anything with llog_handle)

      In llog_client_create, 'handle' is assigned with the return value of llog_alloc_handle,
      which (if !NULL) is directly assigned to the return parameter '*res' (pointer
      to llh in mdc_changelog_send_thread). Then, if something goes wrong, the code
      jump to err_free where 'handle' is free & poison but res is keep untouched
      llog_client_create return a value != 0 and a pointer (in *res) to a freed &
      poisoned area.

      In mdc_changelog_send_thread, in case of error the error label (out) rely on
      the fact that if llh is not NULL, it is valid and should be processed by
      llog_cat_put. But it isn't since it as been poisoned in llog_client_create
      during the previous error handling.

      A simple fix could be to move the *res = handle in llog_client_create just
      before the EXIT macro. The *res will stay untouched until the function
      successfuly complete.

      Attachments

        Issue Links

          Activity

            [LU-980] incompatbile error handling

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50511/
            Subject: LU-980 mount: improve mount/unmount messages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: cba8c65b384f92d269944042047f7a58d2808530

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50511/ Subject: LU-980 mount: improve mount/unmount messages Project: fs/lustre-release Branch: master Current Patch Set: Commit: cba8c65b384f92d269944042047f7a58d2808530

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50511
            Subject: LU-980 mount: improve mount/unmount messages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 40638bcfcc4e77dbbaea96f08f5cd78205ff1a34

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50511 Subject: LU-980 mount: improve mount/unmount messages Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 40638bcfcc4e77dbbaea96f08f5cd78205ff1a34
            bogl Bob Glossman (Inactive) added a comment - http://review.whamcloud.com/#change,2806 back port to b2_1
            pjones Peter Jones added a comment -

            Landed for 2.2

            pjones Peter Jones added a comment - Landed for 2.2

            Integrated in lustre-master » i686,client,el6,ofa #480
            LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

            Result = ABORTED
            Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
            Files :

            • lustre/ptlrpc/llog_client.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,client,el6,ofa #480 LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2) Result = ABORTED Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2 Files : lustre/ptlrpc/llog_client.c

            Integrated in lustre-master » x86_64,client,el6,ofa #480
            LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

            Result = FAILURE
            Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
            Files :

            • lustre/ptlrpc/llog_client.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,el6,ofa #480 LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2) Result = FAILURE Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2 Files : lustre/ptlrpc/llog_client.c

            Integrated in lustre-master » x86_64,server,el6,ofa #480
            LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

            Result = FAILURE
            Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
            Files :

            • lustre/ptlrpc/llog_client.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,server,el6,ofa #480 LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2) Result = FAILURE Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2 Files : lustre/ptlrpc/llog_client.c

            Integrated in lustre-master » i686,client,el5,ofa #452
            LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

            Result = SUCCESS
            Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
            Files :

            • lustre/ptlrpc/llog_client.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,client,el5,ofa #452 LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2) Result = SUCCESS Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2 Files : lustre/ptlrpc/llog_client.c

            Integrated in lustre-master » i686,client,el5,inkernel #452
            LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

            Result = SUCCESS
            Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
            Files :

            • lustre/ptlrpc/llog_client.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,client,el5,inkernel #452 LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2) Result = SUCCESS Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2 Files : lustre/ptlrpc/llog_client.c

            Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #452
            LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

            Result = SUCCESS
            Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
            Files :

            • lustre/ptlrpc/llog_client.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #452 LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2) Result = SUCCESS Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2 Files : lustre/ptlrpc/llog_client.c

            People

              hongchao.zhang Hongchao Zhang
              louveta Alexandre Louvet (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: