Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10267

Wrong poll() returned revents for changelog device

Details

    • 3
    • 9223372036854775807

    Description

      The following program failed to poll on the changelog device:

           rc = llapi_changelog_start(&chglog_hdlr,
                                         CHANGELOG_FLAG_BLOCK | CHANGELOG_FLAG_JOBID,
                                         mdtname, 0);
           
              fd = llapi_changelog_get_fd(chglog_hdlr);
      
              fds[0].fd = fd;
              fds[0].events = POLLIN;
              rc = llapi_changelog_recv(chglog_hdlr, &rec);
              rc = poll(fds, 1, CHLG_POLL_INTV);
      

      The returned revents are always POLLERR.

      The Lustre log output caused this failure is:

      00000002:20000000:1.0:1511318935.046970:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 30 13TRUNC 1622762232318656574 0xe t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
      00000040:00001000:1.0:1511318935.046987:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1ae10 type 0x10660000
      00000040:00001000:1.0:1511318935.046988:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x10660000 idx=31
      00000040:00001000:1.0:1511318935.046988:0:110785:0:(llog.c:595:llog_process_thread()) lrh_index: 31 lrh_len: 120 (4592 remains)
      00000002:20000000:1.0:1511318935.047002:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 31 11CLOSE 1622762232427964466 0x242 t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
      00000040:00001000:1.0:1511318935.047004:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1ae88 type 0x10660000
      00000040:00001000:1.0:1511318935.047004:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x10660000 idx=32
      00000040:00001000:1.0:1511318935.047004:0:110785:0:(llog.c:595:llog_process_thread()) lrh_index: 32 lrh_len: 120 (4472 remains)
      00000002:20000000:1.0:1511318935.047004:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 32 13TRUNC 1622762281381438112 0xe t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
      00000040:00001000:1.0:1511318935.047005:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1af00 type 0x10660000
      00000040:00001000:1.0:1511318935.047005:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x10660000 idx=33
      00000040:00001000:1.0:1511318935.047006:0:110785:0:(llog.c:595:llog_process_thread()) lrh_index: 33 lrh_len: 120 (4352 remains)
      00000002:20000000:1.0:1511318935.047006:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 33 11CLOSE 1622762281387196858 0x242 t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
      00000040:00001000:1.0:1511318935.047007:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1af78 type 0x0
      00000040:00001000:1.0:1511318935.047007:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x0 idx=0
      00000040:00001000:1.0:1511318935.047009:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012ce4040 type 0x0
      00000040:00001000:1.0:1511318935.047009:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x0 idx=0
      00000040:00001000:1.0:1511318935.047010:0:110785:0:(llog.c:556:llog_process_thread()) Re-read last llog buffer for new records, index 2, last 1
      00000040:00001000:0.0:1511318935.047146:0:85780:0:(llog_osd.c:859:llog_osd_next_block()) looking for log index 2 (cur idx 1 off8192), size 8256
      00000040:00001000:1.0:1511318935.047240:0:110785:0:(llog.c:483:llog_process_thread()) cur_offset 8192, chunk_offset 8192, buf_offset 64, rc = -5
      00000002:00020000:1.0:1511318935.047242:0:110785:0:(mdc_changelog.c:244:chlg_load()) lustre-MDT0000-mdc-ffff88000cabe000: fail to process llog: rc = -5
      

      And this bug can be easy reproduced by the following commands:

      lctl --device lustre-MDT0000 changelog_register
      dd if=/dev/zero of=/mnt/lustre/test bs=1M count=2
      lfs changelog lustre-MDT0000
      

      ,

      Attachments

        Issue Links

          Activity

            [LU-10267] Wrong poll() returned revents for changelog device

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30582/
            Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 37047ec8599e9b5ec3884be45c83e58a2b4f5270

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30582/ Subject: LU-10267 llog: fix EOF handling in llog_client_next_block() Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 37047ec8599e9b5ec3884be45c83e58a2b4f5270

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30582
            Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: d60e632396623c809bfafc27e1b529407106f198

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30582 Subject: LU-10267 llog: fix EOF handling in llog_client_next_block() Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: d60e632396623c809bfafc27e1b529407106f198
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30313/
            Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a485c51379a0218aaf01f7c0baf8e4cc993d8824

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30313/ Subject: LU-10267 llog: fix EOF handling in llog_client_next_block() Project: fs/lustre-release Branch: master Current Patch Set: Commit: a485c51379a0218aaf01f7c0baf8e4cc993d8824
            jhammond John Hammond added a comment -

            > To implement the poll, the best way is that the changelog orignator (MDT) pushed the change logs to the replaciator(client) once generated new log records, but I don't think current llog mechanism can support this.

            I agree.

            jhammond John Hammond added a comment - > To implement the poll, the best way is that the changelog orignator (MDT) pushed the change logs to the replaciator(client) once generated new log records, but I don't think current llog mechanism can support this. I agree.

            John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30313
            Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: efde5a31f0b5fe4d5667d403fb399f6fb54d3d39

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30313 Subject: LU-10267 llog: fix EOF handling in llog_client_next_block() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: efde5a31f0b5fe4d5667d403fb399f6fb54d3d39

            People

              bfaccini Bruno Faccini (Inactive)
              qian Qian Yingjin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: