• <bdo id='jDTvN'></bdo><ul id='jDTvN'></ul>
  1. <i id='jDTvN'><tr id='jDTvN'><dt id='jDTvN'><q id='jDTvN'><span id='jDTvN'><b id='jDTvN'><form id='jDTvN'><ins id='jDTvN'></ins><ul id='jDTvN'></ul><sub id='jDTvN'></sub></form><legend id='jDTvN'></legend><bdo id='jDTvN'><pre id='jDTvN'><center id='jDTvN'></center></pre></bdo></b><th id='jDTvN'></th></span></q></dt></tr></i><div id='jDTvN'><tfoot id='jDTvN'></tfoot><dl id='jDTvN'><fieldset id='jDTvN'></fieldset></dl></div>
    <legend id='jDTvN'><style id='jDTvN'><dir id='jDTvN'><q id='jDTvN'></q></dir></style></legend>
    1. <small id='jDTvN'></small><noframes id='jDTvN'>

      <tfoot id='jDTvN'></tfoot>

      在 Python 中与 finditer() 重叠匹配

      时间:2023-10-19

      <small id='SX1ZU'></small><noframes id='SX1ZU'>

      <i id='SX1ZU'><tr id='SX1ZU'><dt id='SX1ZU'><q id='SX1ZU'><span id='SX1ZU'><b id='SX1ZU'><form id='SX1ZU'><ins id='SX1ZU'></ins><ul id='SX1ZU'></ul><sub id='SX1ZU'></sub></form><legend id='SX1ZU'></legend><bdo id='SX1ZU'><pre id='SX1ZU'><center id='SX1ZU'></center></pre></bdo></b><th id='SX1ZU'></th></span></q></dt></tr></i><div id='SX1ZU'><tfoot id='SX1ZU'></tfoot><dl id='SX1ZU'><fieldset id='SX1ZU'></fieldset></dl></div>
      <tfoot id='SX1ZU'></tfoot>
        <bdo id='SX1ZU'></bdo><ul id='SX1ZU'></ul>

            <tbody id='SX1ZU'></tbody>

              <legend id='SX1ZU'><style id='SX1ZU'><dir id='SX1ZU'><q id='SX1ZU'></q></dir></style></legend>
              • 本文介绍了在 Python 中与 finditer() 重叠匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                I'm using a regex to match Bible verse references in a text. The current regex is

                REF_REGEX = re.compile('''
                  (?<!w)                        # Not preceded by any words
                  (?P<quote>q(?:uote)?s+)?      # Match optional 'q' or 'quote' followed by many spaces
                  (?P<book>                           
                    (?:(?:[1-3]|I{1,3})s*)?     # Match an optional arabic or roman number between 1 and 3.
                    [A-Za-z]+                    # Match any alphabetics
                  ).?                           # Followed by an optional dot
                  (?:                         
                    s*(?P<chapter>d+)          # Match the chapter number
                    (?:
                      [:.](?P<startverse>d+)   # Match the starting verse number, preceded by ':' or '.'
                        (?:-(?P<endverse>d+))?  # Match the optional ending verse number, preceded by '-'
                    )?                           # Verse numbers are optional
                  )
                  (?:
                    s+(?:                       # Here be spaces
                      (?:froms+)|(?:ins+)|(?P<lbrace>())   # Match 'from[:space:]', 'in[:space:]' or '('
                      s*(?P<version>w+)        # Match a word preceded by optional spaces
                      (?(lbrace)))              # Close the '(' if found earlier
                  )?                             # The whole 'in|from|()' is optional
                  ''', re.IGNORECASE | re.VERBOSE | re.UNICODE)
                

                This matches the following expressions fine:

                "jn 3:16":                           (None, 'jn', '3', '16', None, None, None),
                "matt. 18:21-22":                    (None, 'matt', '18', '21', '22', None, None),
                "q matt. 18:21-22":                  ('q ', 'matt', '18', '21', '22', None, None),
                "QuOTe jn 3:16":                     ('QuOTe ', 'jn', '3', '16', None, None, None),
                "q 1co13:1":                         ('q ', '1co', '13', '1', None, None, None), 
                "q 1 co 13:1":                       ('q ', '1 co', '13', '1', None, None, None),
                "quote 1 co 13:1":                   ('quote ', '1 co', '13', '1', None, None, None),
                "quote 1co13:1":                     ('quote ', '1co', '13', '1', None, None, None),
                "jean 3:18 (PDV)":                   (None, 'jean', '3', '18', None, '(', 'PDV'),
                "quote malachie 1.1-2 fRom Colombe": ('quote ', 'malachie', '1', '1', '2', None, 'Colombe'),
                "quote malachie 1.1-2 In Colombe":   ('quote ', 'malachie', '1', '1', '2', None, 'Colombe'),
                "cinq jn 3:16 (test)":               (None, 'jn', '3', '16', None, '(', 'test'),
                "Q   IIKings5.13-58   from   wolof": ('Q     ', 'IIKings', '5', '13', '58', None, 'wolof'),
                "This text is about lv5.4-6 in KJV only": (None, 'lv', '5', '4', '6', None, 'KJV'),
                

                but it fails to parse:

                "Found in 2 Cor. 5:18-21 ( Ministers":                    (None, '2 Cor', '5', '18', '21', None, None),
                

                because it returns (None, 'in', '2', None, None, None, None) instead.

                Is there a way to get finditer() to return all matches, even if they overlap, or is there a way to improve my regex so it matches this last bit properly?

                Thanks.

                解决方案

                A character consumed is consumed, you should not ask the regex engine to go back.

                From your examples the verse part (e.g. :1) seems not optional. Removing that will match the last bit.

                ref_regex = re.compile('''
                (?<!w)                      # Not preceeded by any words
                ((?i)q(?:uote)?s+)?            # Match 'q' or 'quote' followed by many spaces
                (
                    (?:(?:[1-3]|I{1,3})s*)?    # Match an arabic or roman number between 1 and 3.
                    [A-Za-z]+                   # Match many alphabetics
                ).?                            # Followed by an optional dot
                (?:
                    s*(d+)                    # Match the chapter number
                    (?:
                        [:.](d+)               # Match the verse number
                        (?:-(d+))?             # Match the ending verse number
                    )                    # <-- no '?' here
                )
                (?:
                    s+
                    (?:
                        (?i)(?:froms+)|        # Match the keyword 'from' or 'in'
                        (?:ins+)|
                        (?P<lbrace>()      # or stuff between (...)
                    )s*(w+)
                    (?(lbrace)))
                )?
                ''', re.X | re.U)
                

                (If you're going to write a gigantic RegEx like this, please use the /x flag.)


                If you really need overlapping matches, you could use a lookahead. A simple example is

                >>> rx = re.compile('(.)(?=(.))')
                >>> x = rx.finditer("abcdefgh")
                >>> [y.groups() for y in x]
                [('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('e', 'f'), ('f', 'g'), ('g', 'h')]
                

                You may extend this idea to your RegEx.

                这篇关于在 Python 中与 finditer() 重叠匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                上一篇:强制对一个可迭代对象进行所有迭代 下一篇:在 python 的元组列表中有效且更快地迭代超过 3600 万个项目

                相关文章

                  <tfoot id='bAFYB'></tfoot>
                  <i id='bAFYB'><tr id='bAFYB'><dt id='bAFYB'><q id='bAFYB'><span id='bAFYB'><b id='bAFYB'><form id='bAFYB'><ins id='bAFYB'></ins><ul id='bAFYB'></ul><sub id='bAFYB'></sub></form><legend id='bAFYB'></legend><bdo id='bAFYB'><pre id='bAFYB'><center id='bAFYB'></center></pre></bdo></b><th id='bAFYB'></th></span></q></dt></tr></i><div id='bAFYB'><tfoot id='bAFYB'></tfoot><dl id='bAFYB'><fieldset id='bAFYB'></fieldset></dl></div>
                  <legend id='bAFYB'><style id='bAFYB'><dir id='bAFYB'><q id='bAFYB'></q></dir></style></legend>
                    • <bdo id='bAFYB'></bdo><ul id='bAFYB'></ul>

                    <small id='bAFYB'></small><noframes id='bAFYB'>