• <small id='T4pIl'></small><noframes id='T4pIl'>

    <i id='T4pIl'><tr id='T4pIl'><dt id='T4pIl'><q id='T4pIl'><span id='T4pIl'><b id='T4pIl'><form id='T4pIl'><ins id='T4pIl'></ins><ul id='T4pIl'></ul><sub id='T4pIl'></sub></form><legend id='T4pIl'></legend><bdo id='T4pIl'><pre id='T4pIl'><center id='T4pIl'></center></pre></bdo></b><th id='T4pIl'></th></span></q></dt></tr></i><div id='T4pIl'><tfoot id='T4pIl'></tfoot><dl id='T4pIl'><fieldset id='T4pIl'></fieldset></dl></div>

    1. <legend id='T4pIl'><style id='T4pIl'><dir id='T4pIl'><q id='T4pIl'></q></dir></style></legend>
      <tfoot id='T4pIl'></tfoot>
        <bdo id='T4pIl'></bdo><ul id='T4pIl'></ul>

        如何获取 Lucene 模糊搜索结果的匹配项?

        时间:2023-06-28

      1. <small id='7fqOm'></small><noframes id='7fqOm'>

      2. <legend id='7fqOm'><style id='7fqOm'><dir id='7fqOm'><q id='7fqOm'></q></dir></style></legend>
          <tfoot id='7fqOm'></tfoot>
          • <bdo id='7fqOm'></bdo><ul id='7fqOm'></ul>

          • <i id='7fqOm'><tr id='7fqOm'><dt id='7fqOm'><q id='7fqOm'><span id='7fqOm'><b id='7fqOm'><form id='7fqOm'><ins id='7fqOm'></ins><ul id='7fqOm'></ul><sub id='7fqOm'></sub></form><legend id='7fqOm'></legend><bdo id='7fqOm'><pre id='7fqOm'><center id='7fqOm'></center></pre></bdo></b><th id='7fqOm'></th></span></q></dt></tr></i><div id='7fqOm'><tfoot id='7fqOm'></tfoot><dl id='7fqOm'><fieldset id='7fqOm'></fieldset></dl></div>

                  <tbody id='7fqOm'></tbody>

                  本文介绍了如何获取 Lucene 模糊搜索结果的匹配项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  在使用 Lucene Fuzzy Search 时如何获得匹配的模糊词及其偏移量?

                  how do you get the matching fuzzy term and its offset when using Lucene Fuzzy Search?

                      IndexSearcher mem = ....(some standard code)
                  
                      QueryParser parser = new QueryParser(Version.LUCENE_30, CONTENT_FIELD, analyzer);
                  
                      TopDocs topDocs = mem.search(parser.parse("wuzzy~"), 1);
                      // the ~ triggers the fuzzy search as per "Lucene In Action" 
                  

                  模糊搜索工作正常.如果文档包含术语fuzzy"或luzzy",则匹配.如何获得匹配的术语以及它们的偏移量是多少?

                  The fuzzy search works fine. If a document contains the term "fuzzy" or "luzzy", it is matched. How do I get which term matched and what are their offsets?

                  我已确保所有 CONTENT_FIELD 都添加了带有位置和偏移量的 termVectorStored.

                  I have made sure that all CONTENT_FIELDs are added with termVectorStored with positions and offsets .

                  推荐答案

                  没有直接的方法可以做到这一点,但是我重新考虑了 Jared 的建议并且能够使解决方案发挥作用.

                  There was no straight forward way of doing this, however I reconsidered Jared's suggestion and was able to get the solution working.

                  我在这里记录一下,以防其他人遇到同样的问题.

                  I am documenting this here just in case someone else has the same issue.

                  创建一个实现org.apache.lucene.search.highlight.Formatter的类

                  public class HitPositionCollector implements Formatter
                  {
                      // MatchOffset is a simple DTO
                      private List<MatchOffset> matchList;
                      public HitPositionCollector(
                      {
                          matchList = new ArrayList<MatchOffset>();
                      }
                  
                      // this ie where the term start and end offset as well as the actual term is captured
                      @Override
                      public String highlightTerm(String originalText, TokenGroup tokenGroup)
                      {
                          if (tokenGroup.getTotalScore() <= 0)
                          {
                          }
                          else
                          {
                              MatchOffset mo= new MatchOffset(tokenGroup.getToken(0).toString(), tokenGroup.getStartOffset(),tokenGroup.getEndOffset());
                              getMatchList().add(mo);
                          }
                  
                          return originalText;
                      }
                  
                      /**
                      * @return the matchList
                      */
                      public List<MatchOffset> getMatchList()
                      {
                          return matchList;
                      }
                  }
                  

                  主代码

                  public void testHitsWithHitPositionCollector() throws Exception
                  {
                      System.out.println(" .... testHitsWithHitPositionCollector");
                      String fuzzyStr = "bro*";
                  
                      QueryParser parser = new QueryParser(Version.LUCENE_30, "f", analyzer);
                      Query fzyQry = parser.parse(fuzzyStr);
                      TopDocs hits = searcher.search(fzyQry, 10);
                  
                      QueryScorer scorer = new QueryScorer(fzyQry, "f");
                  
                      HitPositionCollector myFormatter= new HitPositionCollector();
                  
                      //Highlighter(Formatter formatter, Scorer fragmentScorer)
                      Highlighter highlighter = new Highlighter(myFormatter,scorer);
                      highlighter.setTextFragmenter(
                          new SimpleSpanFragmenter(scorer)
                      );
                  
                      Analyzer analyzer2 = new SimpleAnalyzer();
                  
                      int loopIndex=0;
                      //for (ScoreDoc sd : hits.scoreDocs) {
                          Document doc = searcher.doc( hits.scoreDocs[0].doc);
                          String title = doc.get("f");
                  
                          TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
                                                      hits.scoreDocs[0].doc,
                                                      "f",
                                                      doc,
                                                      analyzer2);
                  
                          String fragment = highlighter.getBestFragment(stream, title);
                  
                          System.out.println(fragment);
                          assertEquals("the quick brown fox jumps over the lazy dog", fragment);
                          MatchOffset mo= myFormatter.getMatchList().get(loopIndex++);
                  
                          assertTrue(mo.getEndPos()==15);
                          assertTrue(mo.getStartPos()==10);
                          assertTrue(mo.getToken().equals("brown"));
                  }
                  

                  这篇关于如何获取 Lucene 模糊搜索结果的匹配项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:在包含 1 亿个字符串的大型文本文件中进行高效的子字符串搜索(无重复字符串) 下一篇:在 Lucene 中,如何确定 IndexSearcher 或 IndexWriter 是否正在另一个线程中使用?

                  相关文章

                1. <small id='FZR5U'></small><noframes id='FZR5U'>

                  <legend id='FZR5U'><style id='FZR5U'><dir id='FZR5U'><q id='FZR5U'></q></dir></style></legend>

                  • <bdo id='FZR5U'></bdo><ul id='FZR5U'></ul>
                  1. <i id='FZR5U'><tr id='FZR5U'><dt id='FZR5U'><q id='FZR5U'><span id='FZR5U'><b id='FZR5U'><form id='FZR5U'><ins id='FZR5U'></ins><ul id='FZR5U'></ul><sub id='FZR5U'></sub></form><legend id='FZR5U'></legend><bdo id='FZR5U'><pre id='FZR5U'><center id='FZR5U'></center></pre></bdo></b><th id='FZR5U'></th></span></q></dt></tr></i><div id='FZR5U'><tfoot id='FZR5U'></tfoot><dl id='FZR5U'><fieldset id='FZR5U'></fieldset></dl></div>

                    <tfoot id='FZR5U'></tfoot>