<small id='KT19U'></small><noframes id='KT19U'>

      <tfoot id='KT19U'></tfoot>
        <bdo id='KT19U'></bdo><ul id='KT19U'></ul>

      <legend id='KT19U'><style id='KT19U'><dir id='KT19U'><q id='KT19U'></q></dir></style></legend>

        <i id='KT19U'><tr id='KT19U'><dt id='KT19U'><q id='KT19U'><span id='KT19U'><b id='KT19U'><form id='KT19U'><ins id='KT19U'></ins><ul id='KT19U'></ul><sub id='KT19U'></sub></form><legend id='KT19U'></legend><bdo id='KT19U'><pre id='KT19U'><center id='KT19U'></center></pre></bdo></b><th id='KT19U'></th></span></q></dt></tr></i><div id='KT19U'><tfoot id='KT19U'></tfoot><dl id='KT19U'><fieldset id='KT19U'></fieldset></dl></div>
      1. 如何从 Lucene 中的文档术语向量中获取位置?

        时间:2023-06-29

          <i id='HAfZr'><tr id='HAfZr'><dt id='HAfZr'><q id='HAfZr'><span id='HAfZr'><b id='HAfZr'><form id='HAfZr'><ins id='HAfZr'></ins><ul id='HAfZr'></ul><sub id='HAfZr'></sub></form><legend id='HAfZr'></legend><bdo id='HAfZr'><pre id='HAfZr'><center id='HAfZr'></center></pre></bdo></b><th id='HAfZr'></th></span></q></dt></tr></i><div id='HAfZr'><tfoot id='HAfZr'></tfoot><dl id='HAfZr'><fieldset id='HAfZr'></fieldset></dl></div>

            <legend id='HAfZr'><style id='HAfZr'><dir id='HAfZr'><q id='HAfZr'></q></dir></style></legend><tfoot id='HAfZr'></tfoot>

            <small id='HAfZr'></small><noframes id='HAfZr'>

              <bdo id='HAfZr'></bdo><ul id='HAfZr'></ul>
                  <tbody id='HAfZr'></tbody>

                1. 本文介绍了如何从 Lucene 中的文档术语向量中获取位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我需要遍历 Lucene 索引中的所有文档,并获取每个术语在每个文档中出现的位置.据我能够从 Lucene javadoc 中了解到,这样做的方法是执行以下操作:

                  I need to iterate over all documents in a Lucene index, and obtain the positions at which each term occurs in each document. As far as I am able to understand from the Lucene javadoc, the way to do this is to do something like this:

                  IndexReader ir = obtainIndexReader();
                  Terms tv = ir.getTermVector( doc, field );
                  TermsEnum terms = tv.iterator();
                  PostingsEnum p = null;
                  while( terms.next() != null ) {
                      p = terms.postings( p, PostingsEnum.ALL );
                      while( p.nextDoc() != PostingsEnum.NO_MORE_DOCS ) {
                          int freq = p.freq();
                          for( int i = 0; i < freq; i++ ) {
                              int pos = p.nextPosition();   // Always returns -1!!!
                              BytesRef data = p.getPayload();
                              doStuff( freq, pos, data ); // Fails miserably, of course.
                          }
                      }
                  }
                  

                  但是,即使 (1) 索引确实包含相关字段上的位置,并且 (2) 术语向量声称具有位置(即:tv.hasPositions() == true),我仍然得到-1" 适用于所有职位.

                  However, even though (1) the index does indeed include positions on the relevant field and (2) the term vector claims to have positions (i.e.: tv.hasPositions() == true), I keep getting "-1" for all positions.

                  首先,我是不是做错了什么?是否有另一种方法可以在每个文档的基础上迭代过帐?第二:到底发生了什么?该索引包含位置,getTermVector 返回的术语实例声称包含位置,并且我正在查看 Luke 中的正确位置值,但是当我尝试在我的代码中访问所述值时仍然得到 -1.什么给了?

                  First, am I doing something wrong? Is there an alternative way of iterating over postings on a per-document basis? Second: What is going on anyway? The index contains positions, the Terms instance returned by getTermVector claims to include positions, and I'm looking at the correct position values in Luke, yet I still get -1 when I try to access said values in my code. What gives?

                  相关字段配置有以下选项:

                  The relevant field was configured with the following options:

                      FieldType ft = new FieldType();
                      ft.setIndexOptions( IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS );
                      ft.setStoreTermVectors( true );
                      ft.setStoreTermVectorOffsets( true );
                      ft.setStoreTermVectorPayloads( true );
                      ft.setStoreTermVectorPositions( true );
                      ft.setTokenized( true );
                      return ft;
                  

                  推荐答案

                  您是否在索引时为您的字段类型设置了 FieldType.setStoreTermVectorPositions(true)?http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/document/FieldType.html#setStoreTermVectorPositions(boolean)

                  Did you set FieldType.setStoreTermVectorPositions(true) on your field type at index time? http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/document/FieldType.html#setStoreTermVectorPositions(boolean)

                  这篇关于如何从 Lucene 中的文档术语向量中获取位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:Java Lucene 4.5如何按不区分大小写进行搜索 下一篇:如何在日期之间搜索(休眠搜索)?

                  相关文章

                  <small id='hCeCV'></small><noframes id='hCeCV'>

                  1. <legend id='hCeCV'><style id='hCeCV'><dir id='hCeCV'><q id='hCeCV'></q></dir></style></legend>

                  2. <tfoot id='hCeCV'></tfoot>
                    <i id='hCeCV'><tr id='hCeCV'><dt id='hCeCV'><q id='hCeCV'><span id='hCeCV'><b id='hCeCV'><form id='hCeCV'><ins id='hCeCV'></ins><ul id='hCeCV'></ul><sub id='hCeCV'></sub></form><legend id='hCeCV'></legend><bdo id='hCeCV'><pre id='hCeCV'><center id='hCeCV'></center></pre></bdo></b><th id='hCeCV'></th></span></q></dt></tr></i><div id='hCeCV'><tfoot id='hCeCV'></tfoot><dl id='hCeCV'><fieldset id='hCeCV'></fieldset></dl></div>

                        <bdo id='hCeCV'></bdo><ul id='hCeCV'></ul>