• <tfoot id='omqps'></tfoot>

    <small id='omqps'></small><noframes id='omqps'>

      1. <legend id='omqps'><style id='omqps'><dir id='omqps'><q id='omqps'></q></dir></style></legend>
      2. <i id='omqps'><tr id='omqps'><dt id='omqps'><q id='omqps'><span id='omqps'><b id='omqps'><form id='omqps'><ins id='omqps'></ins><ul id='omqps'></ul><sub id='omqps'></sub></form><legend id='omqps'></legend><bdo id='omqps'><pre id='omqps'><center id='omqps'></center></pre></bdo></b><th id='omqps'></th></span></q></dt></tr></i><div id='omqps'><tfoot id='omqps'></tfoot><dl id='omqps'><fieldset id='omqps'></fieldset></dl></div>
          <bdo id='omqps'></bdo><ul id='omqps'></ul>

        在 Lucene 中获取词频

        时间:2023-06-29

        <i id='IUzOD'><tr id='IUzOD'><dt id='IUzOD'><q id='IUzOD'><span id='IUzOD'><b id='IUzOD'><form id='IUzOD'><ins id='IUzOD'></ins><ul id='IUzOD'></ul><sub id='IUzOD'></sub></form><legend id='IUzOD'></legend><bdo id='IUzOD'><pre id='IUzOD'><center id='IUzOD'></center></pre></bdo></b><th id='IUzOD'></th></span></q></dt></tr></i><div id='IUzOD'><tfoot id='IUzOD'></tfoot><dl id='IUzOD'><fieldset id='IUzOD'></fieldset></dl></div>

        <legend id='IUzOD'><style id='IUzOD'><dir id='IUzOD'><q id='IUzOD'></q></dir></style></legend>
        <tfoot id='IUzOD'></tfoot>

            <small id='IUzOD'></small><noframes id='IUzOD'>

              • <bdo id='IUzOD'></bdo><ul id='IUzOD'></ul>
                  <tbody id='IUzOD'></tbody>

                1. 本文介绍了在 Lucene 中获取词频的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  有没有一种快速简便的方法从 Lucene 索引中获取词频,而无需通过 TermVectorFrequencies 类来完成,因为对于大型集合来说这需要大量时间?

                  Is there a fast and easy way of getting term frequencies from a Lucene index, without doing it through the TermVectorFrequencies class, since that takes an awful lot of time for large collections?

                  我的意思是,有没有像 TermEnum 这样的东西,它不仅有文档频率,还有词频?

                  What I mean is, is there something like TermEnum which has not just the document frequency but term frequency as well?

                  更新:使用 TermDocs 太慢了.

                  UPDATE: Using TermDocs is way too slow.

                  推荐答案

                  使用TermDocs 获取给定文档的词频.与文档频率一样,您可以使用感兴趣的术语从 IndexReader 获取术语文档.

                  您不会找到比 TermDocs 更快的方法而不失一些通用性.TermDocs 直接从索引段中的.frq"文件中读取,其中每个术语频率按文档顺序列出.

                  You won't find a faster method than TermDocs without losing some generality. TermDocs reads directly from the ".frq" file in an index segment, where each term frequency is listed in document order.

                  如果这太慢",请确保您已优化索引以将多个段合并为一个段.按顺序遍历文档(跳过没问题,但不能高效地在文档列表中来回跳转).

                  If that's "too slow", make sure that you've optimized your index to merge multiple segments into a single segment. Iterate over the documents in order (skips are alright, but you can't jump back and forth in the document list efficiently).

                  您的下一步可能是进行额外处理,以创建一个更专业的文件结构,省略 SkipData.就我个人而言,我会寻找更好的算法来实现我的目标,或者提供更好的硬件——大量内存,或者保存 RAMDirectory,或者提供给操作系统以在其自己的文件缓存系统上使用.

                  Your next step might be additional processing to create an even more specialized file structure that leaves out the SkipData. Personally I would look for a better algorithm to achieve my objective, or provide better hardware—lots of memory, either to hold a RAMDirectory, or to give to the OS for use on its own file-caching system.

                  这篇关于在 Lucene 中获取词频的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:如何在 solr 结果中获得构面范围? 下一篇:在 lucene 中使用命中荧光笔

                  相关文章

                2. <small id='D8AfH'></small><noframes id='D8AfH'>

                3. <tfoot id='D8AfH'></tfoot>

                      <bdo id='D8AfH'></bdo><ul id='D8AfH'></ul>

                    <legend id='D8AfH'><style id='D8AfH'><dir id='D8AfH'><q id='D8AfH'></q></dir></style></legend>
                    1. <i id='D8AfH'><tr id='D8AfH'><dt id='D8AfH'><q id='D8AfH'><span id='D8AfH'><b id='D8AfH'><form id='D8AfH'><ins id='D8AfH'></ins><ul id='D8AfH'></ul><sub id='D8AfH'></sub></form><legend id='D8AfH'></legend><bdo id='D8AfH'><pre id='D8AfH'><center id='D8AfH'></center></pre></bdo></b><th id='D8AfH'></th></span></q></dt></tr></i><div id='D8AfH'><tfoot id='D8AfH'></tfoot><dl id='D8AfH'><fieldset id='D8AfH'></fieldset></dl></div>