• <i id='qd2x5'><tr id='qd2x5'><dt id='qd2x5'><q id='qd2x5'><span id='qd2x5'><b id='qd2x5'><form id='qd2x5'><ins id='qd2x5'></ins><ul id='qd2x5'></ul><sub id='qd2x5'></sub></form><legend id='qd2x5'></legend><bdo id='qd2x5'><pre id='qd2x5'><center id='qd2x5'></center></pre></bdo></b><th id='qd2x5'></th></span></q></dt></tr></i><div id='qd2x5'><tfoot id='qd2x5'></tfoot><dl id='qd2x5'><fieldset id='qd2x5'></fieldset></dl></div>
  • <small id='qd2x5'></small><noframes id='qd2x5'>

    <legend id='qd2x5'><style id='qd2x5'><dir id='qd2x5'><q id='qd2x5'></q></dir></style></legend>

        <tfoot id='qd2x5'></tfoot>

          <bdo id='qd2x5'></bdo><ul id='qd2x5'></ul>

        Lucene 4.0 中的词频

        时间:2023-06-28
            <tbody id='vekvm'></tbody>

              <bdo id='vekvm'></bdo><ul id='vekvm'></ul>
                • <tfoot id='vekvm'></tfoot>
                • <small id='vekvm'></small><noframes id='vekvm'>

                  <legend id='vekvm'><style id='vekvm'><dir id='vekvm'><q id='vekvm'></q></dir></style></legend>

                  <i id='vekvm'><tr id='vekvm'><dt id='vekvm'><q id='vekvm'><span id='vekvm'><b id='vekvm'><form id='vekvm'><ins id='vekvm'></ins><ul id='vekvm'></ul><sub id='vekvm'></sub></form><legend id='vekvm'></legend><bdo id='vekvm'><pre id='vekvm'><center id='vekvm'></center></pre></bdo></b><th id='vekvm'></th></span></q></dt></tr></i><div id='vekvm'><tfoot id='vekvm'></tfoot><dl id='vekvm'><fieldset id='vekvm'></fieldset></dl></div>
                  本文介绍了Lucene 4.0 中的词频的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  尝试使用 Lucene 4.0 计算词频.我的文档频率工作得很好,但不知道如何使用 API 来做词频.这是我的代码:

                  Trying to calculate term frequency using Lucene 4.0. I got document frequency working just fine, but can't figure out how to do term frequency using the API. Here's the code I have:

                  private static void addDoc(IndexWriter writer, String content) throws IOException {
                      FieldType fieldType = new FieldType();
                      fieldType.setStoreTermVectors(true);
                      fieldType.setStoreTermVectorPositions(true);
                      fieldType.setIndexed(true);
                      fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
                      fieldType.setStored(true);
                      Document doc = new Document();
                      doc.add(new Field("content", content, fieldType));
                      writer.addDocument(doc);
                  }
                  
                  public static void main(String[] args) throws IOException, ParseException {
                      Directory directory = new RAMDirectory();  
                      Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_40);
                      IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
                      IndexWriter writer = new IndexWriter(directory, config);
                      addDoc(writer, "Lucene is stupid");
                      addDoc(writer, "Java is great");
                      writer.close();
                      IndexReader reader = DirectoryReader.open(directory);
                      System.out.println(reader.docFreq(new Term("content", "Lucene")));
                      reader.close();
                  }
                  

                  我尝试过执行类似 reader.getTermVector(0, "content")... 的操作,但找不到仅获取该文档中特定术语频率的方法.

                  I've tried doing something like reader.getTermVector(0, "content")... but can't find a method to just get the frequency of a particular term in that document.

                  谢谢!

                  推荐答案

                  K,想通了.您可以从 MultiFields 获取 DocsEnum 对象,然后对其进行迭代.

                  K, figured it out. You can get a DocsEnum object from MultiFields, and then iterate over that.

                  private static void addDoc(IndexWriter writer, String content) throws IOException {
                      FieldType fieldType = new FieldType();
                      fieldType.setStoreTermVectors(true);
                      fieldType.setStoreTermVectorPositions(true);
                      fieldType.setIndexed(true);
                      fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
                      fieldType.setStored(true);
                      Document doc = new Document();
                      doc.add(new Field("content", content, fieldType));
                      writer.addDocument(doc);
                  }
                  
                  public static void main(String[] args) throws IOException, ParseException {
                      Directory directory = new RAMDirectory();  
                      Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_40);
                      IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
                      IndexWriter writer = new IndexWriter(directory, config);
                      addDoc(writer, "bla bla bla bleu bleu");
                      addDoc(writer, "bla bla bla bla");
                      writer.close();
                      DirectoryReader reader = DirectoryReader.open(directory);
                      DocsEnum de = MultiFields.getTermDocsEnum(reader, MultiFields.getLiveDocs(reader), "content", new BytesRef("bla"));
                      int doc;
                      while((doc = de.nextDoc()) != DocsEnum.NO_MORE_DOCS) {
                            System.out.println(de.freq());
                      }
                      reader.close();
                  }
                  

                  这篇关于Lucene 4.0 中的词频的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:Lucene - 检索文档中多值字段的所有值 下一篇:如何对 Solr 中的多个字段执行嵌套聚合?

                  相关文章

                  • <bdo id='VMgqn'></bdo><ul id='VMgqn'></ul>
                • <small id='VMgqn'></small><noframes id='VMgqn'>

                  <i id='VMgqn'><tr id='VMgqn'><dt id='VMgqn'><q id='VMgqn'><span id='VMgqn'><b id='VMgqn'><form id='VMgqn'><ins id='VMgqn'></ins><ul id='VMgqn'></ul><sub id='VMgqn'></sub></form><legend id='VMgqn'></legend><bdo id='VMgqn'><pre id='VMgqn'><center id='VMgqn'></center></pre></bdo></b><th id='VMgqn'></th></span></q></dt></tr></i><div id='VMgqn'><tfoot id='VMgqn'></tfoot><dl id='VMgqn'><fieldset id='VMgqn'></fieldset></dl></div>
                  <legend id='VMgqn'><style id='VMgqn'><dir id='VMgqn'><q id='VMgqn'></q></dir></style></legend>

                    <tfoot id='VMgqn'></tfoot>