我们正在设置一个 Solr 来索引文档,其中标题字段可以是各种语言.谷歌搜索后,我发现了两个选项:
We're setting up a Solr to index documents where title field can be in various languages. After googling I found two options:
哪个更好?有什么大起大落?
Which one is better? What are the ups and downs?
谢谢
还有第三种选择,您可以为所有语言使用一组通用字段,但对字段 language
应用过滤器.例如,如果您有字段 text
, language
您可以将所有语言的文本内容放入 text
字段并使用例如 fq=language:english
仅检索英文文档.
There's also a third alternative where you use a common set of fields for all languages but apply a filter to a field language
. For instance if you have the fields text
, language
you can put text contents for all languages in to the text
field and use e.g., fq=language:english
to only retrieve english documents.
这种方法的缺点是您不能使用特定于语言的功能,例如 lemmatisation
、stemming
等.
The downside of this approach is that you cannot use language specific features such as lemmatisation
, stemming
, etc.
为每种语言定义不同的架构字段,即 title_en、title_fr、...对每种语言应用不同的过滤器,然后使用相应语言查询其中一个标题字段.
Define different schema fields for every language i.e. title_en, title_fr,... applying different filters to each language then query one of title fields with a corresponding language.
这种方法提供了很好的灵活性,但当存在多种语言时,请注意高内存消耗和复杂性.这可以使用多个 solr 服务器来缓解.
This approach gives good flexibility, but beware of high memory consumption and complexity when many languages are present. This can be mitigated using multiple solr servers.
创建不同的 Solr 核心来处理每种语言并使我们的应用查询正确的 Solr 核心.
Creating different Solr cores to handle each language and make our app query correct Solr core.
绝对是一个不错的解决方案.但是,单独的管理和轻微的开销是否适合您可能与您希望使用的语言数量有关.
Definately a nice solution. But whether the separate administration and slight overhead will work for you is probably in relation to the number of languages you wish to use.
除非第一种方法适用,否则我可能会倾向于第二种方法,除非不需要内核的可扩展性.不过,这两种方法都很好,我认为这基本上归结为偏好.
Unless the first approach is applicable, I would probably lean towards the second one unless the scalability of cores isn't desired. Either approach is fine though and I think it basicaly comes down to preference.
这篇关于使用 Solr 索引多种语言的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!