I'm creating a site that allows users to submit quotes. How would I go about creating a (relatively simple?) search that returns the most relevant quotes?


For example, if the search term was "turkey" then I'd return quotes where the word "turkey" appears twice before quotes where it only appears once.


(I would add a few other rules to help filter out irrelevant results, but my main concern is that.)


每个人都建议使用 MySQL 全文搜索,但是您应该注意一个巨大的警告.全文搜索引擎仅适用于 MyISAM 引擎(不适用于 InnoDB,后者因其引用完整性和 ACID 合规性而成为最常用的引擎).

Everyone is suggesting MySQL fulltext search, however you should be aware of a HUGE caveat. The Fulltext search engine is only available for the MyISAM engine (not InnoDB, which is the most commonly used engine due to its referential integrity and ACID compliance).


1.粒子树概述了最简单的方法.您实际上可以从纯 SQL 中获得排名搜索(没有全文,什么也没有).下面的 SQL 查询将搜索表并根据搜索字段中字符串出现的次数对结果进行排名:

1. The simplest approach is outlined by Particle Tree. You can actaully get ranked searches off of pure SQL (no fulltext, no nothing). The SQL query below will search a table and rank results based off the number of occurrences of a string in the search fields:

    SUM(((LENGTH(p.body) - LENGTH(REPLACE(p.body, 'term', '')))/4) +
        ((LENGTH(p.body) - LENGTH(REPLACE(p.body, 'search', '')))/6))
    AS Occurrences
    posts AS p
    Occurrences DESC


上述 SQL 查询的变体,添加 WHERE 语句(WHERE p.body LIKE '%whatever%you%want')等可能会得到您所需要的.

Variations on the above SQL query, adding WHERE statements (WHERE p.body LIKE '%whatever%you%want'), etc. will probably get you exactly what you need.

2. 您可以更改数据库架构以支持全文.通常采取什么措施来保持 InnoDB 引用完整性、ACID 合规性和速度,而无需安装诸如 Sphinx 全文搜索引擎 for MySQL 是将报价数据拆分到它自己的表中.基本上你会有一个表 Quotes 是一个 InnoDB 表,而不是你的 TEXT 字段数据",你有一个引用quote_data_id",它指向 Quote_Data 表上的 ID,它是一个 MyISAM 表.你可以在 MyISAM 表上做你的全文,加入与你的 InnoDB 表一起返回的 ID,瞧你有你的结果.

2. You can alter your database schema to support full text. Often what is done to keep the InnoDB referential integrity, ACID compliance, and speed without having to install plugins like Sphinx Fulltext Search Engine for MySQL is to split the quote data into it's own table. Basically you would have a table Quotes that is an InnoDB table that, rather than having your TEXT field "data" you have a reference "quote_data_id" which points to the ID on a Quote_Data table which is a MyISAM table. You can do your fulltext on the MyISAM table, join the IDs returned with your InnoDB tables and voila you have your results.

3. 安装 Sphinx.祝你好运.

鉴于您的描述,我强烈建议您采用我介绍的第一种方法,因为您有一个简单的数据库驱动站点.第一个解决方案很简单,可以快速完成工作.Lucene 设置起来会很麻烦,特别是如果您想将它与数据库集成,因为 Lucene 主要用于索引文件而不是数据库.Google 自定义站点搜索只会让您的站点名誉扫地(让您看起来很业余和被黑),而 MySQL 全文很可能会导致您更改数据库架构.

Given what you described, I would HIGHLY recommend you take the 1st approach I presented since you have a simple database driven site. The 1st solution is simple, gets the job done quickly. Lucene will be a bitch to setup especially if you want to integrate it with the database as Lucene is designed mainly to index files not databases. Google custom site search just makes your site lose tons of reputation (makes you look amateurish and hacked), and MySQL fulltext will most likely cause you to alter your database schema.

