We have a large database with tens of thousands records. Each record has various text fields that should be searchable. All the records are stored in MySQL.
Initially we use the LIKE “%keyword%", however, a command word such as ‘flower’ search takes 40s and the server load is extremely high. This is not acceptable on a public website.
Secondly, we re-write the codes to use MySQL’s full text search feature. The result is still disappointing as the same search still takes half a minutes.
We discover a new search utility provided by Zend called Zend Lucene Search. We heard about Lucene before as people in Yell.com are using this technology to power their search.
Here is a quick guide on how to get Zend Lucene running for your website:
1. Get & install Zend framework
The minimal version of Zend Framework will do the work well.
http://framework.zend.com/download/current/
Unzip the file and put it under /usr/local/lib/zend
Update your php.ini to add the zend framework in your include directories.
2. Build the index.
In our case, we use DataObject within the Pear library to get records from database and then use Zend Lucend to make index.
PHP:
|
|
| require_once('Zend/Search/Lucene.php'); |
| |
|
|
| Zend_Search_Lucene_Analysis_Analyzer::setDefault( |
| new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive()); |
| |
| $index = Zend_Search_Lucene::create($zend_indexPath); |
| |
| |
| $photo = DB_DataObject::factory('Photo'); |
| $photo->find(); |
| |
| |
| while($photo->fetch() ) |
| { |
| $doc = new Zend_Search_Lucene_Document(); |
| |
| $doc->addField( |
| Zend_Search_Lucene_Field::UnIndexed('id', $photo->id)); |
| |
| |
| $doc->addField( |
| Zend_Search_Lucene_Field::Unstored('code', stripSlashes($photo->code))); |
| |
| $doc->addField( |
| Zend_Search_Lucene_Field::TEXT('name', stripSlashes($photo->name))); |
| |
| $doc->addField( |
| Zend_Search_Lucene_Field::Unstored('detail', stripSlashes($photo->detail))); |
| |
| $doc->addField( |
| Zend_Search_Lucene_Field::Unstored('keywords', stripSlashes($photo->keywords))); |
| |
| $doc->addField( |
| Zend_Search_Lucene_Field::UnIndexed('url', "domain/index.php?id=$photo->id")); |
| |
| |
| $index->addDocument($doc); |
| } |
| |
| $index->commit(); |
| |
|
|
| $index->optimize(); |
3. Update the index
You can run the above codes in the cron job daily to keep your index uptodate.
You can also make finer control by removing the records from the index and then add it back if the record is changed.
In our case, there is no such frequent change in database so we just re-make the index daily.
4. Make query
The Lucene query language is easy and simple.
PHP:
| require_once('Zend/Search/Lucene.php'); |
| |
|
|
| Zend_Search_Lucene_Analysis_Analyzer::setDefault( |
| new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive()); |
| |
| $index = Zend_Search_Lucene::create($zend_indexPath); |
| $results = $index->find("+flower"); |
| |
| echo $index->count()." records found.\n\n"; |
| |
| if($index->count()) |
| |
| { |
| |
| $count = 0; |
| |
| foreach ($results as $result) |
| |
| { |
| echo "<a href='$result->url'>$result->name </a><br/>"; |
| |
| } |
| } |
| } |
The beauty is: there is no need to use the processing power of MYSQL at all when user makes a query. In fact, the same search is cut from 30s to just 1s. The improvement is amazing.
In fact, we can use this technology to build our own search engine for a large website. Lucene can be used to index html, Excel, PDF, Word documents as well as to index database records.
There is also plugin for you to highlight search results.