Zend Search Lucene and large result sets
Index size does not affect Lucene's search speed per-se: what matters is the frequency of the search terms. And terms tend to have larger frequencies in larger indexes. (Doug Cutting on java-user)Given our index includes a keyword field that indicates a type, e.g. whether an index entry represents an article or a document. Queries should be made on these type subsets, for example to match only articles that contain `lucene' in the title. Approximately 80-90% of all indexed documents represent articles. The overall size of the index is roughly ~500'000 documents. A Boolean query consisting of two subqueries for `title:lucene' and `type:article' (both mandatory) takes unexpectedly long to execute. In this case, the blame for the delay can be clearly put to the `type:article' subquery that matches a very large result set.
$term1 = new Zend_Search_Lucene_Index_Term('t0', 'type');
$term2 = new Zend_Search_Lucene_Index_Term('lucene', 'title');
$query = new Zend_Search_Lucene_Search_Query_Boolean();
$queryt1 = new Zend_Search_Lucene_Search_Query_Term($term1);
$queryt2 = new Zend_Search_Lucene_Search_Query_Term($term2);
$query->addSubquery($queryt1, true);
$query->addSubquery($queryt2, true);
Measurements:
| Subquery 1 (type) : | 10.5533s |
| Subquery 2 (title): | 0.0889s |
| Combined: | 11.68558s |
Lucene's inherent way to retrieve documents is to successively search for every term of a query, collect the results and then perform calculations for conjunctions or intersections based on the search term operators. The complexity of the search syntax semantics probably prevents any chance for reasonable search-within-search features in order to already narrow the search space before execution of the next term (e.g. get all documents matching the title and then substract all items not matching the type). Moreover all limiting and sorting seems to be applied after the full retrieval is completed.
However, Java Lucene offers different Filters that work with cacheable BitSet objects to efficiently post-process the results, so that the expensive `type' subquery could be implemented in such a manner. Zend Search Lucene does not have filters (yet). Using termDocs() to retrieve the document ids for entries matching the title, followed by crude looping to leave out all non-article types proved to be efficient for this particular case (Measured 0.08670s).
$hits = array();
$count = 0;
$term = new Zend_Search_Lucene_Index_Term('lucene', 'title');
$docIds = $this->searchIndex->termDocs($term);
foreach($docIds as $key => $docId) {
$doc = $this->searchIndex->getDocument($docId);
if ($doc->type === 'article') {
array_push($hits, $doc);
$count++;
}
if ($count === $maxRes) break;
}
return $hits;
Gumstix with LCD Pt. 1

Managed to make the LCD displaying the bootscreen and a login prompt. The framebuffer console is also working with an external usb keyboard. But no luck so far with TinyX, which segfaults upon startup and complains about the lack of several utilities and the matchbox window manager. After recompiling back and forth, the following post by Christ Dollar on the gumstix-users mailinglist helps to remain optimistic:
As you've noticed, the tinyx and microwin packages are broken in the(Whole Thread)
current buildroot (in fact, I don't think they've ever worked). The
gumstix buildroot is a fork from the main uclibc buildroot, which is
made to compile for lots of arches other than ARM and gumstix. So some
of the packages are just migration artifacts from the original
buildroot.
There are alternatives for getting tinyx and matchbox running for
gumstix. If you are comfortable with linux in general and have a decent
working knowledge of cross compiling then you should checkout the
Openembedded based gumstix build at gumstix.net Just follow the quick
start instructions and build the 'gumstix-X11-image' which will give you
tinyx + matchbox + cairo + gtk etc. It currently is just over 16MB in
size, so it will only fit on a verdex xl6, but work is being done to
make it lighter. I've used this procedure on a number of machines with
great success.
Kermitrc
A convenient rc-file for Kermit in order to connect to the gumstix via serial:
set line /dev/ttyUSB0
set speed 115200
set reliable
fast
set carrier-watch off
set flow-control none
set prefixing all
set file type bin
set rec pack 4096
set send pack 4096
set window 5
Start Kermit like: kermit ./.kermitrc








