OMERO search
OMERO.server uses Lucene to index all string and
timestamp information in the database, as well as all OriginalFile
which
can be parsed to simple text (see File parsers for
more information). The index is stored under /OMERO/FullText
or the
FullText
subdirectory of your omero.data.dir
, and can be
searched with Google-like queries.
Once an entity is indexed, it is possible to start writing querying
against the server via IQuery.findAllByFullText()
. Use
new Parameters(new Filter().owner())
and .group()
to restrict
your search. Or alternatively use the ome.api.Search
interface
(below).
See also
- Search and indexing configuration
Section of the sysadmin documentation describing the configuration of the search and indexing for the server.
Field names
Each row in the database becomes a single Lucene Document
parsed
into the several Fields
. A field is referenced by prefixing a search
term with the field name followed by a colon. For example,
name:myImage searches for myImage anywhere in the name field.
Field |
Comments |
---|---|
Any unprefixed field searches the combination of all fields together i.e. a search for cell AND name:myImage gets translated to combined_fields:cell AND name:myImage. |
|
<field name> |
Each string, timestamp, or |
details.owner.omeName |
Login name of the owner of the object |
details.owner.firstName |
First name of the owner of the object |
details.owner.lastName |
Last name of the owner of the object |
details.group.name |
Group name of the owning group of the object |
details.creationEvent.id |
Id of the Event of this objects creation |
details.creationEvent.time |
When that Event took place |
details.updateEvent.id |
Id of the Event of this objects last modification |
details.updateEvent.time |
When that Event took place |
details.permissions |
Permissions in the form rwrwrw or rw- |
tag |
Contents from a |
annotation |
Contents from annotations, including |
annotation.ns |
Namespace (if present) for any annotations on an object |
annotation.type |
Short type name, e.g. |
channel.name |
Name of the |
channel.fluor |
Fluor value of the |
channel.mode |
Mode of the |
channel.photometricInterpretation |
Name of the |
file.name |
For |
file.format |
For |
file.path |
For |
file.sha1 |
For |
file.contents |
For |
fileset.entry.name |
Name of an imported file. |
fileset.entry.clientPath |
Original, client-side path of an imported file. |
fileset.templatePrefix |
Location of the import in the managed repository. |
${NAME} |
For |
has_key |
As |
Internal |
|
combined_fields |
The default field prefix. |
_hibernate_class |
Used by Hibernate Search to record the entity type. The class value, e.g. ome.model.core.Image is also entered in combined_fields. Unimportant for the casual users. |
id |
The primary key of the entity. Unimportant for the casual user |
Queries
Search queries are very similar to Google searches. When search terms are entered without a prefix (“name:”), then the default field will be used which combines all available fields. Otherwise, a prefix can be added to restrict the search. The search term is first split into “tokens” and these are combined into a search query. The tokenizing happens on all non-alpha-numerical characters, such as space, underscore, hyphen etc. The query is built by combining the tokens with an “OR” operator (see examples in the “Indexing” paragraph). The search terms or the tokens created from them as above must precisely match the indexed entries. This means for example that a search term tes will not match the indexed entry test and the search will accordingly give no result.
Indexing
Successful searching depends on understanding how the text is indexed. The default analyzer used is the FullTextAnalyzer.
1. Desktop/image_GFP-H2B_1.dv ---> "desktop", "image", "gfp", "h2b", "1", "dv"
2. Desktop/image_GFP-H2B_2.dv ---> "desktop", "image", "gfp", "h2b", "2", "dv
3. Desktop/image_GFP_01-H2B.dv ---> "desktop", "image", "gfp", "01", "h2b", "dv"
4. Desktop/image_GFP-CSFV_a.dv ---> "desktop", "image", "gfp", "csfv", "a", "dv"
Assuming these entries above for Image.name:
searching for GFP-H2B returns 1, 2, 3 and 4, because of the tokenizing on the hyphen and joining of the tokens by an OR.
searching for “GFP H2B” or “GFP-H2B” only returns 1 and 2, since the quotes enforce the exact sequence of the tokens and the query is built with an AND.
searching for GFP H2B returns 1, 2, 3 and 4, since the two tokens are joined by an OR.
With the same entries as above and adding a wildcard:
searching for *FP returns 1, 2, 3 and 4. As this example shows, leading wildcards in the Graphical User Interface are allowed, but must be explicitly enabled when using the API directly, see below in the developers section.
searching for GF* returns 1, 2, 3 and 4.
searching for GFP-* returns no results, but GFP.* returns 1, 2, 3 and 4. Only hyphen and underscore do not return results in this situation, the other non-alpha-numerical characters do.
Wildcards and quotations:
Wildcard inside quotations is not parsed as a wilcard, but as a non-alpha-numerical character on which the tokenizing happens.
searching for “*FP-H2B” returns no results, since it is the same as searching for “FP-H2B”.
searching for “GF*” returns no results, since it is the same as searching for “GF”.
searching for “GFP-*” returns 1, 2, 3 and 4, since it is the same as searching for “GFP-”.
searching for “GFP*H2B” returns 1 and 2, since it is the same as searching for “GFP H2B”.
Information for developers
ome.api.IQuery
The current IQuery implementation restricts searches to a single class at a time.
findAllByFullText(Image.class, "metaphase")
– Images which contain or are annotated with “metaphase”findAllByFullText(Image.class, "annotation:metaphase")
– Images which are annotated with “metaphase”findAllByFullText(Image.class, "tag:metaphase")
– Images which are tagged with “metaphase” (specialization of the previous)findAllByFullText(Image.class, "file.contents:metaphase")
– Images which have files attached containing “metaphase”findAllByFullText(OriginalFile.class, "file.contents:metaphase")
– File containing “metaphase”
ome.api.Search
The Search API offers a number of different queries along with various filters and settings which are all maintained on the server.
The matrix below show which combinations of parameters and queries are supported (S), will throw an exception (X), and which will simply silently be ignored (I).
Query Method –> |
byGroupForTags/byTagsForGroup |
byFullText/SomeMustNone |
byAnnotatedWith |
---|---|---|---|
Parameters |
|||
annotated between |
S |
S |
S |
annotated by |
S |
S |
S |
annotated by |
S |
I |
I |
created between |
S |
I |
I |
modified between |
S |
I (Immutable) |
S |
owned by |
S |
S |
S |
all types |
X |
I |
X |
1 type |
S |
I |
S |
N types |
X |
I |
X |
only ids |
S |
I |
S |
Ordering / Fetches |
|||
orderBy |
S |
I |
S |
fetchAnnotations |
I |
||
Other |
|||
setProjections 3 |
X |
X |
X |
current*Metdata 4 |
X |
X |
X |
Footnotes
Leading wildcard searches
Leading wildcard searches are disallowed by default. “?omething” or “*hatever”, for example, would both throw exceptions. They can be run by using:
Search search = serviceFactory.createSearchService();
search.setAllowLeadingWildcards(true);
There is a performance penalty, however. In addition, wildcard searches get expanded on the server to boolean queries. For example, assuming “ACELL”, “BCELL”, and “CCELL” are all terms in your index, then the query:
*CELL
gets expanded to:
ACELL OR BCELL OR CCELL
If there are too many terms in the expansion then an exception will be thrown. This requires the user to enter a more refined search, but not because there are too many results, only because there is not enough room in memory to search on all terms at once.
Extension points
Two extension points are currently available for searching. The first are the File parsers mentioned above. By configuring the map of Formats (roughly mime-types) of files to parser instances, extracting information from attached binary files can be made quick and straightforward.
Similarly, Search bridges provide a mechanism for parsing all metadata entering the system. One built in bridge (the FullTextBridge) parses out the fields mentioned above, but by creating your own bridge it is possible to extract more information specific to your site.
See also
Working with annotations, Search bridges, File parsers, Query Parser Syntax,
- Luke
a Java application which you can download and point at your
/OMERO/FullText
directory to get a better feeling for Lucene queries.