Table of Contents

Name

ssfind -- tool for searching semantic spaces

Synopsis

ssfind [-k n | -k=n] [-limit n | -limit=n | -l n | -l=n] [-offset n | -offset=n | -o n | -o=n] [-doc | --search-documents] [-nodoc | --no-search-documents] [-train | --search-training-docs] [-notrain | --no-search-training-docs] [-term | --search-terms] [-noterm | --no-search-terms] [-orig | --search-original] [--cosine | --euclidean | --pearson | --dotproduct] file expression

Description

ssfind searches the semantic space in file for terms and/or documents matching or similar to expression (composed terms or documents as described below). ssfind outputs the identifier of the term or document, its distance or similarity to the query and whether the identifier corresponds to a term, training document or document. Results are output in order of decreasing similarity (or increasing distance).

The options are as follows:

-k n or -k=n
Sets the maximum number of dimensions used for the search. If this is omitted, or the specified value is larger than the number of supported dimensions, ssfind will silently use the maximum supported dimensionality of the space.
-limit n or -limit=n or -l n or -l=n
Limits the number of returned results to the specified value.
-offset n or -offset=n or -o n or -o=n
Sets the offset into the overall result set of the returned results.
-doc or --search-documents
Enables searching of the document space (enabled by default).
-nodoc or --no-search-documents
Disables searching of the document space.
-train or --search-training-docs
Enables searching of the training document space.
-notrain or --no-search-training-docs
Disables searching of the training document space.
-term or --search-terms
Enables searching of the term space.
-noterm or --no-search-terms
Disables searching of the term space.
-orig or --search-original
Enables (weighted) searching of the original training data. This is like performing a boolean search on the training data, combined with ranking based on spatial location. This option disables searching of any of the spaces, even if they are specified enabled.
--cosine
Selects cosine similarity measurement (default).
--euclidean
Selects Euclidean distance measurement.
--pearson
Selects Pearson product-moment correlation similarity measurement.
--dotproduct
Selects dot-product based distance measurement.

Query Expression

The query expression can consist either of a list of one or more query terms, or one of the following primaries. Multiple query terms are merged to form a pseudo-document query. An individual term can occur multiple times to increase its weight.

-doc name
Search for items similar to the document name.
-tdoc name
Search for items similar to the training document name.
-file name
Use the file name as the query specification. The file should be formatted with each line containing a term followed by a space, followed by a number indicating the weighting or number of term occurrences.

Examples

The following examples are shown as given to the shell:

ssfind -k=43 -l=10 corel.llss sun
Find the top 10 documents related to the term sun in the corel.llss semantic space using 43 dimensions.

ssfind -nodoc -term corel.llss -doc 1000.jpg
List all the terms in corel.llss in order of decreasing likelihood of them being related to the document 1000.jpg

ssfind -l10 -o10 -train -nodoc corel.llss sky plane
List the 10th to 20th training documents related to the terms sky and plane in the corel.llss semantic space.

See Also

ssmake(1) ssutil(1) libSemanticSpace(3)

Copyright

School of Electronics and Computer Science, University of Southampton

Author

Jonathon Hare <jsh2@ecs.soton.ac.uk>

Bugs

Currently mixing of primaries and terms in query expressions is unsupported. Also, only a single term can be used when searching the original data.


Table of Contents