Table of Contents
ssfind -- tool for searching semantic spaces
ssfind [-k n | -k=n] [-limit n | -limit=n | -l n | -l=n]
[-offset n | -offset=n | -o n | -o=n] [-doc | --search-documents]
[-nodoc | --no-search-documents] [-train | --search-training-docs]
[-notrain | --no-search-training-docs] [-term | --search-terms]
[-noterm | --no-search-terms] [-orig | --search-original]
[--cosine | --euclidean | --pearson | --dotproduct] file
expression
ssfind searches the semantic space in file for terms and/or documents
matching or similar to expression (composed terms or documents as
described below). ssfind outputs the identifier of the term or document,
its distance or similarity to the query and whether the identifier corresponds
to a term, training document or document. Results are output in
order of decreasing similarity (or increasing distance).
The options are as follows:
- -k n or -k=n
- Sets the maximum number of dimensions used for the search.
If this is omitted, or the specified value is larger than
the number of supported dimensions, ssfind will silently
use the maximum supported dimensionality of the space.
- -limit n or -limit=n or -l n or -l=n
-
Limits the number of returned results to the specified
value.
- -offset n or -offset=n or -o n or -o=n
-
Sets the offset into the overall result set of the returned
results.
- -doc or --search-documents
-
Enables searching of the document space (enabled by
default).
- -nodoc or --no-search-documents
-
Disables searching of the document space.
- -train or --search-training-docs
-
Enables searching of the training document space.
- -notrain or --no-search-training-docs
-
Disables searching of the training document space.
- -term or --search-terms
-
Enables searching of the term space.
- -noterm or --no-search-terms
-
Disables searching of the term space.
- -orig or --search-original
-
Enables (weighted) searching of the original training data.
This is like performing a boolean search on the training
data, combined with ranking based on spatial location. This
option disables searching of any of the spaces, even if
they are specified enabled.
- --cosine
- Selects cosine similarity measurement (default).
- --euclidean
- Selects Euclidean distance measurement.
- --pearson
- Selects Pearson product-moment correlation similarity measurement.
- --dotproduct
- Selects dot-product based distance measurement.
The query expression can consist either of a list of one or more query
terms, or one of the following primaries. Multiple query terms are merged
to form a pseudo-document query. An individual term can occur multiple
times to increase its weight.
- -doc name
- Search for items similar to the document name.
- -tdoc name
- Search for items similar to the training document name.
- -file name
- Use the file name as the query specification. The file
should be formatted with each line containing a term followed
by a space, followed by a number indicating the
weighting or number of term occurrences.
The following examples are shown as given to the shell:
ssfind -k=43 -l=10 corel.llss sun
Find the top 10 documents related to the term sun in the
corel.llss semantic space using 43 dimensions.
ssfind -nodoc -term corel.llss -doc 1000.jpg
List all the terms in corel.llss in order of decreasing
likelihood of them being related to the document 1000.jpg
ssfind -l10 -o10 -train -nodoc corel.llss sky plane
List the 10th to 20th training documents related to the
terms sky and plane in the corel.llss semantic space.
ssmake(1)
ssutil(1)
libSemanticSpace(3)
School of Electronics and Computer Science, University of Southampton
Jonathon Hare <jsh2@ecs.soton.ac.uk>
Currently mixing of primaries and terms in query expressions is unsupported.
Also, only a single term can be used when searching the original
data.
Table of Contents