Bulk imports
The CLI import option --bulk specifies a configuration file that
can be used to perform a batch of imports with the same or similar
options. The file is written in a simple YAML syntax and can be named
whatever you would like. It does not need to be placed in the folder
from which the OMERO commands are run.
A minimal YAML file might look like:
---
path: "my-files.txt"
Assuming that my-files.txt is a list of files such as
fileA
fileB
directoryC
this is equivalent to:
$ omero import -k --transfer=ln_s fileA fileB directoryC
where the files fileA and fileB and all the files of
directoryC will be imported.
Bulk-only options
Path
The path key specifies a file from which each individual line
will be processed as a separate import. In the simplest case, a single
file is placed per line as above. For more complex usages, path
can point to a tab-separated value (TSV) or a comma-separate value (CSV)
file where each field will be interpreted based on columns.
Columns
A fairly regular requirement in importing many files is that for
each file a similar but slightly different configuration is needed.
This can be accomplished with the columns key. It specifies how
each of the separated fields of the path file should be interpreted.
For example, a bulk.yml file specifying:
---
path: "files.tsv"
columns:
- name
- path
along with a files.tsv of the form:
import-1 fileA
import-2 fileB
would match the two calls:
$ omero import --name import-1 fileA
$ omero import --name import-2 fileB
but in a single call. The same could be achieved with this CSV file:
import-1,fileA
import-2,fileB
Other options like target can also be added as a separate field:
Dataset:name:training-set import-1 fileA
Dataset:name:training-set import-2 fileB
Dataset:name:test-set-001 import-3 fileC
by defining columns in your bulk.yml as:
columns:
- target
- name
- path
which will create the named datasets if they do not exist. See Import targets for more information on import targets and see below for more examples of options you can use.
Include
The include key specifies another bulk YAML file that should be
included in the current processing. For example, if there is a global
configuration file omero-imports.yml that all users should use,
such as:
---
checksum_algorithm: "File-Size-64"
exclude: "clientpath"
transfer: "ln_s"
then users can make use of this configuration by adding the following
line to their bulk.yml file:
include: /etc/omero-imports.yml
Dry_run
The dry_run key can either be set to true in which case
no import will occur, and only the potential actions will be
shown, or additionally it can be set to a file path of the form
my_import_%s.sh where %s will be replaced by an number
and a file with the given name will be written out. Each of these
scripts can then be used independently.
Other options
Otherwise, all the regular options from the CLI are available for
configuration via --bulk:
checksum_algorithmfor faster processing of large filescontinuefor processing all files even if one errorsexcludefor skipping files that have already been importedparallel_filesetfor concurrent importsparallel_uploadfor concurrent uploadstargetfor placing imported images into specific containerstransferfor alternative methods of shipping files to the server
See Import images for more information.