Bulk imports
The CLI import option --bulk
specifies a configuration file that
can be used to perform a batch of imports with the same or similar
options. The file is written in a simple YAML syntax and can be named
whatever you would like. It does not need to be placed in the folder
from which the OMERO commands are run.
A minimal YAML file might look like:
---
path: "my-files.txt"
Assuming that my-files.txt
is a list of files such as
fileA
fileB
directoryC
this is equivalent to:
$ omero import -k --transfer=ln_s fileA fileB directoryC
where the files fileA
and fileB
and all the files of
directoryC
will be imported.
Bulk-only options
Path
The path
key specifies a file from which each individual line
will be processed as a separate import. In the simplest case, a single
file is placed per line as above. For more complex usages, path
can point to a tab-separated value (TSV) or a comma-separate value (CSV)
file where each field will be interpreted based on columns
.
Columns
A fairly regular requirement in importing many files is that for
each file a similar but slightly different configuration is needed.
This can be accomplished with the columns
key. It specifies how
each of the separated fields of the path
file should be interpreted.
For example, a bulk.yml
file specifying:
---
path: "files.tsv"
columns:
- name
- path
along with a files.tsv
of the form:
import-1 fileA
import-2 fileB
would match the two calls:
$ omero import --name import-1 fileA
$ omero import --name import-2 fileB
but in a single call. The same could be achieved with this CSV file:
import-1,fileA
import-2,fileB
Other options like target
can also be added as a separate field:
Dataset:name:training-set import-1 fileA
Dataset:name:training-set import-2 fileB
Dataset:name:test-set-001 import-3 fileC
by defining columns
in your bulk.yml
as:
columns:
- target
- name
- path
which will create the named datasets if they do not exist. See Import targets for more information on import targets and see below for more examples of options you can use.
Include
The include
key specifies another bulk YAML file that should be
included in the current processing. For example, if there is a global
configuration file omero-imports.yml
that all users should use,
such as:
---
checksum_algorithm: "File-Size-64"
exclude: "clientpath"
transfer: "ln_s"
then users can make use of this configuration by adding the following
line to their bulk.yml
file:
include: /etc/omero-imports.yml
Dry_run
The dry_run
key can either be set to true
in which case
no import will occur, and only the potential actions will be
shown, or additionally it can be set to a file path of the form
my_import_%s.sh
where %s
will be replaced by an number
and a file with the given name will be written out. Each of these
scripts can then be used independently.
Other options
Otherwise, all the regular options from the CLI are available for
configuration via --bulk
:
checksum_algorithm
for faster processing of large filescontinue
for processing all files even if one errorsexclude
for skipping files that have already been importedparallel_fileset
for concurrent importsparallel_upload
for concurrent uploadstarget
for placing imported images into specific containerstransfer
for alternative methods of shipping files to the server
See Import images for more information.