Bulk imports
============

The CLI import option ``--bulk`` specifies a configuration file that
can be used to perform a batch of imports with the same or similar
options. The file is written in a simple YAML syntax and can be named
whatever you would like. It does not need to be placed in the folder
from which the OMERO commands are run.

A minimal YAML file might look like: ::

    ---
    path: "my-files.txt"

Assuming that :file:`my-files.txt` is a list of files such as ::

    fileA
    fileB
    directoryC

this is equivalent to: ::

    $ omero import -k --transfer=ln_s fileA fileB directoryC

where the files :file:`fileA` and :file:`fileB` and all the files of
:file:`directoryC` will be imported.

Bulk-only options
-----------------

Path
^^^^

The ``path`` key specifies a file from which *each individual line*
will be processed as a separate import. In the simplest case, a single
file is placed per line as above. For more complex usages, ``path``
can point to a tab-separated value (TSV) or a comma-separate value (CSV)
file where each field will be interpreted based on ``columns``.

Columns
^^^^^^^

A fairly regular requirement in importing many files is that for
each file a similar but slightly different configuration is needed.
This can be accomplished with the ``columns`` key. It specifies how
each of the separated fields of the ``path`` file should be interpreted.

For example, a :file:`bulk.yml` file specifying: ::

    ---
    path: "files.tsv"
    columns:
    - name
    - path

along with a :file:`files.tsv` of the form: ::

    import-1	fileA
    import-2	fileB

would match the two calls: ::

    $ omero import --name import-1 fileA
    $ omero import --name import-2 fileB

but in a single call. The same could be achieved with this CSV file: ::

    import-1,fileA
    import-2,fileB

Other options like ``target`` can also be added as a separate field: ::

    Dataset:name:training-set	import-1	fileA
    Dataset:name:training-set	import-2	fileB
    Dataset:name:test-set-001	import-3	fileC

by defining ``columns`` in your :file:`bulk.yml` as: ::

    columns:
    - target
    - name
    - path

which will create the named datasets if they do not exist.
See :doc:`import-target` for more information on import targets
and see below for more examples of options you can use.

Include
^^^^^^^

The ``include`` key specifies another bulk YAML file that should be
included in the current processing. For example, if there is a global
configuration file :file:`omero-imports.yml` that all users should use,
such as: ::

    ---
    checksum_algorithm: "File-Size-64"
    exclude: "clientpath"
    transfer: "ln_s"

then users can make use of this configuration by adding the following
line to their :file:`bulk.yml` file: ::

    include: /etc/omero-imports.yml

Dry_run
^^^^^^^

The ``dry_run`` key can either be set to ``true`` in which case
no import will occur, and only the potential actions will be
shown, or additionally it can be set to a file path of the form
``my_import_%s.sh`` where ``%s`` will be replaced by an number
and a file with the given name will be written out. Each of these
scripts can then be used independently.

Other options
-------------

Otherwise, all the regular options from the CLI are available for
configuration via ``--bulk``:

- ``checksum_algorithm`` for faster processing of large files
- ``continue`` for processing all files even if one errors
- ``exclude`` for skipping files that have already been imported
- ``parallel_fileset`` for concurrent imports
- ``parallel_upload`` for concurrent uploads
- ``target`` for placing imported images into specific containers
- ``transfer`` for alternative methods of shipping files to the server

See :doc:`import` for more information.
