FS configuration options
Background
Users import their image files to the OMERO.fs server. The contents of
these files are kept intact by the server and the import process
preserves the files’ path and name (at least within the rules of
omero.fs.repo.path_rules
below), so that OMERO.fs can become
a trusted repository for the master copy of users’ data. While the
default server configuration from Configuration properties glossary should typically suffice,
omero config set may be used to adjust settings related to file
uploads. These settings are explained below.
Repository location
Several properties determine where FS-imported files are stored:
omero.data.dir
- singleton property (i.e. once globally) which points to the legacy repository location for OMERO. For OMERO to run on multiple systems, the contents of this directory must be on a shared volume.omero.managed.dir
- singleton property which points to the defaultManagedRepository
. In an OMERO install in which there is only one Blitz server, this will be the only repository. This need not be located underomero.data.dir
but is by default.omero.repo.dir
(experimental) - value passed to all non-legacy, standalone repositories. This is not actively used, but would allow hosting repositories on multiple physical systems without the need for a shared volume. For example, after runningomero admin start
on the main machine, it would be possible to launch nodes on various machines viaomero node start fs-B
,omero node start fs-C
, etc. Each of these would pass a differentomero.repo.dir
value to its process.
Template path
When files are uploaded to the managed repository, a parent directory is
created to receive the upload. A multi-file image has all its files
stored in the same parent directory, though they may be in different
subdirectories of that parent to mirror the original directory
structure before upload. The omero.fs.repo.path
setting
defines the creation of that parent directory. It is this value which
makes the ManagedRepository
“managed”.
Path naming constraints
There is some flexibility in how this parent directory is named. The constraints are:
The path components (individual directories in the path) must be separated by
/
characters.A path component separator may be written as
//
only if followed by at least one more path component. In this case:The server ensures that the path components preceding the
//
are owned by theroot
user.Any newly created path components following the
//
are owned by the user who owns the images.
If no
//
is present then all newly created path components are owned by the user who owns the images.The path must be unique for each import. It is for this reason that the
%time%
term expands to a time with millisecond resolution.To avoid confusion with the expansion terms enumerated below, avoid other uses of the
%
character in path components.
In the above, ownership of path components is in the context of OMERO users accessing the OMERO managed repository through its API. It does not relate to operating system users’ permissions for the underlying filesystem.
Expansion terms
Special terms may be used within path components: these are replaced with text that depends on the import.
For any directory in the template path
%userId%
expands to the user’s numerical ID
%user%
expands to the user’s name
%institution%
expands to the user’s institution name; this path component is wholly omitted if the user has no institution set
%institution:default%
expands to the user’s institution name, or to the supplied “default” if the user has no institution set; for instance,
%institution:State College of Florida, Manatee-Sarasota%
is permitted%groupId%
expands to the OMERO group’s numerical ID
%group%
expands to the OMERO group’s name
%perms%
expands to the group’s six-character permissions string, for example
rw----
for a private group%year%
expands to the current year number, for example
2014
%month%
expands to the current month number, zero-padded, for example
08
%monthname%
expands to the current month name, for example
August
%day%
expands to the current day number in the month, zero-padded, for example
04
%sessionId%
expands to the session’s numerical ID
%session%
expands to the session key (UUID) of the session, for example
6c2dae43-cfad-48ce-af6f-025569f9e6df
%thread%
expands to the name of the server thread that is performing the import
For user-owned directories only
These expansion terms may not precede //
in the template
path.
%time%
expands to the current time, in hours, minutes, seconds, milliseconds, for example
13-49-07.727
%hash%
expands to an eight-digit hexadecimal hash code that is constant for the set of files being imported, for example
0554E3A1
%hash:digits%
expands as
%hash%
, wheredigits
is a comma-separated list of how many digits of the hash to use in different subdirectories; for example,hash-%hash:3,3,2%
expands to a form likehash-123/456/78
%increment%
expands to an integer that increases consecutively so as to create the next new directory, for example using
inc-%increment%
with preexisting directories up toinc-24
would expand toinc-25
%increment:digits%
expands as
%increment%
wheredigits
specifies a minimum length to which to zero-pad the integer, for example usinginc-%increment:3%
with preexisting directories up toinc-024
would expand toinc-025
%subdirs%
expands to nothing until the preceding directory has more than one thousand entries, in which case it expands to an integer that increases consecutively to similarly limit the entry count in subdirectories; applies recursively to extend the number of path components as needed, so, using
example/below-%subdirs%
in the path, withexample/below-000
toexample/below-999
all “full”, three-digit subdirectories below those are created, such asexample/below-123/456
%subdirs:digits%
expands as
%subdirs%
wheredigits
specifies to how many digits%subdirs%
may expand for each path component: for example,example/%subdirs:4%-below
allows ten thousand directory entries inexample
before creatingexample/1234-below
and, much later,example/1234-below/5678
No more than one of %time%
, %subdirs%
or
%increment%
may be used in any one path component, although
they may each be used many times in the whole path. If
%subdirs%
expands to nothing then its entire path component
is omitted: no other expansion terms in that component are used.
Legal file names
Although OMERO.fs attempts to preserve file naming, the server’s operating system or file system is likely to somehow constrain what file names may be stored by OMERO.fs. This is of particular concern when a user may upload from a more permissive system to a server on a less permissive system, or when it is anticipated that the server itself may be migrated to a less permissive system. The server never accepts Unicode control characters in file names.
The omero.fs.repo.path_rules
setting defines the combination
of restrictions that the server must apply in accepting file uploads.
The restrictions are grouped into named sets:
Windows required
prohibits names with the characters
"
,*
,/
,:
,<
,>
,?
,\
,|
, names beginning with$
, the namesAUX
,CLOCK$
,CON
,NUL
,PRN
,COM1
toCOM9
,LPT1
toLPT9
, and anything beginning with one of those names followed by.
Windows optional
prohibits names ending with
.
or a spaceUNIX required
prohibits names with the character
/
UNIX optional
prohibits names beginning with
.
or-
These rules are applied to each separate path component of the file
name on the client’s system. So, for instance, an upload of a file
/tmp/myfile.tif
from a Linux system would satisfy the
UNIX required
restrictions because neither of the path
components tmp
and myfile.tif
contains a
/
character.
Applying the “optional” restrictions does not assist OMERO.fs at all;
those restrictions are designed to ease manual maintenance of the
directory specified by the omero.managed.dir
setting, being
where the server stores users’ uploaded files.
Checksum algorithm
As the client uploads each file to the server, it calculates a checksum for the file. After the upload is complete the client reports that checksum to the server. The server then calculates the checksum for the corresponding file from its local filesystem and checks that it matches what the client reported. File integrity is thus assured because corruption during transmission or writing would be revealed by a checksum mismatch.
There are various algorithms by which checksums may be calculated. The list of
available algorithms is given by omero.checksum.supported
. To
calculate comparable checksums the client and server use the same
algorithm. The server API permits clients to specify the algorithm,
but it is expected that they will typically accept the server default.
The number that suffixes each of the checksum algorithm names
specifies the bit width of the resulting checksum. A larger bit width
makes it less likely that different files will have the same checksum
by coincidence, but lengthens the checksum hex strings that are
reported to the user and stored in the hash
column of the
originalfile
table in the database.