Moving data to the archive location using `auto_process archive`

Once the initial processing and QC have been completed and approved, the data can be copied to its final location using the archive command.

Note

In this context the “archive” location is where the data should be stored and accessed by bioinformaticians for subsequent analyses.

Archiving is a two stage process:

The default command:
```
auto_process.py archive
```
copies the final data (a subset of the data in the working directory) to a “staging” directory in the archive location. Multiple invocations synchronise the staging directory with the current state of the working directory.
The command:
```
auto_process.py archive --final
```
updates the staging area and copies the data to the final location; it also performs actions such as setting the group and permissions on the final data.

Once --final has been used subsequent archive commands cannot be used.

The staging directory name is the name of the analysis directory with a double underscore prepended and with the suffix .pending; for example:

__180817_M00123_0001_000000000-BV1X2_analysis.pending

The final directory has the same name as the analysis directory.

By default archive inserts two additional directory levels to the final destination, to create a YEAR/PLATFORM hierarchary. For example, if the archive location was /mnt/archive/ then the full path to the staging directory would look like

/mnt/archive/2018/miseq/__180817_M00123_0001_000000000-BV1X2_analysis.pending

and the final location would be

/mnt/archive/2018/miseq/180817_M00123_0001_000000000-BV1X2_analysis

Logging archived runs

If a run logging file is defined either within the config file, for example:

[archive]
...
logging_file = /data/archive/runs/SEQ_DATA.log

or using the --logging_file option of the archive command, then an entry for the run will be added to this file automatically when it is archived to its final location.

The logging is performed by the `log_seq_data.sh <https://genomics-bcftbx.readthedocs.io/en/latest/usage/general_utils.html#logging-details-of-sequencing-runs`_ utility script, which is part of the genomics-bcftbx package.

Archiving failed runs

By default it is not possible to archive an analysis directory which doesn’t have any project directories.

However in some cases it might be desirable to archive an incomplete analysis directory, for example if the original run had failed.

In this case the --force option of the archive command can be used to force archiving of the analysis directory, provided that a bcl2fastq output subdirectory (or similar) also exists.

Moving data to the archive location using auto_process archive

Logging archived runs

Archiving failed runs

Moving data to the archive location using `auto_process archive`