Starting an analysis using ``auto_process setup``
=================================================

The ``setup`` command is used to start a new analysis: it
initialises a new analysis directory for processing a
sequencing run. Most subsequent ``auto_process`` commands
(for example ``make_fastqs``) are normally issued from within
the analysis directory.

The simplest invocation of the command is:

::

   auto_process.py setup DATA_DIR

where ``DATA_DIR`` specifies the location of the data source
(i.e. the top-level directory containing the output from the
sequencer run which is to be processed). For example:

::

   auto_process.py setup /mnt/data/seqruns/180817_M00123_0001_000000000-BV1X2

``DATA_DIR`` can be a local or a remote directory; see
:ref:`setup_remote_data_dir`.

By default the name of the new analysis directory will consist
of the basename of ``DATA_DIR`` with the suffix ``_analysis``
appended, for example the command above will produce an analysis
directory called

::

   180817_M00123_0001_000000000-BV1X2_analysis

To create an analysis directory with a different name, specify
it using the ``--analysis-dir`` option:

::

   auto_process.py setup DATA_DIR --analysis-dir ANALYSIS_DIR

See :doc:`Analysis Directories <../output/analysis_dirs>` for
details of the analysis directory structure.

The ``setup`` command will also report the expected outputs
based on the sample sheet associated with the sequencing run,
for example:

::

   Predicted projects:
   ===================
   - JohnBleakley
   - LauraBridges
   - MarcusDreng
   - StevenYound

   JohnBleakley (24 samples)
   -------------------------
   JB1	S11	TGCGGCGT-TACCGAGG	L3,4
   JB2	S12	CATAATAC-CGTTAGAA	L3,4
   JB3	S13	GATCTATC-AGCCTCAT	L3,4
   JB4	S14	AGCTCGCT-GATTCTGC	L3,4
   JB5	S15	CGGAACTG-TCGTAGTG	L3,4
   JB6	S16	TAAGGTCA-CTACGACA	L3,4
   ...

It will also flag up any potential issues (for example if
two project names are very similar then this might indicate
a typo in a project name in the sample sheet).

.. note::

   The output prediction can also be generated using
   ``auto_process samplesheet`` command.

.. _setup_specifying_sample_sheet:

********************************
Specifying the sample sheet file
********************************

The ``setup`` command will try to locate the sample sheet
within the source data and will use this by default.

However if the sequencing run doesn't include a sample
sheet file (for example, NextSeq runs), or if you want to
use an alternative sample sheet, then the
``-s``/``--sample-sheet`` option can be used to explicitly
specify the sample sheet file.

For example:

::

   auto_process.py setup \
      --sample-sheet /mnt/data/samplesheets/SampleSheet_180817.csv \
      /mnt/data/seqruns/180817_M00123_0001_000000000-BV1X2

The sample sheet can also be on a remote system, for example:

::

   auto_process.py setup \
      --sample-sheet pjb@kellerman.man.ac.uk:samplesheets/SampleSheet_180817.csv \
      /mnt/data/seqruns/180817_M00123_0001_000000000-BV1X2

or it can be a URL:

::

   auto_process.py setup \
      --sample-sheet https://example.com/samplesheets/SampleSheet_180817.csv \
      /mnt/data/seqruns/180817_M00123_0001_000000000-BV1X2

.. _setup_remote_data_dir:

********************************************
Specifying a remote sequencing run directory
********************************************

For data on a remote system which is accessible via ``ssh``,
the ``DATA_DIR`` can be specified using the general syntax

::

   [[USER@]HOST:]DATA_DIR

For example:

::

   pjb@kellerman.man.ac.uk:/mnt/data/seqruns/180817_M00123_0001_000000000-BV1X2

(The ``--sample-sheet`` option accepts the same syntax.)

.. note::

   It is recommended that either passwordless ``ssh`` access
   is configured, or that ``ssh-agent`` is used for the
   current session, to suppress multiple password prompts
   each time the remote system is accessed.

.. _setup_specifying_facility_run_number:

**********************************
Specifying the facility run number
**********************************

The facility run number can be explicitly specified using the
``-r``/``--run-number`` option of the ``setup`` command.

.. _setup_specifying_analysis_run_number:

******************************
Specifying the analysis number
******************************

An arbitrary number can be assigned to the analysis using the
``-n``/``--analysis-number`` option of the ``setup`` command.

.. note::

   If an analysis number is assigned at setup then it will be
   appended to the analysis directory name, unless this is
   overridden by the ``--analysis-dir`` option.

.. _setup_specifying_additional_files:

***************************
Specifying additional files
***************************

If additional files are required for processing or downstream
analysis (e.g. well list files) then the ``-f``/``--file``
option of the ``setup`` command can be used to specify one or
more additional files which will be copied into the analysis
directory.

For example:

::

   auto_process.py setup \
      --file WTA_probe_allocation.xlsx \
      /mnt/data/seqruns/180817_M00123_0001_000000000-BV1X2

Files can be either be local or on a remote system, or can be
specified as URLs. Multiple ``--file`` options can be specified
to import more than one file.

.. _setup_import_fastqs:

************************************
Setup from existing bcl2fastq output
************************************

A new analysis directory can be created from an existing
``bcl2fastq`` output directory using the ``--fastq-dir``
option, which should be used to specify the subdirectory
of the ``DATA_DIR`` which contains the output Fastq files.

For example:

::

   auto_process.py setup \
      --fastq-dir bcl2fastq2 \
      /mnt/data/seqruns/180817_M00123_0001_000000000-BV1X2

where ``bcl2fastq2`` is the output directory from the
BCL-to-Fastq conversion software, within the run data
directory ``180817_M00123_0001_000000000-BV1X2``.