Analysis and Project Directories ================================ ******************** Analysis directories ******************** The top-level analysis directory is created by the :doc:`setup command <../using/setup>`, and typically will have a name based on the sequencing run name with the suffix ``_analysis``, for example: :: 180817_M00123_0001_000000000-BV1X2_analysis The analysis directory will contain the following files and directories produced by the ``auto_process`` commands: .. table:: :widths: auto ========================== ================================== ============ **File or Directory** **Description and contents** **Stage** -------------------------- ---------------------------------- ------------ auto_process.info Parameter file used for processing :doc:`setup <../using/setup>` metadata.info Metadata for the run :doc:`setup <../using/setup>` logs/ Log files from each processing :doc:`setup <../using/setup>` command ScriptCode/ Directory for custom user scripts :doc:`setup <../using/setup>` SampleSheet.orig.csv Copy of the original sample sheet :doc:`setup <../using/setup>` file custom_SampleSheet.csv Updated version of sample sheet :doc:`setup <../using/setup>` used for processing primary_data/ Raw sequencing data :doc:`make_fastqs <../using/make_fastqs>` processing_qc.html Processing QC report :doc:`make_fastqs <../using/make_fastqs>` per_lane_stats.info Per-lane statistics :doc:`make_fastqs <../using/make_fastqs>` per_lane_sample_stats.info Per-lane statistics for samples :doc:`make_fastqs <../using/make_fastqs>` statistics_full.info Per-Fastq statistics :doc:`make_fastqs <../using/make_fastqs>` statistics.info Per-Fastq statistics :doc:`make_fastqs <../using/make_fastqs>` projects.info Metadata for all projects :doc:`make_fastqs <../using/make_fastqs>` / Output from ``bcl2fastq`` :doc:`make_fastqs <../using/make_fastqs>` (can be set explicitly using the ``--output-dir`` option) / Project directory (one for each :doc:`setup_analysis_dirs <../using/setup_analysis_dirs>` project defined in ``projects.info``) undetermined/ Project directory for undetermined :doc:`setup_analysis_dirs <../using/setup_analysis_dirs>` reads README.txt Text file with user notes on the :ref:`commands_readme` run (e.g. unusual processing steps) ========================== ================================== ============ --------------------------- Analysis directory metadata --------------------------- Each analysis has additional data items associated with it which are stored in the ``metadata.info`` file. The most commonly used metadata items are listed in the table below: .. table:: :widths: auto ====================== ======================================== **Item** **Description** ---------------------- ---------------------------------------- ``run_number`` Facility-assigned identifier which can differ from the instrument run number ``source`` Source of the sequencing data, for example the name of the facility, institution or service that provided it ``platform`` The sequencing platform (e.g. ``miseq``) ``bcl2fastq_software`` Location and version of the package used to perform the Fastq generation ====================== ======================================== The full set of metadata items and values for can be viewed using the ``metadata`` command: :: auto_process.py metadata [ANALYSIS_DIR] This metadata is not required for processing, but should be set before the QC is published and the analysis is completed. Items can be set or updated using the ``--set`` option of the ``metadata`` command, for example: :: auto_process.py metadata --set run_number=88 ---------------- Run reference ID ---------------- Each analysis directory has a "run reference ID" which is generated automatically from the associated metadata (specifically the platform, run datestamp, instrument run number and facility run number). The general form of the reference ID is: :: PLATFORM_DATESTAMP[/INSTRUMENT_RUN_NUMBER]#FACILITY_RUN_NUMBER The instrument run number is included if it differs from the facility run number (or if the facility run number is not supplied). For example: :: HISEQ4000_181029/88#72 is a run from a HiSeq 4000 instrument with datestamp ``181029`` and instrument run number ``88``; the facility assigned run ID ``72`` to the run as its local identifier. :: MISEQ_180912#3 is a run a MiSeq instrument with datestamp ``180912``, where both the instrument and facility run numbers are ``3``. ******************* Project directories ******************* Project directories are created within the analysis directory by the :doc:`setup command <../using/setup>` command, based on the contents of the ``projects.info`` file. Each project directory will contain the following files and directories: ========================== ====================================== **File or Directory** **Description and contents** -------------------------- -------------------------------------- README.info Project metadata fastqs/ Fastq files ScriptCode/ Directory for custom user scripts qc/ QC pipeline outputs qc_report.html :doc:`QC report ` qc_report.PROJECT.RUN.zip ZIP file containing all QC outputs and reports multiqc_report.html ``multiqc`` outputs multiqc_report_data/ Data associated with ``multiqc`` cellranger_count/ Full outputs from ``cellranger count`` single library analyses (10xGenomics projects only) ========================== ====================================== -------------------------- Project directory metadata -------------------------- Each analysis project has additional data items associated with it which are stored in project's ``README.info`` file. The most commonly used metadata items are listed in the table below: .. table:: :widths: auto ======================== ========================================= **Item** **Description** ------------------------ ----------------------------------------- ``Name`` Project name ``Run`` Parent run name ``Platform`` Sequencing platform (e.g. ``novaseq600``) ``Sequencer model`` Model of sequencer used (e.g. ``NovaSeq 6000``) ``User`` Name of the user(s) ``PI`` Name of PI(s) ``Organism`` Organism name(s) ``Library type`` The type of experiment (e.g. ``RNA-seq``) ``Single cell platform`` Single cell platform, if applicable ``Number of cells`` Number of cells (single cell only) ``Paired_end`` Whether the data are single- or paired- end ``Primary fastqs`` Subdirectory holding the 'primary' set of Fastq files for the project ``Samples`` Number and list of sample names for physical samples ``Biological samples`` List of subset of physical samples with biological data (if not set then all samples assumed to contain biological information) ``Multiplexed samples`` List of sample names for multiplexed for 10x Genomics CellPlex and Flex data and Parse Evercode data (if not set then there are no multiplexed samples; if set to ``?`` then there are multiplexed samples but the names are not known) ``Comments`` Any additional comments about the project ======================== ========================================= Typically most of the values are populated at setup time from the contents of the ``projects.info`` file (see :doc:`Setting up analysis directories <../using/setup_analysis_dirs>`), with the others being set automatically (for example after running single cell analyses). ----------------------------------- Multiple Fastq sets within projects ----------------------------------- Normally each project will only have one set of Fastq files associated with it, and these will be in the ``fastqs`` subdirectory of the project directory. However some analyses may have more than one sets of associated Fastqs, and in these cases there will be multiple subdirectories (each of which contains one of these sets). The project metadata file includes the item ``Primary fastqs`` which indicates which of the Fastq sets is the principal one. ********************************** ``undetermined`` project directory ********************************** This is a special project that is created for storage and QC of the reads which couldn't be assigned to any samples by the Fastq generation.