Projects metadata file: projects.info
The projects.info
file is used to define the projects within
the analysis directory, and associates metadata information with
each project.
It is initially created when the make_fastqs command completes, and acts as a control file for subsequent operations including:
setup_analysis_dirs (used to populate the project directories)
publish_qc (only publishes data for projects that are listed in
projects.info
)
projects.info
is a tab delimited file with one line for each
project, with the following fields:
Field |
Data item |
---|---|
|
Name of the project |
|
List of sample names associated with the project |
|
Name(s) of the researcher(s) directly associated with the project |
|
Library or application type (e.g. |
|
Single cell platform used to prepare the samples |
|
Organism(s) that the samples in the project
came from (e.g. |
|
Name(s) of the principal investigator(s) (PIs) associated with the project |
|
Free text field for any additional comments |
The project name and sample names are normally added automatically;
the remaining fields must be populated manually by editing the
file. The following fields are compulsory for
setup_analysis_dirs
:
User
Library
Organism
PI
The following conventions are used for data item syntax:
“Null” values are represented by a full stop (i.e.
.
)Unknown values should be represented with a question mark (i.e.
?
)Where there are multiple users, PIs or organisms, they should be separated by comma characters (e.g.
Human,mouse
)Projects preceeded by a comment character (i.e.
#
) are ignored
Currently there are no canonical lists of allowed values for libraries or organism names, however:
Organism name(s) are used to look up appropriate reference data files for the QC pipeline in the configuration file, so names that are used should be consistent with the configuration;
Some combinations of single cell platform and library type are used to determine the appropriate QC protocol for the data, in which case the single cell platform and library values must be a valid combination; Single cell and spatial data below.
Note
The organism names are converted to lower case and non-alphabetic
characters are converted to underscores (_
) when looking up
names in the configuration: for example, Mouse
is converted
to mouse
, D. Melanogaster
is converted to
d_melanogaster
)
Single cell and spatial data
For single cell and spatial data values in the library type
(Library
) and single cell platform (SC_platform
) fields have
special significance.
For more information on the valid options for specific types of data see:
Note
Use .
for the single cell platform for projects that don’t have
single cell or spatial data.