For assistance with GEO submissions, please send an email to [email protected].
- What information do I need to collect for a submission of High Throughput Sequencing Data?
- What information do I need to collect for a submission of Microarray Data?
- I’ve collected, to the best of my ability at this point, all information needed for my submission. What now?
High Throughput Sequencing Data
Before considering having the Bioinformatics Core submit your sequence data to GEO for you, please read through the instructions at http://www.ncbi.nlm.nih.gov/geo/info/seq.html. If after doing so you feel it is best for us to handle the submission, please read thoroughly the instructions below. Time line: Please note that once we submit the data, the length of time it takes to receive the GEO number for your study is out of our control. Given our experience, it generally takes up to TEN business days to receive this number from GEO. In order to avoid delays in your publication procedure, we recommend you start the GEO submission FOUR weeks ahead of time.
What we need from you: For all HTS submissions, we need three components:
-
raw data files
- Files generated by sequencing instrument, containing reads and quality scores
- e.g., BAM, FASTQ, CSFASTQ/QUAL, SFF, etc. NOTE: SRA prefers BAM over FASTQ whenever possible. See https://www.ncbi.nlm.nih.gov/sra/docs/submitformats/ for details.
-
processed/normalized data files
- These are file(s) you used to draw conclusions from your study. The person who analyzed your data will have this information.
- e.g., normalized abundance measurments for expression profiling data, tag density files or peak files with quantitative data for ChIP-Seq data, etc.
-
a metadata worksheet containing descriptive information and protocols for experiments and samples, and information on your publication (title, authors, abstract)
- Gather all known sample information, including but not limited to:
- source
- organism
- cell type
- molecule
- strain
- description
- any clinical characteristics
- Gather protocol/labeling information from the facility that sequenced your samples (if run at MSK, that would be IGO)
- Gather data processing information from the person who analyzed your data (most likely a member of the Bioinformatics Core)
- Download the metadata template
- https://www.ncbi.nlm.nih.gov/geo/info/examples/seq_template.xlsx
- Note: this template is updated frequently by NCBI. If the link above is broken, please find the correct link at http://www.ncbi.nlm.nih.gov/geo/info/seq.html
- Gather all known sample information, including but not limited to:
Microarray Data
Before considering having the Bioinformatics Core submit your microarray data to GEO for you, please read through the instructions at https://www.ncbi.nlm.nih.gov/geo/info/submission.html. If after doing so you feel it is best for us to handle the submission, please read thoroughly the instructions below. Time line: Please note that once we submit the data, the length of time it takes to receive the GEO number for your study is out of our control. Given our experience, it generally takes up to TEN business days to receive this number from GEO. In order to avoid delays in your publication procedure, we recommend you start the GEO submission FOUR weeks ahead of time.
Note on non-commercial arrays: If your data is not from commercial arrays, it can be quite involved to gather all of the information needed for successful data submission. It is often best in this case to ask the people who designed the array to handle the submission for you as they already have a lot of this info. If you’d still rather we do the submission, please get the design file from them, and also give us their contact information.
What we need from you: For commercial arrays, we need three components for GEO submission:
- raw data files (*.CEL files for Affymetrix; GEOarchive matrix file for Illumina)
- normalized data files (.CHP files for Affymetrix or text/Excel table of normalized values) Note: please make sure that you give us the file(s) you used to draw conclusions from your study and be ready to tell us the normalization method used (MAS5, RMA, GCRMA, etc.). The person who analyzed your data will have this information.
-
a metadata worksheet containing descriptive information and protocols for experiments and samples, and information on your publication (title, authors, abstract)
- Find out the platform:
- Which company did your arrays come from (Affymetrix, Illumina, etc.)?
- What is the name of your array (HGU133a, HumanHT-12, etc.)?
- From what organism were your samples?
- Make sure that the sample names you provide match exactly to those in the raw data files
- Gather protocol/labeling information from the facility that ran your array (if run at MSK, that would be IGO)
- Download the metadata template for your array Most commonly used templates:
- Find out the platform:
If the template for your experiment is not listed here, you can find it at http://www.ncbi.nlm.nih.gov/geo/info/spreadsheet.html#GAtemplates Note: just focus on the “Metadata Template” tab. Please use the “Metadata Example” tab and fill in the template as best you can (this includes all information with the exception of the last two rows). Ignore “Matrix Template” and “Matrix Example” tabs.
Start the submission process
Time line: Please note that once we submit the data, the length of time it takes to receive the GEO number for your study is out of our control. Given our experience, it generally takes up to TEN business days to receive this number from GEO. In order to avoid delays in your publication procedure, we recommend you start the GEO submission FOUR weeks ahead of time.
Contact: [email protected] to begin the process of having the Bioinformatics Core handle your GEO submission, send an email to [email protected] with the following information:
- Cost center
- Fund number
- Location of data files describe above (if located on a shared MSKCC server). This information can be obtained either from IGO or BIC if data was generated at MSKCC.
- If your data is not on a shared server, let us know and we will arrange a time for you to bring it to us on a CD, flash drive, etc.
Please do NOT try to send us your data via email. These files are very large and must be delivered to us via server or disk.