Due Friday 01 November
Your answers to all questions should be submitted to myUni as a .zip
file containing four bash scripts, and a 5_final_assembly
folder including files required in step 5, a screenshot file showing alternative splicing events in Step6, and a text file inlcuding your answers to theoretical questions. [1 mark]
The .zip
filename must start with your student number and your bash script must be able to run without errors.
Meaningful comments are strongly advised [1 mark]
For all scripts, please use the directory ~/Assignment6
as the parent directory for all downloads and analysis.
You will be expected to hard code this into your scripts.
Use an organised folder structure to store files generated in this assignment. [1 mark]
Assignment6/
├── data
├── DB
├── results
│ ├── 1_QC
│ ├── 2_clean_data
│ ├── 3_denovo_assembly
│ ├── 4_genome_guided_assembly
│ └── 5_final_assembly
└── scripts
Step 1. Write the first script to:
DB
from the Ensembl ftp
directory as you did in assignment 4 [1 mark]DB
subdirectory and decompress and untar it [1 mark]Step 2. Write the second script to copy the sequencing data to your data
directory and carry out QC. The data are in ~/data/Transcriptomics_data/Assignment/
. [1 mark] Include the following steps:
Step 3. Write the third script to:
Trinity
[3 marks]GMAP
[1 mark]BUSCO
[2 marks]Step 4. Write the fourth script to:
fasta
file of genome guided assembled transcripts using gffread
[1 mark]BUSCO
[2 marks]Step 5. Organise the final results:
5_final_assembly
[1 mark]5_final_assembly
[1 mark]5_final_assembly
[1 mark]Step 6. Identify one assembled transcript including alternative splicing events with read evidence (Take a screenshot from IGV and include the screenshot file in you assignment submission) [3 marks].