Due before 12pm, Friday 23rd August, 2024
Your answers to all questions should be submitted to myUni as a single .zip
file containing 3 files:
my_species_gff_features.txt
is not required as part of your submission for Q1, only the script which will generate this file!
Similarly, for Q2, only the script is required.Q1. Write a script to:
+ Download the gff3 file for your assigned species (see bottom of page) to your current directory from Ensembl [1 mark]
+ Count how many of each feature type there is, sorted in numerical order [4 marks]
+ Export the results to a file with a name of the form my_species_gff_features.txt
where you use your assigned species name instead of my_species
[1 mark].
NB: If your actual species is not included in the name, no marks will be given.
+ The script must also include code to generate one or more comment lines in the output file/table before the table with the genome-build used, (hint: grep your gff to find the genome build info as the header is very large in most cases)
+ The script must also write the code used to generate the summary (counts) data to the output file as part of the file header. [2 marks]
Q2. For the file we used in the practicals (Drosophila_melanogaster.BDGP6.ncrna.fa), add to the final practical script provided so that: + the output contains a meaningful header [1 mark] + the output contains column names [2 marks] + the output includes: a) gene id; b) chromosome; c) start; d) stop; e) strand and f) gene_biotype [3 marks] + Appropriate comments which make the script easier to understand [1 mark]
NB: If identical comments are identified in any submissions, a mark of zero will be given for this question for all suspicious submissions.
In a single rmarkdown file answer the following questions:
Q3. Two groups of people have volunteered to take part in a genetic study. Group 1 (n = 126) are volunteers with no history of Type I Diabetes in their immediate family, whilst Group 2 (n = 183) have all been diagnosed with Type I Diabetes. A genotyping study was undertaken on these volunteers using 25,786 SNPs selected due to their proximity to key immune genes.
Researchers are looking to identify any SNP genotypes which may increase the risk of Type I Diabetes. In your answer, consider the reference SNP allele as A
and the alternate SNP allele as B
, using the genotypes AA
, AB
and BB
.
a. For an individual SNP, what test would be appropriate for this comparison? [1 mark]
b. Define H₀ and Hₐ for the genotype at each individual SNP. [2 marks]
c. If there was no true difference in any genotypes between the two groups, how many p-values would you expect to see < 0.05? [1 mark]
d. Using Bonferroni's method, what would a suitable cutoff value be to consider a SNP as being associated with an increased risk of Type I diabetes, i.e., to reject H₀ [1 mark]
e. Given the following genotype table, would you reject or fail to reject H₀? Provide your working and a full explanation. [3 marks]
Group | AA | AB | BB |
---|---|---|---|
Control | 25 | 60 | 41 |
T1D | 21 | 55 | 103 |
Q4. An experiment was repeated multiple times, in which GFP fluorescence was measured in a cell culture as a measurement of gene expression, both before and after viral transfection.
GFP was present on a plasmid as a reporter for activity at a specific promoter.
The change in fluorescence values obtained for each repeat are given below as the vector x
, presented on the log2 scale for your individual subset of experiments.
a. Define H₀ and Hₐ [2 marks]
b. Calculate the sample mean and sample variance in R
[2 marks]
c. Calculate the T-statistic using R
. [1 mark]
d. What would the degrees of freedom be for your t-test? [1 mark]
e. Calculate the p-value using R
[1 mark]
Show all working & code.
If your student number is not listed, please contact Dave to ensure you are added to the list
You can download your assigned species here: 'http://ftp.ensembl.org/pub/release-100/gff3/' of course you will have to add the relevant additional information to specify your species and the '.100.gff3.gz' file.
ID | Species | Taxonomy ID | Common Name |
---|---|---|---|
a1137364 | Monodelphis domestica | 13616 | Gray Short-Tailed Opossum |
Neolamprologus brichardi | 32507 | Princess cichlid | |
a1645191 | Mola mola | 94237 | Ocean Sunfish |
a1703423 | Pelodiscus sinensis | 13735 | Chinese Soft-Shelled Turtle |
a1749842 | terrapene carolina triunguis | 158814 | Three-Toed Box Turtle |
a1766804 | Hippocampus comes | 109280 | Tiger Tail Seahorse |
a1773594 | bison bison bison | 9901 | Plains Bison |
a1792812 | canis lupus dingo | 9612 | Dingo |
a1795973 | sus scrofa pietrain | 9823 | Pig |
a1797428 | chrysemys picta bellii | 8479 | Western Painted Turtle |
a1823995 | Aotus nancymaae | 37293 | Ma's Night Monkey |
a1839745 | Poecilia mexicana | 48701 | Shortfin molly |
a1843355 | mustela putorius furo | 9668 | Domestic Ferret |
a1850508 | Serinus canaria | 9135 | Common Canary |
a1851176 | Salmo salar | 8030 | Atlantic Salmon |
a1851451 | Poecilia reticulata | 8081 | Guppy |
a1851815 | Macaca fascicularis | 9541 | Crab-Eating Macaque |
a1853428 | Poecilia formosa | 48698 | Amazon Molly |
a1862759 | Pan paniscus | 9597 | Pygmy Chimpanzee |
a1863615 | Neovison vison | 452646 | American Mink |
a1864525 | Fukomys damarensis | 885580 | Damara Mole-Rat |
a1865749 | Gambusia affinis | 33528 | Western Mosquitofish |
a1868878 | sus scrofa usmarc | 9823 | Pig |
a1869981 | Chinchilla lanigera | 34839 | Long-Tailed Chinchilla |
a1871922 | Rhinolophus ferrumequinum | 59479 | Greater Horseshoe Bat |
a1872208 | Monopterus albus | 43700 | Swamp Eel |
a1873151 | mus musculus c57bl6nj | 10090 | Mouse |
a1875420 | Fundulus heteroclitus | 8078 | Mummichog |
a1876866 | Parambassis ranga | 210632 | Indian Glassy Fish |
a1886661 | Castor canadensis | 51338 | American Beaver |
a1893982 | astyanax mexicanus pachon | 7994 | Pachon Cavefish |
a1894721 | Pygocentrus nattereri | 42514 | Red-Bellied Piranha |
a1894991 | Oryzias javanicus | 123683 | Javanese Ricefish |
a1897552 | Clupea harengus | 7950 | Atlantic Herring |
a1899345 | Ictidomys tridecemlineatus | 43179 | Thirteen-Lined Ground Squirrel |
a1900426 | Sarcophilus harrisii | 9305 | Tasmanian Devil |
a1901379 | Otolemur garnettii | 30611 | Small-Eared Galago |
a1903005 | Xiphophorus maculatus | 8083 | Southern Platyfish |
a1904204 | Anabas testudineus | 64144 | Climbing Perch |
a1904509 | Lepidothrix coronata | 321398 | Blue-Crowned Manakin |
a1906661 | mus musculus akrj | 10090 | Mouse |
a1773541 | Strigops habroptila | 2489341 | Kakapo |
a1909154 | Cebus capucinus | 9516 | White-Faced Sapajou |
a1910059 | sus scrofa bamei | 9823 | Pig |
a1909565 | sus scrofa berkshire | 9823 | Pig |
a1932615 | Zonotrichia albicollis | 44394 | White-Throated Sparrow |
a1935423 | Geospiza fortis | 48883 | Medium Ground-Finch |
a1940870 | Pogona vitticeps | 103695 | Central Bearded Dragon |
a1947736 | Equus caballus | 9796 | Horse |
a1947841 | Larimichthys crocea | 215358 | Large Yellow Croaker |
a1954027 | Ictalurus punctatus | 7998 | Channel Catfish |
a1954456 | panthera tigris altaica | 9694 | Amur Tiger |
a1955686 | Ovis aries | 9940 | Sheep |
If your student number is not listed, please contact Dave to ensure you are added to the list
The results you are analysing for Q4 are as follows.
You can simply paste these values into your RMarkdown document as the object x
and perform all of your analysis on these values.
ID | Values |
---|---|
a1137364 | x <- c(-0.0433, 2.2891, 1.0951, 1.8736) |
x <- c(-0.1669, -0.1257, 0.5036, 0.0166, -2.4042, -0.1197, 2.1225, 1.212, 5.6835) | |
a1645191 | x <- c(1.4484, 1.3787, -0.4143, 3.6535, 1.8083, 0.4871) |
a1703423 | x <- c(0.8609, 0.1178, 0.9691, -1.4772, 2.5688, 0.1644, -0.8474, -2.1161) |
a1749842 | x <- c(1.6882, 0.6029, 1.9638, -0.8317, -0.988, -2.3742, -0.4314) |
a1766804 | x <- c(4.8103, -0.8988, -0.7313, 3.5593, -0.4826, 2.4308, -0.4153, 0.0668, 2.867) |
a1773594 | x <- c(0.2608, -1.4151, 2.8979, 1.4756, 2.2516) |
a1792812 | x <- c(-0.6245, -0.2226, 1.5684, -1.2285) |
a1795973 | x <- c(1.8684, 0.2476, 0.5143, 0.3547, -0.1645, 1.251, -1.9733, 2.3365, -0.1596) |
a1797428 | x <- c(-1.2975, 1.7608, -0.0982, 1.8631, -2.3649, 0.1515, 1.709, -1.1104) |
a1823995 | x <- c(0.0863, 0.5653, 0.4356, 2.2952, 0.8758, 0.0343) |
a1839745 | x <- c(0.5774, 1.3214, 2.6437, -0.2024) |
a1843355 | x <- c(1.0581, -1.0416, -0.8031, -0.1871) |
a1850508 | x <- c(-0.9714, 1.6317, -1.3517, 1.1421, -0.2032, 1.0515, 1.9313, 1.8357, -1.4458) |
a1851176 | x <- c(-2.08, -0.8782, 0.285, 1.3781) |
a1851451 | x <- c(1.2449, -0.7304, -0.1646, 2.0851, -0.7812, 0.5486, 0.9953, 0.1769, 0.3606) |
a1851815 | x <- c(0.0134, 1.9545, 1.0156, 2.4974) |
a1853428 | x <- c(-0.0298, -1.3617, 2.1709, 0.8543, 0.2726, -0.7025) |
a1862759 | x <- c(0.0065, 1.0768, -0.7598, 3.0874) |
a1863615 | x <- c(0.7639, 0.0844, 1.6849, -0.5075, -0.8914, 1.0112, 2.7931) |
a1864525 | x <- c(1.1527, 1.6518, 1.6833, -1.3038, 0.6058, 0.4401, -1.1379, 1.6957) |
a1865749 | x <- c(1.5056, 1.647, -0.0661, -2.1343, 1.423, 0.7569, 0.0609) |
a1868878 | x <- c(-2.9709, -0.45, 0.0074, 0.1217, 2.5973) |
a1869981 | x <- c(0.8152, 1.4271, 0.4694, -0.2918, -0.286, -0.6054, 1.2336) |
a1871922 | x <- c(0.5119, -1.8475, 1.8198, -0.1338, 0.8777, 0.0604, -0.4013, 0.3594, 2.3255) |
a1872208 | x <- c(0.7079, 0.3685, 0.2181, 0.5549, -2.3652, -1.2004, 3.4151) |
a1873151 | x <- c(2.4517, 1.4722, 2.6803, -1.9407, -0.6271, -0.2569, -0.4746, 2.6373, -2.0029) |
a1875420 | x <- c(-1.5192, 2.833, 3.3837, -1.0578, 0.8613, 2.5236, -2.2051, -0.4782, 3.1518) |
a1876866 | x <- c(-1.123, -0.3904, -0.4987, 2.0421, -0.8621, -0.7427, 0.7549, -0.5748, 0.4906, -1.0987) |
a1886661 | x <- c(1.129, -0.5184, 0.0759, 0.4659) |
a1893982 | x <- c(0.4595, 1.7962, 2.1128, 2.646, -2.3673, -0.1961) |
a1894721 | x <- c(-2.91, 2.5906, -1.8167, -0.9061, -0.8382, 0.2453, 0.4134, 3.5842) |
a1894991 | x <- c(1.9851, -0.5493, 1.7286, 3.7736, 0.4708, 0.7642, 0.8895, 2.0142, -0.3351) |
a1897552 | x <- c(3.2391, -1.7528, 0.4305, -0.6628, -1.6469, 0.782, 1.5166, 0.237) |
a1899345 | x <- c(0.1474, 1.9407, 0.5564, 1.7953, 1.2655, 0.1691, 0.4547, 0.0187, -2.1591, 0.6655) |
a1773541 | x <- c(-0.3584, 3.5074, 2.9401, 2.1211, -0.1264, 0.9589, -1.1208, -1.9353, 1.7136) |
a1901379 | x <- c(0.3038, -1.1985, -1.7394, 0.2135, 2.5004, 1.6915, 2.8043, 0.5944) |
a1903005 | x <- c(0.6722, -0.1956, 0.5278, 1.8678, 3.1052, 1.2281, 3.0446, -0.5916) |
a1904204 | x <- c(-0.992, 4.0864, 1.3728, 1.2436, -0.6282) |
a1904509 | x <- c(-0.1648, 2.1814, -1.0161, 0.9513, 1.0205, -0.0862, 0.5003, -1.3463, -0.1135, 2.1983) |
a1906661 | x <- c(0.706, -0.9058, 2.1544, 2.9012, 1.1209, -1.3753, 1.9729, 2.875, 3.6917, -1.077) |
a1908426 | x <- c(1.2641, 1.44, 0.1889, 0.1464, -0.4497, 1.3661, 2.1486, 2.4283, 1.108, 1.2788) |
a1909154 | x <- c(2.4852, 1.2269, -0.0679, -0.9611, 2.1799) |
a1910059 | x <- c(2.8994, -3.0528, 0.1018, 1.3461, 0.9224, 1.1955, 2.3327) |
a1909565 | x <- c(2.0478, -1.8305, 1.4769, -0.1325, 1.2525, 1.7385) |
a1932615 | x <- c(0.9234, 1.4015, -0.3947, -3.1329, -0.1542, 1.9771, -0.4997, 0.0226) |
a1935423 | x <- c(0.6379, 2.0681, 2.887, 0.844, 0.398, 0.265, 0.0612, 0.8686) |
a1940870 | x <- c(-1.0123, 2.6733, 0.0865, 2.7096, 2.1062, -0.157, 0.6258, -0.5091, 0.114, 0.0341) |
a1947736 | x <- c(1.2537, 2.7525, 0.5015, 0.3354, -1.129, 1.1688, 3.6342, 1.2302, 0.8297, 1.3461) |
a1947841 | x <- c(-1.5333, 2.0824, 2.7886, -0.0431, 0.0146, -1.404, 0.7452, 3.0861, -1.9524, 0.3867) |
a1954027 | x <- c(0.6892, -0.1318, -0.1043, 4.0503, 0.0989, 0.8987, 1.7833) |
a1954456 | x <- c(-0.0944, -1.7223, 0.9764, 0.657, 0.6068, -1.0101, 0.5493) |
a1955686 | x <- c(0.1344, 0.0336, -1.2412, 0.3731) |