Please load the Version Control Quiz in “my uni”
At 11:15 I will release the password. The timers are done individually, please be quiet if you finish the test to allow others to concentrate
Git is the command line tool (and file based version control database). As we have seen in previous pracs, you can do it entirely on your local computer.
GitHub is website that runs Git on the cloud to help you host a repository. There are others similar sites like GitLab and BitBucket.
A centrally hosted repo on the internet (with authorization) is useful as you can access it from many computers
They also provide many other features to help you run software projects like issue trackers, wikis and more.
Here are some repositories of famous bioinformatics tools.
Go and view the pages, skim the README and the issue trackers. Open some random issues
IGV - paper in 2013, last commit 2 days ago (from Oct 5 2023)
Samtools - paper in 2009, last commit 2 weeks ago
BWA - paper in 2009, last commit September 23, 2022. New development is being done on BWA-MEM2
Compare the release date of the paper vs the last commit.
Consider how different it must be from the first release, and whether the authors realised someone (or they themselves) would be working on the code 13 years later!
Note: Notice how the commit frequency and issue tracker is a good way to evaluate a project’s health.
Go to IGV issues
Notice how all you can see are the titles. Titles are extremely important.
Before raising a bug, you should always check it see if it has already been reported (ie don’t create a “dupe”)
Run a few searches. Experiment with clicking on Open/Closed. Notice how special tokens in the search form change.
Again, notice how the title is critical for finding things as you can’t see the contents (though you can search for it, eg “font” will bring up titles without “font” in it, as they have it in the issue)
GitHub handles user authentication
You can make organisations (which are basically groups that own projects rather than individual users).
This is a matter of choice (or lab policy) but I generally release work projects under my organisation as it’s more professional, makes it easier for my coworkers to maintain it, and our lab seems more permanent than an individual job. You can always transfer it, and have the old URL redirect, but generally it’s better to make URLs permanent.
https://github.com - It’s possible you may keep this GitHub account, use it for CVs etc. So I recommend NOT using a silly name or using your Adelaide uni account name
On https://github.com - create a new project:
git clone https://github.com/YOUR_USERNAME_HERE/github_practical_example_project
cd github_practical_example_project # go into the repository you just cloned
nano README.md # Make a change to the file so we have something to commit
git add README.md
git commit --message "edit README"
git push
If the repo is public (and you didn’t have to enter it when you cloned it - as anyone can do that) you will have to enter your username / password (access token)
CTR+C out of this now.
GitHub uses Personal access tokens rather than passwords. To generate one of these for the VM:
DO NOT CLOSE THIS TAB - leave it open so you can copy/paste it below
You can store your token (as plaintext on the VM) via:
git config --global credential.helper store
If you do this you will only have to enter your token once. If someone gets this token they can make changes to your GitHub repo so please password protect your VMs
git push
# It will now ask for your username - this is what you used to sign up, and can be seen in the URL of the repo you cloned after "github.com/<YOUR_USERNAME>/your_repo"
# Paste username then hit enter
# Copy the Personal access token from the web page (green, click "two squares" copy button)
# Paste access token then hit enter
Push should complete successfully
If you did close the tab, just create another token now, and use that.
As well as titles, labels can be used to organise issues into related themes. Read the documentation Github - managing labels
Issues move througha lifecycle, ie:
raised -> fixed -> tested -> closed raised -> tagged as “duplicate”, “wontfix” -> comment about which other issue is a duplicate + close
Even for your own projects - try and write it with as much information as possible (aiming to be standalone, so you don’t need much outside context), so that a 3rd party can come in and make sense of things.
If it’s clear - maybe it’ll stop them raising a dupe. Maybe someone will fix it for you!
Depending on the type of bug, there’s certain information you want to include
For crashes: stack traces and logs
For ones that involve complicated user steps - Try to find a way to “reproduce” the error consistently and produce clear step by step instructions. Ideally your instructions should reproduce the issue 100% of the time with a minimal amount of steps
For ones involving input files, perhaps try and produce a minimal, cut down one.
For unexpected behavior, or outputs, Actual vs Expected is very useful
Milestones are a way of managing releases and deadlines
For instance, you may have 20 issues, but your lab head says you have to stop coding and start writing up your paper in 2 weeks.
Note: Free GitHub accounts can only have wikis on public projects
Check to see if you have a “wiki” menu item on your project. If you don’t you can either skip this step or make your project public.
To experiment with wikis: