Like most command line programs, git has command line help. Run:
git --help
This shows an overview of sub commands. It’s short, read it all!
start a working area (see also: git help tutorial)
clone Clone a repository into a new directory
init Create an empty Git repository or reinitialize an existing one
work on the current change (see also: git help everyday)
add Add file contents to the index
mv Move or rename a file, a directory, or a symlink
restore Restore working tree files
rm Remove files from the working tree and from the index
examine the history and state (see also: git help revisions)
bisect Use binary search to find the commit that introduced a bug
diff Show changes between commits, commit and working tree, etc
grep Print lines matching a pattern
log Show commit logs
show Show various types of objects
status Show the working tree status
grow, mark and tweak your common history
branch List, create, or delete branches
commit Record changes to the repository
merge Join two or more development histories together
rebase Reapply commits on top of another base tip
reset Reset current HEAD to the specified state
switch Switch branches
tag Create, list, delete or verify a tag object signed with GPG
collaborate (see also: git help workflows)
fetch Download objects and refs from another repository
pull Fetch from and integrate with another repository or a local branch
push Update remote refs along with associated objects
You can get help for a subcommand via:
git clone --help
This opens a man (manual) page, and is the equivalent of
man git-clone
Scroll through the man page with up/down keys, then page up/page down keys.
Press q to quit the man page.
Git tracks which user did what, so you need to identify yourself.
If you didn’t do this in the tutorial, please run this:
git config --global user.name "Your name"
git config --global user.email "your.name@student.adelaide.edu.au"
git config --global core.editor "nano -w"
This is stored in your home directory and so works on all git repositories (global)
To see it:
cat ~/.gitconfig
If you have already done this in a previous tutorial, you need to update it:
cd BIOTECH-7005-BIOINF-3000 # Wherever you cloned it before
git pull # fetch changes from GitHub and update your local repository
# If you get errors due to changes from the last tutorial, perhaps it is easier to just delete it and clone it again.
If you haven’t cloned the repo before, do so via:
# Clone a repo
git clone https://github.com/University-of-Adelaide-Bx-Masters/BIOTECH-7005-BIOINF-3000
When you clone a repo - you make a local copy of the whole version control database (history, deleted files etc)
A git repository is a sequence of changes, you can see a history of these via:
git log
This opens an interactive log (shows a “:” at the bottom) of commits in reverse chronological order (ie latest commits first). Press the up/down arrow keys to scroll, or page up and down. Press q to quit.
Notice that at the top of every commit is a hash - which is the unique label you can use to refer to that change.
There are lots of options, eg to show a summary of changes.
git log --stat --summary
Or to see a visual of branching/merging:
git log --graph
Sample output:
| * | commit 4ac3e07643488adad8a5a4462c756503cc5e975a
| |\ \ Merge: 6dea320 bbef0f4
| | | | Author: Dave Adelson <david.adelson@adelaide.edu.au>
| | | | Date: Wed Sep 22 09:41:55 2021 +0930
| | | |
| | | | Merge pull request #27 from University-of-Adelaide-Bx-Masters/NewTranscriptome
| | | |
| | | | update link to merged major project document
| |\ \ Merge: 6dea320 bbef0f4
| | | | Author: Dave Adelson <david.adelson@adelaide.edu.au>
| | | | Date: Wed Sep 22 09:41:55 2021 +0930
| | | |
| | | | Merge pull request #27 from University-of-Adelaide-Bx-Masters/NewTranscriptome
| | | |
| | | | update link to merged major project document
| | | |
| * | | commit 6dea320691f4e3d5d5ec1e3b02c1c84db46f5ec6
| |\ \ \ Merge: 8a40db2 5b9a8b1
| | | | | Author: Dave Adelson <david.adelson@adelaide.edu.au>
| | | | | Date: Wed Sep 22 09:33:33 2021 +0930
| | | | |
| | | | | Merge pull request #26 from University-of-Adelaide-Bx-Masters/NewTranscriptome
| | | | |
| | | | | New transcriptome
| | | | |
* | | | | commit cff75d103c70161e44decc9bdf59f4cd8274cfeb
|\ \ \ \ \ Merge: 8a40db2 9d41f5e
| |/ / / / Author: David Adelson <david.adelson@adelaide.edu.au>
|/| | | / Date: Wed Sep 22 09:48:38 2021 +0930
| | |_|/
| |/| | Merge branch 'NewTranscriptome'
| | | |
| * | | commit 9d41f5e071da6e179660adb8a124beb2b190c2c6
| | |/ Author: David Adelson <david.adelson@adelaide.edu.au>
| |/| Date: Wed Sep 22 09:44:34 2021 +0930
| | |
| | | deleted html
| | |
| * | commit bbef0f42d9e7efd2be26cd3d281f55dfcfcfef01
| |/ Author: David Adelson <david.adelson@adelaide.edu.au>
| | Date: Wed Sep 22 09:36:20 2021 +0930
| |
| | update link to merged major project document
Or a more concise graph:
git log --all --decorate --oneline --graph
To see the details of a particular commit, copy/paste a hash, eg:
git show 23d004b56503d2aaccc5f6e39993a8d10aee334c
To see the differences between two commits, you can use diff:
Show is the same as the difference between a commit and the previous commit:
git diff 11a25fea8e15523049896381de9312501053843d 23d004b56503d2aaccc5f6e39993a8d10aee334c
but you can eg look across 3 commits:
git diff 141d48db593f24783169062b1e94c4259bacfaeb f6cec9c27eea6ef854f754f1adbe3c3310cb708d
Exercise:
Use git log to find 2 commits, and run git diff between them.
Now add a few commits before/after it to see how the size of the diff grows
Shortcuts
HEAD is a shortcut to the latest commit.
If you don’t specify a second hash, it will default to HEAD, to show you all the changes since that commit.
Every commit usually has one “parent” commit which points to the previous state, some shortcuts:
# These also work on normal hashes not just HEAD
git show HEAD^ # to see the parent of HEAD
git show HEAD^^ # to see the grandparent of HEAD
git show HEAD~4 # to see the great-great grandparent of HEAD
# See last 2 commits:
git diff HEAD^^ # Implicitly comparin to HEAD
# Go to where you want to put a new git repository
mkdir bioinfo_tute_wk8
git init
There is now a hidden directory called “.git”
ls -R .git
Apples
Watermelon
Then save the file, and exit to the shell. Run:
git status
This should show you that you have 1 untracked file called ‘README.md’.
To see this as a diff:
git diff
We need to tell Git about this change - run:
git add README.md
Now run:
git status
It should show you have a new file README.md, now run:
git commit
Because you didn’t specify a messae, this will open up your editor (which we specified as nano earlier) to make one. Enter something like “initial shopping list” then exit. It will apply a commit with the message you entered.
Why did we have to do add and then commit? Ie why have a staging area instead of comitting directly?
It allows you to carefully add/remove individual files to build a commit - ie you can change a lot of files, then commit a few together, then others in a separate commit.
Think of it as a draft area where you can carefully build exactly what you want to commit.
Edit README.md again, adding 2 more lines:
Bananas
Mango
Run
git commit
Again, it should say: “no changes added to commit” - because you haven’t added the file. Let’s do that and commit:
git add README.md
git commit -m "add stuff"
We used the -m
parameter to provide the commit message
on the command line, so we didn’t need to open up nano to type one.
Run:
git log
You can see your comments will stored against a commit for all time - make sure you spend a bit of time making them clear and useful!
Perhaps we should make that last message better, to edit it:
git commit --amend -m "Add Cyan monkeys's fruit"
You can also see which was the last commit to touch a line in a file - this git command is actually easy to remember!
git blame README.md
Now edit README.md again, and add a “*” to the front of each line so it looks like:
* Apples
* Bananas
* Mango
* Watermelon
Add/commit again:
git add README.md
git commit -m "formatting - markdown bullet points"
Now run git blame again:
git blame README.md
You can see that since we touched every line in the file, git blame is all that commit, and not very useful. So you can ignore a particular commit:
git blame README.md --ignore-rev HEAD # Or the commit hash of your formatting change
You can see why logically grouping changes in commits is useful, eg separating formatting changes vs adding new things.
I will often break up a change into multiple commits, eg “clean up the area - keep existing functionality the same but make it easier to extend/change” and then a separate commit of “add the new functionality”.
If you run an auto-linter / code formatting tool (and you should!),
you can keep track of commits in a file and use
--ignore-revs-file
All commits have unique hashes as labels, however these don’t mean anything and are hard to remember.
You can give commits your own names, which is called tagging.
git tag my_tag <hash from git log> # need to change to your hash
A common use for tags is to label a release. This makes it easier to know the state of code in a release, find what releases are affected by bugs, and for instance show what changed between releases, eg:
This is linked from the CHANGELOG file in one of my GitHub projects:
Tip from scientific reproducibility lecture. If you always use Git, you can keep track of what version of a script was run.
For instance, copy/paste this into a file pipeline.sh
#!/bin/bash
DATA_DIR=data
FASTA_FILE=${DATA_DIR}/dna.fasta
TODAY=$(date --iso)
GIT_COMMIT=$(git rev-parse HEAD)
RUN_LOG=pipeline_run_${TODAY}_${GIT_COMMIT}.log
mkdir -p ${DATA_DIR}
if [[ ! -e ${FASTA_FILE} ]]; then
echo ">DNA sequence, ~25% accuracy" > ${FASTA_FILE}
cat /dev/urandom | tr -dc 'GATC' | fold -w ${1:-50} | head -1 >> ${FASTA_FILE}
fi
echo "Script ${0} started $(date) in '$(pwd)'" > ${RUN_LOG}
echo "git: ${GIT_COMMIT}" >> ${RUN_LOG}
echo "Input: $(md5sum ${FASTA_FILE})" >> ${RUN_LOG}
Now add/commit it:
git add pipeline.sh
git commit -m "reproducible pipeline"
And run it, then examine the logs:
bash pipeline.sh
cat *.log
This should have saved the output:
Script pipeline.sh started Fri 16 Sep 2022 01:16:32 ACST in '/home/dlawrence/localwork/bioinfo_tute_wk8'
git: 2af18ad1033bbd5efd90a0aa88acf59dc1d9c4c4
Input: 482b49fe8627a6a300b51f46f7e5c6ff data/dna.fasta
Exercise
Modify this script to generate a longer DNA sequence with a different filename, then checking in the changes, then re-running it.
Sometimes you deliberately don’t want to track certain files.
Examples include binary files, logs, temporary files or editor configuration (so each person can set their own preferences)
Run
git status
The log and data files are here. Let’s say that in a real experiment they are enormous, and you don’t want to check them in.
echo "data/" > .gitignore
echo "*.log" >> .gitignore
Now run:
git status
Exercise
Check in .gitignore
(optional) Write a 2000 word philosophy paper describing what you think should happen if you add .gitignore to the .gitignore file.
Git stores (in .git) all of the changes ever applied to the files in the repo.
It is worth thinking of the “normal files” in the directory as temporary - “working files”, they can be reconstructed at various historical points by replaying certain git changes. Re-applying all of the changes will create the “latest” versions, and you can selectively apply patches to move through historical versions.
Cat the first log file from your pipeline.sh runs above, and find the commit hash
To move the working files back to the state of this commit, you can run:
git checkout <commit_hash> # Need to copy/paste from your git history
This will mention having a “detached head” - HEAD keeps track of the current point in a repository. Usually, it is attached to a branch (eg master), while moving through history, it’s “detached”
You can also see this if you run
git status
It will say “head detached”
Exercise
Run
git log
and pick a commit, ie “initial commit”
Move the repo back to this commit, and examine the README.md file
To move back to the latest commit, run:
git checkout master
To check you’re back to normal - run git log
and ensure
that the last commit has HEAD next to it
Delete the README.md, ie:
rm README.md
Now run:
git status # Shows file was deleted
git diff # Shows patch
Retrieve the file from the repo - to undelete it in the working files
git checkout README.md
So far in your new repository we’ve been working in a linear way.
But remember Git is a graph, and you can have multiple paths (think
back to the git log --graph
diagrams above) that split
apart.
The default branch is main/master, but you can diverge from this, and also merge together different branches.
A common task when working with multiple people is to create a branch from current HEAD to do new development, so you can store your code in the repository without breaking master.
You can create a branch then switch to it:
git branch blue_monkey # Creates a new branch from HEAD
git checkout blue_monkey
Or you could do this in 1 line via:
git checkout -b blue_monkey # if you run this after creating the branch above it will error and do nothing
Now edit the README.md and add a line “* Grapes”
git add README.md
git commit -m "blue monkey change"
Now, go back to the master branch.
git checkout master
cat README.md # Double check that grapes are not there
git merge blue_monkey # Merge branch back into master
Because there were no conflicts - the merge was done automatically.
But imagine that the blue and cyan monkeys went off and did separate work. Find the commit before the merge, and create a branch from that, eg:
git checkout -b cyan_monkey HEAD^ # 1 commit back
Edit the README.md, say add “* Caffeinated bananas” and delete “* Watermelon”
Add/commit
git add README.md
git commit -m "cyan monkey shopping list changes"
Then back to main again:
git merge cyan_monkey
This can’t be auto-merged and is flagged as a conflict.
Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
Automatic merge failed; fix conflicts and then commit the result.
If you look at the README.md it was merged, but with deliberate errors inserted around the merge conflict, so you have to manually edit it.
* Apples
* Bananas
<<<<<<< HEAD
* Grapes
=======
* Chocolate covered bananas
>>>>>>> cyan_monkey
* Mango
Edit this file, to put the entries in alphabetical order, then delete the lines with “<<<<” and “>>>” on them
git add README.md
git commit # editor will now suggest a 'merge' commit message
Now run:
git log --graph
Notice the branch and how it was merged back
Exercise
git cherry-pick <the hash you want>
to only apply one change. This is useful to apply bug fixes to historical branches
When you clone a repository (eg from GitHub) you make a local copy (with all commits and history)
When you edit files and make make changes to this, you
and also track where you got it from (remote origin) - so it is easy to keep them in sync.
Fetch - retrieves commits from another repo, but doesn’t apply them Pull - runs fetch, then applies the commits, in effect “syncing” your local repo to the remote Push - send your changes to the remote server (we will cover this in either the bonus section below, and GitHub prac in Week 9)
When working with others, you need to decide on how to use Git in your day to day work - these work patterns are often called git workflows
For instance - is it ok to have broken code in master/main? If you are working by yourself, or are making very small changes you have tested well, it may be OK to commit directly into the main/master branch.
If you are running an open source project and random people download your code - maybe not!
If I am working on something larger than 1 day’s work, I like to create a feature branch so that I can push my code to Github (to have a backup + work on different computers) then merge back to master when done.
Some teams only do work in feature branches, then use pull requests to review the code by someone else before it is merged into master.
Note: This may only work on the desktop version of R studio - haven’t tested on web version.
Many IDEs integrate with source control.
In R studio, select via top menu:
There is now a git button in the toolbar (just below the top menu)
Question
Github is a website that hosts git repositories on the internet, and also provides a number of services such as authentication, code browsing, issue tracking etc.
GitHub uses “Persoanl access tokens” rather than passwords. To generate one of these for the VM:
If you have password access to your VM, and you are happy to store your password in plain text on the VM, then you can do so via:
git config --global credential.helper store
Otherwise, just leave the tab open so you can copy/paste it, or copy/paste it into a temporary text editor
When questioned for your password, paste in the personal access token (if you store credentials, it will save it now to your VM disk)
We will be going into GitHub in depth in the week 9 practical. Have a good mid-semester break!