Concepts around collaboration
Objectives
Be able to decide whether to divide work at the branch level or at the repository level.
Instructor note
15 min teaching
Motivation
Someone has given you access to a repository online and you want to contribute?
We will review how to make a copy and send changes back.
Then, we make a “pull request” that allows a review.
Once we know how code review works, we will be able to:
propose changes to repositories of others
review changes submitted by external contributors.
Cloning a repository
In order to make a copy a repository (a clone),
the git clone
command can be used.
Cloning of a repository is of relevance in a few different situations:
Working on your own, cloning is the way to copy a repository on, e.g., a personal computer, a server, and a supercomputer.
The original repository could be a repository that you or your colleague own. A common use case for cloning is when working together within a smaller team where everyone has read and write access to the same git repository.
Alternatively, cloning can be made from a public repository of a code that you would like to use. Perhaps you have no intention to work on the code, but would like to stay in tune with the latest developments, also in-between releases of new versions of the code.
Your work is not visible to others, because it is on your computer.
Forking a repository
Forking a repository on a forge creates a clone
that reside under a different account on the same forge (a fork).
It is typically done to work on a git repository you cannot write to.
Your work is visible to others, because it is on the web
commits in the fork can be made to any branch (including
main
ormaster
)The commits that are made within the branches of the fork repository can be contributed back to the parent repository by means of pull (or merge) requests.
Exercise
What is the difference between forking and then cloning (your fork, to your computer) vs cloning (to your computer) and then pushing to a brand new repository?
Solution
Forking on a forge and then cloning creates links:
from your fork to the original repository;
from clone to your fork.
When cloning and then pushing to a new repository, you will create links:
from your clone to the original repository;
from your clone to the new repository.
Your repository on the forge will not have a link to the original repository and will not be listed as a fork of the original repository.
Generating from templates and importing
There are two more ways to create “copies” of repositories into your user space:
A repository can be marked as template and new repositories can be generated from it like using a cookie-cutter. The newly created repository will start with a new history.
You can import a repository from another hosting service or web address. This will preserve the history of the imported project and features like Wikis, issues and the like.
Discussion
Visit one of the repositories/projects that you have used recently and try to find out how many forks exist and where they are.
In which situations could it be useful to start from a “template” repository by generating?
Synchronizing changes between repositories
We need a mechanism to communicate changes between the repositories.
We will pull or fetch updates from remote repositories (we will soon discuss the difference between pull and fetch).
We will push updates to remote repositories.
We will learn how to suggest changes within repositories on a forge and across repositories (pull request).
Repositories that are forked or cloned do not automatically synchronize themselves: We will learn how to update forks (by pulling from the “central” repository).
A main difference between cloning a repository and forking a repository is that
cloning is a general operation for generating copies of a repository to different computers
forking is a particular operation implemented on forges (that includes cloning)
Authentication: connecting to the repository from your computer
There are mainly two ways to do authentication:
SSH keys
HTTPS
Please have a look at this guide by CodeRefinery for a general introduction to authentication options.
We suggest setting up and using an SSH key, since it is a form of authentication that is also used on other services (e.g., to access HPC systems). For a step-by-step guide look at this walkthrough by Software Carpentry.
Authentication via HTTPS might require less set up,
if password authentication is allowed.
If not, you can use a personal access token
as a drop-in replacement,
which can be configured at these pages:
Problems in Collaborative Software development
Merging can be a difficult moment in the life cycle of a software.
Git will try to do reasonable operations when merging two different lines of work, but:
There might be an detectable ambiguity in the way that two different lines of work can be reconciled (this leads to a conflict)
the results are not guaranteed to give you working software all the times (i.e., you don’t get a conflict, but the result is not correct either - this is scarier).
Contributing to the main branch as often as possible, to make the changes as small as possible, is a possible approach to reduce the difficulty related to merging.
In the following chapters we will focus on tools that ease the communication aspect of collaborative software development.