Getting Started with Git

Outside of learning a programming language, source control is one of the most important skills for developers. Controlling how your code changes over time, and how those changes are shared and integrated into applications over time really all starts with source control. In this post, I’ll try to give you a basic understanding of the popular source control tool in the world - Git.

Before we get started, just know that I’ll be assuming you know at least a little bit about the command line on your operating system, how to open it and how to use some basic commands to change directories or create files. Also, if you’re interested in a video course version of this material, you can always sign up for a free trial of Pluralsight to take my AWS DevOps course that this materials is from.

What Is Version Control?

So let’s start by clarifying what version control is! Here’s my working definition:

Version control systems allow you to track and manage code changes over time

But this only skims the surface! As a developer, you’ll use version control:

To record the history of your changes to code
To help resolve conflicts between different versions of code
As a critical part of systems that will update your applications automatically

And potentially a lot more!

Types of Version Control

Technically, there are lots of different tools that fall under the larger umbrella of version control (also known as “source control”). Here are just a few of them:

Subversion (SVN)
CVS
Perforce
Mercurial
Git

While there are lots of different ways to control versions of your code, Git is by far the most popular today for the majority of development. However, if you end up coding at organizations with much older code bases you could very well see something else.

Installing Git

In order to start using Git, you’ll need to get it set up on your machine. You can download Git from the Git-SCM website.

However, depending on your operating system you might already have it on your machine. To check, you can open up your command line and run git --version.

Git Essentials

Good news bad news.

Bad news - This isn’t going to be a comprehensive introduction to Git. There’s a lot to learn about Git and I could safely bet that most developers don’t have half the possible Git commands memorized.

Good news - There are half a dozen Git commands developers use all the time and the rest we can look up if we ever need them. We’ll focus on the essential concepts and commands you need to understand to work with Git!

Git Concepts

First, let’s define a few concepts:

Repositories are where your code is stored, either on your computer or somewhere else where your team can collaborate on it with you.
Remotes are a type of repository that isn’t on your local machine, but are somewhere else like GitHub, BitBucket or a private server somewhere.
Your Working Directory is the folder you put all your code in on your machine.

Let’s start with these concepts, and add a few more as we go!

Using Git

When we start using Git, we’re either going to want to:

Copy a remote repository to our local machine
Create a new local repository

For this, we’ll imagine we have a remote repository already and we want to copy it to our local environment:

Screenshot of git workflow diagram

To do this, we’d use the git clone command with whatever the repository URL was for our remote repository. For example, to clone a project of mine called Serverless Jams that is hosted on my account in GitHub we might run a command like this:

git clone https://github.com/fernando-mc/serverlessjams

Assuming this repository was public, or we had access to it, Git would then go grab the code for it and bring a copy of the repository locally. Now importantly, if we want to make changes to the remote version of repository, we need to have permissions to do so.

If we did, the next step might be to change directories into this cloned repository and start making changes. Let’s say we create a new file called file_b.txt in the working directory of this Git repository:

Showing file_b.txt being put in the working directory

When we do this, the file doesn’t immediately get saved in the repository or sent up to GitHub. First, we need to make the conscious decision that the file we added is something we want to prepare to be added to our repository. This process is called adding the file to our staging area. We do this with the git add command:

Screenshot of adding the file to staging area

For example, to add file_b.txt to staging we would run:

git add file_b.txt

We can add files to staging (or remove them with another command) as much as we want. But when we’re ready to make these changes official we need to “commit” them. Think of this as us recording the changes locally in our repository, usually with a description of what we changed. We do this with the git commit command:

Screenshot of the git commit step to the local repository

The command we actually run will usually use the -m flag to contain a message related to the code we changed. For example:

git commit -m "added file_b to our code"

With this change, we have finished creating a “commit” locally which records our code changes. But if we want to share those with other developers we will have to “push” these changes to the remote repository. We do this with the git push command:

Screenshot of git push step to the remote repository

If it was just us interacting with the codebase, this might be all we needed for a while to add files and make changes. We’d basically use these commands over and over as we made changes to the codebase to record everything we’ve done. But, if we want to start working with more people, or on open source projects we probably need to understand a few more concepts.

Forks

While in many situations just git push will work, you may need to specify the remote name. By default, the remote is usually called origin. However, in some cases, such as when you are contributing to open source projects, you might have more than one remote. This is usually due to something called Forking.

Forking, is when you take one version of a remote repository, and create another remote repository that copies that one at that point in time (the fork). You then typically clone and push commits to the fork repository until you’re ready to request changes be made to the original.

So first we would create the forked remote repository from the first remote repository:

Screenshot of forking process

When we do this, we have to give each of the remotes names to differentiate them from one another. The first remote repository is usually called upstream, the fork is usually called origin. As a quick note - when we’re not forking repositories, the name of your remote repository usually defaults to origin too, but we didn’t really pay attention to that before because we only had one to work with!

After forking the repository, the same workflow happens locally, where we create new files (or modify old ones) in the working directory. Then we add them to the staging area with git add and commit them locally to history with git commit.

Git workflow with a fork

Importantly though, one thing the screenshot above leaves out is that we may want to specify which remote we’re pushing changes to. We can specify the remote inside of the git push command by adding the name of that remote right after git push. For example, if we want to push to the origin remote we can run:

git push origin

If we wanted to push directly to the upstream remote (assuming we had permissions to do that) we could run:

git push upstream

Note that the names upstream and origin are purely by convention. If you wanted to have fun, you could just as easily call the upstream repository mountaintop and the fork downriver or something else fun.

In a typical workflow, we would push our changes to the origin with git push origin and then open a pull request against the upstream repository:

Git workflow with a PR from a fork

This pull request gives other developers an opportunity to use a tool like GitHub or BitBucket to review our code and approve the changes we want to make. It also allows us do things like run automatic integration tests and other code scanning to check for possible improvements or errors.

Branches

There is one more important Git concept that I’ve neglected to mention called branches. Each time we push changes in Git we’ve been doing so while working on a branch. You can think of branches as ways that we can help compartmentalize all the changes we’re making to the code. Eventually, if we’re working on two new features simultaneously we might want to keep those changes separated in two different branches.

Additionally, because we have a default branch that stores the most recent changes we want to work from, when we have features we’re adding we typically create a new feature branch and merge those changes into that default branch.

Let’s see how this works! Below, we have a main branch from an origin remote repository that we clone locally. But instead of working with the main branch (which is also commonly called master), we make changes on a new feature branch called my-feature:

Git workflow with a PR from the origin main branch

One thing that is not shown in the image above is that to create and switch to a new branch like my-feature we could run a few different commands.

We could switch to the new branch (called checking it out) and create it in just one command:

git checkout -b my-feature

We could also do this in two commands:

git branch my-feature
git checkout my-feature

When we create and switch to this new branch it will have all the files from the main branch we were just on. But for all changes we make to the my-feature branch they wont be reflected in the main branch until later.

So we can still do all the same steps to git add and git commit our changes until we’re ready to push them to the origin remote. In this case however we specify the branch name that we’re pushing to in the remote repository because we’re creating a new one there with:

git push origin my-feature

After we do this, even with a just a single remote we can typically use our GitHub or BitBucket repository system to open up a pull request from our my-feature branch to merge changes into the main branch:

Git workflow with a PR from a fork

And when we’re ready to get things back in sync with our main branch locally, we can check that out with:

git checkout main

When it’s checked out, and after the pull request is merged, we can bring the changes back down from the remote repository with git pull as shown here:

Git workflow showing git pull to main from origin main

Whenever we’re done, we could delete the local feature branch because all it’s changes are merged into main using:

git branch -d my-feature

From here we just start the process again and again as we add more changes and improvements!

Summing Things Up

So we covered a lot of different concepts here including:

Repositories - Where our code is tracked by Git
Remotes - Where we store our code remotely, including both upstream and origin which are two different possible remote repositories
Branches - which we use to help keep different code changes separated into features or more discrete units.
Commiting code changes and new files

As well as a few other concepts and commands to put it all together!

If you have any questions don’t hesitate to leave a comment below or reach out to me on Twitter!