Featured image of post GIT Internals

GIT Internals

How Git Works Under the Hood

So, you use Git every day, pushing, pulling, and occasionally rage-quitting because of a merge conflict.

But do you actually know what’s going on under the hood?


1. Git’s Data Model: Snapshots, Not Diffs

Most people assume Git tracks changes (like traditional version control systems such as SVN or Mercurial), but nope.

Git is a snapshot-based system.

Each commit is a complete snapshot of your repository at that moment in time. The magic? If a file hasn’t changed, Git just reuses the same reference instead of storing a duplicate.

Proof With cat-file

Run this in a Git repo:

1
git cat-file -p HEAD

You’ll see something like:

1
2
3
4
5
6
tree a1b2c3d4e5
parent 1234567890
author You <you@example.com> 1647890189 +0000
committer You <you@example.com> 1647890189 +0000

Added README

That tree hash (a1b2c3d4e5) represents the snapshot of the files at that commit.


2. Git’s Directed Acyclic Graph (DAG)

Git stores commits as a Directed Acyclic Graph (DAG)—a fancy way of saying commits point backward in time, forming a tree-like structure with no cycles.

Each commit has:

  • A tree (which maps to actual file contents)
  • A parent (or multiple parents for merges)
  • Metadata (author, message, etc.)

Here’s a quick ASCII example of a Git history:

1
2
3
A ← B ← C ← D  (main branch)
         E ← F (feature branch)

This structure makes history traversal super fast.


3. SHA-1 Hashing: Git’s Fingerprint System

Git identifies everything (commits, trees, blobs, etc.) using SHA-1 hashes. This ensures data integrity.

Let’s see how Git hashes things:

1
echo "Hello Git" | git hash-object --stdin

Output:

1
8ab686eafeb1f44702738c8b0f24f2567c36da6d

That’s a SHA-1 hash of the string “Hello Git”. Git does this for all files, commits, and trees.


4. Intervals and Git’s History Walks

How git log Walks the Graph

When you run:

1
git log --graph --oneline --all

Git walks the commit graph using intervals—basically, it optimizes how it retrieves commits by skipping unnecessary paths.

Example:

If your commit history looks like:

1
2
3
A ← B ← C ← D (main)
         E ← F (feature)

Git doesn’t scan every commit sequentially. Instead, it walks both branches in an interval-like pattern, minimizing redundant work.


5. Exploring Git Internals With Code

Want to see Git’s raw storage? Let’s play!

1. List All Git Objects

1
ls .git/objects

You’ll see folders with weird names (first 2 chars of SHA-1 hashes).

2. Inspect a Commit Object

Find a commit hash and run:

1
git cat-file -p <commit-hash>

It prints the commit details.

3. Check a Tree Object (File Snapshot)

Grab the tree hash from the commit and run:

1
git cat-file -p <tree-hash>

Now you see the folder structure!

4. Inspect a Blob (File Content)

Find a file’s blob hash and run:

1
git cat-file -p <blob-hash>

Boom! The file’s content appears.


6. Git’s Garbage Collection and Packfiles

Git compresses objects using packfiles. These bundle multiple objects into a single file for efficiency.

Run:

1
git gc

Git will repack objects, saving space.

To inspect packfiles:

1
ls .git/objects/pack


7. What Is a Git Branch? (It’s Just a Pointer!)

Most people assume a branch is a separate folder or copy of files. Nope.

A Git branch is just a file that contains a commit hash. That’s it.

Let’s check it out!

1
cat .git/refs/heads/main

Example output:

1
9fceb02b3beecf73c4f0d7b24b3b9d09981fb17e

That’s just a commit hash!

When you create a branch, Git creates a new file under .git/refs/heads/ with a different commit hash. This means branches are cheap—they just move a pointer.


8. How HEAD Works: The Active Branch

HEAD is a special reference that tells Git which branch you’re currently on.

Run:

1
cat .git/HEAD

Example output:

1
ref: refs/heads/main

This means HEAD is pointing to main. When you switch branches, Git just updates this file.

Try:

1
2
git checkout -b new-branch
cat .git/HEAD

Now it shows:

1
ref: refs/heads/new-branch

No files copied. No magic. Just a simple pointer change.


9. What Happens When You Switch Branches?

Let’s break it down:

  1. Git updates HEAD to point to the new branch.
  2. Git updates the working directory to match the new branch’s commit.
  3. Git unstages any conflicting changes (if needed).

Try this:

1
git checkout -b test-branch

Now check:

1
cat .git/HEAD

Output:

1
ref: refs/heads/test-branch

Git also updates your working files to match the latest commit in test-branch.


10. How Merging Works Under the Hood

When you merge branches, Git looks for a common ancestor (usually the latest shared commit).

Example:

1
2
3
A ← B ← C (main)
      D ← E (feature)

If you merge feature into main, Git finds commit B (the last shared commit), then combines changes from C and E.

Merge Types

  1. Fast-forward merge

    • If no new commits exist on main, Git just moves the branch pointer forward.

    • Example:

      1
      2
      
      git checkout main
      git merge feature
      

      This just updates main to point to E.

  2. Three-way merge

    • If main has new commits, Git needs to create a merge commit.

      1
      
      git merge feature
      

      Git creates a new commit combining changes.


11. How Git Deletes and Recovers Branches

Deleting a Branch

A branch is just a file, so deleting it is easy:

1
git branch -d feature

This deletes .git/refs/heads/feature.

To force delete:

1
git branch -D feature

Recovering a Deleted Branch

If you deleted a branch but need it back:

  1. Find the last commit hash:

    1
    
    git reflog
    
  2. Restore the branch:

    1
    
    git checkout -b feature <commit-hash>
    

Boom! The branch is back.



12. How Git Rebase Works Internally

Rebasing is one of Git’s most misunderstood features. Instead of merging branches, it rewrites history by moving commits.

Example scenario:

1
2
3
A ← B ← C (main)
       D ← E (feature)

If we run:

1
2
git checkout feature
git rebase main

Git does this internally:

  1. Finds the common ancestor (B).
  2. Moves feature commits (D and E) onto main, replaying them one by one.
  3. Updates feature to point to the new commit history.

The result:

1
A ← B ← C ← D' ← E' (feature)

Rebasing rewrites commit hashes, creating new commits D' and E'. This is why you shouldn’t rebase shared branches—it changes history!


13. The Git Reflog: Your Undo Button

Git never actually loses commits—even deleted ones. Every action is logged in the reflog (git reflog).

Run:

1
git reflog

Example output:

1
2
9fceb02 HEAD@{0}: commit: Added README
2d4f7b6 HEAD@{1}: checkout: moving from feature to main

This shows recent actions, like branch checkouts and commits.

If you accidentally delete a branch or reset a commit, use:

1
git reset --hard HEAD@{1}

Or restore a deleted branch:

1
git checkout -b feature 2d4f7b6

Git is hard to break, thanks to the reflog.


14. How Git Stash Works Internally

When you run:

1
git stash

Git doesn’t create a branch. Instead, it:

  1. Saves your uncommitted changes as a stash object.
  2. Moves HEAD back to a clean working directory.

Stashes are stored in:

1
ls .git/refs/stash

To view them:

1
git stash list

To restore:

1
git stash apply

Each stash is a commit, so you can even inspect them:

1
git stash show -p

15. Packfiles: How Git Optimizes Storage

If you check .git/objects, you’ll see lots of small files.

Over time, Git packs these into a single compressed file called a packfile.

Check packfiles:

1
ls .git/objects/pack

To manually optimize storage:

1
git gc

Git then:

  • Compresses objects into fewer files.
  • Eliminates duplicate objects.
  • Reduces repository size.

This is why Git repos stay efficient, even with thousands of commits.


16. Git Garbage Collection: Cleaning Up Unused Objects

Git automatically removes orphaned objects (like old commits no longer referenced by any branch).

To see loose objects:

1
git fsck --unreachable

Run garbage collection manually:

1
git gc --prune=now

This removes:

  • Orphaned commits
  • Old packfiles
  • Unreferenced blobs

If you accidentally delete a commit before garbage collection runs, you can still find it with git reflog.



17. Git Bisect: Debugging Like a Time Traveler

Ever had a bug that wasn’t there yesterday? Instead of manually checking old commits, Git can automatically find the exact commit where the bug was introduced.

How It Works

git bisect performs a binary search on your commit history.

  1. Start bisect mode:

    1
    
    git bisect start
    
  2. Mark a good commit:

    1
    
    git bisect good <commit-hash>
    
  3. Mark a bad commit:

    1
    
    git bisect bad HEAD
    
  4. Git will now checkout the midpoint commit and ask you to test.

    • If the commit is good, run:
      1
      
      git bisect good
      
    • If the commit is bad, run:
      1
      
      git bisect bad
      
  5. Git repeats this process until it finds the first bad commit.

  6. When finished, reset:

    1
    
    git bisect reset
    

This automates debugging by letting Git find exactly where the problem started.


18. Worktrees: Multiple Checkouts at Once

Ever wanted to work on two branches at the same time without stashing or committing? That’s what git worktree does.

How Worktrees Work

A Git worktree is another checkout of the same repository but in a different folder.

Creating a Worktree:

1
git worktree add ../feature-branch feature

This creates a new folder ../feature-branch where you can work on the feature branch without switching your main repo.

Listing Worktrees:

1
git worktree list

Removing a Worktree:

1
git worktree remove ../feature-branch

This is super useful for working on multiple branches without constantly switching.


19. Bare Repositories: What’s Inside Remote Git Repos?

When you run:

1
git clone git@github.com:user/repo.git

You’re cloning a bare repository.

What’s a Bare Repository?

A bare repo has no working directory—just the Git data.

To create one:

1
git init --bare myrepo.git

A bare repo only contains the .git directory:

1
2
3
4
5
myrepo.git/
 ├── HEAD
 ├── refs/
 ├── objects/
 ├── hooks/

This is what GitHub, GitLab, and other remote services use to store repos centrally.

Why Use a Bare Repo?

  • Collaboration: Remote repositories need to accept pushes, but a regular Git repo can’t push to itself.
  • Centralized Storage: CI/CD systems often use bare repositories for automation.

20. Git Submodules: Repositories Inside Repositories

Sometimes, you need to include another Git repo inside your own (e.g., a shared library). That’s where submodules come in.

Adding a Submodule

1
git submodule add https://github.com/some/library.git libs/library

This creates:

1
2
libs/library/   # A separate Git repo
.gitmodules     # Tracks submodule settings

Cloning a Repo With Submodules

By default, submodules aren’t cloned. To fix this:

1
git clone --recurse-submodules <repo-url>

Or if you forgot:

1
git submodule update --init --recursive

How Submodules Work Internally

Submodules aren’t stored as normal files. Instead, Git stores a special commit reference:

1
cat .gitmodules

This tells Git which commit of the submodule to check out.


21. Git Hooks: Automate Everything

Git has built-in automation via hooks—scripts that run before/after Git commands.

Where Hooks Live

Hooks are stored in .git/hooks/:

1
ls .git/hooks

Common Hooks:

HookRuns When?
pre-commitBefore a commit is created
pre-pushBefore a git push
commit-msgWhen a commit message is entered
post-mergeAfter a successful merge

Example: Preventing Bad Commit Messages

Create .git/hooks/commit-msg:

1
2
3
4
5
#!/bin/sh
if ! grep -qE "^(feat|fix|docs|chore):" "$1"; then
  echo "Commit message must start with feat:, fix:, docs:, or chore:"
  exit 1
fi

Make it executable:

1
chmod +x .git/hooks/commit-msg

Now Git rejects bad commit messages!



22. What Is a Git Patch?

A Git patch is a text-based representation of a commit (or multiple commits). Instead of pushing/pulling, you can export a change as a .patch file and apply it elsewhere.

Think of it like a portable commit—you can send it via email, copy it to another machine, or even manually review it.

Example workflow:

  1. Generate a patch file (git format-patch).
  2. Send it to someone (email, Slack, etc.).
  3. Apply it on another repository (git apply).

This is how Linux kernel development and many open-source projects handle contributions!


23. Creating a Git Patch

To generate a patch for the last commit:

1
git format-patch -1

This creates a .patch file like:

1
0001-Added-feature-X.patch

To generate a patch for multiple commits:

1
git format-patch HEAD~3

This creates one .patch file per commit.

To create a single patch for all changes:

1
git diff > my_changes.patch

This is useful when you haven’t committed changes yet.


24. Structure of a Git Patch File

A .patch file is just plain text! Let’s break down an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
From 4f5a4e7c1342b5a1b6c6ff3a0b2f1d4a52a3d5b8 Mon Sep 17 00:00:00 2001
From: John Doe <johndoe@example.com>
Date: Mon, 1 Mar 2025 14:30:00 -0700
Subject: [PATCH] Fix typo in README

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 3b18b12..6f7e8f1 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-Hello Git Usres!
+Hello Git Users!

Breakdown:

  1. Metadata (Commit Info)

    • From: → Commit hash
    • From: → Author
    • Date: → Timestamp
    • Subject: → Commit message
  2. File Change Summary

    • Shows the number of insertions/deletions.
  3. Unified Diff Format (diff --git)

    • index → Shows blob hashes before/after the change.
    • --- and +++ → Indicates file modifications.
    • @@ → Shows line numbers where changes occurred.

25. Applying a Git Patch

To apply a patch:

1
git apply 0001-Added-feature-X.patch

This applies the change but does not commit it.

To apply and commit:

1
git am 0001-Added-feature-X.patch

The am (apply mailbox) command preserves the original commit message and author.


26. How Git Stores and Processes Patches

Internally, a patch file is just a diff. When you run:

1
git diff > changes.patch

Git runs the diff algorithm to compute differences between the latest commit and your working directory.

When applying a patch, Git:

  1. Reads the diff data.
  2. Finds the target file(s).
  3. Applies changes line by line.
  4. Checks for conflicts (if needed).

What Happens If a Patch Fails?

If a patch doesn’t match the current state, you get:

1
2
error: patch failed: README.md:1
error: README.md: patch does not apply

This means the file changed since the patch was created. You’ll need to manually fix conflicts before retrying.


27. Git Patches vs Cherry-Picking

Another way to move changes between branches is git cherry-pick:

1
git cherry-pick <commit-hash>

This applies a commit from one branch to another.

FeatureGit PatchCherry-Pick
Requires commit history?❌ No✅ Yes
Can be shared via email?✅ Yes❌ No
Preserves author info?✅ Yes✅ Yes
Can apply multiple commits?✅ Yes✅ Yes

Patches are more flexible because they work even if the repo history is different.


28. Interactive Patch Editing

Want to apply only part of a patch? Use --reject:

1
git apply --reject my_changes.patch

This applies as much as possible and creates .rej files for conflicts.

To manually inspect:

1
git apply --check my_changes.patch

This dry-runs the patch without applying it.


Here’s a deep dive into Git Cherry-Picking, starting at section 29.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
title: "Git Internals: How Cherry-Picking Works Under the Hood"
description: "A detailed look at how Git cherry-picking works, its internals, and when to use it effectively."
slug: "git-cherry-pick-internals"
date: 2017-10-22
image: "post/Articles/IMAGES/37.jpg"
categories: ["Git", "Version Control", "Internals"]
tags: ["Git", "Cherry-Pick", "Commits", "Merge", "Conflict Resolution"]
draft: false
weight: 428
---

# Git Internals: How Cherry-Picking Works Under the Hood

Sometimes, you need to **pick just one commit** from another branch without merging the entire branch. That’s where **Git cherry-picking** comes in.

Cherry-picking lets you **selectively apply commits** from anywhere in your repo’s history, without merging unwanted changes.

In this article, we’ll cover:
- What cherry-picking is and why it’s useful
- How cherry-picking works internally
- How to cherry-pick multiple commits
- Handling conflicts during cherry-picking
- Cherry-picking vs merging vs rebasing

Let’s get started!

---

## 29. What Is Git Cherry-Picking?

Git cherry-picking **applies an individual commit** from another branch to your current branch.

### Example Use Case:
You’re working on `main`, but a bugfix was added to `feature-branch`. You don’t want the **entire branch**, just the bugfix.

Instead of merging, you can **cherry-pick** the fix:

```sh
git cherry-pick <commit-hash>

This creates a new commit on your branch with the same changes as <commit-hash>, but without merging the whole branch.


30. How Cherry-Picking Works Internally

When you cherry-pick a commit, Git:

  1. Finds the commit you specified.
  2. Applies the changes from that commit onto your current branch.
  3. Creates a new commit with the same changes but a new hash.

Under the Hood:

A cherry-pick is equivalent to:

  1. Running git diff <commit-hash>^ <commit-hash> to get the changes.
  2. Applying those changes to the working directory.
  3. Creating a new commit with those changes.

Example:

1
2
3
A ← B ← C ← D (main)
       E ← F ← G (feature)

If you run:

1
git cherry-pick F

The result:

1
2
3
A ← B ← C ← D ← F' (main)
       E ← F ← G (feature)

Even though F already exists in feature, a new commit F' is created on main.


31. Cherry-Picking Multiple Commits

You can cherry-pick multiple commits at once:

1
git cherry-pick <commit1> <commit2>

Or a range of commits:

1
git cherry-pick <start-commit>^..<end-commit>

For example:

1
git cherry-pick C^..E

This picks C, D, and E.


32. What Happens If a Cherry-Pick Fails?

If Git can’t apply a commit cleanly, it results in a conflict:

1
error: could not apply <commit-hash>

Git will stop cherry-picking and let you resolve conflicts manually.

Fixing a Cherry-Pick Conflict

  1. Open conflicted files and fix them.

  2. Mark them as resolved:

    1
    
    git add <fixed-file>
    
  3. Continue the cherry-pick:

    1
    
    git cherry-pick --continue
    

If you want to cancel:

1
git cherry-pick --abort

33. Cherry-Picking vs Merging vs Rebasing

Cherry-picking isn’t the only way to move commits between branches. Here’s how it compares to merging and rebasing:

FeatureCherry-PickingMergingRebasing
Selects specific commits?✅ Yes❌ No❌ No
Creates new commit hashes?✅ Yes❌ No✅ Yes
Maintains original history?❌ No✅ Yes❌ No
Can be undone easily?✅ Yes (revert)✅ Yes (revert)⚠️ No (rewrites history)

When to Use Cherry-Picking:

✅ When you need only one commit from another branch.
✅ When you don’t want to merge an entire branch.
✅ When applying a hotfix from one branch to another.


34. Automating Cherry-Picking With -x

By default, cherry-picked commits don’t track where they came from. To include a reference:

1
git cherry-pick -x <commit-hash>

This adds a reference like:

1
(cherry picked from commit 4a5b3c)

Now it’s clear that the commit came from elsewhere!


35. Undoing a Cherry-Pick

If you made a mistake, you can undo a cherry-pick before committing:

1
git cherry-pick --abort

If you already committed the cherry-pick:

1
git revert <commit-hash>

This creates a reverse commit that undoes the cherry-picked changes.