Git Internals: Understanding the Plumbing Behind Your Daily Workflow

Every day, millions of developers run git add, git commit, and git push without a second thought. But have you ever wondered what actually happens inside the .git directory? Understanding Git's internals isn't just academic curiosity — it makes you a more effective developer, helps you recover from tricky situations, and demystifies the moments when Git seems to behave unexpectedly.

Git branch visualization showing a project's version history — Behind every branch visualization lies a content-addressable filesystem built on SHA-1 hashes.

Git Is a Content-Addressable Filesystem

At its core, Git is not a version control system — it's a content-addressable filesystem with a VCS built on top. What does that mean? Every piece of data Git stores is identified by its SHA-1 hash. The same content always produces the same hash, which means:

Duplicate files are stored only once (deduplication for free)
You can verify data integrity — if a single byte changes, the hash changes
Every object is immutable — you never modify, you only create new objects

Let's look at what's actually inside .git:

$ ls .git
HEAD  config  hooks  index  objects  refs

That objects directory is where everything lives. Let's explore the four types of objects Git stores.

The Four Object Types

Git has exactly four types of objects: blobs, trees, commits, and tags. Everything Git does is built from these four building blocks.

Blobs: Storing File Contents

A blob stores raw file contents — just the data, no filename, no metadata. When you git add a file, Git creates a blob object:

$ echo "Hello, world" | git hash-object --stdin
8ba32e485b04e0833ea0867e2e6b23c7e9a0d0e3

# Where does it live?
$ find .git/objects -type f
.git/objects/8b/a32e485b04e0833ea0867e2e6b23c7e9a0d0e3

The first two characters of the hash become the directory name, and the remaining 38 characters become the filename. Git compresses blobs with zlib, but the hash is computed from the uncompressed content prefixed with a header.

Trees: Storing Directory Structure

A tree object maps filenames to blob or other tree objects. It represents a single directory snapshot:

$ git cat-file -p HEAD^{tree}
100644 blob 8ba32e485b04e0833ea0867e2e6b23c7e9a0d0e3    README.md
040000 tree a3d4b57e6c3f1f8e5a6c5d4e3f2a1b0c9d8e7f6a    src

Each entry has a mode (like Unix file permissions), the object type, the hash, and the filename. A tree captures an entire directory hierarchy — this is how Git represents your project structure at any point in time.

Commits: The History Glue

A commit object ties everything together — it points to a tree (the project snapshot), one or more parent commits, and metadata:

$ git cat-file -p HEAD
tree a3d4b57e6c3f1f8e5a6c5d4e3f2a1b0c9d8e7f6a
parent 1f2e3d4c5b6a7890abcdef1234567890abcdef12
author Davide Andreazzini <davide@example.com> 1745308800 +0000
committer Davide Andreazzini <davide@example.com> 1745308800 +0000

Add circuit breaker implementation

This is the entire commit. No diffs stored — just a pointer to a tree, the parent(s), author info, and the message. Git computes diffs on the fly by comparing trees, which is why git log --patch is so fast.

Developer terminal and code editor showing git workflow — Understanding the object model turns Git from a mysterious black box into a transparent, inspectable system.

References: How Git Finds Things

If everything is addressed by a 40-character SHA-1 hash, how do branch names like main or feature/login work? The answer is references.

References are simply files containing a hash, stored inside .git/refs/:

$ cat .git/refs/heads/main
1f2e3d4c5b6a7890abcdef1234567890abcdef12

$ cat .git/refs/tags/v2.0
3a4b5c6d7e8f9012345678901234567890123456

A branch is just a pointer to a commit. Creating a branch literally means writing 41 bytes to a file. This is why branching in Git is instant — no file copying, no directory duplication, just a new file with a hash in it.

HEAD: The Special Reference

HEAD is Git's way of knowing where you are. Usually it's a symbolic reference:

$ cat .git/HEAD
ref: refs/heads/main

But when you check out a specific commit by hash, HEAD becomes detached — it points directly to a commit object instead of a branch. This is why Git warns you about a "detached HEAD" — any commits you make won't be on any branch and could be garbage collected.

The Staging Area: Git's Index

The staging area (or index) is one of Git's most misunderstood features. It's not a diff — it's a complete snapshot of your next commit, stored in .git/index.

$ git ls-files --stage
100644 8ba32e485b04e0833ea0867e2e6b23c7e9a0d0e3 0    README.md
100644 3c4d5e6f7a8b901234567890123456789012345a 0    src/app.js

When you run git add, Git:

Computes the SHA-1 of the file contents
Stores a new blob in the object database
Updates the index entry for that filename to point to the new blob

When you run git commit, Git:

Builds a tree object from the index
Creates a commit object pointing to that tree
Updates the current branch reference to point to the new commit

This two-step process is powerful: it lets you compose your commits carefully, staging some changes while leaving others unstaged.

Packfiles: Where Compression Really Happens

If Git stored every version of every file as a separate blob, repositories would be enormous. Git solves this with packfiles.

When Git detects that the object database is getting large (or when you run git gc), it packs loose objects into a single packfile:

$ git gc
Counting objects: 1842, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (631/631), done.
Writing objects: 100% (1842/1842), done.
Total 1842 (delta 872), reused 1842 (delta 872)

Inside a packfile, Git uses delta compression: instead of storing the full content of similar files, it stores one base version and then just the differences (deltas) for subsequent versions. A file that changes one line between commits only stores that one-line delta, not the entire file again.

The .git/objects/pack/ directory contains the packfile (.pack) and an index for quick lookups (.idx). Git can reconstruct any object from a packfile by applying deltas in sequence.

Plumbing vs Porcelain

Git commands are divided into two categories:

Porcelain: The user-friendly commands you use daily — git add, git commit, git log, git merge
Plumbing: The low-level commands that actually manipulate the object database

Understanding plumbing commands gives you superpowers:

# Inspect any object
$ git cat-file -t 8ba32e485b
blob

# Pretty-print an object
$ git cat-file -p 8ba32e485b
Hello, world

# List the tree at a commit
$ git ls-tree HEAD
100644 blob 8ba32e485b...    README.md
040000 tree a3d4b57e6c3f...    src

# Show what changed between two trees
$ git diff-tree -p HEAD~1 HEAD

# Write a blob from stdin
$ echo "new content" | git hash-object -w --stdin

# Create a tree from the index
$ git write-tree
a3d4b57e6c3f1f8e5a6c5d4e3f2a1b0c9d8e7f6a

# Create a commit object
$ git commit-tree a3d4b57e -p HEAD -m "Custom commit"

These plumbing commands are what the porcelain commands use internally. When you run git commit, Git is essentially running git write-tree, git commit-tree, and updating a reference — all behind the scenes.

Practical Power Moves

Understanding the internals unlocks capabilities that feel like magic:

Recover a Lost Commit

# You accidentally did git reset --hard and lost commits
# They are still in the object database (for ~90 days)
$ git reflog
$ git checkout -b recovery-branch HEAD@{3}

Inspect What's in the Staging Area

# See exactly what will be committed
$ git diff --cached

# Or more granularly, look at the index itself
$ git ls-files --stage

Bisect to Find the Bug-Introducing Commit

$ git bisect start
$ git bisect bad          # current commit is broken
$ git bisect good v2.0    # v2.0 was working
# Git binary-searches between them
$ git bisect run ./test.sh  # automatically test each commit

Manipulate Commits at the Object Level

# Change a commit's parent without re-creating its tree
$ git replace --graft <commit> <new-parent>

# Export objects to a bundle for offline transfer
$ git bundle create repo.bundle --all

Key Takeaways

Git is a content-addressable filesystem — everything is stored and retrieved by SHA-1 hash, making deduplication and integrity verification automatic.
Four object types — blobs (file contents), trees (directories), commits (snapshots with metadata), and tags (named references to commits) are the building blocks of everything Git does.
Branches are just files — creating a branch writes 41 bytes to a file, which is why it's instant.
The staging area is a complete snapshot — not a diff, but a full representation of your next commit.
Packfiles with delta compression — Git only stores the differences between similar objects, keeping repositories small.
Plumbing commands give you X-ray vision — git cat-file, git ls-tree, and git hash-object let you inspect and manipulate Git at the lowest level.

Next time you run git commit, you'll know exactly what's happening: a blob for each staged file, a tree assembling them, a commit pointing to that tree, and a branch reference updated to the new commit. Git isn't magic — it's a beautifully simple key-value store with a VCS built on top. Understanding that foundation transforms Git from a collection of incantations into a tool you can reason about, debug, and truly master.