Git Large File Storage (LFS)
Git Large File Storage (LFS) is an open-source Git extension for versioning large files and binary assets alongside your project's source code. It replaces large files (such as audio samples, videos, datasets, and graphics) with text pointers inside Git, while storing the file content on a separate server. This significantly reduces the size of your Git repository and improves performance, especially for teams collaborating on projects with large assets.
Problem Solved by Git LFS Git, by default, stores all versions of every file in the repository's history. This becomes problematic when dealing with large files, as the repository size grows rapidly, impacting cloning, fetching, and other Git operations. Storing large binary files directly in Git can lead to performance bottlenecks, longer wait times, and higher storage costs.
Key Concepts
- Pointer Files: Instead of storing the actual large files in the Git repository, Git LFS replaces them with small text files (typically under 1KB) called pointer files. These pointer files contain metadata about the large file, including its object ID (OID) and size.
- LFS Server (Object Storage): The actual large file content is stored on a separate Git LFS server. This server can be a self-hosted solution (e.g., using MinIO, Amazon S3, or other object storage) or a hosted service provided by platforms like GitHub, GitLab, or Bitbucket. The LFS server manages the storage and retrieval of these large files.
- Object ID (OID): A unique identifier (usually a SHA-256 hash) that represents the immutable content of a large file. This is stored within the pointer file.
- .gitattributes File: This file in your Git repository specifies which file types or paths should be managed by Git LFS. It essentially tells Git to treat certain files as LFS objects.
How Git LFS Works
- Tracking Files: You configure Git LFS to track specific file types or paths using the
.gitattributes
file. This tells git which files to treat as LFS objects. - Adding Large Files: When you add (or modify) a large file that is tracked by Git LFS, Git does not directly store the content in the repository.
- Pointer File Creation: Instead, Git creates a pointer file that contains the OID of the file, its size, and a reference to Git LFS.
- Repository Storage: Git stores the pointer file in the repository. This significantly reduces the size of the Git repository.
- LFS Transfer: The actual large file content is automatically transferred to the Git LFS server.
- Cloning/Fetching: When someone clones or fetches the repository, Git retrieves the pointer files.
- LFS Retrieval: Git LFS then transparently fetches the corresponding large files from the LFS server, reconstructing the complete project on the user's machine.
Benefits of Using Git LFS
- Reduced Repository Size: The repository remains small because it only stores pointer files, rather than the large file content.
- Improved Performance: Cloning, fetching, and other Git operations are faster because less data needs to be transferred.
- Version Control for Large Files: Git LFS enables version control for large files and binary assets, tracking changes and allowing you to revert to previous versions.
- Collaboration: Enables teams to efficiently collaborate on projects that contain large files.
- Storage Efficiency: Object-based storage solutions used by LFS are often more cost-effective for large file storage compared to traditional Git repositories.
Example Configuration and Workflow
- Install Git LFS:
git lfs install
- Track Files: Specify which files Git LFS should manage. For example, to track all
.psd
files:This command adds the following line to yourgit lfs track "*.psd"
git add .gitattributes
git add my_large_image.psd
git commit -m "Add large image file".gitattributes
file: