While working on an AI-related app recently using vectors, I found myself with a 115MB file in my repository. While attempting to push to GitHub, I got an error in GitKraken about a hook failing. I didn’t have any hooks locally or remotely, so the issue was a bit perplexing. After trying a few things, I resorted to command-line Git, and that’s when I saw the problem.
GitHub blocks files larger than 100 megabytes. They recommend using Git Large File Storage, but I decided maybe I shouldn’t be adding large vector files into my repository anyway.
So, I deleted the offending file from my repository, committed and attempted to push. It failed again. Same error and complaining about the same large file. Then it dawned on me that when you delete a file in Git, it doesn’t delete delete it. Because you can revert commits, it still exists in the repo’s history.
This is where the filter-branch
command saves our bacon
git filter-branch --force --index-filter \ 'git rm --cached --ignore-unmatch path/to/your-file' \ --prune-empty --tag-name-filter cat -- --all
This should result in an output that indicates the file was successfully removed from your history. The important thing to note is you must provide the full path to your file. If it lives in myfolder/anotherfolder/somefolder/myfile.jpg
that is the value you provide in the above command.
A Quick Check-In With .gitignore
Now that you’ve dealt with the oversized file, it’s important to protect yourself against accidentally committing it again in the future. Enter .gitignore
. If you haven’t already, update your
Use the force… push
If the file you’re deleting already exists in your repo remotely (say on GitHub), you must do a forced push because you’ve messed with the history.
git push origin --force --all
Also, note that space might not be adjusted right away. You might have to wait for the garbage collection to run to clean up your repository. In my instance, I couldn’t even push because my file was larger than GitHub’s limit, and it rejected my push.