Git Ghost: A command line tool to execute a program with local modifications without losing reproducibility
We’re happy to open-source Git Ghost, which is developed by Shingo Omura and Daisuke Taniwaki. By using this tool, you can run ML jobs of your git-managed code with locally made modifications without losing reproducibility. You can go back to the code of a specific run anytime during the trial-and-error phase!
Running one ML job for trial-and-error while waiting for other jobs is a very common use case. Before Git Ghost, the simplest way to do it was managing source code with git and using rsync to synchronize our source code with locally made modifications to run ML jobs in our Kubernetes cluster. Then, we realized we often want to revert the code back to a state when we got good results. However, although git-managed code provides versioning of your code, synchronizing code with rsync breaks this versioning because it does not make any versioning of the synchronized code, so it was hard to get back such code.
One idea we came up with first was just to commit local modifications and push it to a remote. However, it’s cumbersome to commit and push many times just to run a job with a modification of a few characters and of course, you don’t want to get your remote repository dirty. So we came up with the idea of this tool.
Assume you want to send a modification of content change from a to b on a file foo in your local machine to a directory in a remote server.
First, create a patch of the local modification.
$ git ghost push xxxxxxx yyyyyyy $ git ghost show yyyyyyy diff --git a/foo b/foo index 7898192..6178079 100644 --- a/foo +++ b/foo @@ -1 +1 @@ -a +b
Then, you can sync the local modification in a remote server.
$ git ghost diff HEAD $ git ghost pull yyyyyyy $ git ghost diff HEAD diff --git a/foo b/foo index 7898192..6178079 100644 --- a/foo +++ b/foo @@ -1 +1 @@ -a +b
There you go! You can see that the modifications in your local machine were synchronized to the remote server.
Although Git Ghost is a very simple tool as shown above, it performs brilliantly when it is integrated with other tools. For example, you can send modifications into a Kaniko container to build Docker images with local modifications. Here’s an example using Argo to execute a job with local modifications in a reproducible manner.
The idea is simple. The tool creates a patch with your locally made commits and modifications with the information of a base commit existing in your remote repository and pushes it to another remote repository. Then, it downloads the base commit in a remote place and applies the patch. A small trick here is we separated patches of locally made commits and locally made modifications because with this separation, locally made modifications can be reused even after locally made commits are pushed to the remote repository.
The reason why we chose a git repository for the patch storage is that it doesn’t require extra tools and credentials.
Although we’re going to use this tool in a Kubernetes cluster, we believe using this tool is not limited to Kubernetes clusters. You can use it to send changes from your laptop to an on-premise server if you want to track changes.
Please try it and give us your feedback on GitHub!