Before the existence of self-service platforms, managing of source code repositories was typically done by IT organization. What I have discovered during my discussions with organizations is that many of them still use a single repository where all of the source code for all of the projects are stored. This is especially the case with organizations where software development is not the core business, but still heavily needed. In this blog post, I will pinpoint three distinctive downfalls of this approach.
1. Access management
Access management of projects is a simple precaution for preventing human errors. The most common scenario in a single repository setup I have seen is where everybody is simply granted access to everything. This is partly because it requires less overhead of management, but mostly because of the "we are all smart people, and we know what we are doing" attitude.
Even though this attitude makes sense most of the times and I strongly believe in empowerment, it's good to have a safety net available to prevent those accidents where people mistakenly make changes to parts of the code base which they were not supposed to touch in the first place. By dividing the repository into multiple ones, each consisting source code of a single project, we can manage access rights on project level and at the same time have a clearer picture of who is actually allowed to access what.
2. Single point of failure on repository corruption
What would be the benefit of having a version control system without the assurance that you have the whole version history available? Subversion has the
svnadmin verify command to check the integrity of the repository. Git and Mercurial offer similar functionality with
git fsck and
hg verify commands respectively. When the repository grows larger in both size and the number of commits, verifying the integrity can turn out to be tediously slow. When splitting the repository to multiple smaller ones, the job can be parallelized or run per repository.
The time it takes to verify the integrity is not the number one factor to consider, but rather what happens if the repository where all of the projects are gets corrupted. It basically boils down to limiting the risk of repository corruption to only a single project if it happens.
3. Slowness in version control operations
We all want fast feedback whether it relates to looking up the change history or merging a branch. Git and Mercurial have been improving the speed of some operations by having the information available locally. However, the larger your repository is, the longer the execution of some operations is going to take against that repository. An extreme example is this forum post from a Facebook developer from 2012 where a 'git blame' command against their repository took 11 and 44 minutes with a warm OS file cache and a cold cache respectively. No one can honestly argue that 11 minutes is fast enough to see who altered a specific line in a file.
Regardless of the number of projects, you should always keep them in different repositories. Even when the projects are tightly coupled, it's better to use Git submodules, SVN externals or Mercurial subrepositories to achieve unified tracking of changes between components rather than having everything piled up in one repository. I would love to hear success stories of splitting large repositories consisting of multiple projects to individual project repositories. If you have any to share, please leave a comment.
That being said, if you are afraid of the extra overhead that management of multiple source code repositories brings, why not give Deveo a try? With Deveo you can set up and manage any number of Git, Subversion and Mercurial repositories behind your company firewall.