In addition to supporting multiple version control systems; Git, Subversion, and Mercurial, Deveo has been built to scale for thousands of simultaneous users. We have built Deveo to support high-availability and horizontally scalable installations. High-availability and horizontal scalability are necessities for large corporations who wish to consolidate code hosting and other development tools under one platform. When a large organization starts to plan to implement a centralized and large-scale code hosting deployment, the infrastructure requirements are naturally one concern, which should not be taken lightly.
In this blog post, I'm going to cover the necessary things you need to consider when planning a large-scale self-hosted Deveo based installation behind your company firewall. The information is based on our customer cases, in addition to over 10 years of history and expertise in enterprise level version control systems and other software development tool deployments.
Reasons for Enterprise scale installation
When you have a development organization with 10 people, you are likely better off hosting your code in the cloud rather than behind your company firewall. The maintenance overhead of monthly upgrades, keeping security up to date and handling backups simply takes precious time from your development team. In larger organizations with thousands of employees in the development organization, it pays off. Sometimes it's required because of more strict compliance and security policies in place, that dictate how and where the Company IPRs, source code assets, should be stored. For companies with the aforementioned requirements, hosting the source code on-premises, either behind the company firewall or otherwise internally, is a suitable solution.
As the headcount of the development organization in such companies can be calculated in hundreds or thousands, rather than individuals, the code hosting deployment needs to take such user amounts into account, to provide high-availability and scalability for serving the needs of the whole organization. Such a service, provided through the centralized IT organization, needs to be up and running 24/7 and endure even the harshest usage and load spikes that are thrown at it.
High-availability Deveo setup
A typical high-available and enterprise-scale code hosting setup behind company firewall with Deveo looks like the below picture:
A high-availability Deveo setup consists of multiple application nodes that are placed behind a load balancer. The application nodes share the load for simultaneous version control system operations. In addition, if one of the servers breaks, or needs to be taken down, other servers share the load, while the server is out-of-order. This allows things such as zero-time-deployments of software updates, by simply taking one of the servers out from the load balancer, updating it and bringing it back to the load balancer. It also allows scaling the setup by simply adding more application nodes when the usage grows.
In large enterprises, there might be multiple sites involved in common development efforts. The product development can be conducted, for example, in both Europe and Asia. In order to guarantee fast access to the Git, Subversion and Mercurial repositories from multiple sites, the data can also be replicated to off-site servers, which will provide blazing fast read speeds in all sites, while all of the write operations go to a master instance.
CPU and Memory requirements for application nodes
One of the most demanding tasks for CPU/RAM is the Git cloning. For a large Git repo (1.5 GB, 500K commits),
git-pack-objects utilizes a single core CPU from 45% to around 90% and around 10% of RAM. It then uses lower resources during the
git-receive-pack operation, to about 20% CPU and 40% RAM. For the same repository, the initial Git push/import triggers
git-index-pack, which utilizes CPU from 45% to around 90% and about 10% of RAM.
Even though the aforementioned example is probably from the more demanding side, it's a good idea to have enough horsepower in the application nodes to give enough performance. As an example, for setup with 2000 users doing version control system operations daily, the high-availability Deveo setup would require three application nodes, each with 8 cores and 32GB of memory.
In addition to the local disks on the application nodes that contain the operating system files, the application nodes are all connected to a shared storage where the Git, Subversion, and Mercurial repositories, and other shared files are stored. The shared storage can be, for example, a Network File System (NFS), Common Internet File System (CIFS) which is also known as Server Message Block (SMB), GlusterFS, or any other distributed file system. The basic requirements for a shared storage are the data throughput to the application nodes in addition to the latency of the data retrieval. The speed of the shared storage will impact the clone speeds roughly with the given formula:
Storage performance (Mbps) / consecutive clone or checkout operations = clone or checkout speed
And in more concrete terms, given your shared storage would have a throughput of 1 Gbps and you would be doing 100 Git clone operations simultaneously, you would be transferring the data in 10Mbps speed for each cloning client. If you use Git appropriately, and not store large binary files to the repositories, the repository sizes should stay small.
The latency of the shared storage should be minimized. Git, Subversion, and Mercurial are all very sensitive to latency. For example in Git, the
git log might require thousands of Git objects to be loaded and traversed sequentially. If there’s latency in the low-level disk accesses, performance naturally suffers. An appropriate latency can be achieved, however, using fast network connection between the shared storage and the application nodes. In addition, there are specific file-systems that should be followed to ensure appropriate latency and performance.
Database and load balancer requirements
In Deveo high-availability setup, the databases are kept on a single server. Deveo uses both MongoDB and Redis databases for storing data. Deveo Web UI uses MongoDB extensively. It is recommended to give MongoDB a decent amount of memory to have the working set reside in memory for fast access. For the same 2000 daily user amount, another node with 8 cores and 32GB of memory should be decent.
A load balancer is used to distribute the traffic to application nodes. The load balancer also offloads the SSL. If using HAProxy for load balancing, we recommend 1-2 CPU cores and 2GB of memory. Having a reliable and fast network between the load balancer and the Deveo Web servers is also extremely important.
Setting up a high-availability Deveo setup takes the following ingredients:
- 3 application nodes with 8 cores and 32GB memory each
- 1 database node with 8 cores and 32GB memory
- 1 load balancer with 2 cores and 2GB of memory
- a shared storage of your choice
For the full guide for setting up the high-availability setup, we have our administration guide that will provide detailed step-by-step instructions for all of the details. If you would like to know whether the high-availability Deveo setup would suit your needs, contact us. We are glad to discuss whether, for example, a combo setup, where all of the services are on one machine, is more appropriate for your case, or whether the high-availability and scalability would bring additional value.
If you liked this article or would like to ask something specific about The Deveo high-availability, do leave a comment.
Read more about Deveo on-premises installations: