How to improve svn performance over 7 times

How to improve svn performance over 7 times

In Deveo you can setup, manage and maintain Git, Subversion and Mercurial repositories. All version control system repositories in Deveo can be accessed either through HTTP(S) or SSH protocol. We use Apache HTTP server in handling all HTTP based authentication and authorization. In this blog post, I'm going to describe how we were able to achieve a significant improvement to our Subversion performance over HTTP protocol.

If you are new to Subversion (SVN) and would like to learn how to use it, check out our SVN tutorial.

The problem

When using Subversion over HTTP protocol, it can very quickly generate hundreds or thousands of requests against the server for a single operation, and the cost of opening a new connection to the server is non-trivial. In the Version Control with Subversion book, it's described that this problem can be mitigated by keeping the HTTP connections alive longer by setting the KeepAlive on and MaxKeepAliveRequests to a large number (at least 1000).

On top of keeping track of the connection between clients and the server; Deveo has various authentication and authorization schemes to support. Authentication can happen either against a local user database, against LDAP or Active Directory service or even both. Authorization-related information is stored in the database and can be accessed through Deveo REST APIs.

Every request that goes through Apache also goes through the authentication and authorization process. This means that those hundreds or even thousands of requests that subversion clients send are all authenticated against either the local database, LDAP or Active Directory service or in the worst case scenario, both. As you might imagine, this brings an exhaustive amount of overhead to Deveo servers as well as to the database or directory service in question.

Initial approach to solving the problem

How these type of situations are handled traditionally is by caching the authentication and authorization results. Apache offers modules for handling authentication and authorization through database and LDAP or Active Directory services with mod_authnz_ldap, mod_authn_dbd and mod_authz_dbd modules. A way to cache the authentication results is also offered by mod_authn_socache module. Due to our more complex authentication and authorization scheme, however, we could not use the standard modules but had to do things more deviously.

The obstacle

As authentication and authorization in Deveo can happen in multiple ways, how we handle the requests is through our REST APIs. In order to do that we have implemented a custom python module and call it through Apache’s mod_wsgi authentication handler. This means that none of the modules Apache provides by default were applicable for our case.

The solution

Because of the limitations, the solution was to add caching to the authentication module itself. We used Redis to do the caching, as we use it for other purposes as well. How the caching works in practice is that once the user is authenticated the first time, the result is stored in Redis database - encrypted of course. Any subsequent request from the same user then simply fetches the encrypted results from Redis instead of calling the API. Luckily for us, the communication between Python and Redis is straightforward using the redis module.

Results

With the caching in place, we were able to achieve significant improvements within our test scenarios. The first scenario was a one machine combo installation. The initial time it took to checkout a large Subversion repository was 1 minute and 2 seconds. With the Redis based caching in place, the time was reduced to 8 seconds. This would constitute a whopping 7.75x improvement.

Improvements in Deveo combo installation

In the second scenario we set-up Deveo in clustered mode. In this scenario, before the caching was enabled the checkout took 26 seconds and 6 seconds after the caching was enabled. In clustered setup, the improvement was 4.3x.

Improvements in Deveo cluster installation

We used virtual machines in Google Compute Engine to conduct the testing. The test repository used in both test cases was roughly 300 directories, 1800 files and approximately 100MB in size. The improvement results vary depending on multiple factors. At least the following factors can affect on how much the performance is improved:

  • Size of the repository
  • Complexity of the repository (Number of files and directories)
  • Deveo setup type (Combo or clustered setup)
  • Authentication method used (Local database authentication, LDAP/Active Directory or both)

Summary

Introducing caching to authentication and authorization requests in Apache is a great way to improve HTTP-based Subversion performance regardless of whether the caching is done by the default modules offered by Apache or by something implemented on your own. The improvements were released in Deveo 2.11.1 that can be downloaded through our customer portal.

Seamless software development.

Code management and collaboration platform with Git, Subversion, and Mercurial.

Sign up for free
comments powered by Disqus