How to migrate a Confluence space to Markdown

How to migrate a Confluence space to Markdown

Deveo offers a simple project based Wiki for storing project related information in a neat way. Even though it's typical that companies have their central documentation storage in Confluence, some wish to migrate away from confluence to use a more simple and connected approach - Deveo wiki. This blog post introduces a way to migrate an existing Confluence space to Deveo Wiki. The approach presented in this blog post can be used to migrate a Confluence space to any Markdown Wiki, but for the sake of context, we did it for Deveo Wiki.

What is migrated

For the sake of this article, we wanted to migrate only the most up-to-date version of the pages. We were interested in migrating the page content and support formatting, images, links, and attachments. We were not interested in the history, which could, in theory, be migrated as well. Deveo stores its project Wiki content in a Git repository. Thus, the migrated content is stored in Deveo Wiki format to a Git repository. The Git repository can then be pushed to a Deveo Wiki repository in a given project. The content can also be transferred to any other Markdown-based Wiki or content management system.

TLDR

If you want to migrate a Confluence space to Markdown with attachments, links, and everything, check out the GitHub project here. If you wish to find out how the migration actually works, read on.

How the migration works

The migration from a given Confluence space to Deveo Wiki happens through Confluence APIs. From the technical point of view, the steps to migrate a Confluence space to Deveo Wiki are:

  1. Read all page ids from a given confluence space
  2. For each Confluence page do the following:
    1. Download the page to a directory
    2. Prepend and append appropriate XML metadata to the stored page
    3. Download page attachments and store them to attachments directory
    4. Convert the page to Deveo Wiki [Markdown] (https://en.wikipedia.org/wiki/Markdown) format and store it to pages directory
  3. Create a page named Home, unless it already exists

Reading all page ids from a given confluence space

Getting the page ids happens through a simple call to Confluence API. You may use the following curl command to test it out:

curl -u USER:PASSWORD https://CONFLUENCE_URL/rest/api/content?spaceKey=CONFLUENCE_SPACE | python -mjson.tool  

Saving the content with the XML metadata

From a technical point of view, we save the content while prepending, and appending the XML metadata to the stored content in the same step. Getting the page content in Confluence's storage format, which is XHTML based, happens with following API call:

curl -u USER:PASSWORD https://CONFLUENCE_URL/rest/api/content/PAGE_ID?expand=body.storage | python -mjson.tool  

The expand=body.storage returns the content or "body" of the page in the Confluence storage format, which is a format we will use to convert to markdown. For converting the page, we use an open-source tool called Confluence to markdown converter. Confluence to Markdown Converter uses XSL to transform the XHTML, which is XML, but requires that the page content contains not just the content, but also a link to the document type declaration.

In order to automate the conversion of multiple pages, we write each page to a separate text file, with the following XML at the beginning of each file.

<?xml version="1.0" encoding="UTF-8"?>  
<!DOCTYPE ac:confluence SYSTEM "../confluence-to-markdown-converter/dtd/confluence-all.dtd" [<!ENTITY clubs    "&#9827;"><!ENTITY nbsp   "&#160;"><!ENTITY ndash   "&#8211;"><!ENTITY mdash   "&#8212;">]><ac:confluence xmlns:ac="http://www.atlassian.com/schema/confluence/4/ac/" xmlns:ri="http://www.atlassian.com/schema/confluence/4/ri/" xmlns="http://www.atlassian.com/schema/confluence/4/">  

The document type declaration needs to be present in the corresponding directory we point to. We also need to append the following closing tag at the end of the document in order to make each page compliant.

</ac:confluence>  

So the basic structure of each page is:

- XML Metadata -
- Page content -
- Closing tag -

Download page attachments

Before converting the page from the Confluence format to Deveo markdown format, we need to check whether the page being converted contains any attachments, and download those attachments to the appropriate format. Deveo wiki stores attachments in the attachments directory. We get a list of attachments for a given Confluence page with the following curl command:

curl -u USER:PASSWORD https://CONFLUENCE_URL/rest/api/content/PAGE_ID/child/attachment  

We need to fetch each attachment individually. Fetching the attachment happens with the following curl command:

curl -u USER:PASSWORD -s -S -o markdown/attachments/ATTACHMENT_NAME https://CONFLUENCE_URL/download/attachments/PAGE_ID/ATTACHMENT_NAME  

Convert the page to Deveo Wiki format

After we have all Confluence pages from a given Confluence space stored with the correct metadata and attachments, we can convert and store the pages one-by-one to Deveo markdown format. Deveo stores the wiki pages to pages directory, so we use that. We use a fork of the Confluence to markdown converter tool for the conversion. Deveo has its own syntax for linking to attachments and thus we needed to modify the XSL rules.

During test migrations, we found problematic content, such as table rows that contain both th heading cells and td table cells. We added an XSL rule that handles those cases as otherwise the files would have been skipped. The conversion of a single page happens with the following command:

java -jar confluence-to-markdown-converter/lib/saxon9he.jar -s:./confluence/PAGE_NAME.txt -xsl:confluence-to-markdown-converter/xslt/c2deveo.xsl -o:./markdown/pages/PAGE_NAME.txt  

The Confluence to markdown converter tool allows specifying the XSL transformations, so we can use our modified version of the original transformation file.

Create a page named "Home"

Deveo Wiki requires a page called "Home" exists. Unless the Confluence space that we are migrating contains a page called "Home", we need to create that. The content of the page can contain links to the other pages for example.

Automate all the things

Automate all the things!

Luckily you don't need to care about the details above if you just want to migrate your Confluence space to Deveo. We have packed the above implementation into a combination that can be found from the GitHub project. There's also instructions for proper usage.

What is missing?

Our implementation is still missing some things that might be required for a full-blown migration. The missing functionalities are listed below:

  1. Page and space history
  2. Support for attachments with same names
  3. Creating the Git repository and pushing it to Deveo.

The page history support can be implemented by initializing a Git repository and requesting each version of a page, converting it and committing that to the Git repository. Since Deveo Wiki uses Git repository as its backend for content and attachment, the history for individual pages can be preserved. In the context of this blog post, our aim was simply to migrate the current version of the content.

Supporting attachments with same names that are different files is currently not supported. In the uncommon case where a file with the same name but the different content is present, it can be renamed in Confluence side before the migration.

We left the last step intentionally unimplemented. The steps described above and the tool we provided can be used to migrate from Confluence to any Markdown-based Wiki or content management system. So the last step can be chosen by the user.

Conclusion

I hope you enjoyed reading the instructions. If you have any questions or comments, do leave them below. If you wish to see Deveo Wiki in action, sign up to Deveo here.

Seamless software development.

Code management and collaboration platform with Git, Subversion, and Mercurial.

Sign up for free
comments powered by Disqus