Deveo offers a simple project based Wiki for storing project related information in a neat way. Even though it's typical that companies have their central documentation storage in Confluence, some wish to migrate away from confluence to use a more simple and connected approach - Deveo wiki. This blog post introduces a way to migrate an existing Confluence space to Deveo Wiki. The approach presented in this blog post can be used to migrate a Confluence space to any Markdown Wiki, but for the sake of context, we did it for Deveo Wiki.
What is migrated
For the sake of this article, we wanted to migrate only the most up-to-date version of the pages. We were interested in migrating the page content and support formatting, images, links, and attachments. We were not interested in the history, which could, in theory, be migrated as well. Deveo stores its project Wiki content in a Git repository. Thus, the migrated content is stored in Deveo Wiki format to a Git repository. The Git repository can then be pushed to a Deveo Wiki repository in a given project. The content can also be transferred to any other Markdown-based Wiki or content management system.
If you want to migrate a Confluence space to Markdown with attachments, links, and everything, check out the GitHub project here. If you wish to find out how the migration actually works, read on.
How the migration works
The migration from a given Confluence space to Deveo Wiki happens through Confluence APIs. From the technical point of view, the steps to migrate a Confluence space to Deveo Wiki are:
- Read all page ids from a given confluence space
- For each Confluence page do the following:
- Download the page to a directory
- Prepend and append appropriate XML metadata to the stored page
- Download page attachments and store them to
- Convert the page to Deveo Wiki [Markdown] (https://en.wikipedia.org/wiki/Markdown) format and store it to
- Create a page named
Home, unless it already exists
Reading all page ids from a given confluence space
Getting the page ids happens through a simple call to Confluence API. You may use the following
curl command to test it out:
curl -u USER:PASSWORD https://CONFLUENCE_URL/rest/api/content?spaceKey=CONFLUENCE_SPACE | python -mjson.tool
Saving the content with the XML metadata
From a technical point of view, we save the content while prepending, and appending the XML metadata to the stored content in the same step. Getting the page content in Confluence's storage format, which is XHTML based, happens with following API call:
curl -u USER:PASSWORD https://CONFLUENCE_URL/rest/api/content/PAGE_ID?expand=body.storage | python -mjson.tool
expand=body.storage returns the content or "body" of the page in the Confluence storage format, which is a format we will use to convert to markdown. For converting the page, we use an open-source tool called Confluence to markdown converter. Confluence to Markdown Converter uses XSL to transform the XHTML, which is XML, but requires that the page content contains not just the content, but also a link to the document type declaration.
In order to automate the conversion of multiple pages, we write each page to a separate text file, with the following XML at the beginning of each file.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE ac:confluence SYSTEM "../confluence-to-markdown-converter/dtd/confluence-all.dtd" [<!ENTITY clubs "♣"><!ENTITY nbsp " "><!ENTITY ndash "–"><!ENTITY mdash "—">]><ac:confluence xmlns:ac="http://www.atlassian.com/schema/confluence/4/ac/" xmlns:ri="http://www.atlassian.com/schema/confluence/4/ri/" xmlns="http://www.atlassian.com/schema/confluence/4/">
The document type declaration needs to be present in the corresponding directory we point to. We also need to append the following closing tag at the end of the document in order to make each page compliant.
So the basic structure of each page is:
- XML Metadata - - Page content - - Closing tag -
Download page attachments
Before converting the page from the Confluence format to Deveo markdown format, we need to check whether the page being converted contains any attachments, and download those attachments to the appropriate format. Deveo wiki stores attachments in the
attachments directory. We get a list of attachments for a given Confluence page with the following
curl -u USER:PASSWORD https://CONFLUENCE_URL/rest/api/content/PAGE_ID/child/attachment
We need to fetch each attachment individually. Fetching the attachment happens with the following
curl -u USER:PASSWORD -s -S -o markdown/attachments/ATTACHMENT_NAME https://CONFLUENCE_URL/download/attachments/PAGE_ID/ATTACHMENT_NAME
Convert the page to Deveo Wiki format
After we have all Confluence pages from a given Confluence space stored with the correct metadata and attachments, we can convert and store the pages one-by-one to Deveo markdown format. Deveo stores the wiki pages to
pages directory, so we use that. We use a fork of the Confluence to markdown converter tool for the conversion. Deveo has its own syntax for linking to attachments and thus we needed to modify the XSL rules.
During test migrations, we found problematic content, such as table rows that contain both
th heading cells and
td table cells. We added an XSL rule that handles those cases as otherwise the files would have been skipped. The conversion of a single page happens with the following command:
java -jar confluence-to-markdown-converter/lib/saxon9he.jar -s:./confluence/PAGE_NAME.txt -xsl:confluence-to-markdown-converter/xslt/c2deveo.xsl -o:./markdown/pages/PAGE_NAME.txt
The Confluence to markdown converter tool allows specifying the XSL transformations, so we can use our modified version of the original transformation file.
Create a page named "Home"
Deveo Wiki requires a page called "Home" exists. Unless the Confluence space that we are migrating contains a page called "Home", we need to create that. The content of the page can contain links to the other pages for example.
Automate all the things
Luckily you don't need to care about the details above if you just want to migrate your Confluence space to Deveo. We have packed the above implementation into a combination that can be found from the GitHub project. There's also instructions for proper usage.
What is missing?
Our implementation is still missing some things that might be required for a full-blown migration. The missing functionalities are listed below:
- Page and space history
- Support for attachments with same names
- Creating the Git repository and pushing it to Deveo.
The page history support can be implemented by initializing a Git repository and requesting each version of a page, converting it and committing that to the Git repository. Since Deveo Wiki uses Git repository as its backend for content and attachment, the history for individual pages can be preserved. In the context of this blog post, our aim was simply to migrate the current version of the content.
Supporting attachments with same names that are different files is currently not supported. In the uncommon case where a file with the same name but the different content is present, it can be renamed in Confluence side before the migration.
We left the last step intentionally unimplemented. The steps described above and the tool we provided can be used to migrate from Confluence to any Markdown-based Wiki or content management system. So the last step can be chosen by the user.
I hope you enjoyed reading the instructions. If you have any questions or comments, do leave them below. If you wish to see Deveo Wiki in action, sign up to Deveo here.