There are two possibilities in Confluence when it comes to the question where to store attachments: database and_ file system_. There are situations when this option must be changed after the system operated for a long time, possibliy having stored large amounts of attachments. This task is called attachment migration _and is especially important on older Confluence systems where the default storage option was _database. In the meanwhile, it has been understood that it’s generally better to store attachments on the file system, within the confluence-data directory. This article provides a manual how to migrate attachments safely from database to file system.

Problems

Now, the alert reader might ask: “why do we need an article about this, if there is already official documentation by the Vendors?”. The reason is that we here at Scandio experienced a few serious drawbacks with the attachment migration feature built into Confluence. We have customers whose Wikis are used primarily to organize considerable amounts of attachments (> 50 GB). We experienced the following problems:

  • Database inconsistencies. On different systems, errors occured due to an inconsistent database state. There are two tables in the database associated with attachments: attachment _and _attachmentdata. By definition, entities in the attachment _table reference one entity in the _attachmentdata table. Now I don’t know how this actually happens, but we’ve seen attachment entities reference multiple attachmentdata entities in almost every migration we did. If this is the case, the attachment migration will fail completely, possibly after having already run for a very long time.
  • Long running task. If there are large amounts of attachments, the attachment migration takes very long. For example, we’ve experienced durations of around 3-4 days with amounts like 100.000 attachments and 40 GB. Most of the time during this process, Confluence is not accessible for users.
  • _One unique process. _One main issue is that the migration process can only be executed all at once. There is no possibility of migration incrementally and, in case of errors, continue the migration somewhere in the middle.
  • Non-verbose. With Confluence default settings, there is almost no information given to the user on what the state of the migration is. Of course you can view the process bar in the browser, but once you close the browser or tab, this screen is not restorable and you are completely lost without any output.
  • Lack of analysis tools. When the migration process is finished, it is very hard to find out “how successful” it was, aka, how many attachments were migrated and how many (and which) failed.
  • Clearing database table fails. _At the end of the migration process, Confluence tries to clear the _attachmentdata table. This often fails due to different reasons and has to be done manually after the process.

During the attachment migrations we did here at Scandio, we slowly established our own mechanisms that avoid or compensate these problems. If you proceed carefully along the following steps, your migrations should be as successful as ours are now.

Note: I’m sure there are further problems we did not experience yet. Please let us know with comments what issues you experienced and how you solved them!

Preparations

Note: since we manipulate the Confluence database with queries, make sure you backup your database or operate on a dedicated testsystem before you start.

1.) Erase database inconsistencies

Find out which attachment entities reference more than one attachmentdata entity:

SELECT DISTINCT ATTACHMENTID, COUNT (ATTACHMENTID) FROM ATTACHMENTDATA GROUP BY ATTACHMENTID ORDER BY 2 DESC;

If there are no rows on top of the result whose second column has a value greater than 1, everything is ok. Otherwise, you need to delete duplicate attachmentdata entries.

For each attachment entity with multiple attachmentdata references, find the duplicate attachmentdata entities:

SELECT * FROM ATTACHMENTDATA WHERE ATTACHMENTID = [previously found attachmentid];

Given the result of the query above, delete all found attachmentdata entities except the one with the highest attachmentdataid. Repeat the following query for all these entities.

DELETE FROM ATTACHMENTDATA WHERE ATTACHMENTDATAID = [duplicate attachmentdataid];

Do this until the first _SELECT _query contains only attachment entities with one single attachmentdata entity.

2.) Collect database information

We need some information on the attachments in the database for the analysis purposes explained later. Write down all information collected here.

First, count the total number of attachments in the database:

SELECT COUNT(*) FROM ATTACHMENTS;

Determine the size of the attachmentdata table, e.g. with the statement (this varies from the database system used):

EXEC sp_spaceused 'ATTACHMENTDATA';

Write down the value of the _data _column.

For the analysis described later, we need an export containing all attachment filenames. Run the following statement.

SELECT DISTINCT TITLE FROM ATTACHMENTS;

In your favourite database tool, export the result of this query. For the analysis we do later, there are two requirements:

  • the filename must contain one filename per line (and nothing else!)
  • name the file _db-export-filenames.txt (incl. the leading underscore)

Note: the leading underscore in filenames indicates that this file is used as input for the script introduced in the next section.

3.) Logging

In order to make the migration more verbose, we need to ajust the log level in Confluence:

  • Go to the Confluence Administration Console_ [confluence-path]\admin_
  • In the left navigation, go to Logging and Profiling
  • In order to not let the logfiles be spammed by other processes, set all loglevels under Existing Levels _to _WARN
    • It might be easier to do this in the browser’s JavaScript: jQuery(‘[name=”levelNames”]’).val(‘WARN’)
  • Add the package com.atlassian.confluence.pages.persistence.dao _with level ALL under _Add New Entry

Now, there will appear a lot of information in your atlassian-confluence.log file after you started.

4.) Make the system Read-Only

Most of the time during the attachment migration, users accessing the Confluence system will see a simple HTML page that indicates that the system is currently locked by the migration and is therefore not usable. However, this is not guaranteed for the whole migration process. There is a high chance that users can access the system while the process is still running. In this case, we want to avoid users creating any content in order to keep the database consistent. We want to make the system read-only.

To achieve this, we put an HTML fragment into Admin > Custom HTML > At the beginning of body:

</pre>
<style>
[id^="add"], [id^="edit"], [id^="create"], [id*="admin"],
#content-metadata-attachments, #wysiwyg, #view-attachments-link,
.show-labels-editor {
  display: none !important;
}
#read-only-maintenance {
  text-align: center;
  height: 80px;
  background-color: red;
  color: white;
  font-size: 40px;
  padding-top: 30px;
}
</style>
<div id="read-only-maintenance">The system is currently in read-only maintenance mode</div>
<pre>

This will hide all links associated with add/edit functionality. I guess the CSS selector is not complete. Please feel free to contact us to make the read-only snippet better. Furthermore a eyecatching red header will be shown that tells users that the system is in read-only mode, as seen in the following screen:

ReadOnly Confluence

During Attachment Migration

As soon as you start the migration process, the migration progress bar is shown in the Admin Console. It is very important that you do not close the browser window or tab this is shown in. Otherwise you will not be able to restore this screen. Fortunately, we now have another possibility to find information on the migration process: the _atlassian-confluence.log _file. Due to our logging settings in the previous section, Confluence will inform you about every attachment it migrates and every error that occurs. For example, look at the following output:

2013-04-15 20:42:54,722 INFO [Attachment data migration task] [persistence.dao.hibernate.AbstractHibernateAttachmentDao] migrate Performing pre-migration tasks.
2013-04-15 20:42:54,738 INFO [Attachment data migration task] [pages.persistence.dao.FileSystemAttachmentDataDao] prepareForMigrationTo Creating attachments directory on the filesystem at 'E:\confluence-data\attachments\ver003'
2013-04-15 20:42:54,738 INFO [Attachment data migration task] [persistence.dao.hibernate.AbstractHibernateAttachmentDao$IntraHibernateAttachmentCopier] copy Beginning copy from 'com.atlassian.confluence.pages.persistence.dao.hibernate.HibernateAttachmentDataDao@4823163b' to 'com.atlassian.confluence.pages.persistence.dao.HierarchicalFileSystemAttachmentDataDao@7788a07c'
2013-04-15 20:42:54,879 INFO [Attachment data migration task] [persistence.dao.hibernate.AbstractHibernateAttachmentDao$IntraHibernateAttachmentCopier] copy Found 109034 to copy
2013-04-15 20:42:54,988 DEBUG [Attachment data migration task] [persistence.dao.hibernate.AbstractHibernateAttachmentDao$IntraHibernateAttachmentCopier] copy Copying attachment 'Attachment: bsh_K1_02_04_de_A1.doc v.2 (4521985) Mozen' and 1 previous versions.
2013-04-15 20:42:55,113 DEBUG [Attachment data migration task] [persistence.dao.hibernate.AbstractHibernateAttachmentDao$IntraHibernateAttachmentCopier] copy Copying attachment 'Attachment: bsh_U6_03_03_A2_de.pdf v.1 (4521986) admin' and 0 previous versions.
...
2013-04-16 06:08:17,314 DEBUG [Attachment data migration task] [persistence.dao.hibernate.AbstractHibernateAttachmentDao$IntraHibernateAttachmentCopier] copy Copying attachment 'Attachment: sample attachment name v.79 (8134269) username' and 92 previous versions.
2013-04-16 06:08:31,736 ERROR [Attachment data migration task] [persistence.dao.hibernate.AbstractHibernateAttachmentDao$IntraHibernateAttachmentCopier] copy There was a problem retrieving attachment 'Attachment: sample attachment name v.79 (8134269) username' from the data store. Skipping.
...

While the migration task is running, you should observe the system carefully.

  • Inspect the logfiles if everything is ok (does it keep writing “Copying attachment…”)
  • If you manage to keep the process status screen, observe how far it has come and what’s remaining
  • From time to time, try to access the system and see if the system is accessible and if the Read-Only code is working
  • Look at the properties (number of files, folder size) of the attachment folder in the Confluence home directory

When the migration task is finished, the following log will appear in your logfile:

2013-04-29 05:35:46,861 INFO [Attachment data migration task] [persistence.dao.hibernate.AbstractHibernateAttachmentDao$IntraHibernateAttachmentCopier] copy Copy completed.

If this happens, the migration has completed successfully. Note that this doesn’t mean that all attachments were migrated successfully. It only indicates that the process ran through all of your attachments. We will see in the next section how we investigate which attachments were not migrated successfully.

Note that it is quite common that some exceptions occur after this log, for example:

2013-04-29 05:35:46,861 INFO [Attachment data migration task] [persistence.dao.hibernate.AbstractHibernateAttachmentDao] migrate Performing post-migration tasks.
2013-04-29 05:58:13,488 WARN [Attachment data migration task] [sf.hibernate.util.JDBCExceptionReporter] logExceptions SQL Error: 0, SQLState: 08S01 ...
2013-04-29 05:58:13,488 ERROR [Attachment data migration task] [sf.hibernate.util.JDBCExceptionReporter] logExceptions I/O Error: There is not enough space on the disk ...
2013-04-29 05:58:13,503 WARN [Attachment data migration task] [sf.hibernate.util.JDBCExceptionReporter] logExceptions SQL Error: 0, SQLState: 08S01 ...
2013-04-29 05:58:13,503 ERROR [Attachment data migration task] [sf.hibernate.util.JDBCExceptionReporter] logExceptions I/O Error: There is not enough space on the disk ...
2013-04-29 05:58:13,503 WARN [Attachment data migration task] [v2.c3p0.impl.NewPooledConnection] handleThrowable [c3p0] A PooledConnection that has already signalled a Connection error is still in use! ...
2013-04-29 05:58:13,503 WARN [Attachment data migration task] [v2.c3p0.impl.NewPooledConnection] handleThrowable [c3p0] Another error has occurred [java.sql.SQLException: Invalid state, the Connection object is closed.] which will not be reported to listeners! ...
java.sql.SQLException: Invalid state, the Connection object is closed.
	at net.sourceforge.jtds.jdbc.ConnectionJDBC2.checkOpen(ConnectionJDBC2.java:1634)
    ...
2013-04-29 05:58:13,503 ERROR [Attachment data migration task] [sf.hibernate.transaction.JDBCTransaction] rollback Rollback failed ...
java.sql.SQLException: Invalid state, the Connection object is closed.
	at net.sourceforge.jtds.jdbc.ConnectionJDBC2.checkOpen(ConnectionJDBC2.java:1634)
    ...

It is a post-migration task to delete all content from the attachmentdata table. If this table is very large, the process might fail and has to be done by hand later on. This is not a big deal. In general everything is find if this state is reached.

After the migration - Analysis of the migration

Now it is getting interesting. Without an extended toolset and course of actions (i.e. without the earlier steps) we’d have immense problems now. How do we know which attachments were migrated successfully? How do we know which attachments failed the migration? How do we know what the database state is? If we proceed along the following points, we’ll have a detailed and professional analysis of the migration task.

1.) Compare database content and file system content

First, we must have a look at the properties of the attachment folder in the confluence-data directory. Compare the database information retrieved at Preparations 2.), with the file count and size of the folder. If these are about to match, it is the first sign that everything is fine. Otherwise, don’t worry: we’ll find out more details in the next step.

2.) Programmatical migration analysis

We at Scandio developed a little tool called Att (click here to download). This tool is programmed with the _Ruby _language and does the following:

  • Read and interpret the atlassian-confluence.log
  • Read and interpret the database export generated under Preparations 2.)
  • Generate summary reports about the whole attachment migration, matching logfile and database export
  • Generate detailed success and error lists of individual attachments

In order to use this tool, you need to do the following:

  • Generate a directory (from now on refered to as DIR) you want to use for the migration analysis.
  • If there was a large number of attachments, it the log output might spread over several atlassian-confluence.log files. Find the starting point that logged migrate Performing pre-migration tasks _and the end point that logged _migrate Performing post-migration tasks _and combine all logfiles to one file and save it as _atlassian-confluence.txt (the leading underscore is required). Put this file in _DIR.
  • Take the file _db-export-filenames.txt _generated under _Preparations 2.) and put it in DIR.
  • Put the Att _tool (simply a Ruby file _att.rb) into DIR.
  • Install Ruby if yet installed on your system_._

After these preparations, you should have three files in DIR:

  • _atlassian-confluence.txt
  • _db-export-filenames.txt
  • att.rb

Now, you can run the tool. Navigate to DIR in your terminal and run the Ruby file:

uby att.rb

The script will output some information like the following:

fgrund@felix-pc:~/att$ ruby att.rb
Att: Starting migration analysis. This takes time...
Att: Performing log analysis...
Att: Performing db analysis...
Att: Finished migration analysis. Look into the files generated in this folder.

When the tool is finished, there will be several files created in the same directory:

  • _log-analysis-summary.txt: _summary of the logfile analysis, i.e. how many attachments succeeded and how many failed the migration.
  • _log-success-compact.txt: _a list of filenames that were migrated successfully.
  • _log-success-detailed.txt: _a list of attachments with detailed information that were migrated successfully.
  • log-error-compact.txt: a list of filenames that failed the migration.
  • _log-error-detailed.txt: _a list of attachments with detailed information that failed the migration.
  • _db-analysis-summary.txt: _summary of the matching/compare process of the db-export and the logfile analysis. This matches the database export and the logfile analysis against each other.
  • _db-success-compact.txt: _list of filenames that appear both in db-export and log-success-compact
  • _db-error-compact.txt: _list of filenames that are in the db-export but not in log-success-compact

3.) Time for decisions

With these analysis files at hand, it is now time to make a decision. Was the attachment migration “successful enough”? Do I have to examine the log files further and fix further database inconsistencies? In that case, you might have to restart the migration afterwards, since it’s a “all or nothing” process. However, if the analysis indicates just a few errors, you should keep going and find a solution for the attachments that failed the migration; or you ignore them–your decision.

4.) Steps afterwards

The attachment migration process tries to delete all content from the attachmentdata table when it has finished. This operation sometimes fails on different database systems, one example being MSSQL where the delete from statement is too much for the database transaction log. So, find out if your attachmentdata table still has entries and make sure you clear them. For example, on MSSQL try to use the truncate _statement instead of the _delete statement.

Now you should check, if Confluence succeeded in switching file storage to file system on the application level. Go to Confluence Admin => Attachment storage and inspect which checkbox is set. If it is still database, then Confluence failed to change this after the migration. In this case, proceed along the official fix for this issue provided by Atlassian.

Conclusion

Confluence provides two ways to store attachments, one being the database, the other the file system. _It has proved in the past, that the _file system is the better option on most systems. However, if your hook is set on database and you want to change to file system, you must undergo the process of attachment migration. If you have many attachments on your system, it can become a pain to make this process run through. This article provides a way to ease this pain with a clear description of preparations and an analysis tool to inspect to what degree the migration was successful. If you carefully proceed along the steps described, your migration should finally work out like a charm.