WHITEPAPER: A Guideline for architecting Terabyte storage in AEM using Amazon s S3 Written by Niels Hardeman TERABYTE STORAGE IN AEM USING AMAZON S3 STORAGE Accommodating a huge amount of assets in any content management platform is challenging. Adobe Experience Manager offers an integration with the Amazon S3 storage solution, allowing binary data for images, documents and videos to be stored in an S3 bucket. Amazon S3 is highly performant and offers nearly infinite storage capacity. De Schutterij 12, 3905 PL Veenendaal, The Netherlands tel. + 31 (0) 318 55 92 10 www.tricode.nl info@tricode.nl
Ever migrated 5 terabytes of images, documents and videos from one platform to another? If so, than you probably are well aware that it is important to think ahead. To look beyond the out-of-the-box storage option with its default settings. When talking about terabyte storage, performance is everything. The choices made during the planning and architecting phase can literally make or break the performance of a CMS system and the websites running on it. AVAILABLE CHOICES Adobe Experience Manager offers a number of storage methods, each offering a different way of storing data. Each of these options has its strengths and weaknesses. In AEM storage the mechanisms are called Micro Kernels, or MK for short. Repository OAK TarMK OR MongoMK Segement Store MongoDB Binary data The following storage methods are available in Adobe Experience Manager: 1. TarMK: the default storage mechanism. Uses the on-disk Segment Store. 2. MongoMK: the storage mechanism for MongoDB. Uses NoSQL to store data. There are two types of data that AEM distinguishes when it comes to storage: Binary data Content nodes The way in which each type of data is stored is configurable through OSGi. By default binary data is stored alongside the content nodes. It is however possible to indicate that binary data needs to be stored in an alternative way. This can be done in the TarMK or MongoMK configuration by setting the customblobstore property to true. 2 van 6
For storing binary data the following alternative methods are available: 1. FileDataStore; Store binary data on-disk. 2. S3 Datastore; Store binary data using Amazon S3 storage. MAKING A CHOICE As stated the binary data will be stored alongside the content nodes by default. When the amount of binary data exceeds 500 Gigabyte we advise to switch from an on-disk storage solution to the Amazon S3 storage solution. The main reasons are: Maintainability: Maintenance jobs take ever longer. Tar compacting especially, not being able to complete this effectively means that performance degradation is imminent. Growth: A repository of this size will most likely continue to grow. Switching to Amazon S3 will allow for infinite capacity without the worry of having to continuously monitor and expand the available space. As with all computing systems the downside of using an external storage method is always a slight sacrifice in performance, but we re talking about milliseconds here. Depending on how close your AEM server is to the Amazon Cloud the latency might be smaller or larger. Should any issues related to performance arise; the local cache serves as a buffer between S3 and AEM and fine-tuning it can solve these problems. HOW ABOUT MONGODB? Everything we ve heard about MongoDB states that it s designed to handle vast amounts of data quickly. So why not choose MongoDB instead of S3? To answer this question we ll have to take a step back and look at it from the perspective of the AEM architecture. Within the AEM architecture the primary purpose of MongoDB is to enable horizontal scaling of the number of servers used for authoring. This is purely aimed towards accommodating a fast amount of people editing content simultaneously. The secondary purpose is to enable the community feature of AEM to share posts, replies and other types of user generated content between publishers. These are the main business cases for deploying MongoDB with AEM. Can we use MongoDB for storing immense amounts of binary data? The answers is yes, but there are a couple of things that need to be considered: Setting up and handling a large MongoDB setup requires expertise. Adobe advises that MongoDB should only be considered if there is enough expertise available within the company to support it. Professional (payed) support from MongoDB is definitely advised. Having enough disk space will remain a concern. 3 van 6
HOW DOES S3 WORK WITH AEM? There are two ways in which AEM can be set up with Amazon S3 storage. Clustered, meaning all servers share a single bucket Stand-alone, wherein each server has its own bucket A good starting point for explaining how S3 works with AEM is the stand-alone method. The installation and configuration process is described on docs.adobe.com. A special feature pack is required to be installed. Once the necessary configuration changes have been made and AEM is started it will automatically start syncing all its binary data to the S3 storage bucket. Repository Local Cache Binary data A number of important aspects are configurable: Size of the local cache Maximum number of threads to be used for uploading Minimum size of objects to be stored in S3 Maximal size for binaries to be cached Depending on the amount of assets that need to be migrated the initial import may take anywhere from several hours up to a few days. This process is can be speeded up by allowing many simultaneous uploads. If the server has enough CPU power and network bandwidth this can make a great difference. Amazon S3 can easily handle 100 requests per second with bursts of up to 800 requests. So the available bandwidth and CPU power are the main factors at play during the migration process. 4 van 6
In the Amazon S3 buckets each asset is stored as a separate entry. The S3 connector uses unique node identifiers to keep track of each asset. Once imported into S3 the data can be viewed via the AWS console: Notice the unique identifiers. These are made up out an identifier of 36 characters and a prefix of 4 characters. This prefix is important for maintaining performance of the bucket s index because it ensures that key names are distributed evenly across index partitions. This is quite technical, but vital to the performance of an S3 bucket. The S3 connector will take care of all of this. CLUSTERED S3 ARCHITECTURE When storage size runs into the terabyte range a clustered S3 architecture becomes interesting: instead of having 3 buckets for 3 servers with a total of 15 TB it is also possible to share a single bucket. Think of it as the family bucket at KFC. This approach would save 2/3 rds of the storage space. Next to saving space it also eliminates the need for replicating huge binary files such as videos between authors and publishers, this is called binaryless replication. Author Publisher 1 Publisher 2 5 van 6
HOW TO PLAN FOR MIGRATION TO S3 Before considering a move to S3 it is important to know the answers to the following questions: What is the number of assets that will be imported? What is the total size? What is the number of different renditions will be generated? How much space will the renditions take up? What is the expected growth for the upcoming years? The number of assets to be imported will give some sense of the amount of time and work involved. If the number of assets is huge than it may be worthwhile to automate or script the import process. In any case it is a good idea to separate the import into two stages: Importing the original files Generating renditions and extracting the XMP metadata. Running the workflows for renditions and metadata after all files have been imported will ensure that these processes don t slow each other down. It is also a checkpoint: a good opportunity to create a backup. Finally, as a rule of thumb: if total size equals or exceeds 500 GB a move to S3 is advisable. Similarly, if the repository is expected to grow larger than 500 GB or its growth cannot easily be accommodated on the current hosting platform than a move to S3 is also favorable. MORE INFO? Are you interested in Adobe Experience Manager for your content driven solutions? Tricode is capable of analyzing our needs and advising you on the implementation of Adobe Experience manager. We offer training, consultancy, development teams and a service desk for support & maintenance. For advice, training or other inquiries please contact the Tricode Professional Services Sales departement: Tricode Professional Services BV, Sales department De Schutterij 12 3905 PL Veenendaal Tel.: +31 318 55 92 10 E-mail: info@tricode.nl FOLLOW US facebook.com/tricode linkedin.com/company/tricode slideshare.net/tricode twitter.com/tricode www.tricode.nl 6 van 6