2019-11-06 | Scandio | 5 min read
Topic | Agility on Scale with Atlassian Datacenter on AWS Technology | AWS and Atlassian Services and Products
Coping with user demands challenges the processes and capacities of on-premise infrastructure. E.g. when a large amount of teams take up the usage of a brand new DevOps toolchain, the demand on computing power over the whole IT stack increases significantly and abruptly.
The fast increase in processing demand, paired with an inability to instantly provision the additional capacity, often leads to unstable systems and insufficient user experience. It is key to provide any such resource increase within minutes to prevent users perceiving a troublesome performance degradation.
Implementing changes in such heavily used systems requires a very robust testing and quality assurance process which guarantees that a new release, upgrade or patch does not have a negative impact on user experience and business-critical functionality.
In the era of rapid technological change, the cloud is not only useful for backup and storage, it’s key for future-proofing businesses and maintaining a competitive edge. Businesses moving mass amounts of data to the cloud can take advantage of the latest technology and move away from aging on-premises infrastructure, putting themselves at the forefront of innovation.
Why Scandio and ByteSource?
Scandio and ByteSource are both Atlassian Platinum Enterprise and AWS Consulting Partners with a profound experience in deploying Atlassian data center solutions preferably in the AWS public cloud. Together with the BMW Group-internal team, the providers took responsibility for the migration of the Atlassian toolstack to AWS.
The solution is based on Atlassian’s reference architecture for AWS. AWS autoscaling technology is used:
- to provide dynamic scaling of the EC2 nodes,
- to automatically provision new nodes based on usage thresholds,
- or to dispose of nodes which are no longer needed or unhealthy.Running EC2 nodes in multiple availability zones guarantees the resilience of the entire system, taking into account the unlikely possibility of two availability zones failing at once.
AWS Relational Database Service (RDS) with deployments in multiple availability zones and a provisioned IO provides an optimal solution for a heavily loaded central database as typical for Atlassian data center deployments.
The shared file storage as required by Atlassian data center architecture is provided by AWS Elastic File System (EFS) with a highly available NFS-based file system.
Security and Compliance
All system components are deployed in private VPCs connected to the corporate network infrastructure via AWS Direct Connect and a fast dedicated broadband connection. All EC2 EBS volumes, S3 buckets, EFS volumes and AMIs are encrypted with strong, self-signed cryptographic keys managed by AWS KMS.
All data is encrypted at rest and in transit with TLS-enabled network protocols.
Even the backups and its transfer to a second physical location are encrypted.
All configuration management items including secrets like database passwords and certificates are stored in a highly available and secure HashiCorp Vault deployment, running on a multi-AZ Consul backend with encrypted storage.
Deployment and Orchestration
Using autoscaling groups requires a fully automated provisioning of the involved EC2 nodes. In the BMW Group solution, all nodes are provisioned by Ansible playbooks, executed by a H/A Ansible Tower instance running on ECS, managed itself by yet another autoscaling group in multiple AZs and exposed by an EC2 application load balancer. The orchestration solution for the involved infrastructure is based on Terraform. Its extended support for AWS but also for HashiCorp Vault enables seamless and secure deployment and provisioning. Environment separation is implemented by using completely independent VPCs in different AWS accounts. All environments are deployed via a fully integrated Jenkins-based continuous delivery pipeline. New releases can always be tested in a staging environment, where load and stress tests are conducted by the same performance suite used by the initial implementation. This mitigates the risk of unexpected functional changes and performance degradation.
The result of the migration is an extremely stable system with response times decreased by an order of magnitude. Here are some examples of the measured metrics:
- Average response times of the typical CRUD (create issue, read / search for an issue with JQL, update issue and delete issue) REST calls before the migration were at approximately 7 seconds with very high variation (minimum response times of 2 seconds, maximum response times of 30+ seconds).
- Average response times of the typical CRUD REST calls after the migration are under 1 second (e.g. 0.58s average response time for 7 days) and on a very low level of variation (minimum times of 0.3 seconds and maximum times of around 3 seconds).
- Power users measured a sustainable decrease of response times of their specific JQL filters from 25 seconds down to 2 seconds, constituting an acceleration of more than 10 times.
- In the first month of operation no major incidents occurred, and an uptime of nearly 100% was achieved.
In addition to these hard measures, „soft“ benefits like a lower workload of the support and service teams, and time gained for functional improvements have to be mentioned.
Even the best solution can always be improved over time. The lower effort for maintenance and incidents leaves the BMW Group team with more capacity for improvements, new integrations and new features.
After the successful migration to the AWS cloud, the first natural step is to look for cost optimization options. It is generally recommendable to start with a slightly overprovisioned base line in order to provide guaranteed user experience. After the load patterns have become stable and measurable, it is possible to plan and settle on fixed VM counts, VM instance types etc. Reliable metrics enable various options for cost optimization.
Mid-term statistics of usage patterns allow for the implementation of predictive autoscaling. E.g. if it is a known fact that usually a large usage peak occurs at approximately 9:00 AM, it can be of great value for the users if scaling is triggered shortly beforehand- and not when the CPU utilization is already high. On days with predictably low usage (like weekends and public holidays) nodes can be reduced to a minimum of two and even switched to VM types with less capacity.
Improving Performance Monitoring
More meaningful performance metrics enable more detailed analytics. These, in turn, enable more in-depth insights into the system and speed up resolution times whenever an issue occurs (cause e.g. by a new release or a newly installed app).
Migration to AWS Aurora as soon it is supported by Atlassian
Atlassian has already announced the future support of AWS Aurora. This can increase the solution resilience significantly but also bring some major significant performance optimisation (according to AWS Aurora can be up to 8 times faster than the RDS equivalent).
Stability and scaling
- No more instability due to scaling issues observed after the migration.
- Virtually infinite scaling available (performance tested with a predicted load of 100k users).
- “Everything as code” (versioned in git repositories).
- Identical environments are easy to set up.
- Feature and release testing boils down to changing a few lines of code.
- Changing the system parameters is possible at any time. Long lead times are a thing of the past.
- Architectural and security / compliance requirements are fully met.
- All data is encrypted at REST and in transit.
- Advanced secrets management is implemented.
Scandio GmbH is a software company based in Munich, made up of technical consultants and developers. Based on our know-how and experience, we have been successfully developing tailor-made solutions for our customers for over 16 years. We are Atlassian Platinum Solution Partner over a decade and a AWS Select Consulting Partner since 2013. Since 2018 we are a proud winner in the category: „Atlassian Partner of the Year - Enterprise“.
BYTESOURCE based in Vienna is a leading specialist in team collaboration solutions based on Atlassian and an expert in agile software development, DevOps and technical consulting. BYTESOURCE is „Atlassian Platinum Solution Partner Enterprise“ and AWS Select Consulting Partner.
Scandiolife on Instagram.
Connect with us on LinkedIn.
Look what Scandio is tweeting.