AWS Redshift is a columnar warehouse service that is often used for massive data aggregation, correlation and can host petabyte-scale data in a clustered model. In a typical SDLC environment on the cloud, different accounts are used for different SDLC environments like Dev, Stage, Test, and Production. Like any other database system, there is a need to port data held in AWS Redshift clusters from one environment to another. As the data held in Redshift clusters can be massive in size, moving this data across multiple accounts can become challenging as well as increase cost and redundancy in other accounts.
One option for sharing data in other accounts is to extract the entire data out of redshift cluster in other services like AWS S3 and then transfer this data using online programmatic methods or offline method by transferring data on an appliance or on-premise location, and re-uploading the same data on the new account. While these methods can still achieve the purpose but are neither scalable nor cost-efficient. Also, by taking the data out of the cluster, the metadata and the model of the database objects may be lost. One of the standard methods followed to transfer data across AWS Redshift clusters within as well as across AWS accounts is by creating snapshots of the cluster and then restoring this snapshot in the cluster of choice. In this article, we will learn a mechanism to address this scenario by using AWS Redshift database snapshots, sharing it with the desired accounts and restoring the same into an Amazon Redshift cluster.
AWS Redshift Setup
In this article, we would start with a working AWS Redshift cluster and it’s assumed that you already have the required data in the cluster that is required to be shared with a different AWS account. Those who are new to AWS Redshift can refer to this article, Getting started with AWS Redshift, to create a new Redshift cluster. Once the cluster is created, it would look as shown below on the Amazon Redshift Clusters page. To simulate the scenario, it’s recommended to create some test data of a reasonable volume so that when the snapshot is created, the size of the volume is large. While it is not necessary to create sample data for this exercise, but you would be able to appreciate the value that this feature provides to sharing large sized backups across AWS accounts compared to other indirect methods of porting data across AWS accounts.
In my last article, Managing snapshots in AWS Redshift clusters, we discussed AWS Redshift manual and automated snapshots, which are used for backups and recovery. Snapshots are also a vehicle for moving data from one cluster to another as well. When one cluster restores snapshot of another cluster, data is automatically ported that is held in the backup. We need a mechanism in which snapshots of one Amazon Redshift cluster hosted in one account can be accessed by another Amazon Redshift cluster hosted in a different account. Redshift supports automated as well as manual snapshots, as we discussed in my last article, which would look as shown below in the Snapshots section of the cluster properties.
There may be a need to access data held in the manual as well as automated snapshots in a different AWS account. It is required to create a manual snapshot of an automated snapshot from the Actions menu, as the automated snapshot would get deleted automatically after the retention period. So it’s assumed that either a manual snapshot or a manual snapshot of an automated snapshot is already in place. Click on the manual snapshot to navigate to the details and it would look as shown below. Let’s say that we intend to use the data contained in this snapshot in a different AWS account. For this purpose, we need to make this snapshot accessible to another account.
Click on the Edit button, and you would find the snapshot settings as shown below. We have the option to provide another AWS account with which we intend to share the snapshot. Assuming you have another account, you can type the 12-digit account id in the Account box under the Manage Access section, click on Add account button and then click on Save to save the modifications. Here this manual snapshot is created in Account-1 as you can see on the top right section of the below figure, and the account to which we are provisioning access, in this case, is Account-2. To access the snapshot from Account-2, we need to log in to Account-2 and navigate to the Snapshots section of AWS Redshift.
After logging on to Account – 2, you would be able to find the shared snapshot listed as shown below.
Click on the Actions menu to see the list of actions that can be performed on this snapshot. Not all the actions shown in the Actions menu can be performed on shared snapshots, though they may appear to be accessible. One such option is the Delete snapshot option. Let’s say that the owner account that created the snapshot, shared it with multiple accounts. If one of the accounts that have access to this snapshot mistakenly deletes the snapshot, then others will lose access to it and even may have an impact on the source account which created this snapshot. So, ideally, the consuming accounts should not be able to delete the snapshot shared by the owner account.
And the same is the case here. Select the snapshot, click on the Actions menu and select Delete snapshot option. A message would pop-up as shown below, informing that only the original account that created the snapshot can delete the snapshot.
Navigate to the cluster page in Account – 2, and you would find that though the snapshot is made available by sharing, there are no clusters that are automatically created to use the snapshot. That option is up to the users to restore the snapshot by creating a cluster.
To restore the snapshot, navigate back to the snapshots, select the shared snapshot and click on the Restore snapshot option. That would bring up a page as shown below. The settings would be pre-populated and would be identical to the settings of the cluster from which the snapshot was created.
Once the cluster is restored from the snapshot, the data would become accessible in Account – 2. Once this cluster is created, there is no dependency of Account – 2 on the snapshot that was shared from the original account, and the sharing can be removed and even the snapshot can be deleted if it’s not required.
To delete the snapshot in the original account i.e. Account – 1, log on to this account and navigate to the Snapshots section. Select the Actions menu and click on Delete Snapshot to delete the snapshot. You would find an error as shown below. The reason for this error is that as long as the snapshot is shared with other accounts, even the original account that created the snapshot cannot delete it.
To remove the sharing, open the manual snapshot and click on the Edit button. Click on the Remove account button to remove all the accounts to which access has been provided. Once all accounts are removed, repeat the above step and the snapshot should get deleted.
In this article, we learned how to configure AWS Redshift snapshots, configure it to provision access to other accounts, and used the shared snapshots to restore an Amazon Redshift cluster. We also learned the criteria that need to be satisfied to delete a shared snapshot and the type of access any consuming accounts can exercise on a shared snapshot.