Why would you bother?
Well, let’s ignore the obligation to get it for AWS partnerships reasons. I think there are 2 good reasons why you should make the effort:
- Because you really want to learn about AWS in-depth: This is what I call “High ROI education”. These days, we’re bombarded with blogs (like this one), tweets, tutorials, … You touch 5 technologies for 20min and you move on to the next fad. This is the type of shallow learning that is useful in a discovery phase. But it never gets committed to long-term memory. At least not with me. In contrast, in my first job in 2005, I had to squeeze the last bit of performance out of MySQL, and I learned a ton of things that are relevant even today. Eg. the difference between MyISAM and InnoDB storage engines. This is actually something that came back in this AWS exam. I’ve been lucky enough to have done similar deep-dives in my career in technologies such as C++, C#, Postgres, Vertica, Hadoop, Spark, Python, …. You can never know 100%. But you get to a level where you are confident enough to face any problem the tech throws at you. You know you’ve gone deep enough when you find the documentation limiting and you routinely read the source code (when available) to understand what’s going on. And for sure, knowing one thing in-depth helps you understand other things. Once you know AWS, the step to Azure or GCP is not that scary anymore. So dive in, the water is refreshing!
- Because it looks good on your CV: There has been a lot of discussion on the value of certifications for job roles. I can tell from my experience as an employee, a freelancer and now an employer: it just ticks a few boxes with people in decision roles. Hate it or love it. It’s the way it is. Very often the organisation looking for help has no in-depth knowledge about the matter. Otherwise, they probably don’t need help. But then how to judge that the new hire / contractor / supplier is actually knowledgeable? Anybody can make fancy slides. A certification is an external entity validating that you’re at least knowledgeable enough to pass a test. Certifications alone won’t get you there. In fact, when we see candidates who put their certifications too prominent on their CV, like that’s their proudest achievement, that’s a red flag to me. But still, most engineers I know are terrible at marketing. Consider certifications as a form of marketing.
You’ll learn a ton about AWS
The Pro exam actually covers only about 20% of the services in AWS. So you can safely ignore things like AWS Device Farm, AWS Robomaker, AWS EventBridge, AWS Sumerian, AWS Lumberyard, … . But get ready to see every nook and cranny of EC2, ELB, Autoscaling Groups, VPC, Security Groups, NACLs, VGW, DirectConnect, EBS, IAM, S3, SQS, Lambda, Cloudfront, DynamoDB, Redshift, Fargate, Storage Gateway, …
If you work on AWS for a while, you’ve come across most things mentioned above. But I always approached it on a “need-to-know” basis. Now it was time to really get to know these services. And it gave me a much richer understanding of AWS. Here are my 3 main lessons learned. It might be trivial for some, but here it goes:
1. Federated authentication is really powerful. Although IAM allows you to create your own users, it is often much better to use an external identity provider such as Okta, Auth0 or even Active Directory. User management is really basic in AWS and in any reasonably large environment, you want to manage those centrally somewhere. Even for a small organisation such as Data Minded, we are going to look at an identity provider. As we are on Google Apps for Business, maybe Google is good enough. Although I’ve seen Okta being used at clients, and I’ve been pretty impressed with its capabilities. With Federated Authentication, a user is linked to an IAM Role, and that role will get access to certain services. Just like it’s best practice to create roles for services to talk to each other. You can ask AWS STS to generate temporary credentials for you, should you need them. If you use simple IAM users by contrast, you will get permanent credentials that you usually store in a ~/.aws/credentials file. Feels so dirty compared to having a central login system. I’ve always been avoiding the topic because it looked complex. But really, you only need to learn the flow once. And once you understand the flow, it’s relatively simple.
2. There is actually plenty of support for hybrid cloud. I’ll confess, I’ve been frustrated by clients who didn’t want to go ALL IN on cloud after we’ve done a 2 week Proof-of-Concept. Silly, right? But that’s not reality. Most IT departments are understaffed and overworked. Having fancy consultants coming in and asking for extensions of their network to the cloud, and access to a bunch of production databases, which can break if you look at them in the wrong way, is really not something on top of their mind. I don’t blame them. IT infra is a department judged by how stable they can run the things that are built by a department (dev) judged by how fast they can ship things. Services like Storage Gateway enable you to gradually move data to the cloud, while at the same time, offering a reliable backup service. RDS databases can be extended to the cloud with read-replicas, and over time, you can make that read-replica the master. You can import VMs to EC2 and manage them through Systems Manager. Slowly, you can move to managed services. You do need a reliable connection. Ideally a DirectConnect with a redundant line in case the DirectConnect fails. And the BGP protocol can magically route traffic over this redundant setup. Pretty cool. Although, honestly, I’ve never touched any of these techs in real life, so what do I know?
3. RPO and RTO are a fact of life. Deal with it. I never really knew the difference between a Multi-AZ RDS or an RDS with read-replicas. Well a Multi-AZ RDS is for high-availability. An entire AZ can go down and you will lose no data or availability. Read replicas on the other hand, are asynchronous. They sync data from the master to the replica. If the master dies, you can promote a read replica to become the master. But… there’s a but. You might lose data. There is a last point at which you received data. And then a bit later you crashed. Then you realise you crashed and you take time to recover. The RPO is the time between the last data receive and the crash. And the RTO is the time between the crash and the recovery. This is relevant for read-replicas, but also for snapshots, and basically anything that can crash. So yes, also your Multi-AZ RDS database, when you corrupt the data because of a bug in your code, or in case of region outages. These are very unlikely events, but not impossible. Remember when S3 was out last year? Reality is that disasters do occur. That’s ok. You just need to have that conversation with business about what their expectations are, and what your plan is for disaster recovery.
It won’t make you a great architect
Despite learning a ton about AWS, that doesn’t mean you’ll actually be a great architect. The exam is actually surprisingly NOT opinionated about how you build stuff. AWS is like lego. They offer a bunch of services. And you pick and choose what you want. Even about security, AWS is very clear about your responsibilities. In their famous shared security model, they only take care of the bottom layers. And they throw a bunch of tools at you for the top layers. But in the end, it’s really up to you on how to design a secure system.
For highly-scalable websites, I just sat back and let it all in. How does cloudfront work? what’s a WAF? How do you configure health checks on an ELB? But for data & analytics, which is our area of expertise, I found the questions often lacking. “You want to migrate an HPC system with 20PB of image data and many computational jobs to the cloud. Do you choose EMR or Fargate?” The actual question is longer, but you get the gist. Well, I don’t know what I would choose. Can I assume those computational jobs can run in a Docker container, on a single node? Can I assume they can run on Spark? Wouldn’t AWS Batch or even EKS make more sense? Do we need 1 long-running EMR cluster, or can we spin up different clusters on demand? What are the latency requirements?
I also think the exam is outdated sometimes. While the world we work in, is very focused on devops, CICD, docker containers, … the exam had a big emphasis on traditional VM workloads. “What do you do when you have to maintain 20 Windows 2016 Servers that always need latest patching installed within 24h?” Well, I would question all my life choices that led up to this point. But that wasn’t an option. Apparently it has something to do with Patch Groups, Patch Baselines and Maintenance Windows. I get it. Hybrid cloud is a reality, and I really like the support. There will be plenty of traditional workloads running in the cloud for a long time. But I just would’ve expected AWS to push us a bit more in the other direction.
So no, it’s not because an AWS Pro Architect walks into the room that he / she will actually build a strong, modern architecture for you. For all you know, you’re stuck patching Windows 2016 servers for the next 5 years. But at least you have Patch Manager! 😉
So how did I do it?
For those here only in it for the tips and tricks, the links in the beginning of the article should help already. But this was my strategy. Probably doesn’t work for everybody, but it did for me:
- Hours spent: About 80 hours, all in all. That’s more time spent than most of my university exams 😀
- acloud.guru for videos. The video course on AWS Pro Architect is really just an index of stuff you need to know. And not nearly enough to actually know it all. I recommend also watching the Advanced Cloudformation, AWS Networking Speciality and AWS Security Speciality as well. It’s a lot of video, so I usually watched it at 1.3x or 1.5x speed. There is a practice exam on acloud.guru as well but it completely did not reflect the actual exam for me. I failed the acloud.guru exam pretty hard a few days before passing the actual exam.
- whizlabs for practice exams. Some might consider this cheating. But I honestly think it’s not. There was not a single question of the 505 questions on Whizlabs that I got on the exam. So no copy paste possible. Whizlabs did give me two important things: First, I learned how to deal with the crazy long question style: how to quickly rule out bullshit answers and how to quickly identify important constraints in the questions. Second, I learned on which topic areas I was still lacking. If you fail 10 questions on Organisational Units and Service Control Policies in a row, you don’t know the topic, and you should spend time studying. In the end, I did all 6 practice exams at least 3 times. Each had 80 questions. So that’s a ton of questions. But my final practice exam I scored 79/80 and the one mistake I made was a stupid one. I did know the answer. So whizlabs was a very good, short, direct feedback loop for me.
- A clean AWS account: You can’t do without. Set up a cloudfront distribution. Deploy a webserver in an autoscaling group. Configure Route53. Take down a server. What happens?
- Study buddies: I can definitely recommend this. We studied with the 3 of us at Data Minded. Every Friday for 4 Fridays. One guy knew tons about security because he’s been involved in a banking project in the last 6 months. The other knew about networking quite well because a current client needed it. I could bring in bits and pieces because I’ve seen so many different deployments by now. Also, it’s great for morale. You’re suffering together, but you’re also learning together.
- What I didn’t do: I barely read any whitepaper. It is recommended to do so. So maybe I’ve been stupid. But I liked much more small videos about a topic, some experimentation in the AWS console and trial and error on Whizlabs