V2 AWS CloudFormation template problem

danmason · 8 January 2024 11:42

Hey Sean - hope you’re well! Aware we have quite a time difference, so I’ve erred on the side of being verbose here

Been investigating this today - started out by creating the full set of AWS stacks from scratch. All seemed to work fine for me - including setting up the xtdb-ecs stack.

A few questions/steps to check things from my end (mostly assuming usage of the AWS dashboard for the below):

Based on the name of the original discuss thread - “Adding XTDB 2 to an existing clustered app deployment” - curious if you set up all of the stacks in that guide, or reused components from other ones? Ie - is it a new VPC setup via the xtdb-vpc stack, or an existing VPC with existing public/private subnets and security groups?
If you look within EC2 → Instances - do you see any running instances that would have been created by our ECS stack/launch configuration?
- These should be recently launched and on the whichever VPC security group you specified.
- There should be 1, by default (as default value is to have DesiredCapacity of 1)
- They should be in the Running instance state.
- If there are any issues setting these up (ie, these are not in the Running state) - should be able to get some logs from the startup by doing the following:
  - Clicking on the instance and going to the instance summary page.
  - Clicking on Actions → Monitor and troubleshoot → Get system log.
If there is an EC2 instance present and Running, worth taking a look at the ECS cluster itself:
- Under Elastic Container Service → Clusters → <name of cluster> (defaults to xtdb-cluster if left unchanged)
- Should see a cluster overview at the top - we expect to see Registered container instances to be equal to 1 (if using DesiredCapacity of 1)
- If this is not the case - there may be some issue with EC2 registering the running instance to the ECS cluster.
If there is a Registered container instances count of 1 - we should look at the service definition within the current cluster:
- Under Tasks - we should see one running task with the TaskDefinition created by xtdb-ecs.
- Under Logs - we should see some logs from the starting up of the node - something like the following set of logs:
  - Starting XTDB 2.x (pre-alpha) @ “dev-SNAPSHOT” @ commit-sha
  - Creating AWS resources for watching files on bucket <bucket-name>
  - Creating SQS queue xtdb-object-store-notifs-queue-<uuid>
  - Adding relevant permissions to allow SNS topic with ARN <SNS topic ARN>
  - Subscribing SQS queue xtdb-object-store-notifs-queue-<uuid> to SNS topic with ARN <SNS topic ARN>
  - Initializing filename list from bucket <bucket-name>
  - Watching for filechanges from bucket <bucket-name>
  - HTTP server started on port: 3000
  - Node started
- If, in the process of creating the ECS service, you see the above logs being outputted multiple times and/or the task is getting restarted a number of times - it may be the case that the service is failing healthchecks and restarting the task.
If all of the above is working - should be able to send a request to the node using the LoadBalancerUrl outputted from the xtdb-alb stack:
- curl -v <LoadBalancerUrl>/status
- Should receive a status code 200 message with {"latest-completed-tx":null,"latest-submitted-tx":null} returned - if it is anything else, curious to see what response you get?

I have a few suspicions/further questions on the above, based on what you see:

If the EC2 instance hasn’t setup correctly/isn’t Running:
- Curious to know which AWS region you are setting up the template on?
- There may be an issue with how we select the ImageId for the instance to use based on the region.
If the EC2 instance has been setup correctly and is Running:
- If it has not been registered as a container instance - curious to see if theres anything in the System Log that may suggest why.
- If it has been registered as a container instance and the task is in a state of constantly restarting / you do not get back anything from curl :
  - I suspect that the AWS resources (in particular, the Application Load Balancer) may not have internal access to the container/node - this would cause healthchecks to fail and the service to never be in the ‘ready’ state.
  - If you were using an existing VPC (ie, not setting up a new one using xtdb-vpc) it might be the case that the security group you are using does not allow the necessary ingress/permissions - worth a look at how it is setup within xtdb-vpc for reference.

Topic		Replies	Views
Adding XTDB 2 to an existing clustered app deployment Users v2	10	622	2 January 2024
V2 CloudFormation Template Error Users v2	1	172	23 January 2024
Example of containerized XTDB nodes with Kafka transaction and document store Users	8	345	20 November 2023
Scaling out and going to production with XTDB Users	1	773	6 May 2023
Setup guide for self hosted / plain Kubernetes clusters? Users v2	4	175	9 September 2024

V2 AWS CloudFormation template problem

Related topics