Terraform AWS Athena to use Glue catalog as db Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!AWS Accessing Redshift in a VPCCan crawlers update imported tables in AWS Glue?AWS Glue ETL Job fails with AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'HIVE_INVALID_METADATA in Amazon AthenaHow does AWS Glue ETL job retrieve data?Creating a Glue Data Catalog Table within a Glue JobAWS Glue multi-tenant files normalization into the common schemaWhy is my AWS Glue crawler not creating any tables?Adding multiple S3 paths to glue crawler with terraformAWS Glue - avro to parquet - Glue job getting an empty frame from catalog

Unix AIX passing variable and arguments to expect and spawn

Why do C and C++ allow the expression (int) + 4*5?

Is Bran literally the world's memory?

Trying to enter the Fox's den

What is the ongoing value of the Kanban board to the developers as opposed to management

Lights are flickering on and off after accidentally bumping into light switch

Pointing to problems without suggesting solutions

How to ask rejected full-time candidates to apply to teach individual courses?

A journey... into the MIND

2 sample t test for sample sizes - 30,000 and 150,000

What helicopter has the most rotor blades?

How is an IPA symbol that lacks a name (e.g. ɲ) called?

How to create a command for the "strange m" symbol in latex?

What is the difference between 准时 and 按时?

Who can become a wight?

Why does my GNOME settings mention "Moto C Plus"?

What documents does someone with a long-term visa need to travel to another Schengen country?

A German immigrant ancestor has a "Registration Affidavit of Alien Enemy" on file. What does that mean exactly?

How to charge percentage of transaction cost?

Who's this lady in the war room?

Checking IFI enabled on SQL server below 2016

Protagonist's race is hidden - should I reveal it?

Weaponising the Grasp-at-a-Distance spell

Marquee sign letters

Terraform AWS Athena to use Glue catalog as db

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)

Data science time! April 2019 and salary with experience

The Ask Question Wizard is Live!AWS Accessing Redshift in a VPCCan crawlers update imported tables in AWS Glue?AWS Glue ETL Job fails with AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'HIVE_INVALID_METADATA in Amazon AthenaHow does AWS Glue ETL job retrieve data?Creating a Glue Data Catalog Table within a Glue JobAWS Glue multi-tenant files normalization into the common schemaWhy is my AWS Glue crawler not creating any tables?Adding multiple S3 paths to glue crawler with terraformAWS Glue - avro to parquet - Glue job getting an empty frame from catalog

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I'm confused as to how I should use terraform to connect Athena to my Glue Catalog database.

I use

resource "aws_glue_catalog_database" "catalog_database" 
 name = "$var.glue_db_name"


resource "aws_glue_crawler" "datalake_crawler" 
 database_name = "$var.glue_db_name"
 name = "$var.crawler_name"
 role = "$aws_iam_role.crawler_iam_role.name"
 description = "$var.crawler_description"
 table_prefix = "$var.table_prefix"
 schedule = "$var.schedule" 

 s3_target 
 path = "s3://$var.data_bucket_name[0]"
 
 s3_target 
 path = "s3://$var.data_bucket_name[1]"

to create a Glue DB and the crawler to crawl an s3 bucket (here only two), but I don't know how I link the Athena query service to the Glue DB. In the terraform documentation for Athena, there doesn't appear to be a way to connect Athena to a Glue cataglog but only to an S3 Bucket. Clearly, however, Athena can be integrated with Glue.

How can I terraform an Athena database to use my Glue catalog as it's data source rather than an S3 bucket?

edited Apr 4 at 12:42

asked Mar 12 at 19:08

Steven

1,6601132

1

Did you run the crawler? Did it create AWS Glue tables? If you do not define aws_glue_catalog_table resources with terraform that point to their respective S3 locations, the crawler will need to run at least once to create the tables. Once they are created your Glue DB and the tables should become visible in Athena, even without defining a terraform aws_athena_database resource.

– Martin
Mar 28 at 15:29

I've run the crawler. There are no tables created in the Glue db. To be honest, I don't even know if I've given the crawler the correct role/policies: I've created a gist that includes the code I'm using: gist.github.com/stevenranney/3f5545e1e736266807b1f337e0be58e0

– Steven
Mar 28 at 18:52

I suggest to also add tag "terraform" (or "terraform-aws") to the question.

– Martin
Apr 4 at 6:38

add a comment |

I'm confused as to how I should use terraform to connect Athena to my Glue Catalog database.

I use

resource "aws_glue_catalog_database" "catalog_database" 
 name = "$var.glue_db_name"


resource "aws_glue_crawler" "datalake_crawler" 
 database_name = "$var.glue_db_name"
 name = "$var.crawler_name"
 role = "$aws_iam_role.crawler_iam_role.name"
 description = "$var.crawler_description"
 table_prefix = "$var.table_prefix"
 schedule = "$var.schedule" 

 s3_target 
 path = "s3://$var.data_bucket_name[0]"
 
 s3_target 
 path = "s3://$var.data_bucket_name[1]"

How can I terraform an Athena database to use my Glue catalog as it's data source rather than an S3 bucket?

edited Apr 4 at 12:42

asked Mar 12 at 19:08

Steven

1,6601132

1

Did you run the crawler? Did it create AWS Glue tables? If you do not define aws_glue_catalog_table resources with terraform that point to their respective S3 locations, the crawler will need to run at least once to create the tables. Once they are created your Glue DB and the tables should become visible in Athena, even without defining a terraform aws_athena_database resource.

– Martin
Mar 28 at 15:29

I've run the crawler. There are no tables created in the Glue db. To be honest, I don't even know if I've given the crawler the correct role/policies: I've created a gist that includes the code I'm using: gist.github.com/stevenranney/3f5545e1e736266807b1f337e0be58e0

– Steven
Mar 28 at 18:52

I suggest to also add tag "terraform" (or "terraform-aws") to the question.

– Martin
Apr 4 at 6:38

add a comment |

I'm confused as to how I should use terraform to connect Athena to my Glue Catalog database.

I use

resource "aws_glue_catalog_database" "catalog_database" 
 name = "$var.glue_db_name"


resource "aws_glue_crawler" "datalake_crawler" 
 database_name = "$var.glue_db_name"
 name = "$var.crawler_name"
 role = "$aws_iam_role.crawler_iam_role.name"
 description = "$var.crawler_description"
 table_prefix = "$var.table_prefix"
 schedule = "$var.schedule" 

 s3_target 
 path = "s3://$var.data_bucket_name[0]"
 
 s3_target 
 path = "s3://$var.data_bucket_name[1]"

How can I terraform an Athena database to use my Glue catalog as it's data source rather than an S3 bucket?

edited Apr 4 at 12:42

asked Mar 12 at 19:08

Steven

1,6601132

I'm confused as to how I should use terraform to connect Athena to my Glue Catalog database.

I use

resource "aws_glue_catalog_database" "catalog_database" 
 name = "$var.glue_db_name"


resource "aws_glue_crawler" "datalake_crawler" 
 database_name = "$var.glue_db_name"
 name = "$var.crawler_name"
 role = "$aws_iam_role.crawler_iam_role.name"
 description = "$var.crawler_description"
 table_prefix = "$var.table_prefix"
 schedule = "$var.schedule" 

 s3_target 
 path = "s3://$var.data_bucket_name[0]"
 
 s3_target 
 path = "s3://$var.data_bucket_name[1]"

How can I terraform an Athena database to use my Glue catalog as it's data source rather than an S3 bucket?

amazon-web-services terraform aws-glue terraform-provider-aws aws-glue-data-catalog

edited Apr 4 at 12:42

asked Mar 12 at 19:08

Steven

1,6601132

edited Apr 4 at 12:42

asked Mar 12 at 19:08

Steven

1,6601132

edited Apr 4 at 12:42

asked Mar 12 at 19:08

Steven

1,6601132

asked Mar 12 at 19:08

Steven

1,6601132

asked Mar 12 at 19:08

Steven

1,6601132

1

Did you run the crawler? Did it create AWS Glue tables? If you do not define aws_glue_catalog_table resources with terraform that point to their respective S3 locations, the crawler will need to run at least once to create the tables. Once they are created your Glue DB and the tables should become visible in Athena, even without defining a terraform aws_athena_database resource.

– Martin
Mar 28 at 15:29

I've run the crawler. There are no tables created in the Glue db. To be honest, I don't even know if I've given the crawler the correct role/policies: I've created a gist that includes the code I'm using: gist.github.com/stevenranney/3f5545e1e736266807b1f337e0be58e0

– Steven
Mar 28 at 18:52

I suggest to also add tag "terraform" (or "terraform-aws") to the question.

– Martin
Apr 4 at 6:38

add a comment |

1

Did you run the crawler? Did it create AWS Glue tables? If you do not define aws_glue_catalog_table resources with terraform that point to their respective S3 locations, the crawler will need to run at least once to create the tables. Once they are created your Glue DB and the tables should become visible in Athena, even without defining a terraform aws_athena_database resource.

– Martin
Mar 28 at 15:29

I've run the crawler. There are no tables created in the Glue db. To be honest, I don't even know if I've given the crawler the correct role/policies: I've created a gist that includes the code I'm using: gist.github.com/stevenranney/3f5545e1e736266807b1f337e0be58e0

– Steven
Mar 28 at 18:52

I suggest to also add tag "terraform" (or "terraform-aws") to the question.

– Martin
Apr 4 at 6:38

Did you run the crawler? Did it create AWS Glue tables? If you do not define aws_glue_catalog_table resources with terraform that point to their respective S3 locations, the crawler will need to run at least once to create the tables. Once they are created your Glue DB and the tables should become visible in Athena, even without defining a terraform aws_athena_database resource.

– Martin
Mar 28 at 15:29

I've run the crawler. There are no tables created in the Glue db. To be honest, I don't even know if I've given the crawler the correct role/policies: I've created a gist that includes the code I'm using: gist.github.com/stevenranney/3f5545e1e736266807b1f337e0be58e0

– Steven
Mar 28 at 18:52

I suggest to also add tag "terraform" (or "terraform-aws") to the question.

– Martin
Apr 4 at 6:38

add a comment |

2 Answers
2

active

oldest

votes

Our current basic setup for having Glue crawl one S3 bucket and create/update a table in a Glue DB, which can then be queried in Athena, looks like this:

Crawler role and role policy:

The assume_role_policy of the IAM role needs only Glue as principal

The IAM role policy allows actions for Glue, S3, and logs

The Glue actions and resources can probably be narrowed down to the ones really needed (this is work in progress)

The S3 actions are limited to those needed by the crawler

resource "aws_iam_role" "glue_crawler_role" 
 name = "analytics_glue_crawler_role"

 assume_role_policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Action": "sts:AssumeRole",
 "Principal": 
 "Service": "glue.amazonaws.com"
 ,
 "Effect": "Allow",
 "Sid": ""
 
 ]

EOF


resource "aws_iam_role_policy" "glue_crawler_role_policy" 
 name = "analytics_glue_crawler_role_policy"
 role = "$aws_iam_role.glue_crawler_role.id"
 policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Effect": "Allow",
 "Action": [
 "glue:*",
 ],
 "Resource": [
 "*"
 ]
 ,
 
 "Effect": "Allow",
 "Action": [
 "s3:GetBucketLocation",
 "s3:ListBucket",
 "s3:GetBucketAcl",
 "s3:GetObject",
 "s3:PutObject",
 "s3:DeleteObject"
 ],
 "Resource": [
 "arn:aws:s3:::analytics-product-data"
 ]
 ,
 
 "Effect": "Allow",
 "Action": [
 "logs:CreateLogGroup",
 "logs:CreateLogStream",
 "logs:PutLogEvents"
 ],
 "Resource": [
 "arn:aws:logs:*:*:/aws-glue/*"
 ]
 
 ]

EOF

S3 Bucket, Glue Database and Crawler:

resource "aws_s3_bucket" "product_bucket" 
 bucket = "analytics-product-data"
 acl = "private"


resource "aws_glue_catalog_database" "analytics_db" 
 name = "inventory-analytics-db"


resource "aws_glue_crawler" "product_crawler" 
 database_name = "$aws_glue_catalog_database.analytics_db.name"
 name = "analytics-product-crawler"
 role = "$aws_iam_role.glue_crawler_role.arn"

 schedule = "cron(0 0 * * ? *)"

 configuration = ""Version": 1.0, "CrawlerOutput": "Partitions": "AddOrUpdateBehavior": "InheritFromTable" , "Tables": "AddOrUpdateBehavior": "MergeNewColumns" "

 schema_change_policy 
 delete_behavior = "DELETE_FROM_DATABASE"
 

 s3_target 
 path = "s3://$aws_s3_bucket.product_bucket.bucket/products"

edited Mar 30 at 0:34

answered Mar 30 at 0:06

Martin

7118

add a comment |

I had many things wrong in my Terraform code. To start with:

The S3 bucket argument in the aws_athena_database code refers to the bucket for query output not the data the table should be built from.

I had set up my aws_glue_crawler to write to a Glue database rather than an Athena db. Indeed, as Martin suggested above, once correctly set up, Athena was able to see the tables in the Glue db.

I did not have the correct policies attached to my crawler. Initially, the only policy attached to the crawler role was
```
resource "aws_iam_role_policy_attachment" "crawler_attach" 
 policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
 role = "$aws_iam_role.crawler_iam_role.name"
 
```
after setting a second policy that explicitly allowed all S3 access to all of the buckets I wanted to crawl and attaching that policy to the same crawler role, the crawler ran and updated tables successfully.

The second policy:

resource "aws_iam_policy" "crawler_bucket_policy" 
 name = "crawler_bucket_policy"
 path = "/"
 description = "Gives crawler access to buckets"
 policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Sid": "Stmt1553807998309",
 "Action": "*",
 "Effect": "Allow",
 "Resource": "*"
 ,
 
 "Sid": "Stmt1553808056033",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket0"
 ,
 
 "Sid": "Stmt1553808078743",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket1"
 ,
 
 "Sid": "Stmt1553808099644",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket2"
 ,
 
 "Sid": "Stmt1553808114975",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket3"
 ,
 
 "Sid": "Stmt1553808128211",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket4"
 
 ]

EOF

I'm confident that I can get away from hardcoding the bucket names in this policy but I don't yet know how to do that.

answered Mar 28 at 22:20

Steven

1,6601132

crawler_bucket_policy has one statement allowing action "*" on resource "*" - this seems very open, is it really necessary? Our current setup, which I'll list as an answer below, allows action "glue:*" on resource "*" in addition to the s3 actions.

– Martin
Mar 29 at 23:42

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55129035%2fterraform-aws-athena-to-use-glue-catalog-as-db%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Our current basic setup for having Glue crawl one S3 bucket and create/update a table in a Glue DB, which can then be queried in Athena, looks like this:

Crawler role and role policy:

The assume_role_policy of the IAM role needs only Glue as principal

The IAM role policy allows actions for Glue, S3, and logs

The Glue actions and resources can probably be narrowed down to the ones really needed (this is work in progress)

The S3 actions are limited to those needed by the crawler

resource "aws_iam_role" "glue_crawler_role" 
 name = "analytics_glue_crawler_role"

 assume_role_policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Action": "sts:AssumeRole",
 "Principal": 
 "Service": "glue.amazonaws.com"
 ,
 "Effect": "Allow",
 "Sid": ""
 
 ]

EOF


resource "aws_iam_role_policy" "glue_crawler_role_policy" 
 name = "analytics_glue_crawler_role_policy"
 role = "$aws_iam_role.glue_crawler_role.id"
 policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Effect": "Allow",
 "Action": [
 "glue:*",
 ],
 "Resource": [
 "*"
 ]
 ,
 
 "Effect": "Allow",
 "Action": [
 "s3:GetBucketLocation",
 "s3:ListBucket",
 "s3:GetBucketAcl",
 "s3:GetObject",
 "s3:PutObject",
 "s3:DeleteObject"
 ],
 "Resource": [
 "arn:aws:s3:::analytics-product-data"
 ]
 ,
 
 "Effect": "Allow",
 "Action": [
 "logs:CreateLogGroup",
 "logs:CreateLogStream",
 "logs:PutLogEvents"
 ],
 "Resource": [
 "arn:aws:logs:*:*:/aws-glue/*"
 ]
 
 ]

EOF

S3 Bucket, Glue Database and Crawler:

resource "aws_s3_bucket" "product_bucket" 
 bucket = "analytics-product-data"
 acl = "private"


resource "aws_glue_catalog_database" "analytics_db" 
 name = "inventory-analytics-db"


resource "aws_glue_crawler" "product_crawler" 
 database_name = "$aws_glue_catalog_database.analytics_db.name"
 name = "analytics-product-crawler"
 role = "$aws_iam_role.glue_crawler_role.arn"

 schedule = "cron(0 0 * * ? *)"

 configuration = ""Version": 1.0, "CrawlerOutput": "Partitions": "AddOrUpdateBehavior": "InheritFromTable" , "Tables": "AddOrUpdateBehavior": "MergeNewColumns" "

 schema_change_policy 
 delete_behavior = "DELETE_FROM_DATABASE"
 

 s3_target 
 path = "s3://$aws_s3_bucket.product_bucket.bucket/products"

edited Mar 30 at 0:34

answered Mar 30 at 0:06

Martin

7118

add a comment |

Our current basic setup for having Glue crawl one S3 bucket and create/update a table in a Glue DB, which can then be queried in Athena, looks like this:

Crawler role and role policy:

The assume_role_policy of the IAM role needs only Glue as principal

The IAM role policy allows actions for Glue, S3, and logs

The Glue actions and resources can probably be narrowed down to the ones really needed (this is work in progress)

The S3 actions are limited to those needed by the crawler

resource "aws_iam_role" "glue_crawler_role" 
 name = "analytics_glue_crawler_role"

 assume_role_policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Action": "sts:AssumeRole",
 "Principal": 
 "Service": "glue.amazonaws.com"
 ,
 "Effect": "Allow",
 "Sid": ""
 
 ]

EOF


resource "aws_iam_role_policy" "glue_crawler_role_policy" 
 name = "analytics_glue_crawler_role_policy"
 role = "$aws_iam_role.glue_crawler_role.id"
 policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Effect": "Allow",
 "Action": [
 "glue:*",
 ],
 "Resource": [
 "*"
 ]
 ,
 
 "Effect": "Allow",
 "Action": [
 "s3:GetBucketLocation",
 "s3:ListBucket",
 "s3:GetBucketAcl",
 "s3:GetObject",
 "s3:PutObject",
 "s3:DeleteObject"
 ],
 "Resource": [
 "arn:aws:s3:::analytics-product-data"
 ]
 ,
 
 "Effect": "Allow",
 "Action": [
 "logs:CreateLogGroup",
 "logs:CreateLogStream",
 "logs:PutLogEvents"
 ],
 "Resource": [
 "arn:aws:logs:*:*:/aws-glue/*"
 ]
 
 ]

EOF

S3 Bucket, Glue Database and Crawler:

resource "aws_s3_bucket" "product_bucket" 
 bucket = "analytics-product-data"
 acl = "private"


resource "aws_glue_catalog_database" "analytics_db" 
 name = "inventory-analytics-db"


resource "aws_glue_crawler" "product_crawler" 
 database_name = "$aws_glue_catalog_database.analytics_db.name"
 name = "analytics-product-crawler"
 role = "$aws_iam_role.glue_crawler_role.arn"

 schedule = "cron(0 0 * * ? *)"

 configuration = ""Version": 1.0, "CrawlerOutput": "Partitions": "AddOrUpdateBehavior": "InheritFromTable" , "Tables": "AddOrUpdateBehavior": "MergeNewColumns" "

 schema_change_policy 
 delete_behavior = "DELETE_FROM_DATABASE"
 

 s3_target 
 path = "s3://$aws_s3_bucket.product_bucket.bucket/products"

edited Mar 30 at 0:34

answered Mar 30 at 0:06

Martin

7118

add a comment |

Our current basic setup for having Glue crawl one S3 bucket and create/update a table in a Glue DB, which can then be queried in Athena, looks like this:

Crawler role and role policy:

The assume_role_policy of the IAM role needs only Glue as principal

The IAM role policy allows actions for Glue, S3, and logs

The Glue actions and resources can probably be narrowed down to the ones really needed (this is work in progress)

The S3 actions are limited to those needed by the crawler

resource "aws_iam_role" "glue_crawler_role" 
 name = "analytics_glue_crawler_role"

 assume_role_policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Action": "sts:AssumeRole",
 "Principal": 
 "Service": "glue.amazonaws.com"
 ,
 "Effect": "Allow",
 "Sid": ""
 
 ]

EOF


resource "aws_iam_role_policy" "glue_crawler_role_policy" 
 name = "analytics_glue_crawler_role_policy"
 role = "$aws_iam_role.glue_crawler_role.id"
 policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Effect": "Allow",
 "Action": [
 "glue:*",
 ],
 "Resource": [
 "*"
 ]
 ,
 
 "Effect": "Allow",
 "Action": [
 "s3:GetBucketLocation",
 "s3:ListBucket",
 "s3:GetBucketAcl",
 "s3:GetObject",
 "s3:PutObject",
 "s3:DeleteObject"
 ],
 "Resource": [
 "arn:aws:s3:::analytics-product-data"
 ]
 ,
 
 "Effect": "Allow",
 "Action": [
 "logs:CreateLogGroup",
 "logs:CreateLogStream",
 "logs:PutLogEvents"
 ],
 "Resource": [
 "arn:aws:logs:*:*:/aws-glue/*"
 ]
 
 ]

EOF

S3 Bucket, Glue Database and Crawler:

resource "aws_s3_bucket" "product_bucket" 
 bucket = "analytics-product-data"
 acl = "private"


resource "aws_glue_catalog_database" "analytics_db" 
 name = "inventory-analytics-db"


resource "aws_glue_crawler" "product_crawler" 
 database_name = "$aws_glue_catalog_database.analytics_db.name"
 name = "analytics-product-crawler"
 role = "$aws_iam_role.glue_crawler_role.arn"

 schedule = "cron(0 0 * * ? *)"

 configuration = ""Version": 1.0, "CrawlerOutput": "Partitions": "AddOrUpdateBehavior": "InheritFromTable" , "Tables": "AddOrUpdateBehavior": "MergeNewColumns" "

 schema_change_policy 
 delete_behavior = "DELETE_FROM_DATABASE"
 

 s3_target 
 path = "s3://$aws_s3_bucket.product_bucket.bucket/products"

edited Mar 30 at 0:34

answered Mar 30 at 0:06

Martin

7118

Our current basic setup for having Glue crawl one S3 bucket and create/update a table in a Glue DB, which can then be queried in Athena, looks like this:

Crawler role and role policy:

The assume_role_policy of the IAM role needs only Glue as principal

The IAM role policy allows actions for Glue, S3, and logs

The Glue actions and resources can probably be narrowed down to the ones really needed (this is work in progress)

The S3 actions are limited to those needed by the crawler

resource "aws_iam_role" "glue_crawler_role" 
 name = "analytics_glue_crawler_role"

 assume_role_policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Action": "sts:AssumeRole",
 "Principal": 
 "Service": "glue.amazonaws.com"
 ,
 "Effect": "Allow",
 "Sid": ""
 
 ]

EOF


resource "aws_iam_role_policy" "glue_crawler_role_policy" 
 name = "analytics_glue_crawler_role_policy"
 role = "$aws_iam_role.glue_crawler_role.id"
 policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Effect": "Allow",
 "Action": [
 "glue:*",
 ],
 "Resource": [
 "*"
 ]
 ,
 
 "Effect": "Allow",
 "Action": [
 "s3:GetBucketLocation",
 "s3:ListBucket",
 "s3:GetBucketAcl",
 "s3:GetObject",
 "s3:PutObject",
 "s3:DeleteObject"
 ],
 "Resource": [
 "arn:aws:s3:::analytics-product-data"
 ]
 ,
 
 "Effect": "Allow",
 "Action": [
 "logs:CreateLogGroup",
 "logs:CreateLogStream",
 "logs:PutLogEvents"
 ],
 "Resource": [
 "arn:aws:logs:*:*:/aws-glue/*"
 ]
 
 ]

EOF

S3 Bucket, Glue Database and Crawler:

resource "aws_s3_bucket" "product_bucket" 
 bucket = "analytics-product-data"
 acl = "private"


resource "aws_glue_catalog_database" "analytics_db" 
 name = "inventory-analytics-db"


resource "aws_glue_crawler" "product_crawler" 
 database_name = "$aws_glue_catalog_database.analytics_db.name"
 name = "analytics-product-crawler"
 role = "$aws_iam_role.glue_crawler_role.arn"

 schedule = "cron(0 0 * * ? *)"

 configuration = ""Version": 1.0, "CrawlerOutput": "Partitions": "AddOrUpdateBehavior": "InheritFromTable" , "Tables": "AddOrUpdateBehavior": "MergeNewColumns" "

 schema_change_policy 
 delete_behavior = "DELETE_FROM_DATABASE"
 

 s3_target 
 path = "s3://$aws_s3_bucket.product_bucket.bucket/products"

edited Mar 30 at 0:34

answered Mar 30 at 0:06

Martin

7118

edited Mar 30 at 0:34

answered Mar 30 at 0:06

Martin

7118

answered Mar 30 at 0:06

Martin

7118

answered Mar 30 at 0:06

Martin

7118

add a comment |

I had many things wrong in my Terraform code. To start with:

The S3 bucket argument in the aws_athena_database code refers to the bucket for query output not the data the table should be built from.

I had set up my aws_glue_crawler to write to a Glue database rather than an Athena db. Indeed, as Martin suggested above, once correctly set up, Athena was able to see the tables in the Glue db.

I did not have the correct policies attached to my crawler. Initially, the only policy attached to the crawler role was
```
resource "aws_iam_role_policy_attachment" "crawler_attach" 
 policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
 role = "$aws_iam_role.crawler_iam_role.name"
 
```
after setting a second policy that explicitly allowed all S3 access to all of the buckets I wanted to crawl and attaching that policy to the same crawler role, the crawler ran and updated tables successfully.

The second policy:

resource "aws_iam_policy" "crawler_bucket_policy" 
 name = "crawler_bucket_policy"
 path = "/"
 description = "Gives crawler access to buckets"
 policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Sid": "Stmt1553807998309",
 "Action": "*",
 "Effect": "Allow",
 "Resource": "*"
 ,
 
 "Sid": "Stmt1553808056033",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket0"
 ,
 
 "Sid": "Stmt1553808078743",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket1"
 ,
 
 "Sid": "Stmt1553808099644",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket2"
 ,
 
 "Sid": "Stmt1553808114975",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket3"
 ,
 
 "Sid": "Stmt1553808128211",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket4"
 
 ]

EOF

I'm confident that I can get away from hardcoding the bucket names in this policy but I don't yet know how to do that.

answered Mar 28 at 22:20

Steven

1,6601132

crawler_bucket_policy has one statement allowing action "*" on resource "*" - this seems very open, is it really necessary? Our current setup, which I'll list as an answer below, allows action "glue:*" on resource "*" in addition to the s3 actions.

– Martin
Mar 29 at 23:42

add a comment |

I had many things wrong in my Terraform code. To start with:

The S3 bucket argument in the aws_athena_database code refers to the bucket for query output not the data the table should be built from.

I had set up my aws_glue_crawler to write to a Glue database rather than an Athena db. Indeed, as Martin suggested above, once correctly set up, Athena was able to see the tables in the Glue db.

I did not have the correct policies attached to my crawler. Initially, the only policy attached to the crawler role was
```
resource "aws_iam_role_policy_attachment" "crawler_attach" 
 policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
 role = "$aws_iam_role.crawler_iam_role.name"
 
```
after setting a second policy that explicitly allowed all S3 access to all of the buckets I wanted to crawl and attaching that policy to the same crawler role, the crawler ran and updated tables successfully.

The second policy:

resource "aws_iam_policy" "crawler_bucket_policy" 
 name = "crawler_bucket_policy"
 path = "/"
 description = "Gives crawler access to buckets"
 policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Sid": "Stmt1553807998309",
 "Action": "*",
 "Effect": "Allow",
 "Resource": "*"
 ,
 
 "Sid": "Stmt1553808056033",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket0"
 ,
 
 "Sid": "Stmt1553808078743",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket1"
 ,
 
 "Sid": "Stmt1553808099644",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket2"
 ,
 
 "Sid": "Stmt1553808114975",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket3"
 ,
 
 "Sid": "Stmt1553808128211",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket4"
 
 ]

EOF

I'm confident that I can get away from hardcoding the bucket names in this policy but I don't yet know how to do that.

answered Mar 28 at 22:20

Steven

1,6601132

crawler_bucket_policy has one statement allowing action "*" on resource "*" - this seems very open, is it really necessary? Our current setup, which I'll list as an answer below, allows action "glue:*" on resource "*" in addition to the s3 actions.

– Martin
Mar 29 at 23:42

add a comment |

I had many things wrong in my Terraform code. To start with:

The S3 bucket argument in the aws_athena_database code refers to the bucket for query output not the data the table should be built from.

I had set up my aws_glue_crawler to write to a Glue database rather than an Athena db. Indeed, as Martin suggested above, once correctly set up, Athena was able to see the tables in the Glue db.

I did not have the correct policies attached to my crawler. Initially, the only policy attached to the crawler role was
```
resource "aws_iam_role_policy_attachment" "crawler_attach" 
 policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
 role = "$aws_iam_role.crawler_iam_role.name"
 
```
after setting a second policy that explicitly allowed all S3 access to all of the buckets I wanted to crawl and attaching that policy to the same crawler role, the crawler ran and updated tables successfully.

The second policy:

resource "aws_iam_policy" "crawler_bucket_policy" 
 name = "crawler_bucket_policy"
 path = "/"
 description = "Gives crawler access to buckets"
 policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Sid": "Stmt1553807998309",
 "Action": "*",
 "Effect": "Allow",
 "Resource": "*"
 ,
 
 "Sid": "Stmt1553808056033",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket0"
 ,
 
 "Sid": "Stmt1553808078743",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket1"
 ,
 
 "Sid": "Stmt1553808099644",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket2"
 ,
 
 "Sid": "Stmt1553808114975",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket3"
 ,
 
 "Sid": "Stmt1553808128211",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket4"
 
 ]

EOF

I'm confident that I can get away from hardcoding the bucket names in this policy but I don't yet know how to do that.

answered Mar 28 at 22:20

Steven

1,6601132

I had many things wrong in my Terraform code. To start with:

The S3 bucket argument in the aws_athena_database code refers to the bucket for query output not the data the table should be built from.

I had set up my aws_glue_crawler to write to a Glue database rather than an Athena db. Indeed, as Martin suggested above, once correctly set up, Athena was able to see the tables in the Glue db.

I did not have the correct policies attached to my crawler. Initially, the only policy attached to the crawler role was
```
resource "aws_iam_role_policy_attachment" "crawler_attach" 
 policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
 role = "$aws_iam_role.crawler_iam_role.name"
 
```
after setting a second policy that explicitly allowed all S3 access to all of the buckets I wanted to crawl and attaching that policy to the same crawler role, the crawler ran and updated tables successfully.

The second policy:

resource "aws_iam_policy" "crawler_bucket_policy" 
 name = "crawler_bucket_policy"
 path = "/"
 description = "Gives crawler access to buckets"
 policy = <<EOF

 "Version": "2012-10-17",
 "Statement": [
 
 "Sid": "Stmt1553807998309",
 "Action": "*",
 "Effect": "Allow",
 "Resource": "*"
 ,
 
 "Sid": "Stmt1553808056033",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket0"
 ,
 
 "Sid": "Stmt1553808078743",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket1"
 ,
 
 "Sid": "Stmt1553808099644",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket2"
 ,
 
 "Sid": "Stmt1553808114975",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket3"
 ,
 
 "Sid": "Stmt1553808128211",
 "Action": "s3:*",
 "Effect": "Allow",
 "Resource": "arn:aws:s3:::bucket4"
 
 ]

EOF

I'm confident that I can get away from hardcoding the bucket names in this policy but I don't yet know how to do that.

answered Mar 28 at 22:20

Steven

1,6601132

answered Mar 28 at 22:20

Steven

1,6601132

answered Mar 28 at 22:20

Steven

1,6601132

answered Mar 28 at 22:20

Steven

1,6601132

crawler_bucket_policy has one statement allowing action "*" on resource "*" - this seems very open, is it really necessary? Our current setup, which I'll list as an answer below, allows action "glue:*" on resource "*" in addition to the s3 actions.

– Martin
Mar 29 at 23:42

add a comment |

crawler_bucket_policy has one statement allowing action "*" on resource "*" - this seems very open, is it really necessary? Our current setup, which I'll list as an answer below, allows action "glue:*" on resource "*" in addition to the s3 actions.

– Martin
Mar 29 at 23:42

crawler_bucket_policy has one statement allowing action "*" on resource "*" - this seems very open, is it really necessary? Our current setup, which I'll list as an answer below, allows action "glue:*" on resource "*" in addition to the s3 actions.

– Martin
Mar 29 at 23:42

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

47WVh,09jj3 usHXUhYymu4rGT5wS cBuW WTeMria0BWq icFadxIG,lV9,fug6EC C1,CTwpgdnwRDEZ,5J6cH7

搜尋此網誌

Styjun

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

2 Answers
2

2 Answers
2

2 Answers
2