Athena query to table stored as parquet files on s3 returns empty result

0

I have a table created by files stored on S3 as parquet with snappy compression. The creation is like this:

CREATE EXTERNAL TABLE my_table( text STRING, url STRING ) STORED AS PARQUET LOCATION 's3://some_path/*.parquet' TBLPROPERTIES ("parquet.compress"="SNAPPY");

This matches the parquet schema, and the table creation is successful. However when I try to query the table, I always get empty result. I tried withMSCK REPAIR TABLE my_table;, it succeeds but shows Tables missing on filesystem. The file pattern I used seems correct though.

How can I fix this?

  • Looks like we don't support *.parquet in the table defintion. What are ways to workaround this? The reason I use this pattern is because I have other files (e.g. .json files) in the directory, so I can't use the directory directly

asked 21 days ago81 views
1 Answer
0
Accepted Answer

You can't limit the S3 location with anything other than a path prefix, and even then, the prefix must end in a forward slash /.

The first warning box on this documentation page https://docs.aws.amazon.com/athena/latest/ug/tables-location-format.html states this explicitly:

Important Athena reads all data stored in the Amazon S3 folder that you specify. If you have data that you do not want Athena to read, do not store that data in the same Amazon S3 folder as the data that you do want Athena to read.

EXPERT
Leo K
answered 21 days ago
  • Thank you! I have a follow-up question and appreciate if you can help - IIUC I can use AWS Glue to define such table, but I'm wondering if I need to have the crawler to make it work, as I suspect Glue itself doesn't support wildcard path either. Can you shed some light on this?