Discussion:
QUESTION: Drill Configuration to access S3 buckets
(too old to reply)
Сергей Боровик
2017-06-14 18:38:38 UTC
Permalink
Raw Message
Hi!
I have an AWS EC2 instance with apache drill 1-10.0.and configured IAM Role.

And I am able to access and query S3 bucket in US East (N. Virginia)
region,
but not able to access/query buckets in US East (Ohio) region, it fails with
"error: system error: amazons3exception: status code 400, AWS Service:
Amazon S3,
AWS Request ID:9D54A8310F26582B, AWS Error Code: null, AWS Error Message:
Bad Request"


I've tried set conf/core-site.xml property to:

<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-2.amazonaws.com</value>
</property>

in this case Ohio fails with the same error,
and N. Virginia has error status code 301, AWS Error Code:
PermanentRedirect,
AWS Error message: The bucket you are attempting to access must be
addressed using the specified endpoint

1) Is there any specific configuration that needs to be enabled on Drill
for Ohio region?
2) Does Drill not work on aws signature version 4?

Thank you in advance.
Any advice is much appreciated!
Jack Ingoldsby
2017-06-15 04:18:53 UTC
Permalink
Raw Message
Useful to know, thanks. Also having problems with Ohio. Will try another
region
Post by Сергей Боровик
Hi!
I have an AWS EC2 instance with apache drill 1-10.0.and configured IAM Role.
And I am able to access and query S3 bucket in US East (N. Virginia)
region,
but not able to access/query buckets in US East (Ohio) region, it fails with
Amazon S3,
Bad Request"
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-2.amazonaws.com</value>
</property>
in this case Ohio fails with the same error,
PermanentRedirect,
AWS Error message: The bucket you are attempting to access must be
addressed using the specified endpoint
1) Is there any specific configuration that needs to be enabled on Drill
for Ohio region?
2) Does Drill not work on aws signature version 4?
Thank you in advance.
Any advice is much appreciated!
Shankar Mane
2017-06-15 04:36:54 UTC
Permalink
Raw Message
aws new regions uses only signature version 4 protocol for S3. Other
regions has both V2 and V4 compatible. Drill works very well if regions has
both signature versions.

By adding endpoints, same problem persists. May be Drill API doesn't have
support to V4 protocol yet.

This V4 problems is also with native hadoop versions prior to 2.8.0.
Post by Jack Ingoldsby
Useful to know, thanks. Also having problems with Ohio. Will try another
region
Post by Сергей Боровик
Hi!
I have an AWS EC2 instance with apache drill 1-10.0.and configured IAM Role.
And I am able to access and query S3 bucket in US East (N. Virginia)
region,
but not able to access/query buckets in US East (Ohio) region, it fails with
Amazon S3,
Bad Request"
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-2.amazonaws.com</value>
</property>
in this case Ohio fails with the same error,
PermanentRedirect,
AWS Error message: The bucket you are attempting to access must be
addressed using the specified endpoint
1) Is there any specific configuration that needs to be enabled on Drill
for Ohio region?
2) Does Drill not work on aws signature version 4?
Thank you in advance.
Any advice is much appreciated!
Uwe L. Korn
2017-06-15 05:39:02 UTC
Permalink
Raw Message
The current Drill releases use the hadoop-io libraries from the 2.7.x series. Locally I have built against the 3.0.0 alpha (2.8 should also work) and can access the regions with newer signature versions. But you should be careful with that as I had to do some code changes to have it built with the 3.0 jars and there were some breaking unit tests afterwards.

Also note that 2.8/3.0 greatly improves on S3 performance if you select the new (and experimental) random-access mode in s3a. This resulted for me in massive improvements for queries that only access a fraction of all columns or that have multiple RowGroups inside each Parquet file.
Post by Shankar Mane
aws new regions uses only signature version 4 protocol for S3. Other
regions has both V2 and V4 compatible. Drill works very well if regions has
both signature versions.
By adding endpoints, same problem persists. May be Drill API doesn't have
support to V4 protocol yet.
This V4 problems is also with native hadoop versions prior to 2.8.0.
Post by Jack Ingoldsby
Useful to know, thanks. Also having problems with Ohio. Will try another
region
Post by Сергей Боровик
Hi!
I have an AWS EC2 instance with apache drill 1-10.0.and configured IAM Role.
And I am able to access and query S3 bucket in US East (N. Virginia)
region,
but not able to access/query buckets in US East (Ohio) region, it fails with
Amazon S3,
Bad Request"
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-2.amazonaws.com</value>
</property>
in this case Ohio fails with the same error,
PermanentRedirect,
AWS Error message: The bucket you are attempting to access must be
addressed using the specified endpoint
1) Is there any specific configuration that needs to be enabled on Drill
for Ohio region?
2) Does Drill not work on aws signature version 4?
Thank you in advance.
Any advice is much appreciated!
Jack Ingoldsby
2017-06-15 11:20:59 UTC
Permalink
Raw Message
Thx for this. Sounds like a combination of AWS/Drill factors.
Are we likely to address the Drill side in a subsequent release?
Post by Uwe L. Korn
The current Drill releases use the hadoop-io libraries from the 2.7.x
series. Locally I have built against the 3.0.0 alpha (2.8 should also work)
and can access the regions with newer signature versions. But you should be
careful with that as I had to do some code changes to have it built with
the 3.0 jars and there were some breaking unit tests afterwards.
Also note that 2.8/3.0 greatly improves on S3 performance if you select
the new (and experimental) random-access mode in s3a. This resulted for me
in massive improvements for queries that only access a fraction of all
columns or that have multiple RowGroups inside each Parquet file.
Post by Shankar Mane
aws new regions uses only signature version 4 protocol for S3. Other
regions has both V2 and V4 compatible. Drill works very well if regions
has
Post by Shankar Mane
both signature versions.
By adding endpoints, same problem persists. May be Drill API doesn't have
support to V4 protocol yet.
This V4 problems is also with native hadoop versions prior to 2.8.0.
Post by Jack Ingoldsby
Useful to know, thanks. Also having problems with Ohio. Will try another
region
Post by Сергей Боровик
Hi!
I have an AWS EC2 instance with apache drill 1-10.0.and configured IAM Role.
And I am able to access and query S3 bucket in US East (N. Virginia)
region,
but not able to access/query buckets in US East (Ohio) region, it fails with
Amazon S3,
AWS Request ID:9D54A8310F26582B, AWS Error Code: null, AWS Error
Bad Request"
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-2.amazonaws.com</value>
</property>
in this case Ohio fails with the same error,
PermanentRedirect,
AWS Error message: The bucket you are attempting to access must be
addressed using the specified endpoint
1) Is there any specific configuration that needs to be enabled on
Drill
Post by Shankar Mane
Post by Jack Ingoldsby
Post by Сергей Боровик
for Ohio region?
2) Does Drill not work on aws signature version 4?
Thank you in advance.
Any advice is much appreciated!
Jack Ingoldsby
2017-06-15 18:08:58 UTC
Permalink
Raw Message
Was able to connect to N Virginia, thanks.
But to be able to use the Drill as a standard tool, would need to be able
to connect to all regions, of course
Post by Jack Ingoldsby
Thx for this. Sounds like a combination of AWS/Drill factors.
Are we likely to address the Drill side in a subsequent release?
Post by Uwe L. Korn
The current Drill releases use the hadoop-io libraries from the 2.7.x
series. Locally I have built against the 3.0.0 alpha (2.8 should also work)
and can access the regions with newer signature versions. But you should be
careful with that as I had to do some code changes to have it built with
the 3.0 jars and there were some breaking unit tests afterwards.
Also note that 2.8/3.0 greatly improves on S3 performance if you select
the new (and experimental) random-access mode in s3a. This resulted for me
in massive improvements for queries that only access a fraction of all
columns or that have multiple RowGroups inside each Parquet file.
Post by Shankar Mane
aws new regions uses only signature version 4 protocol for S3. Other
regions has both V2 and V4 compatible. Drill works very well if regions
has
Post by Shankar Mane
both signature versions.
By adding endpoints, same problem persists. May be Drill API doesn't
have
Post by Shankar Mane
support to V4 protocol yet.
This V4 problems is also with native hadoop versions prior to 2.8.0.
Post by Jack Ingoldsby
Useful to know, thanks. Also having problems with Ohio. Will try
another
Post by Shankar Mane
Post by Jack Ingoldsby
region
Post by Сергей Боровик
Hi!
I have an AWS EC2 instance with apache drill 1-10.0.and configured IAM Role.
And I am able to access and query S3 bucket in US East (N. Virginia)
region,
but not able to access/query buckets in US East (Ohio) region, it
fails
Post by Shankar Mane
Post by Jack Ingoldsby
Post by Сергей Боровик
with
Amazon S3,
AWS Request ID:9D54A8310F26582B, AWS Error Code: null, AWS Error
Bad Request"
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-2.amazonaws.com</value>
</property>
in this case Ohio fails with the same error,
PermanentRedirect,
AWS Error message: The bucket you are attempting to access must be
addressed using the specified endpoint
1) Is there any specific configuration that needs to be enabled on
Drill
Post by Shankar Mane
Post by Jack Ingoldsby
Post by Сергей Боровик
for Ohio region?
2) Does Drill not work on aws signature version 4?
Thank you in advance.
Any advice is much appreciated!
Loading...