Discussion:
Connecting to S3 bucket which does not seem to require a key
(too old to reply)
Jack Ingoldsby
2017-06-07 22:02:34 UTC
Permalink
Raw Message
Hi,
I'm trying to access the NYC Citibike S3 bucket, which seems to publicly
available

https://s3.amazonaws.com/tripdata/index.html
If I leave the Access Key & Secret Key empty, I get the following message

0: jdbc:drill:zk=local> !tables
Error: Failure getting metadata: Unable to load AWS credentials from any
provider in the chain (state=,code=0)

If I try entering random numbers as keys, I get the following message

Error: Failure getting metadata: Status Code: 403, AWS Service: Amazon S3,
AWS Request ID: 1C888A3A21D79F87, AWS Error Code: InvalidAccessKeyId, AWS
Error Message: The AWS Access Key Id you provided does not exist in our
records. (state=,code=0)

Is it possible to connect to a data source that does not seem to require a
key?

Thanks,
Jack
Abhishek Girish
2017-06-12 05:48:14 UTC
Permalink
Raw Message
Drill connects to to S3 buckets (AWS) via the S3a library. And the storage
plugin configuration requires the access & secret keys [1].

I'm not sure if Drill can access S3 without the credentials. It might be
possible via custom authenticators [2]. Hopefully others who have tried
this will comment.


[1] https://drill.apache.org/docs/s3-storage-plugin/
[2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
v4-authenticating-requests.html
Post by Jack Ingoldsby
Hi,
I'm trying to access the NYC Citibike S3 bucket, which seems to publicly
available
https://s3.amazonaws.com/tripdata/index.html
If I leave the Access Key & Secret Key empty, I get the following message
0: jdbc:drill:zk=local> !tables
Error: Failure getting metadata: Unable to load AWS credentials from any
provider in the chain (state=,code=0)
If I try entering random numbers as keys, I get the following message
Error: Failure getting metadata: Status Code: 403, AWS Service: Amazon S3,
AWS Request ID: 1C888A3A21D79F87, AWS Error Code: InvalidAccessKeyId, AWS
Error Message: The AWS Access Key Id you provided does not exist in our
records. (state=,code=0)
Is it possible to connect to a data source that does not seem to require a
key?
Thanks,
Jack
Andries Engelbrecht
2017-06-12 14:43:14 UTC
Permalink
Raw Message
You may be better of downloading the NYC bike data set locally and convert to parquet.
Converting from csv.zip to parquet will result in large improvements in performance if you do various queries on the data set.

--Andries

On 6/11/17, 10:48 PM, "Abhishek Girish" <***@apache.org> wrote:

Drill connects to to S3 buckets (AWS) via the S3a library. And the storage
plugin configuration requires the access & secret keys [1].

I'm not sure if Drill can access S3 without the credentials. It might be
possible via custom authenticators [2]. Hopefully others who have tried
this will comment.


[1] https://drill.apache.org/docs/s3-storage-plugin/
[2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
v4-authenticating-requests.html
Post by Jack Ingoldsby
Hi,
I'm trying to access the NYC Citibike S3 bucket, which seems to publicly
available
https://s3.amazonaws.com/tripdata/index.html
If I leave the Access Key & Secret Key empty, I get the following message
0: jdbc:drill:zk=local> !tables
Error: Failure getting metadata: Unable to load AWS credentials from any
provider in the chain (state=,code=0)
If I try entering random numbers as keys, I get the following message
Error: Failure getting metadata: Status Code: 403, AWS Service: Amazon S3,
AWS Request ID: 1C888A3A21D79F87, AWS Error Code: InvalidAccessKeyId, AWS
Error Message: The AWS Access Key Id you provided does not exist in our
records. (state=,code=0)
Is it possible to connect to a data source that does not seem to require a
key?
Thanks,
Jack
Jack Ingoldsby
2017-06-12 15:34:43 UTC
Permalink
Raw Message
Hi,
Thanks. I'm actually more playing around with a proof of concept that I can
query S3 using our tool via Drill.
So, what I did was to download the citibike and data and create my own s3
bucket with an accessid,secretket , but I'm having some problem connecting
I get the following error message when running a query

org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request
ID: 439EE2E823001E80, AWS Error Code: null, AWS Error Message: Bad Request
[Error Id: 9da0c6bd-b173-48e0-aeac-47179812e696 on
LAP-NY-CHENO.corp.sisense.com:31010]

It appears to be a connection issue but i can connect to the bucket
sisense.citibike using AWS command line utility, using the same accesskey,
secretkey
Does anything leap out ?

The configuration is set to

{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "ID",
"fs.s3a.secret.key": "SECRET"
},


Core-site.xml is set to

<configuration>

<property>
<name>fs.s3a.access.key</name>
<value>AKIAJELPGZYEPGRP6VBA</value>
</property>

<property>
<name>fs.s3a.secret.key</name>
<value>h3CyqC/VzpRirOMi3nCImYJL2oNV1xwOcEBiYi02</value>
</property>

</configuration>

Thanks,
Jack
Post by Andries Engelbrecht
You may be better of downloading the NYC bike data set locally and convert to parquet.
Converting from csv.zip to parquet will result in large improvements in
performance if you do various queries on the data set.
--Andries
Drill connects to to S3 buckets (AWS) via the S3a library. And the storage
plugin configuration requires the access & secret keys [1].
I'm not sure if Drill can access S3 without the credentials. It might be
possible via custom authenticators [2]. Hopefully others who have tried
this will comment.
[1] https://drill.apache.org/docs/s3-storage-plugin/
[2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
v4-authenticating-requests.html
On Wed, Jun 7, 2017 at 3:02 PM, Jack Ingoldsby <
Post by Jack Ingoldsby
Hi,
I'm trying to access the NYC Citibike S3 bucket, which seems to
publicly
Post by Jack Ingoldsby
available
https://s3.amazonaws.com/tripdata/index.html
If I leave the Access Key & Secret Key empty, I get the following
message
Post by Jack Ingoldsby
0: jdbc:drill:zk=local> !tables
Error: Failure getting metadata: Unable to load AWS credentials from
any
Post by Jack Ingoldsby
provider in the chain (state=,code=0)
If I try entering random numbers as keys, I get the following message
Amazon S3,
InvalidAccessKeyId, AWS
Post by Jack Ingoldsby
Error Message: The AWS Access Key Id you provided does not exist in
our
Post by Jack Ingoldsby
records. (state=,code=0)
Is it possible to connect to a data source that does not seem to
require a
Post by Jack Ingoldsby
key?
Thanks,
Jack
Abhishek Girish
2017-06-12 15:59:59 UTC
Permalink
Raw Message
I hope you haven't shared your actual access / secret keys with the
community. If not, please work on securing your account [1]!


[1] https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/
Post by Jack Ingoldsby
Hi,
Thanks. I'm actually more playing around with a proof of concept that I can
query S3 using our tool via Drill.
So, what I did was to download the citibike and data and create my own s3
bucket with an accessid,secretket , but I'm having some problem connecting
I get the following error message when running a query
AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request
ID: 439EE2E823001E80, AWS Error Code: null, AWS Error Message: Bad Request
[Error Id: 9da0c6bd-b173-48e0-aeac-47179812e696 on
LAP-NY-CHENO.corp.sisense.com:31010]
It appears to be a connection issue but i can connect to the bucket
sisense.citibike using AWS command line utility, using the same accesskey,
secretkey
Does anything leap out ?
The configuration is set to
{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "ID",
"fs.s3a.secret.key": "SECRET"
},
Core-site.xml is set to
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>AKIAJELPGZYEPGRP6VBA</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>h3CyqC/VzpRirOMi3nCImYJL2oNV1xwOcEBiYi02</value>
</property>
</configuration>
Thanks,
Jack
On Mon, Jun 12, 2017 at 10:43 AM, Andries Engelbrecht <
Post by Andries Engelbrecht
You may be better of downloading the NYC bike data set locally and
convert
Post by Andries Engelbrecht
to parquet.
Converting from csv.zip to parquet will result in large improvements in
performance if you do various queries on the data set.
--Andries
Drill connects to to S3 buckets (AWS) via the S3a library. And the storage
plugin configuration requires the access & secret keys [1].
I'm not sure if Drill can access S3 without the credentials. It might be
possible via custom authenticators [2]. Hopefully others who have
tried
Post by Andries Engelbrecht
this will comment.
[1] https://drill.apache.org/docs/s3-storage-plugin/
[2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
v4-authenticating-requests.html
On Wed, Jun 7, 2017 at 3:02 PM, Jack Ingoldsby <
Post by Jack Ingoldsby
Hi,
I'm trying to access the NYC Citibike S3 bucket, which seems to
publicly
Post by Jack Ingoldsby
available
https://s3.amazonaws.com/tripdata/index.html
If I leave the Access Key & Secret Key empty, I get the following
message
Post by Jack Ingoldsby
0: jdbc:drill:zk=local> !tables
Error: Failure getting metadata: Unable to load AWS credentials
from
Post by Andries Engelbrecht
any
Post by Jack Ingoldsby
provider in the chain (state=,code=0)
If I try entering random numbers as keys, I get the following
message
Post by Andries Engelbrecht
Amazon S3,
InvalidAccessKeyId, AWS
Post by Jack Ingoldsby
Error Message: The AWS Access Key Id you provided does not exist in
our
Post by Jack Ingoldsby
records. (state=,code=0)
Is it possible to connect to a data source that does not seem to
require a
Post by Jack Ingoldsby
key?
Thanks,
Jack
Jack Ingoldsby
2017-06-12 16:13:14 UTC
Permalink
Raw Message
Well, these are for a specific user I created for this bucket. The user
only has read access to this bucket, which only contains this public
citibike data and has no permissions access.
So, I'm fine if anyone can connect (at least until I figure out the problem)
Post by Abhishek Girish
I hope you haven't shared your actual access / secret keys with the
community. If not, please work on securing your account [1]!
[1] https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/
Post by Jack Ingoldsby
Hi,
Thanks. I'm actually more playing around with a proof of concept that I
can
Post by Jack Ingoldsby
query S3 using our tool via Drill.
So, what I did was to download the citibike and data and create my own s3
bucket with an accessid,secretket , but I'm having some problem
connecting
Post by Jack Ingoldsby
I get the following error message when running a query
AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request
ID: 439EE2E823001E80, AWS Error Code: null, AWS Error Message: Bad
Request
Post by Jack Ingoldsby
[Error Id: 9da0c6bd-b173-48e0-aeac-47179812e696 on
LAP-NY-CHENO.corp.sisense.com:31010]
It appears to be a connection issue but i can connect to the bucket
sisense.citibike using AWS command line utility, using the same
accesskey,
Post by Jack Ingoldsby
secretkey
Does anything leap out ?
The configuration is set to
{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "ID",
"fs.s3a.secret.key": "SECRET"
},
Core-site.xml is set to
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>AKIAJELPGZYEPGRP6VBA</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>h3CyqC/VzpRirOMi3nCImYJL2oNV1xwOcEBiYi02</value>
</property>
</configuration>
Thanks,
Jack
On Mon, Jun 12, 2017 at 10:43 AM, Andries Engelbrecht <
Post by Andries Engelbrecht
You may be better of downloading the NYC bike data set locally and
convert
Post by Andries Engelbrecht
to parquet.
Converting from csv.zip to parquet will result in large improvements in
performance if you do various queries on the data set.
--Andries
Drill connects to to S3 buckets (AWS) via the S3a library. And the storage
plugin configuration requires the access & secret keys [1].
I'm not sure if Drill can access S3 without the credentials. It
might
Post by Jack Ingoldsby
Post by Andries Engelbrecht
be
possible via custom authenticators [2]. Hopefully others who have
tried
Post by Andries Engelbrecht
this will comment.
[1] https://drill.apache.org/docs/s3-storage-plugin/
[2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
v4-authenticating-requests.html
On Wed, Jun 7, 2017 at 3:02 PM, Jack Ingoldsby <
Post by Jack Ingoldsby
Hi,
I'm trying to access the NYC Citibike S3 bucket, which seems to
publicly
Post by Jack Ingoldsby
available
https://s3.amazonaws.com/tripdata/index.html
If I leave the Access Key & Secret Key empty, I get the following
message
Post by Jack Ingoldsby
0: jdbc:drill:zk=local> !tables
Error: Failure getting metadata: Unable to load AWS credentials
from
Post by Andries Engelbrecht
any
Post by Jack Ingoldsby
provider in the chain (state=,code=0)
If I try entering random numbers as keys, I get the following
message
Post by Andries Engelbrecht
Amazon S3,
InvalidAccessKeyId, AWS
Post by Jack Ingoldsby
Error Message: The AWS Access Key Id you provided does not exist
in
Post by Jack Ingoldsby
Post by Andries Engelbrecht
our
Post by Jack Ingoldsby
records. (state=,code=0)
Is it possible to connect to a data source that does not seem to
require a
Post by Jack Ingoldsby
key?
Thanks,
Jack
Abhishek Girish
2017-06-12 16:41:30 UTC
Permalink
Raw Message
That's good to know. I just didn't want Drill community to be the place
your keys were leaked :)

I attempted with your keys and could reproduce the issue. One guess is that
it could be due to location constraints [1].

You can attempt to set the "fs.s3a.endpoint" property in S3 config and give
it a try. For example:

{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "AKIAJELPGZYEPGRP6VBA",
"fs.s3a.secret.key": "h3CyqC/VzpRirOMi3nCImYJL2oNV1xwOcEBiYi02",
"fs.s3a.endpoint": "s3-us-west-2.amazonaws.com" // Pointing to the
region of the bucket
}
...
...
}


[1] http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
Post by Jack Ingoldsby
Well, these are for a specific user I created for this bucket. The user
only has read access to this bucket, which only contains this public
citibike data and has no permissions access.
So, I'm fine if anyone can connect (at least until I figure out the problem)
Post by Abhishek Girish
I hope you haven't shared your actual access / secret keys with the
community. If not, please work on securing your account [1]!
[1] https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/
On Mon, Jun 12, 2017 at 8:34 AM, Jack Ingoldsby <
Post by Jack Ingoldsby
Hi,
Thanks. I'm actually more playing around with a proof of concept that I
can
Post by Jack Ingoldsby
query S3 using our tool via Drill.
So, what I did was to download the citibike and data and create my own
s3
Post by Abhishek Girish
Post by Jack Ingoldsby
bucket with an accessid,secretket , but I'm having some problem
connecting
Post by Jack Ingoldsby
I get the following error message when running a query
AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS
Request
Post by Abhishek Girish
Post by Jack Ingoldsby
ID: 439EE2E823001E80, AWS Error Code: null, AWS Error Message: Bad
Request
Post by Jack Ingoldsby
[Error Id: 9da0c6bd-b173-48e0-aeac-47179812e696 on
LAP-NY-CHENO.corp.sisense.com:31010]
It appears to be a connection issue but i can connect to the bucket
sisense.citibike using AWS command line utility, using the same
accesskey,
Post by Jack Ingoldsby
secretkey
Does anything leap out ?
The configuration is set to
{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "ID",
"fs.s3a.secret.key": "SECRET"
},
Core-site.xml is set to
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>AKIAJELPGZYEPGRP6VBA</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>h3CyqC/VzpRirOMi3nCImYJL2oNV1xwOcEBiYi02</value>
</property>
</configuration>
Thanks,
Jack
On Mon, Jun 12, 2017 at 10:43 AM, Andries Engelbrecht <
Post by Andries Engelbrecht
You may be better of downloading the NYC bike data set locally and
convert
Post by Andries Engelbrecht
to parquet.
Converting from csv.zip to parquet will result in large improvements
in
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
performance if you do various queries on the data set.
--Andries
Drill connects to to S3 buckets (AWS) via the S3a library. And
the
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
storage
plugin configuration requires the access & secret keys [1].
I'm not sure if Drill can access S3 without the credentials. It
might
Post by Jack Ingoldsby
Post by Andries Engelbrecht
be
possible via custom authenticators [2]. Hopefully others who have
tried
Post by Andries Engelbrecht
this will comment.
[1] https://drill.apache.org/docs/s3-storage-plugin/
[2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
v4-authenticating-requests.html
On Wed, Jun 7, 2017 at 3:02 PM, Jack Ingoldsby <
Post by Jack Ingoldsby
Hi,
I'm trying to access the NYC Citibike S3 bucket, which seems to
publicly
Post by Jack Ingoldsby
available
https://s3.amazonaws.com/tripdata/index.html
If I leave the Access Key & Secret Key empty, I get the
following
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
message
Post by Jack Ingoldsby
0: jdbc:drill:zk=local> !tables
Error: Failure getting metadata: Unable to load AWS credentials
from
Post by Andries Engelbrecht
any
Post by Jack Ingoldsby
provider in the chain (state=,code=0)
If I try entering random numbers as keys, I get the following
message
Post by Andries Engelbrecht
Amazon S3,
InvalidAccessKeyId, AWS
Post by Jack Ingoldsby
Error Message: The AWS Access Key Id you provided does not
exist
Post by Abhishek Girish
in
Post by Jack Ingoldsby
Post by Andries Engelbrecht
our
Post by Jack Ingoldsby
records. (state=,code=0)
Is it possible to connect to a data source that does not seem
to
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
require a
Post by Jack Ingoldsby
key?
Thanks,
Jack
Jack Ingoldsby
2017-06-12 17:38:57 UTC
Permalink
Raw Message
Thanks, but unfortunately that didn't work either.... hmm

{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "AKIAJELPGZYEPGRP6VBA",
"fs.s3a.secret.key": "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",
"fs.s3a.endpoint": "s3-us-east-2.amazonaws.com"
},
Post by Abhishek Girish
That's good to know. I just didn't want Drill community to be the place
your keys were leaked :)
I attempted with your keys and could reproduce the issue. One guess is that
it could be due to location constraints [1].
You can attempt to set the "fs.s3a.endpoint" property in S3 config and give
{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "AKIAJELPGZYEPGRP6VBA",
"fs.s3a.secret.key": "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",
"fs.s3a.endpoint": "s3-us-west-2.amazonaws.com" // Pointing to the
region of the bucket
}
...
...
}
[1] http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
Post by Jack Ingoldsby
Well, these are for a specific user I created for this bucket. The user
only has read access to this bucket, which only contains this public
citibike data and has no permissions access.
So, I'm fine if anyone can connect (at least until I figure out the problem)
Post by Abhishek Girish
I hope you haven't shared your actual access / secret keys with the
community. If not, please work on securing your account [1]!
[1] https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/
On Mon, Jun 12, 2017 at 8:34 AM, Jack Ingoldsby <
Post by Jack Ingoldsby
Hi,
Thanks. I'm actually more playing around with a proof of concept
that I
Post by Jack Ingoldsby
Post by Abhishek Girish
can
Post by Jack Ingoldsby
query S3 using our tool via Drill.
So, what I did was to download the citibike and data and create my
own
Post by Jack Ingoldsby
s3
Post by Abhishek Girish
Post by Jack Ingoldsby
bucket with an accessid,secretket , but I'm having some problem
connecting
Post by Jack Ingoldsby
I get the following error message when running a query
AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS
Request
Post by Abhishek Girish
Post by Jack Ingoldsby
ID: 439EE2E823001E80, AWS Error Code: null, AWS Error Message: Bad
Request
Post by Jack Ingoldsby
[Error Id: 9da0c6bd-b173-48e0-aeac-47179812e696 on
LAP-NY-CHENO.corp.sisense.com:31010]
It appears to be a connection issue but i can connect to the bucket
sisense.citibike using AWS command line utility, using the same
accesskey,
Post by Jack Ingoldsby
secretkey
Does anything leap out ?
The configuration is set to
{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "ID",
"fs.s3a.secret.key": "SECRET"
},
Core-site.xml is set to
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>AKIAJELPGZYEPGRP6VBA</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>h3CyqC/VzpRirOMi3nCImYJL2oNV1xwOcEBiYi02</value>
</property>
</configuration>
Thanks,
Jack
On Mon, Jun 12, 2017 at 10:43 AM, Andries Engelbrecht <
Post by Andries Engelbrecht
You may be better of downloading the NYC bike data set locally and
convert
Post by Andries Engelbrecht
to parquet.
Converting from csv.zip to parquet will result in large
improvements
Post by Jack Ingoldsby
in
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
performance if you do various queries on the data set.
--Andries
Drill connects to to S3 buckets (AWS) via the S3a library. And
the
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
storage
plugin configuration requires the access & secret keys [1].
I'm not sure if Drill can access S3 without the credentials. It
might
Post by Jack Ingoldsby
Post by Andries Engelbrecht
be
possible via custom authenticators [2]. Hopefully others who
have
Post by Jack Ingoldsby
Post by Abhishek Girish
Post by Jack Ingoldsby
tried
Post by Andries Engelbrecht
this will comment.
[1] https://drill.apache.org/docs/s3-storage-plugin/
[2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
v4-authenticating-requests.html
On Wed, Jun 7, 2017 at 3:02 PM, Jack Ingoldsby <
Post by Jack Ingoldsby
Hi,
I'm trying to access the NYC Citibike S3 bucket, which seems
to
Post by Jack Ingoldsby
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
publicly
Post by Jack Ingoldsby
available
https://s3.amazonaws.com/tripdata/index.html
If I leave the Access Key & Secret Key empty, I get the
following
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
message
Post by Jack Ingoldsby
0: jdbc:drill:zk=local> !tables
Error: Failure getting metadata: Unable to load AWS
credentials
Post by Jack Ingoldsby
Post by Abhishek Girish
Post by Jack Ingoldsby
from
Post by Andries Engelbrecht
any
Post by Jack Ingoldsby
provider in the chain (state=,code=0)
If I try entering random numbers as keys, I get the following
message
Post by Andries Engelbrecht
Post by Jack Ingoldsby
Error: Failure getting metadata: Status Code: 403, AWS
Amazon S3,
InvalidAccessKeyId, AWS
Post by Jack Ingoldsby
Error Message: The AWS Access Key Id you provided does not
exist
Post by Abhishek Girish
in
Post by Jack Ingoldsby
Post by Andries Engelbrecht
our
Post by Jack Ingoldsby
records. (state=,code=0)
Is it possible to connect to a data source that does not seem
to
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
require a
Post by Jack Ingoldsby
key?
Thanks,
Jack
I
Jack Ingoldsby
2017-06-12 18:25:55 UTC
Permalink
Raw Message
Thanks, but unfortunately that didn't work either....
{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "AKIAJELPGZYEPGRP6VBA",
"fs.s3a.secret.key": "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",
"fs.s3a.endpoint": "s3-us-east-2.amazonaws.com"
},
Post by Abhishek Girish
That's good to know. I just didn't want Drill community to be the place
your keys were leaked :)
I attempted with your keys and could reproduce the issue. One guess is that
it could be due to location constraints [1].
You can attempt to set the "fs.s3a.endpoint" property in S3 config and give
{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "AKIAJELPGZYEPGRP6VBA",
"fs.s3a.secret.key": "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",
"fs.s3a.endpoint": "s3-us-west-2.amazonaws.com" // Pointing to the
region of the bucket
}
...
...
}
[1] http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
Post by Jack Ingoldsby
Well, these are for a specific user I created for this bucket. The user
only has read access to this bucket, which only contains this public
citibike data and has no permissions access.
So, I'm fine if anyone can connect (at least until I figure out the problem)
Post by Abhishek Girish
I hope you haven't shared your actual access / secret keys with the
community. If not, please work on securing your account [1]!
[1] https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/
On Mon, Jun 12, 2017 at 8:34 AM, Jack Ingoldsby <
Post by Jack Ingoldsby
Hi,
Thanks. I'm actually more playing around with a proof of concept
that I
Post by Jack Ingoldsby
Post by Abhishek Girish
can
Post by Jack Ingoldsby
query S3 using our tool via Drill.
So, what I did was to download the citibike and data and create my
own
Post by Jack Ingoldsby
s3
Post by Abhishek Girish
Post by Jack Ingoldsby
bucket with an accessid,secretket , but I'm having some problem
connecting
Post by Jack Ingoldsby
I get the following error message when running a query
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM
AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS
Request
Post by Abhishek Girish
Post by Jack Ingoldsby
ID: 439EE2E823001E80, AWS Error Code: null, AWS Error Message: Bad
Request
Post by Jack Ingoldsby
[Error Id: 9da0c6bd-b173-48e0-aeac-47179812e696 on
LAP-NY-CHENO.corp.sisense.com:31010]
It appears to be a connection issue but i can connect to the bucket
sisense.citibike using AWS command line utility, using the same
accesskey,
Post by Jack Ingoldsby
secretkey
Does anything leap out ?
The configuration is set to
{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "ID",
"fs.s3a.secret.key": "SECRET"
},
Core-site.xml is set to
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>AKIAJELPGZYEPGRP6VBA</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>h3CyqC/VzpRirOMi3nCImYJL2oNV1xwOcEBiYi02</value>
</property>
</configuration>
Thanks,
Jack
On Mon, Jun 12, 2017 at 10:43 AM, Andries Engelbrecht <
Post by Andries Engelbrecht
You may be better of downloading the NYC bike data set locally and
convert
Post by Andries Engelbrecht
to parquet.
Converting from csv.zip to parquet will result in large
improvements
Post by Jack Ingoldsby
in
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
performance if you do various queries on the data set.
--Andries
Drill connects to to S3 buckets (AWS) via the S3a library. And
the
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
storage
plugin configuration requires the access & secret keys [1].
I'm not sure if Drill can access S3 without the credentials. It
might
Post by Jack Ingoldsby
Post by Andries Engelbrecht
be
possible via custom authenticators [2]. Hopefully others who
have
Post by Jack Ingoldsby
Post by Abhishek Girish
Post by Jack Ingoldsby
tried
Post by Andries Engelbrecht
this will comment.
[1] https://drill.apache.org/docs/s3-storage-plugin/
[2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
v4-authenticating-requests.html
On Wed, Jun 7, 2017 at 3:02 PM, Jack Ingoldsby <
Post by Jack Ingoldsby
Hi,
I'm trying to access the NYC Citibike S3 bucket, which seems
to
Post by Jack Ingoldsby
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
publicly
Post by Jack Ingoldsby
available
https://s3.amazonaws.com/tripdata/index.html
If I leave the Access Key & Secret Key empty, I get the
following
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
message
Post by Jack Ingoldsby
0: jdbc:drill:zk=local> !tables
Error: Failure getting metadata: Unable to load AWS
credentials
Post by Jack Ingoldsby
Post by Abhishek Girish
Post by Jack Ingoldsby
from
Post by Andries Engelbrecht
any
Post by Jack Ingoldsby
provider in the chain (state=,code=0)
If I try entering random numbers as keys, I get the following
message
Post by Andries Engelbrecht
Post by Jack Ingoldsby
Error: Failure getting metadata: Status Code: 403, AWS
Amazon S3,
InvalidAccessKeyId, AWS
Post by Jack Ingoldsby
Error Message: The AWS Access Key Id you provided does not
exist
Post by Abhishek Girish
in
Post by Jack Ingoldsby
Post by Andries Engelbrecht
our
Post by Jack Ingoldsby
records. (state=,code=0)
Is it possible to connect to a data source that does not seem
to
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
require a
Post by Jack Ingoldsby
key?
Thanks,
Jack
Jack Ingoldsby
2017-06-14 13:12:07 UTC
Permalink
Raw Message
I think I will add this as a second thread, as I think it's kind of moved
on from my original question.
It sounds like one cannot connect without a key even if the bucket is
public, which is good to know.
Thx
Jack
Post by Jack Ingoldsby
Thanks, but unfortunately that didn't work either....
{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "AKIAJELPGZYEPGRP6VBA",
"fs.s3a.secret.key": "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",
"fs.s3a.endpoint": "s3-us-east-2.amazonaws.com"
},
Post by Abhishek Girish
That's good to know. I just didn't want Drill community to be the place
your keys were leaked :)
I attempted with your keys and could reproduce the issue. One guess is that
it could be due to location constraints [1].
You can attempt to set the "fs.s3a.endpoint" property in S3 config and give
{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "AKIAJELPGZYEPGRP6VBA",
"fs.s3a.secret.key": "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",
"fs.s3a.endpoint": "s3-us-west-2.amazonaws.com" // Pointing to the
region of the bucket
}
...
...
}
[1] http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
Post by Jack Ingoldsby
Well, these are for a specific user I created for this bucket. The user
only has read access to this bucket, which only contains this public
citibike data and has no permissions access.
So, I'm fine if anyone can connect (at least until I figure out the problem)
Post by Abhishek Girish
I hope you haven't shared your actual access / secret keys with the
community. If not, please work on securing your account [1]!
[1] https://aws.amazon.com/blogs/security/wheres-my-secret-acces
s-key/
Post by Jack Ingoldsby
Post by Abhishek Girish
On Mon, Jun 12, 2017 at 8:34 AM, Jack Ingoldsby <
Post by Jack Ingoldsby
Hi,
Thanks. I'm actually more playing around with a proof of concept
that I
Post by Jack Ingoldsby
Post by Abhishek Girish
can
Post by Jack Ingoldsby
query S3 using our tool via Drill.
So, what I did was to download the citibike and data and create my
own
Post by Jack Ingoldsby
s3
Post by Abhishek Girish
Post by Jack Ingoldsby
bucket with an accessid,secretket , but I'm having some problem
connecting
Post by Jack Ingoldsby
I get the following error message when running a query
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM
AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS
Request
Post by Abhishek Girish
Post by Jack Ingoldsby
ID: 439EE2E823001E80, AWS Error Code: null, AWS Error Message: Bad
Request
Post by Jack Ingoldsby
[Error Id: 9da0c6bd-b173-48e0-aeac-47179812e696 on
LAP-NY-CHENO.corp.sisense.com:31010]
It appears to be a connection issue but i can connect to the bucket
sisense.citibike using AWS command line utility, using the same
accesskey,
Post by Jack Ingoldsby
secretkey
Does anything leap out ?
The configuration is set to
{
"type": "file",
"enabled": true,
"connection": "s3a://sisense.citibike",
"config": {
"fs.s3a.access.key": "ID",
"fs.s3a.secret.key": "SECRET"
},
Core-site.xml is set to
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>AKIAJELPGZYEPGRP6VBA</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>h3CyqC/VzpRirOMi3nCImYJL2oNV1xwOcEBiYi02</value>
</property>
</configuration>
Thanks,
Jack
On Mon, Jun 12, 2017 at 10:43 AM, Andries Engelbrecht <
Post by Andries Engelbrecht
You may be better of downloading the NYC bike data set locally and
convert
Post by Andries Engelbrecht
to parquet.
Converting from csv.zip to parquet will result in large
improvements
Post by Jack Ingoldsby
in
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
performance if you do various queries on the data set.
--Andries
Drill connects to to S3 buckets (AWS) via the S3a library. And
the
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
storage
plugin configuration requires the access & secret keys [1].
I'm not sure if Drill can access S3 without the credentials.
It
Post by Jack Ingoldsby
Post by Abhishek Girish
might
Post by Jack Ingoldsby
Post by Andries Engelbrecht
be
possible via custom authenticators [2]. Hopefully others who
have
Post by Jack Ingoldsby
Post by Abhishek Girish
Post by Jack Ingoldsby
tried
Post by Andries Engelbrecht
this will comment.
[1] https://drill.apache.org/docs/s3-storage-plugin/
[2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
v4-authenticating-requests.html
On Wed, Jun 7, 2017 at 3:02 PM, Jack Ingoldsby <
Post by Jack Ingoldsby
Hi,
I'm trying to access the NYC Citibike S3 bucket, which
seems to
Post by Jack Ingoldsby
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
publicly
Post by Jack Ingoldsby
available
https://s3.amazonaws.com/tripdata/index.html
If I leave the Access Key & Secret Key empty, I get the
following
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
message
Post by Jack Ingoldsby
0: jdbc:drill:zk=local> !tables
Error: Failure getting metadata: Unable to load AWS
credentials
Post by Jack Ingoldsby
Post by Abhishek Girish
Post by Jack Ingoldsby
from
Post by Andries Engelbrecht
any
Post by Jack Ingoldsby
provider in the chain (state=,code=0)
If I try entering random numbers as keys, I get the
following
Post by Jack Ingoldsby
Post by Abhishek Girish
Post by Jack Ingoldsby
message
Post by Andries Engelbrecht
Post by Jack Ingoldsby
Error: Failure getting metadata: Status Code: 403, AWS
Amazon S3,
InvalidAccessKeyId, AWS
Post by Jack Ingoldsby
Error Message: The AWS Access Key Id you provided does not
exist
Post by Abhishek Girish
in
Post by Jack Ingoldsby
Post by Andries Engelbrecht
our
Post by Jack Ingoldsby
records. (state=,code=0)
Is it possible to connect to a data source that does not
seem
Post by Jack Ingoldsby
to
Post by Abhishek Girish
Post by Jack Ingoldsby
Post by Andries Engelbrecht
require a
Post by Jack Ingoldsby
key?
Thanks,
Jack
Loading...