Discussion:
Writing to s3 using Drill
(too old to reply)
Shuporno Choudhury
2017-05-22 10:27:24 UTC
Permalink
Raw Message
Hi,

Is it possible to write to a folder in an s3 bucket using the *s3.tmp*
workspace?
Whenever I try, it gives me the follwing error:

*Error: VALIDATION ERROR: Schema [s3.tmp] is not valid with respect to
either root schema or current default schema.*
*Current default schema: s3.root*

Also, s3.tmp doesn't appear while using the command "*show schemas*" though
the tmp workspace exists in the web console

I am using Drill Version 1.10; embedded mode on my local system.

However, I have no problem reading from an s3 bucket, the problem is only
writing to a s3 bucket.
--
Regards,
Shuporno Choudhury
Gautam Parai
2017-05-22 20:00:50 UTC
Permalink
Raw Message
Hi Shuporno,


Could you please specify the configuration from S3 storage plugin here and the output of `show schemas` as it pertains to s3? Is `writable` set to true?


Gautam

________________________________
From: Shuporno Choudhury <***@manthan.com>
Sent: Monday, May 22, 2017 3:27:24 AM
To: ***@drill.apache.org
Subject: Writing to s3 using Drill

Hi,

Is it possible to write to a folder in an s3 bucket using the *s3.tmp*
workspace?
Whenever I try, it gives me the follwing error:

*Error: VALIDATION ERROR: Schema [s3.tmp] is not valid with respect to
either root schema or current default schema.*
*Current default schema: s3.root*

Also, s3.tmp doesn't appear while using the command "*show schemas*" though
the tmp workspace exists in the web console

I am using Drill Version 1.10; embedded mode on my local system.

However, I have no problem reading from an s3 bucket, the problem is only
writing to a s3 bucket.
--
Regards,
Shuporno Choudhury
Sorabh Hamirwasia
2017-05-22 19:55:44 UTC
Permalink
Raw Message
Hi Shuporno,

Can you please share your S3 plugin configuration ? Looks like in your configuration you might be missing something like below:


"tmp": {
"location": "drill-tmp",
"writable": true,
"defaultInputFormat": null

}


Thanks,
Sorabh


________________________________
From: Shuporno Choudhury <***@manthan.com>
Sent: Monday, May 22, 2017 3:27 AM
To: ***@drill.apache.org
Subject: Writing to s3 using Drill

Hi,

Is it possible to write to a folder in an s3 bucket using the *s3.tmp*
workspace?
Whenever I try, it gives me the follwing error:

*Error: VALIDATION ERROR: Schema [s3.tmp] is not valid with respect to
either root schema or current default schema.*
*Current default schema: s3.root*

Also, s3.tmp doesn't appear while using the command "*show schemas*" though
the tmp workspace exists in the web console

I am using Drill Version 1.10; embedded mode on my local system.

However, I have no problem reading from an s3 bucket, the problem is only
writing to a s3 bucket.
--
Regards,
Shuporno Choudhury
Shuporno Choudhury
2017-05-26 04:59:51 UTC
Permalink
Raw Message
Hi,
Can someone at Drill help me with issue please?

On Thu, May 25, 2017 at 1:33 PM, Shuporno Choudhury <
HI,
I corrected the "show schemas" output by putting only "/" in the
"location" . Now it shows s3.tmp in the output.
But, it has a weird problem.
The moment I add a folder to the location, eg: "/myfolder", then s3.tmp
vanishes from the "show schemas" output.
org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(
Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native
Method)+--+
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(
NativeIO.java:609)
This is only a snippet of the error associated with writing to s3
On Thu, May 25, 2017 at 12:41 PM, Shuporno Choudhury <
{
"type": "file",
"enabled": true,
"connection": "s3a://abcd",
"config": {
"fs.s3a.access.key": "abcd",
"fs.s3a.secret.key": "abcd"
},
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null
},
"tmp": {
"location": "/",
"writable": *true*,
"defaultInputFormat": "parquet"
}
}
I have removed the info about the formats to keep the mail small.
Also, I am using Dill on *Windows 10*
On Mon, May 22, 2017 at 3:57 PM, Shuporno Choudhury <
Post by Shuporno Choudhury
Hi,
Is it possible to write to a folder in an s3 bucket using the *s3.tmp*
workspace?
*Error: VALIDATION ERROR: Schema [s3.tmp] is not valid with respect to
either root schema or current default schema.*
*Current default schema: s3.root*
Also, s3.tmp doesn't appear while using the command "*show schemas*"
though the tmp workspace exists in the web console
I am using Drill Version 1.10; embedded mode on my local system.
However, I have no problem reading from an s3 bucket, the problem is
only writing to a s3 bucket.
--
Regards,
Shuporno Choudhury
--
Regards,
Shuporno Choudhury
--
Regards,
Shuporno Choudhury
--
Regards,
Shuporno Choudhury
Shuporno Choudhury
2017-05-26 12:08:52 UTC
Permalink
Raw Message
Hi Nitin,

The "tmp" object exists inside the s3 bucket. Even though, it throws the
same error:

Error: SYSTEM ERROR: IllegalArgumentException: URI has an authority
component
Fragment 0:0
you have to create a tmp object in your bucket to make it work.
s3://bucket_name/tmp has to be created and then it should work
On Fri, May 26, 2017 at 5:02 PM, Shuporno Choudhury <
Hi Nitin,
Thanks for the config settings.
Now, after entering those config settings
1. s3.tmp does appear in the "show schemas" result
2. Also, it doesn't disappear when I add a custom folder in the
location attribute
*Error: SYSTEM ERROR: IllegalArgumentException: URI has an authority
component*
*Fragment 0:0*
*create table s3.tmp.`abcd` as select 1 from (values(1));*
However, this query runs when I use dfs.tmp instead of s3.tmp
Can you try with following s3 config
{
"type": "file",
"enabled": true,
"connection": "s3a://bucket_name",
"config": {
"fs.s3a.connection.maximum": "10000",
"fs.s3a.access.key": "access_key",
"fs.s3a.secret.key": "secret_key",
"fs.s3a.buffer.dir": "/tmp",
"fs.s3a.multipart.size": "10485760",
"fs.s3a.multipart.threshold": "104857600"
},
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"extractHeader": true,
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json",
"extensions": [
"json"
]
},
"avro": {
"type": "avro"
},
"sequencefile": {
"type": "sequencefile",
"extensions": [
"seq"
]
},
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
}
}
}
On Fri, May 26, 2017 at 10:29 AM, Shuporno Choudhury <
Post by Shuporno Choudhury
Hi,
Can someone at Drill help me with issue please?
On Thu, May 25, 2017 at 1:33 PM, Shuporno Choudhury <
HI,
I corrected the "show schemas" output by putting only "/" in the
"location" . Now it shows s3.tmp in the output.
But, it has a weird problem.
The moment I add a folder to the location, eg: "/myfolder", then
s3.tmp
Post by Shuporno Choudhury
vanishes from the "show schemas" output.
Exception in thread "drill-executor-9" java.lang.
org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(
Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(
Native
Post by Shuporno Choudhury
Method)+--+
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(
NativeIO.java:609)
This is only a snippet of the error associated with writing to s3
On Thu, May 25, 2017 at 12:41 PM, Shuporno Choudhury <
{
"type": "file",
"enabled": true,
"connection": "s3a://abcd",
"config": {
"fs.s3a.access.key": "abcd",
"fs.s3a.secret.key": "abcd"
},
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null
},
"tmp": {
"location": "/",
"writable": *true*,
"defaultInputFormat": "parquet"
}
}
I have removed the info about the formats to keep the mail small.
Also, I am using Dill on *Windows 10*
On Mon, May 22, 2017 at 3:57 PM, Shuporno Choudhury <
Post by Shuporno Choudhury
Hi,
Is it possible to write to a folder in an s3 bucket using the
*s3.tmp*
Post by Shuporno Choudhury
Post by Shuporno Choudhury
workspace?
*Error: VALIDATION ERROR: Schema [s3.tmp] is not valid with
respect
to
Post by Shuporno Choudhury
Post by Shuporno Choudhury
either root schema or current default schema.*
*Current default schema: s3.root*
Also, s3.tmp doesn't appear while using the command "*show
schemas*"
Post by Shuporno Choudhury
Post by Shuporno Choudhury
though the tmp workspace exists in the web console
I am using Drill Version 1.10; embedded mode on my local system.
However, I have no problem reading from an s3 bucket, the problem
is
Post by Shuporno Choudhury
Post by Shuporno Choudhury
only writing to a s3 bucket.
--
Regards,
Shuporno Choudhury
--
Regards,
Shuporno Choudhury
--
Regards,
Shuporno Choudhury
--
Regards,
Shuporno Choudhury
--
Nitin Pawar
--
Regards,
Shuporno Choudhury
--
Nitin Pawar
--
Regards,
Shuporno Choudhury
Abhishek Girish
2017-05-24 19:57:30 UTC
Permalink
Raw Message
Sorry, I was wrong - please ignore my previous message. Looks like we do
support writing to S3, but there were small differences necessary to make
this work:

First, I had to prefix the CTAS table name with the S3 plugin name. And
second, I had to either update the s3 storage plugin configuration to
include the default workspace and set writable to true, or create a
workspace with a path and set the writable option to true.

Example:

create table s3.abc.a_ctas as select * from s3.a

"abc": {
"location": "/a",
"writable": true,
"defaultInputFormat": null
}

OR

create table s3.a_ctas as select * from s3.a

"default": {
"location": "/",
"writable": true,
"defaultInputFormat": null
}
I don't think we support writing to Object stores such as S3. We do
support reading from S3 buckets via the S3a library. However, we have
limited support with the plugin. You could file a enhancement request on
JIRA [1].
If someone has any experience with it, they can share details on the JIRA, or
work on it. You are welcome to contribute yourself.
[1] https://issues.apache.org/jira/browse/DRILL
On Mon, May 22, 2017 at 3:27 AM, Shuporno Choudhury <
Post by Shuporno Choudhury
Hi,
Is it possible to write to a folder in an s3 bucket using the *s3.tmp*
workspace?
*Error: VALIDATION ERROR: Schema [s3.tmp] is not valid with respect to
either root schema or current default schema.*
*Current default schema: s3.root*
Also, s3.tmp doesn't appear while using the command "*show schemas*" though
the tmp workspace exists in the web console
I am using Drill Version 1.10; embedded mode on my local system.
However, I have no problem reading from an s3 bucket, the problem is only
writing to a s3 bucket.
--
Regards,
Shuporno Choudhury
Gautam Parai
2017-05-26 07:10:59 UTC
Permalink
Raw Message
Hi Shuporno,


Did you try following the suggestions from Abhishek? Please let us know your observations. Also, please share the CTAS command you are using to write to s3.


Gautam


________________________________
From: Shuporno Choudhury <***@manthan.com>
Sent: Thursday, May 25, 2017 12:11:40 AM
To: ***@drill.apache.org
Subject: Re: Writing to s3 using Drill

My s3 plugin info is as follows:

{
"type": "file",
"enabled": true,
"connection": "s3a://abcd",
"config": {
"fs.s3a.access.key": "abcd",
"fs.s3a.secret.key": "abcd"
},
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null
},
"tmp": {
"location": "/",
"writable": *true*,
"defaultInputFormat": "parquet"
}
}


I have removed the info about the formats to keep the mail small.
Also, I am using Dill on *Windows 10*

On Mon, May 22, 2017 at 3:57 PM, Shuporno Choudhury <
Post by Shuporno Choudhury
Hi,
Is it possible to write to a folder in an s3 bucket using the *s3.tmp*
workspace?
*Error: VALIDATION ERROR: Schema [s3.tmp] is not valid with respect to
either root schema or current default schema.*
*Current default schema: s3.root*
Also, s3.tmp doesn't appear while using the command "*show schemas*"
though the tmp workspace exists in the web console
I am using Drill Version 1.10; embedded mode on my local system.
However, I have no problem reading from an s3 bucket, the problem is only
writing to a s3 bucket.
--
Regards,
Shuporno Choudhury
--
Regards,
Shuporno Choudhury

Loading...