Vitalii Diravka
2018-09-15 17:31:02 UTC
Hi James,
This is the mail for user mailing list.
There is no attachment, please upload it to Google Drive, for instance, and
give us the link.
Did you try to use Drill SqlLine?
Kind regards
Vitalii
This is the mail for user mailing list.
There is no attachment, please upload it to Google Drive, for instance, and
give us the link.
Did you try to use Drill SqlLine?
Kind regards
Vitalii
Hey,
I've had pretty great success using drill on top of S3 but I'm hitting one
big issue: a "long running" query (more than 4.5 minutes) will succeed
""'. See attachment.
Running Drill 1.14 on Amazon Linux. Only modification I made is this
export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS
-Dcom.amazonaws.services.s3.enableV4"
select distinct(column_name) from s3.`/path/to/files/year/month/day/hour/`
All the files are well-formed parquet files and querying any single file
returns fine in a few seconds. When I scale the cluster up to 50+ nodes,
the query obviously returns much faster and no time out occurs. However,
more complicated/higher data volume queries (ie, querying a whole days
worth of data instead of one hour) suffer the same timeout.
Are there settings I can tweak to prevent this timeout from occurring? Can
I save the results of the query somewhere since it's succeeding in the
background?
Drill demolishes our current solution with its performance and we really
want to use it but this bug is making it tricky to sell.
Thanks,
James
I've had pretty great success using drill on top of S3 but I'm hitting one
big issue: a "long running" query (more than 4.5 minutes) will succeed
""'. See attachment.
Running Drill 1.14 on Amazon Linux. Only modification I made is this
export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS
-Dcom.amazonaws.services.s3.enableV4"
select distinct(column_name) from s3.`/path/to/files/year/month/day/hour/`
All the files are well-formed parquet files and querying any single file
returns fine in a few seconds. When I scale the cluster up to 50+ nodes,
the query obviously returns much faster and no time out occurs. However,
more complicated/higher data volume queries (ie, querying a whole days
worth of data instead of one hour) suffer the same timeout.
Are there settings I can tweak to prevent this timeout from occurring? Can
I save the results of the query somewhere since it's succeeding in the
background?
Drill demolishes our current solution with its performance and we really
want to use it but this bug is making it tricky to sell.
Thanks,
James