Discussion:
Is it possible to evaluate different physical plans for a given query with Apache Drill?
Felipe Gutierrez
2018-12-07 11:06:57 UTC
Permalink
Hi,

I watched this video about Apache Drill (
https://www.youtube.com/watch?time_continue=14&v=0rurIzOkTIg) which says I
can install DrillBit on the nodes of my cluster and the Drill engine will
evaluate the best physical plan to execute a query. Then I can run explain
plan for a query (https://drill.apache.org/docs/query-plans/) and I will
see where drill decided to data locality processing in-memory or not and
other cost decisions. This is another reference that I was reading (Apache
Drill vs Spark
<https://stackoverflow.com/questions/29790655/apache-drill-vs-spark>). I
also see that Drill has a plugin for filesystems. So I image that I can
install Drill on 3 computers and query log files on them.

I want to evaluate physical plans of a given query. I wonder If it is
possible to install Drill on Raspberry Pi's that have a variety of
connections (wired, wireless, radio, ...) and evaluate different physical
plans of my query. Is it possible?

Thanks, Felipe
*--*
*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*
Ted Dunning
2018-12-07 11:35:46 UTC
Permalink
If your data is separated from drill by a high latency / high cost link
then it's probably better to move the data closer to drill before starting
the query. The rationale behind this is that when certain costs absolutely
dominate then it's really better to optimize the overall process
essentially by hand.

But to your question, drill can run on a raspberry pi, but it probably just
isn't a great idea to run a distributed version across mini pi's. To run on
a pi you may need to tune memory configurations pretty carefully.
Post by Felipe Gutierrez
Hi,
I watched this video about Apache Drill (
https://www.youtube.com/watch?time_continue=14&v=0rurIzOkTIg) which says I
can install DrillBit on the nodes of my cluster and the Drill engine will
evaluate the best physical plan to execute a query. Then I can run explain
plan for a query (https://drill.apache.org/docs/query-plans/) and I will
see where drill decided to data locality processing in-memory or not and
other cost decisions. This is another reference that I was reading (Apache
Drill vs Spark
<https://stackoverflow.com/questions/29790655/apache-drill-vs-spark>). I
also see that Drill has a plugin for filesystems. So I image that I can
install Drill on 3 computers and query log files on them.
I want to evaluate physical plans of a given query. I wonder If it is
possible to install Drill on Raspberry Pi's that have a variety of
connections (wired, wireless, radio, ...) and evaluate different physical
plans of my query. Is it possible?
Thanks, Felipe
*--*
*-- Felipe Gutierrez*
*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*
Loading...