Discussion:
Drill Cluster without HDFS/MapR-FS?
(too old to reply)
Matt
2017-05-08 18:41:56 UTC
Permalink
Raw Message
I have seen some posts in the past about Drill nodes mounted "close to
the data", and am wondering if its possible to use Drill as a cluster
without HDFS?

Using ZK would not be an issue in itself, and there are apparently
options like https://github.com/mhausenblas/dromedar

Any experiences with this?
ankit beohar
2017-05-08 19:25:05 UTC
Permalink
Raw Message
Hey Matt,

Yes we can use Drill in distribute mode or install on a cluster we did that
but for dev purpose in prod environment we had hadoop still you can do that
and steps are pretty much available in
https://drill.apache.org/docs/installing-drill-on-the-cluster/

Best Regards,
ANKIT BEOHAR
I have seen some posts in the past about Drill nodes mounted "close to the
data", and am wondering if its possible to use Drill as a cluster without
HDFS?
Using ZK would not be an issue in itself, and there are apparently options
like https://github.com/mhausenblas/dromedar
Any experiences with this?
Ted Dunning
2017-05-10 03:35:17 UTC
Permalink
Raw Message
Using Drill against any kind of distributed data store is a fine thing. If
data locality matters, then it is nice if Drill can see what data is where.
Regardless, using Drill with out HDFS works great.

I should point out that using Drill with MapR is technically using it
without HDFS, but since MapR FS implements the HDFS API, the distinction is
kind of technical.
I have seen some posts in the past about Drill nodes mounted "close to the
data", and am wondering if its possible to use Drill as a cluster without
HDFS?
Using ZK would not be an issue in itself, and there are apparently options
like https://github.com/mhausenblas/dromedar
Any experiences with this?
Abhishek Girish
2017-05-10 05:41:27 UTC
Permalink
Raw Message
Do you wish to use Drill in distributed mode with each node having it's own
local file system or do you plan to use it with a different data source
which is also a distributed file system (but not HDFS / MapR-FS)?

If the former, yes you should be able to form a Drill cluster by bringing
up Drillbits in standalone mode on multiple disjoint nodes. You will still
need ZooKeeper for cluster coordination. But understand that since each
node can only talk to files on it's local file system, the Drill cluster
will not have a unified view and access of the files for distributed
processing. Your queries may fail, as a Drillbit might fail to access data.
To experiment, you can make sure the directories and files you need to
query are identical on each node. However, this is untested and I'm not
sure if it will indeed work.

If it's the latter, can you share what data source you have in mind?
I have seen some posts in the past about Drill nodes mounted "close to the
data", and am wondering if its possible to use Drill as a cluster without
HDFS?
Using ZK would not be an issue in itself, and there are apparently options
like https://github.com/mhausenblas/dromedar
Any experiences with this?
Rahul Raj
2017-05-10 06:56:28 UTC
Permalink
Raw Message
Any experience of running drill on GlusterFS or similar storage systems?
How much performance loss would incur because of unavailability of data
locality?

Regards,
Rahul
Post by Abhishek Girish
Do you wish to use Drill in distributed mode with each node having it's own
local file system or do you plan to use it with a different data source
which is also a distributed file system (but not HDFS / MapR-FS)?
If the former, yes you should be able to form a Drill cluster by bringing
up Drillbits in standalone mode on multiple disjoint nodes. You will still
need ZooKeeper for cluster coordination. But understand that since each
node can only talk to files on it's local file system, the Drill cluster
will not have a unified view and access of the files for distributed
processing. Your queries may fail, as a Drillbit might fail to access data.
To experiment, you can make sure the directories and files you need to
query are identical on each node. However, this is untested and I'm not
sure if it will indeed work.
If it's the latter, can you share what data source you have in mind?
Post by Matt
I have seen some posts in the past about Drill nodes mounted "close to
the
Post by Matt
data", and am wondering if its possible to use Drill as a cluster without
HDFS?
Using ZK would not be an issue in itself, and there are apparently
options
Post by Matt
like https://github.com/mhausenblas/dromedar
Any experiences with this?
--
**** This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom it is
addressed. If you are not the named addressee then you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately and delete this e-mail from your system.****
Ted Dunning
2017-05-10 08:33:32 UTC
Permalink
Raw Message
I have no such experience.

The performance loss could vary from minor to profound depending on your
query, network and disk setup.
Post by Rahul Raj
Any experience of running drill on GlusterFS or similar storage systems?
How much performance loss would incur because of unavailability of data
locality?
Regards,
Rahul
Post by Abhishek Girish
Do you wish to use Drill in distributed mode with each node having it's
own
Post by Abhishek Girish
local file system or do you plan to use it with a different data source
which is also a distributed file system (but not HDFS / MapR-FS)?
If the former, yes you should be able to form a Drill cluster by bringing
up Drillbits in standalone mode on multiple disjoint nodes. You will
still
Post by Abhishek Girish
need ZooKeeper for cluster coordination. But understand that since each
node can only talk to files on it's local file system, the Drill cluster
will not have a unified view and access of the files for distributed
processing. Your queries may fail, as a Drillbit might fail to access
data.
Post by Abhishek Girish
To experiment, you can make sure the directories and files you need to
query are identical on each node. However, this is untested and I'm not
sure if it will indeed work.
If it's the latter, can you share what data source you have in mind?
Post by Matt
I have seen some posts in the past about Drill nodes mounted "close to
the
Post by Matt
data", and am wondering if its possible to use Drill as a cluster
without
Post by Abhishek Girish
Post by Matt
HDFS?
Using ZK would not be an issue in itself, and there are apparently
options
Post by Matt
like https://github.com/mhausenblas/dromedar
Any experiences with this?
--
**** This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom it is
addressed. If you are not the named addressee then you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately and delete this e-mail from your system.****
Loading...