Discussion:
Segregating Foreman from leaf-worker fleet
Lokendra Singh Panwar
2018-11-13 23:37:21 UTC
Permalink
Hi,

Is it possible to configure Drill such that the Foreman and leaf-worker
fleets are separate fleets of nodes?
Or if this needs changing the source of Drill, any pointers are appreciated
too.

Thanks,
Lokendra
Timothy Farkas
2018-11-14 00:17:25 UTC
Permalink
Hi Lokendra,

All Drillbits can function as a foreman if a query is sent to them, and all
drillbits are considered worker nodes. This ingrained deeply into the
design of Drill and it was done with the intention of making Drill
symmetric. Symmetric here means that each Drillbit is identical to all the
others. Making this change would be a significant design change.

Why are you interested in running Drill in this way? Do you have a specific
use case in mind?

Thanks,
Tim
Post by Lokendra Singh Panwar
Hi,
Is it possible to configure Drill such that the Foreman and leaf-worker
fleets are separate fleets of nodes?
Or if this needs changing the source of Drill, any pointers are appreciated
too.
Thanks,
Lokendra
Lokendra Singh Panwar
2018-11-14 01:31:57 UTC
Permalink
Hi Tim,

Thanks for the reply.

My usecase is following:

- My main DB table is huge so it is sharded amongs multiple
storage-nodes.
- Each stroage-node is storing the assigned shard in a local relational
db engine.

I was planning to use Drill as a distributed query engine that can
scatter-gather data from these storage-nodes.

So, my overall plan for such architecture, as per my limited understanding
of Drill so far, is:

- Have a DrilBit instance run on each storage-node, and this fleet will
act as a leaf-worker fleet.
- (I will write a Storage Plugin to transform data from my local
relational DB engine to Drill record fromat)
- Maintain another fleet that will serve as Foreman and Intermeidate
query workers, still part of the same Drill cluster.
- The reason I intended to have the leaf-query fleet (storage-nodes)
segregated from Foreman/Intermediate workers (working on major fragments
is):
- storage-nodes (acting as leaf-workers) are premium commodity in
my cluster, involved in data ingestion as well as query traffic
servers as
leaf-worker.
- So, I do not intend to overload them further with intermediate
query fragment processing and aggregation that Foreman and Intermeidate
pool of workers are involved in.

Does the above make sense?

Thanks,
Lokendra
Post by Timothy Farkas
Hi Lokendra,
All Drillbits can function as a foreman if a query is sent to them, and all
drillbits are considered worker nodes. This ingrained deeply into the
design of Drill and it was done with the intention of making Drill
symmetric. Symmetric here means that each Drillbit is identical to all the
others. Making this change would be a significant design change.
Why are you interested in running Drill in this way? Do you have a specific
use case in mind?
Thanks,
Tim
On Tue, Nov 13, 2018 at 3:37 PM Lokendra Singh Panwar <
Post by Lokendra Singh Panwar
Hi,
Is it possible to configure Drill such that the Foreman and leaf-worker
fleets are separate fleets of nodes?
Or if this needs changing the source of Drill, any pointers are
appreciated
Post by Lokendra Singh Panwar
too.
Thanks,
Lokendra
Paul Rogers
2018-11-14 06:26:51 UTC
Permalink
Hi Lokendra,

Your usecase is a typical old school sharded DB app. The design itself is fine. However, as Tim noted, Drill is not designed for this case. Still, perhaps Drill could be extended.


As Tim suggested, Drill assumes any Drillbit can operate in any role. So, in your setup, you would run Drillbits on all your shard storage-nodes. Drill would schedule reads (more on this shortly) on those nodes. Then, Drill would do shuffles to other nodes to perform query operations.

In this model, one of your nodes would act as Foreman for a user. ZooKeeper (ZK) tracks all nodes, each user randomly chooses a Drillbit to act as Foreman, which means Forman load is shared across all your Drillbits.

Suppose you wanted to change this. You'd have to extend the way that Drillbits register themselves in ZK. A Drillbit, when it starts, would be assigned one or more roles which it would advertise in ZK. The distribution mechanisms in the Planner would have to be aware of scan-only nodes, compute-only nodes, and Foreman-only nodes.

Unless you plan to put heavy load on your scan nodes, it is not clear what benefit you'd gain from forcing Drill into a particular distribution model.

Perhaps you can start by running Drill on just your storage nodes, then noting performance.

One final point. Drill today knows to use HDFS to work out data locality for scans. You'd need to modify this to plug in your own data distribution mechanism so that Drill knows which shards to scan on which nodes. I don't believe Drill has a plugin-API for this, but I could be wrong. If not, this would be a great opportunity to define such an API.

Such an API might be helpful for other storage plugins such as Kafka so that scans are done on nodes with data.

Thanks,
- Paul



On Tuesday, November 13, 2018, 5:32:32 PM PST, Lokendra Singh Panwar <***@gmail.com> wrote:

Hi Tim,

Thanks for the reply.

My usecase is following:

  -  My main DB table is huge so it is sharded amongs multiple
  storage-nodes.
  -  Each stroage-node is storing the assigned shard in a local relational
  db engine.

I was planning to use Drill as a distributed query engine that can
scatter-gather data from these storage-nodes.

So, my overall plan for such architecture, as per my limited understanding
of Drill so far, is:

  - Have a DrilBit instance run on each storage-node, and this fleet will
  act as a leaf-worker fleet.
  - (I will write a Storage Plugin to transform data from my local
      relational DB engine to Drill record fromat)
  - Maintain another fleet that will serve as Foreman and Intermeidate
  query workers, still part of the same Drill cluster.
  -  The reason I intended to have the leaf-query fleet (storage-nodes)
  segregated from Foreman/Intermediate workers (working on major fragments
  is):
      -    storage-nodes (acting as leaf-workers) are premium commodity in
      my cluster, involved in data ingestion as well as query traffic
servers as
      leaf-worker.
      -    So, I do not intend to overload them further with intermediate
      query fragment processing and aggregation that Foreman and Intermeidate
      pool of workers are involved in.

Does the above make sense?

Thanks,
Lokendra
Post by Timothy Farkas
Hi Lokendra,
All Drillbits can function as a foreman if a query is sent to them, and all
drillbits are considered worker nodes. This ingrained deeply into the
design of Drill and it was done with the intention of making Drill
symmetric. Symmetric here means that each Drillbit is identical to all the
others. Making this change would be a significant design change.
Why are you interested in running Drill in this way? Do you have a specific
use case in mind?
Thanks,
Tim
On Tue, Nov 13, 2018 at 3:37 PM Lokendra Singh Panwar <
Post by Lokendra Singh Panwar
Hi,
Is it possible to configure Drill such that the Foreman and leaf-worker
fleets are separate fleets of nodes?
Or if this needs changing the source of Drill, any pointers are
appreciated
Post by Lokendra Singh Panwar
too.
Thanks,
Lokendra
Loading...