Not really :)
You get into the problem of having to deal with cache management. Once you start using memory to serve a cache for holding a table in-memory, you are sacrificing the memory resource for doing the actual computation. Also, Drill actually tries to work with Direct Memory and not heap. To work around this, you would then have to introduce a swapping policy, so as to reclaim the memory.
If you were to use Heap for storing the table in memory, then Drill will need to copy the data into DirectMemory to do useful work. So now you have about 2x the memory being used for the data!
If you are using HDFS (or MapR-FS), these filesystems themselves implement a cache management, so we are already leveraging (to a limited extent) the benefits of an in-memory cache.
From: Michael Shtelma <***@gmail.com>
Sent: Wednesday, May 10, 2017 9:44:50 AM
Subject: Re: In-memory cache in Drill
yes, for sure this is also the viable approach... but it would be far
better to be able to have the data also in memory..
Does it make sense to have something like an in-memory storage plugin?
In this case it can be also used as a storage for the temporary
Post by Kunal Khatua
Drill does not cache data in memory because it introduces the risk of dealing with stale data when working with data at a large scale.
If you want to avoid hitting the actual storage repeatedly, one option is to use the 'create temp table ' feature (https://drill.apache.org/docs/create-temporary-table-as-cttas/). This allows you to land the data to a local (or distributed) F, and use that data storage instead. These tables are alive only for the lifetime of the session (connection your client/SQLLine) makes to the Drill cluster.
There is a second benefit of using this approach. You can translate the original data source into a format that is highly suitable to what you are doing with the data. For e.g., you could pull in data from an RDBMS or a JSON store and write the temp table in parquet for performing analytics on.
Sent: Wednesday, May 10, 2017 9:16:30 AM
Subject: In-memory cache in Drill
Are there any way to cache the data that was loaded from the actual
storage plugin in Drill?
As far as I understand, when the query is executed, the data is first
loaded from the storage plugin and handled by the format plugin. After
that, the data is stored using internal vectorized representation and
the query is executed. Is it correct? I am wondering, if there is a
way to store somewhere these data vectors, so that they do not have to
be loaded from the actual storage for each query? Spark does something
like that, by storing data frames in off heap storage.