Discussion:
Reg; Apache Drill
Add Reply
Pritam Tambe
2017-06-06 12:07:17 UTC
Reply
Permalink
Raw Message
Dear Sir,

I want to do Social Media Data analysis for Health Domain using Big Data.

I am confused weather to go for Apache Drill or HIVE.

Please Guide.
--
Thanks & Regards,
Pritam Tambe,
Project Engineer - AAI Group,
Centre for Development of Advanced Computing [C-DAC],

-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------
Bob Rudis
2017-06-07 01:32:05 UTC
Reply
Permalink
Raw Message
You should likely spend some time studying statistics and machine
learning then examine the pluses and minuses of a few "data
science"-oriented programming languages and focus on one that has
idioms that make sense to you. Then you'll see just how inappropriate
your question is.
Post by Pritam Tambe
Dear Sir,
I want to do Social Media Data analysis for Health Domain using Big Data.
I am confused weather to go for Apache Drill or HIVE.
Please Guide.
--
Thanks & Regards,
Pritam Tambe,
Project Engineer - AAI Group,
Centre for Development of Advanced Computing [C-DAC],
-------------------------------------------------------------------------------------------------------------------------------
This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------
Ted Dunning
2017-06-07 07:28:17 UTC
Reply
Permalink
Raw Message
Pritam,

Let me rephrase what Bob has to say. It has some merit, but it also
probably has a bit more sting than it needs to have.

The first question that you need to look at in any kind of textual analysis
project is what kind of data you are likely to have. How will the data be
presented to you? For instance, at two different extremes there are the
twitter API (with a very well specified data format and lots of well coded
meta-data) and there patient notes in raw image form (hand-written data
with no transcriptions and possible very little meta-data). As you can
imagine, the tasks that you need to do on each extreme are very, very
different.

Another key aspect of your data is how big it really is. If you only have
millions of examples, then big data is going to be just a hindrance, not a
help. If you have billions of text examples, then big data may become a
requirement.

Beyond the data source, you need to look at what kind of analysis you need
to do. In particular, it is likely that there will be some sort of
statistical analysis of the data that you are looking at. You might be
looking at some indicators of particular test results that might be found
in social media. Or you might be looking to predict cases of misdiagnosis.
In either case Drill (or Hive) would only be useful for counting up the
cases that have specific features. Finding the features and interpreting
the counts you produce would require other software.

This means that a SQL system like Drill or Hive will have a very minor role
in your analysis. Indeed, many systems that are good for data reduction
(like R or Spark) can do all the counting that Drill or Hive can do.

I hope this helps.
Post by Bob Rudis
You should likely spend some time studying statistics and machine
learning then examine the pluses and minuses of a few "data
science"-oriented programming languages and focus on one that has
idioms that make sense to you. Then you'll see just how inappropriate
your question is.
Post by Pritam Tambe
Dear Sir,
I want to do Social Media Data analysis for Health Domain using Big Data.
I am confused weather to go for Apache Drill or HIVE.
Please Guide.
--
Thanks & Regards,
Pritam Tambe,
Project Engineer - AAI Group,
Centre for Development of Advanced Computing [C-DAC],
------------------------------------------------------------
-------------------------------------------------------------------
Post by Pritam Tambe
This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------
-------------------------------------------------------------------
Loading...