Discussion:
Discussion: Comments in Drill Views
(too old to reply)
John Omernik
2016-06-23 18:48:45 UTC
Permalink
Raw Message
I am looking for discussion here. A colleague was asking me how to add
comments to the metadata of a view. (He's new to Drill, thus the idea of
not having metadata for a table is one he's warming up to).

That got me thinking... why couldn't we use Drill Views to store
table/field comments? This could be a great way to help add contextual
information for users. Here's some current observations when I issue a
describe view_myview


1. I get three columns ,COLUMN_NAME, DATA_TYPE, and IS_NULLABLE
2. Even thought the underlying parquet table has types, the view does not
pass the types for the underlying parquet files through. (The type is ANY)
3. The data for the view is all just a json file that could be easily
extended.


So, a few things would be a nice to have

1. Table comments. when I issue a describe table, if the view has a
"Description" field, then having that print out as a description for the
whole view would be nice. This is harder, I think because it's not just
extending the view information.

2. Column comments: A text field that could be added to the view, and just
print out another column with description. This would be very helpful.
While Drill being schemaless is awesome, the ability to add information to
known data, is huge.

3. Ability to to use the types from the Parquet files (without manually
specifying each type). If we could provide an option to View creation to
attempt to infer type, that would be handy. I realize that folks are using
the LIMIT 0 to get metadata, but describe could be done well too.

4. Ability, using ANSI Sql to update the view column descriptions and the
description for the view itself.

5. I believe Avro has the ability to add this information to the files, so
if the data exists outside of views (such as in AVRO files) should we
present it to the user in describe table events as well?

Curious if folks think this would be valuable, how much work an addition
like this would be to Drill, and other thoughts in general.


John
Ted Dunning
2016-06-23 21:02:23 UTC
Permalink
Raw Message
This is very interesting. I love docstrings in Lisp and Python and Javadoc
in Java.

Basically this is like that, but for SQL. Very helpful.
Post by John Omernik
I am looking for discussion here. A colleague was asking me how to add
comments to the metadata of a view. (He's new to Drill, thus the idea of
not having metadata for a table is one he's warming up to).
That got me thinking... why couldn't we use Drill Views to store
table/field comments? This could be a great way to help add contextual
information for users. Here's some current observations when I issue a
describe view_myview
1. I get three columns ,COLUMN_NAME, DATA_TYPE, and IS_NULLABLE
2. Even thought the underlying parquet table has types, the view does not
pass the types for the underlying parquet files through. (The type is ANY)
3. The data for the view is all just a json file that could be easily
extended.
So, a few things would be a nice to have
1. Table comments. when I issue a describe table, if the view has a
"Description" field, then having that print out as a description for the
whole view would be nice. This is harder, I think because it's not just
extending the view information.
2. Column comments: A text field that could be added to the view, and just
print out another column with description. This would be very helpful.
While Drill being schemaless is awesome, the ability to add information to
known data, is huge.
3. Ability to to use the types from the Parquet files (without manually
specifying each type). If we could provide an option to View creation to
attempt to infer type, that would be handy. I realize that folks are using
the LIMIT 0 to get metadata, but describe could be done well too.
4. Ability, using ANSI Sql to update the view column descriptions and the
description for the view itself.
5. I believe Avro has the ability to add this information to the files, so
if the data exists outside of views (such as in AVRO files) should we
present it to the user in describe table events as well?
Curious if folks think this would be valuable, how much work an addition
like this would be to Drill, and other thoughts in general.
John
John Omernik
2017-03-01 17:55:27 UTC
Permalink
Raw Message
Sorry, I let this idea drop (I didn't follow up and found when searching
for something else...) Any other thoughts on this idea?

Should I open a JIRA if people think it would be handy?
Post by Ted Dunning
This is very interesting. I love docstrings in Lisp and Python and Javadoc
in Java.
Basically this is like that, but for SQL. Very helpful.
Post by John Omernik
I am looking for discussion here. A colleague was asking me how to add
comments to the metadata of a view. (He's new to Drill, thus the idea of
not having metadata for a table is one he's warming up to).
That got me thinking... why couldn't we use Drill Views to store
table/field comments? This could be a great way to help add contextual
information for users. Here's some current observations when I issue a
describe view_myview
1. I get three columns ,COLUMN_NAME, DATA_TYPE, and IS_NULLABLE
2. Even thought the underlying parquet table has types, the view does not
pass the types for the underlying parquet files through. (The type is
ANY)
Post by John Omernik
3. The data for the view is all just a json file that could be easily
extended.
So, a few things would be a nice to have
1. Table comments. when I issue a describe table, if the view has a
"Description" field, then having that print out as a description for the
whole view would be nice. This is harder, I think because it's not just
extending the view information.
2. Column comments: A text field that could be added to the view, and
just
Post by John Omernik
print out another column with description. This would be very helpful.
While Drill being schemaless is awesome, the ability to add information
to
Post by John Omernik
known data, is huge.
3. Ability to to use the types from the Parquet files (without manually
specifying each type). If we could provide an option to View creation to
attempt to infer type, that would be handy. I realize that folks are
using
Post by John Omernik
the LIMIT 0 to get metadata, but describe could be done well too.
4. Ability, using ANSI Sql to update the view column descriptions and the
description for the view itself.
5. I believe Avro has the ability to add this information to the files,
so
Post by John Omernik
if the data exists outside of views (such as in AVRO files) should we
present it to the user in describe table events as well?
Curious if folks think this would be valuable, how much work an addition
like this would be to Drill, and other thoughts in general.
John
Kunal Khatua
2017-03-02 02:08:22 UTC
Permalink
Raw Message
+1


I this this can be very useful. The only worry is of someone abusing it, so we probably should have a limit on the size of this? Not sure else it could be exposed and consumed.


Kunal Khatua

Engineering

[MapR]<http://www.mapr.com/>

www.mapr.com<http://www.mapr.com/>

________________________________
From: John Omernik <***@omernik.com>
Sent: Wednesday, March 1, 2017 9:55:27 AM
To: user
Subject: Re: Discussion: Comments in Drill Views

Sorry, I let this idea drop (I didn't follow up and found when searching
for something else...) Any other thoughts on this idea?

Should I open a JIRA if people think it would be handy?
Post by Ted Dunning
This is very interesting. I love docstrings in Lisp and Python and Javadoc
in Java.
Basically this is like that, but for SQL. Very helpful.
Post by John Omernik
I am looking for discussion here. A colleague was asking me how to add
comments to the metadata of a view. (He's new to Drill, thus the idea of
not having metadata for a table is one he's warming up to).
That got me thinking... why couldn't we use Drill Views to store
table/field comments? This could be a great way to help add contextual
information for users. Here's some current observations when I issue a
describe view_myview
1. I get three columns ,COLUMN_NAME, DATA_TYPE, and IS_NULLABLE
2. Even thought the underlying parquet table has types, the view does not
pass the types for the underlying parquet files through. (The type is
ANY)
Post by John Omernik
3. The data for the view is all just a json file that could be easily
extended.
So, a few things would be a nice to have
1. Table comments. when I issue a describe table, if the view has a
"Description" field, then having that print out as a description for the
whole view would be nice. This is harder, I think because it's not just
extending the view information.
2. Column comments: A text field that could be added to the view, and
just
Post by John Omernik
print out another column with description. This would be very helpful.
While Drill being schemaless is awesome, the ability to add information
to
Post by John Omernik
known data, is huge.
3. Ability to to use the types from the Parquet files (without manually
specifying each type). If we could provide an option to View creation to
attempt to infer type, that would be handy. I realize that folks are
using
Post by John Omernik
the LIMIT 0 to get metadata, but describe could be done well too.
4. Ability, using ANSI Sql to update the view column descriptions and the
description for the view itself.
5. I believe Avro has the ability to add this information to the files,
so
Post by John Omernik
if the data exists outside of views (such as in AVRO files) should we
present it to the user in describe table events as well?
Curious if folks think this would be valuable, how much work an addition
like this would be to Drill, and other thoughts in general.
John
John Omernik
2017-03-02 16:42:53 UTC
Permalink
Raw Message
So I think on your worry that's an easily definable "abuse" condition...
i.e. if we set a limit of say 1024 characters, that provides ample space
for descriptions, but at 1kb per view, that's an allowable condition, i.e.
it would be hard to abuse it ... or am I missing something?
Post by Kunal Khatua
+1
I this this can be very useful. The only worry is of someone abusing it,
so we probably should have a limit on the size of this? Not sure else it
could be exposed and consumed.
Kunal Khatua
Engineering
[MapR]<http://www.mapr.com/>
www.mapr.com<http://www.mapr.com/>
________________________________
Sent: Wednesday, March 1, 2017 9:55:27 AM
To: user
Subject: Re: Discussion: Comments in Drill Views
Sorry, I let this idea drop (I didn't follow up and found when searching
for something else...) Any other thoughts on this idea?
Should I open a JIRA if people think it would be handy?
Post by Ted Dunning
This is very interesting. I love docstrings in Lisp and Python and
Javadoc
Post by Ted Dunning
in Java.
Basically this is like that, but for SQL. Very helpful.
Post by John Omernik
I am looking for discussion here. A colleague was asking me how to add
comments to the metadata of a view. (He's new to Drill, thus the idea
of
Post by Ted Dunning
Post by John Omernik
not having metadata for a table is one he's warming up to).
That got me thinking... why couldn't we use Drill Views to store
table/field comments? This could be a great way to help add contextual
information for users. Here's some current observations when I issue a
describe view_myview
1. I get three columns ,COLUMN_NAME, DATA_TYPE, and IS_NULLABLE
2. Even thought the underlying parquet table has types, the view does
not
Post by Ted Dunning
Post by John Omernik
pass the types for the underlying parquet files through. (The type is
ANY)
Post by John Omernik
3. The data for the view is all just a json file that could be easily
extended.
So, a few things would be a nice to have
1. Table comments. when I issue a describe table, if the view has a
"Description" field, then having that print out as a description for
the
Post by Ted Dunning
Post by John Omernik
whole view would be nice. This is harder, I think because it's not
just
Post by Ted Dunning
Post by John Omernik
extending the view information.
2. Column comments: A text field that could be added to the view, and
just
Post by John Omernik
print out another column with description. This would be very helpful.
While Drill being schemaless is awesome, the ability to add information
to
Post by John Omernik
known data, is huge.
3. Ability to to use the types from the Parquet files (without manually
specifying each type). If we could provide an option to View creation
to
Post by Ted Dunning
Post by John Omernik
attempt to infer type, that would be handy. I realize that folks are
using
Post by John Omernik
the LIMIT 0 to get metadata, but describe could be done well too.
4. Ability, using ANSI Sql to update the view column descriptions and
the
Post by Ted Dunning
Post by John Omernik
description for the view itself.
5. I believe Avro has the ability to add this information to the files,
so
Post by John Omernik
if the data exists outside of views (such as in AVRO files) should we
present it to the user in describe table events as well?
Curious if folks think this would be valuable, how much work an
addition
Post by Ted Dunning
Post by John Omernik
like this would be to Drill, and other thoughts in general.
John
Ted Dunning
2017-03-03 20:56:44 UTC
Permalink
Raw Message
It it really necessary to put a technical limit in to prevent people from
OVER-documenting views?


What is the last time you saw code that had too many comments in it?
Post by John Omernik
So I think on your worry that's an easily definable "abuse" condition...
i.e. if we set a limit of say 1024 characters, that provides ample space
for descriptions, but at 1kb per view, that's an allowable condition, i.e.
it would be hard to abuse it ... or am I missing something?
Post by Kunal Khatua
+1
I this this can be very useful. The only worry is of someone abusing it,
so we probably should have a limit on the size of this? Not sure else it
could be exposed and consumed.
Kunal Khatua
Engineering
[MapR]<http://www.mapr.com/>
www.mapr.com<http://www.mapr.com/>
________________________________
Sent: Wednesday, March 1, 2017 9:55:27 AM
To: user
Subject: Re: Discussion: Comments in Drill Views
Sorry, I let this idea drop (I didn't follow up and found when searching
for something else...) Any other thoughts on this idea?
Should I open a JIRA if people think it would be handy?
Post by Ted Dunning
This is very interesting. I love docstrings in Lisp and Python and
Javadoc
Post by Ted Dunning
in Java.
Basically this is like that, but for SQL. Very helpful.
Post by John Omernik
I am looking for discussion here. A colleague was asking me how to
add
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
comments to the metadata of a view. (He's new to Drill, thus the
idea
Post by Kunal Khatua
of
Post by Ted Dunning
Post by John Omernik
not having metadata for a table is one he's warming up to).
That got me thinking... why couldn't we use Drill Views to store
table/field comments? This could be a great way to help add
contextual
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
information for users. Here's some current observations when I issue
a
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
describe view_myview
1. I get three columns ,COLUMN_NAME, DATA_TYPE, and IS_NULLABLE
2. Even thought the underlying parquet table has types, the view does
not
Post by Ted Dunning
Post by John Omernik
pass the types for the underlying parquet files through. (The type
is
Post by Kunal Khatua
Post by Ted Dunning
ANY)
Post by John Omernik
3. The data for the view is all just a json file that could be easily
extended.
So, a few things would be a nice to have
1. Table comments. when I issue a describe table, if the view has a
"Description" field, then having that print out as a description for
the
Post by Ted Dunning
Post by John Omernik
whole view would be nice. This is harder, I think because it's not
just
Post by Ted Dunning
Post by John Omernik
extending the view information.
2. Column comments: A text field that could be added to the view,
and
Post by Kunal Khatua
Post by Ted Dunning
just
Post by John Omernik
print out another column with description. This would be very
helpful.
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
While Drill being schemaless is awesome, the ability to add
information
Post by Kunal Khatua
Post by Ted Dunning
to
Post by John Omernik
known data, is huge.
3. Ability to to use the types from the Parquet files (without
manually
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
specifying each type). If we could provide an option to View
creation
Post by Kunal Khatua
to
Post by Ted Dunning
Post by John Omernik
attempt to infer type, that would be handy. I realize that folks are
using
Post by John Omernik
the LIMIT 0 to get metadata, but describe could be done well too.
4. Ability, using ANSI Sql to update the view column descriptions and
the
Post by Ted Dunning
Post by John Omernik
description for the view itself.
5. I believe Avro has the ability to add this information to the
files,
Post by Kunal Khatua
Post by Ted Dunning
so
Post by John Omernik
if the data exists outside of views (such as in AVRO files) should we
present it to the user in describe table events as well?
Curious if folks think this would be valuable, how much work an
addition
Post by Ted Dunning
Post by John Omernik
like this would be to Drill, and other thoughts in general.
John
Kunal Khatua
2017-03-03 23:19:01 UTC
Permalink
Raw Message
It might be, incase someone begins to dump a massive design doc into the comment field for a view's JSON.


I'm also not sure about how this information can be consumed. If it is through CLI, either we rely on the SQLLine shell to trim the output, or not worry at all. I'm assuming we'd also probably want something like a

DESCRIBE VIEW ...

to be enhanced to something like

DESCRIBE VIEW WITH COMMENTARY ...


A 1KB field is quite generous IMHO. That's more than 7 tweets to describe something ! [?]


Kunal Khatua

________________________________
From: Ted Dunning <***@gmail.com>
Sent: Friday, March 3, 2017 12:56:44 PM
To: user
Subject: Re: Discussion: Comments in Drill Views

It it really necessary to put a technical limit in to prevent people from
OVER-documenting views?


What is the last time you saw code that had too many comments in it?
Post by John Omernik
So I think on your worry that's an easily definable "abuse" condition...
i.e. if we set a limit of say 1024 characters, that provides ample space
for descriptions, but at 1kb per view, that's an allowable condition, i.e.
it would be hard to abuse it ... or am I missing something?
Post by Kunal Khatua
+1
I this this can be very useful. The only worry is of someone abusing it,
so we probably should have a limit on the size of this? Not sure else it
could be exposed and consumed.
Kunal Khatua
Engineering
[MapR]<http://www.mapr.com/>
www.mapr.com<http://www.mapr.com/>
________________________________
Sent: Wednesday, March 1, 2017 9:55:27 AM
To: user
Subject: Re: Discussion: Comments in Drill Views
Sorry, I let this idea drop (I didn't follow up and found when searching
for something else...) Any other thoughts on this idea?
Should I open a JIRA if people think it would be handy?
Post by Ted Dunning
This is very interesting. I love docstrings in Lisp and Python and
Javadoc
Post by Ted Dunning
in Java.
Basically this is like that, but for SQL. Very helpful.
Post by John Omernik
I am looking for discussion here. A colleague was asking me how to
add
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
comments to the metadata of a view. (He's new to Drill, thus the
idea
Post by Kunal Khatua
of
Post by Ted Dunning
Post by John Omernik
not having metadata for a table is one he's warming up to).
That got me thinking... why couldn't we use Drill Views to store
table/field comments? This could be a great way to help add
contextual
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
information for users. Here's some current observations when I issue
a
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
describe view_myview
1. I get three columns ,COLUMN_NAME, DATA_TYPE, and IS_NULLABLE
2. Even thought the underlying parquet table has types, the view does
not
Post by Ted Dunning
Post by John Omernik
pass the types for the underlying parquet files through. (The type
is
Post by Kunal Khatua
Post by Ted Dunning
ANY)
Post by John Omernik
3. The data for the view is all just a json file that could be easily
extended.
So, a few things would be a nice to have
1. Table comments. when I issue a describe table, if the view has a
"Description" field, then having that print out as a description for
the
Post by Ted Dunning
Post by John Omernik
whole view would be nice. This is harder, I think because it's not
just
Post by Ted Dunning
Post by John Omernik
extending the view information.
2. Column comments: A text field that could be added to the view,
and
Post by Kunal Khatua
Post by Ted Dunning
just
Post by John Omernik
print out another column with description. This would be very
helpful.
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
While Drill being schemaless is awesome, the ability to add
information
Post by Kunal Khatua
Post by Ted Dunning
to
Post by John Omernik
known data, is huge.
3. Ability to to use the types from the Parquet files (without
manually
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
specifying each type). If we could provide an option to View
creation
Post by Kunal Khatua
to
Post by Ted Dunning
Post by John Omernik
attempt to infer type, that would be handy. I realize that folks are
using
Post by John Omernik
the LIMIT 0 to get metadata, but describe could be done well too.
4. Ability, using ANSI Sql to update the view column descriptions and
the
Post by Ted Dunning
Post by John Omernik
description for the view itself.
5. I believe Avro has the ability to add this information to the
files,
Post by Kunal Khatua
Post by Ted Dunning
so
Post by John Omernik
if the data exists outside of views (such as in AVRO files) should we
present it to the user in describe table events as well?
Curious if folks think this would be valuable, how much work an
addition
Post by Ted Dunning
Post by John Omernik
like this would be to Drill, and other thoughts in general.
John
Ted Dunning
2017-03-04 02:28:05 UTC
Permalink
Raw Message
All of War and Peace is only 3MB.

Let people document however they want. Don't over-optimize for problems
that have never occurred.
Post by Kunal Khatua
It might be, incase someone begins to dump a massive design doc into the
comment field for a view's JSON.
I'm also not sure about how this information can be consumed. If it is
through CLI, either we rely on the SQLLine shell to trim the output, or not
worry at all. I'm assuming we'd also probably want something like a
DESCRIBE VIEW ...
to be enhanced to something like
DESCRIBE VIEW WITH COMMENTARY ...
A 1KB field is quite generous IMHO. That's more than 7 tweets to describe something ! [?]
Kunal Khatua
________________________________
Sent: Friday, March 3, 2017 12:56:44 PM
To: user
Subject: Re: Discussion: Comments in Drill Views
It it really necessary to put a technical limit in to prevent people from
OVER-documenting views?
What is the last time you saw code that had too many comments in it?
Post by John Omernik
So I think on your worry that's an easily definable "abuse" condition...
i.e. if we set a limit of say 1024 characters, that provides ample space
for descriptions, but at 1kb per view, that's an allowable condition,
i.e.
Post by John Omernik
it would be hard to abuse it ... or am I missing something?
Post by Kunal Khatua
+1
I this this can be very useful. The only worry is of someone abusing
it,
Post by John Omernik
Post by Kunal Khatua
so we probably should have a limit on the size of this? Not sure else
it
Post by John Omernik
Post by Kunal Khatua
could be exposed and consumed.
Kunal Khatua
Engineering
[MapR]<http://www.mapr.com/>
www.mapr.com<http://www.mapr.com/>
________________________________
Sent: Wednesday, March 1, 2017 9:55:27 AM
To: user
Subject: Re: Discussion: Comments in Drill Views
Sorry, I let this idea drop (I didn't follow up and found when
searching
Post by John Omernik
Post by Kunal Khatua
for something else...) Any other thoughts on this idea?
Should I open a JIRA if people think it would be handy?
Post by Ted Dunning
This is very interesting. I love docstrings in Lisp and Python and
Javadoc
Post by Ted Dunning
in Java.
Basically this is like that, but for SQL. Very helpful.
Post by John Omernik
I am looking for discussion here. A colleague was asking me how to
add
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
comments to the metadata of a view. (He's new to Drill, thus the
idea
Post by Kunal Khatua
of
Post by Ted Dunning
Post by John Omernik
not having metadata for a table is one he's warming up to).
That got me thinking... why couldn't we use Drill Views to store
table/field comments? This could be a great way to help add
contextual
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
information for users. Here's some current observations when I
issue
Post by John Omernik
a
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
describe view_myview
1. I get three columns ,COLUMN_NAME, DATA_TYPE, and IS_NULLABLE
2. Even thought the underlying parquet table has types, the view
does
Post by John Omernik
Post by Kunal Khatua
not
Post by Ted Dunning
Post by John Omernik
pass the types for the underlying parquet files through. (The type
is
Post by Kunal Khatua
Post by Ted Dunning
ANY)
Post by John Omernik
3. The data for the view is all just a json file that could be
easily
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
extended.
So, a few things would be a nice to have
1. Table comments. when I issue a describe table, if the view has
a
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
"Description" field, then having that print out as a description
for
Post by John Omernik
Post by Kunal Khatua
the
Post by Ted Dunning
Post by John Omernik
whole view would be nice. This is harder, I think because it's not
just
Post by Ted Dunning
Post by John Omernik
extending the view information.
2. Column comments: A text field that could be added to the view,
and
Post by Kunal Khatua
Post by Ted Dunning
just
Post by John Omernik
print out another column with description. This would be very
helpful.
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
While Drill being schemaless is awesome, the ability to add
information
Post by Kunal Khatua
Post by Ted Dunning
to
Post by John Omernik
known data, is huge.
3. Ability to to use the types from the Parquet files (without
manually
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
specifying each type). If we could provide an option to View
creation
Post by Kunal Khatua
to
Post by Ted Dunning
Post by John Omernik
attempt to infer type, that would be handy. I realize that folks
are
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
using
Post by John Omernik
the LIMIT 0 to get metadata, but describe could be done well too.
4. Ability, using ANSI Sql to update the view column descriptions
and
Post by John Omernik
Post by Kunal Khatua
the
Post by Ted Dunning
Post by John Omernik
description for the view itself.
5. I believe Avro has the ability to add this information to the
files,
Post by Kunal Khatua
Post by Ted Dunning
so
Post by John Omernik
if the data exists outside of views (such as in AVRO files) should
we
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
present it to the user in describe table events as well?
Curious if folks think this would be valuable, how much work an
addition
Post by Ted Dunning
Post by John Omernik
like this would be to Drill, and other thoughts in general.
John
John Omernik
2017-03-06 13:55:42 UTC
Permalink
Raw Message
I can see both sides. But Ted is right, this won't hurt any thing from a
performance perspective, even if they put War and Peace in there 30 times,
that's 100mb of information to serve. People may choose to use formatting
languages like Markup or something. I do think we should have a limit so we
know what happens if someone tries to break that limit (from a security
perspective) but we could set that quite high, and then just test putting
data that exceeds that as a unit test.
Post by Ted Dunning
All of War and Peace is only 3MB.
Let people document however they want. Don't over-optimize for problems
that have never occurred.
Post by Kunal Khatua
It might be, incase someone begins to dump a massive design doc into the
comment field for a view's JSON.
I'm also not sure about how this information can be consumed. If it is
through CLI, either we rely on the SQLLine shell to trim the output, or
not
Post by Kunal Khatua
worry at all. I'm assuming we'd also probably want something like a
DESCRIBE VIEW ...
to be enhanced to something like
DESCRIBE VIEW WITH COMMENTARY ...
A 1KB field is quite generous IMHO. That's more than 7 tweets to describe
something ! [?]
Kunal Khatua
________________________________
Sent: Friday, March 3, 2017 12:56:44 PM
To: user
Subject: Re: Discussion: Comments in Drill Views
It it really necessary to put a technical limit in to prevent people from
OVER-documenting views?
What is the last time you saw code that had too many comments in it?
Post by John Omernik
So I think on your worry that's an easily definable "abuse"
condition...
Post by Kunal Khatua
Post by John Omernik
i.e. if we set a limit of say 1024 characters, that provides ample
space
Post by Kunal Khatua
Post by John Omernik
for descriptions, but at 1kb per view, that's an allowable condition,
i.e.
Post by John Omernik
it would be hard to abuse it ... or am I missing something?
Post by Kunal Khatua
+1
I this this can be very useful. The only worry is of someone abusing
it,
Post by John Omernik
Post by Kunal Khatua
so we probably should have a limit on the size of this? Not sure else
it
Post by John Omernik
Post by Kunal Khatua
could be exposed and consumed.
Kunal Khatua
Engineering
[MapR]<http://www.mapr.com/>
www.mapr.com<http://www.mapr.com/>
________________________________
Sent: Wednesday, March 1, 2017 9:55:27 AM
To: user
Subject: Re: Discussion: Comments in Drill Views
Sorry, I let this idea drop (I didn't follow up and found when
searching
Post by John Omernik
Post by Kunal Khatua
for something else...) Any other thoughts on this idea?
Should I open a JIRA if people think it would be handy?
Post by Ted Dunning
This is very interesting. I love docstrings in Lisp and Python and
Javadoc
Post by Ted Dunning
in Java.
Basically this is like that, but for SQL. Very helpful.
Post by John Omernik
I am looking for discussion here. A colleague was asking me how
to
Post by Kunal Khatua
Post by John Omernik
add
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
comments to the metadata of a view. (He's new to Drill, thus the
idea
Post by Kunal Khatua
of
Post by Ted Dunning
Post by John Omernik
not having metadata for a table is one he's warming up to).
That got me thinking... why couldn't we use Drill Views to store
table/field comments? This could be a great way to help add
contextual
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
information for users. Here's some current observations when I
issue
Post by John Omernik
a
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
describe view_myview
1. I get three columns ,COLUMN_NAME, DATA_TYPE, and IS_NULLABLE
2. Even thought the underlying parquet table has types, the view
does
Post by John Omernik
Post by Kunal Khatua
not
Post by Ted Dunning
Post by John Omernik
pass the types for the underlying parquet files through. (The
type
Post by Kunal Khatua
Post by John Omernik
is
Post by Kunal Khatua
Post by Ted Dunning
ANY)
Post by John Omernik
3. The data for the view is all just a json file that could be
easily
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
extended.
So, a few things would be a nice to have
1. Table comments. when I issue a describe table, if the view
has
Post by Kunal Khatua
a
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
"Description" field, then having that print out as a description
for
Post by John Omernik
Post by Kunal Khatua
the
Post by Ted Dunning
Post by John Omernik
whole view would be nice. This is harder, I think because it's
not
Post by Kunal Khatua
Post by John Omernik
Post by Kunal Khatua
just
Post by Ted Dunning
Post by John Omernik
extending the view information.
2. Column comments: A text field that could be added to the
view,
Post by Kunal Khatua
Post by John Omernik
and
Post by Kunal Khatua
Post by Ted Dunning
just
Post by John Omernik
print out another column with description. This would be very
helpful.
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
While Drill being schemaless is awesome, the ability to add
information
Post by Kunal Khatua
Post by Ted Dunning
to
Post by John Omernik
known data, is huge.
3. Ability to to use the types from the Parquet files (without
manually
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
specifying each type). If we could provide an option to View
creation
Post by Kunal Khatua
to
Post by Ted Dunning
Post by John Omernik
attempt to infer type, that would be handy. I realize that folks
are
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
using
Post by John Omernik
the LIMIT 0 to get metadata, but describe could be done well too.
4. Ability, using ANSI Sql to update the view column descriptions
and
Post by John Omernik
Post by Kunal Khatua
the
Post by Ted Dunning
Post by John Omernik
description for the view itself.
5. I believe Avro has the ability to add this information to the
files,
Post by Kunal Khatua
Post by Ted Dunning
so
Post by John Omernik
if the data exists outside of views (such as in AVRO files)
should
Post by Kunal Khatua
we
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
present it to the user in describe table events as well?
Curious if folks think this would be valuable, how much work an
addition
Post by Ted Dunning
Post by John Omernik
like this would be to Drill, and other thoughts in general.
John
John Omernik
2017-05-02 18:08:47 UTC
Permalink
Raw Message
I created a JIRA for this based on the Hangout today!

https://issues.apache.org/jira/browse/DRILL-5461
Post by John Omernik
I can see both sides. But Ted is right, this won't hurt any thing from a
performance perspective, even if they put War and Peace in there 30 times,
that's 100mb of information to serve. People may choose to use formatting
languages like Markup or something. I do think we should have a limit so we
know what happens if someone tries to break that limit (from a security
perspective) but we could set that quite high, and then just test putting
data that exceeds that as a unit test.
Post by Ted Dunning
All of War and Peace is only 3MB.
Let people document however they want. Don't over-optimize for problems
that have never occurred.
Post by Kunal Khatua
It might be, incase someone begins to dump a massive design doc into the
comment field for a view's JSON.
I'm also not sure about how this information can be consumed. If it is
through CLI, either we rely on the SQLLine shell to trim the output, or
not
Post by Kunal Khatua
worry at all. I'm assuming we'd also probably want something like a
DESCRIBE VIEW ...
to be enhanced to something like
DESCRIBE VIEW WITH COMMENTARY ...
A 1KB field is quite generous IMHO. That's more than 7 tweets to
describe
Post by Kunal Khatua
something ! [?]
Kunal Khatua
________________________________
Sent: Friday, March 3, 2017 12:56:44 PM
To: user
Subject: Re: Discussion: Comments in Drill Views
It it really necessary to put a technical limit in to prevent people
from
Post by Kunal Khatua
OVER-documenting views?
What is the last time you saw code that had too many comments in it?
Post by John Omernik
So I think on your worry that's an easily definable "abuse"
condition...
Post by Kunal Khatua
Post by John Omernik
i.e. if we set a limit of say 1024 characters, that provides ample
space
Post by Kunal Khatua
Post by John Omernik
for descriptions, but at 1kb per view, that's an allowable condition,
i.e.
Post by John Omernik
it would be hard to abuse it ... or am I missing something?
Post by Kunal Khatua
+1
I this this can be very useful. The only worry is of someone abusing
it,
Post by John Omernik
Post by Kunal Khatua
so we probably should have a limit on the size of this? Not sure
else
Post by Kunal Khatua
it
Post by John Omernik
Post by Kunal Khatua
could be exposed and consumed.
Kunal Khatua
Engineering
[MapR]<http://www.mapr.com/>
www.mapr.com<http://www.mapr.com/>
________________________________
Sent: Wednesday, March 1, 2017 9:55:27 AM
To: user
Subject: Re: Discussion: Comments in Drill Views
Sorry, I let this idea drop (I didn't follow up and found when
searching
Post by John Omernik
Post by Kunal Khatua
for something else...) Any other thoughts on this idea?
Should I open a JIRA if people think it would be handy?
Post by Ted Dunning
This is very interesting. I love docstrings in Lisp and Python
and
Post by Kunal Khatua
Post by John Omernik
Post by Kunal Khatua
Javadoc
Post by Ted Dunning
in Java.
Basically this is like that, but for SQL. Very helpful.
Post by John Omernik
I am looking for discussion here. A colleague was asking me how
to
Post by Kunal Khatua
Post by John Omernik
add
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
comments to the metadata of a view. (He's new to Drill, thus
the
Post by Kunal Khatua
Post by John Omernik
idea
Post by Kunal Khatua
of
Post by Ted Dunning
Post by John Omernik
not having metadata for a table is one he's warming up to).
That got me thinking... why couldn't we use Drill Views to store
table/field comments? This could be a great way to help add
contextual
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
information for users. Here's some current observations when I
issue
Post by John Omernik
a
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
describe view_myview
1. I get three columns ,COLUMN_NAME, DATA_TYPE, and IS_NULLABLE
2. Even thought the underlying parquet table has types, the view
does
Post by John Omernik
Post by Kunal Khatua
not
Post by Ted Dunning
Post by John Omernik
pass the types for the underlying parquet files through. (The
type
Post by Kunal Khatua
Post by John Omernik
is
Post by Kunal Khatua
Post by Ted Dunning
ANY)
Post by John Omernik
3. The data for the view is all just a json file that could be
easily
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
extended.
So, a few things would be a nice to have
1. Table comments. when I issue a describe table, if the view
has
Post by Kunal Khatua
a
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
"Description" field, then having that print out as a description
for
Post by John Omernik
Post by Kunal Khatua
the
Post by Ted Dunning
Post by John Omernik
whole view would be nice. This is harder, I think because it's
not
Post by Kunal Khatua
Post by John Omernik
Post by Kunal Khatua
just
Post by Ted Dunning
Post by John Omernik
extending the view information.
2. Column comments: A text field that could be added to the
view,
Post by Kunal Khatua
Post by John Omernik
and
Post by Kunal Khatua
Post by Ted Dunning
just
Post by John Omernik
print out another column with description. This would be very
helpful.
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
While Drill being schemaless is awesome, the ability to add
information
Post by Kunal Khatua
Post by Ted Dunning
to
Post by John Omernik
known data, is huge.
3. Ability to to use the types from the Parquet files (without
manually
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
specifying each type). If we could provide an option to View
creation
Post by Kunal Khatua
to
Post by Ted Dunning
Post by John Omernik
attempt to infer type, that would be handy. I realize that folks
are
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
using
Post by John Omernik
the LIMIT 0 to get metadata, but describe could be done well
too.
Post by Kunal Khatua
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
4. Ability, using ANSI Sql to update the view column
descriptions
Post by Kunal Khatua
and
Post by John Omernik
Post by Kunal Khatua
the
Post by Ted Dunning
Post by John Omernik
description for the view itself.
5. I believe Avro has the ability to add this information to the
files,
Post by Kunal Khatua
Post by Ted Dunning
so
Post by John Omernik
if the data exists outside of views (such as in AVRO files)
should
Post by Kunal Khatua
we
Post by John Omernik
Post by Kunal Khatua
Post by Ted Dunning
Post by John Omernik
present it to the user in describe table events as well?
Curious if folks think this would be valuable, how much work an
addition
Post by Ted Dunning
Post by John Omernik
like this would be to Drill, and other thoughts in general.
John
Loading...