Reed Villanueva
2018-12-04 20:50:15 UTC
After upgrading drill on our cluster to drill-1.12.0-mapr, testing our
daily ETL scripts (which all use drill for converting parquet files to
tsv), a validation error ("*table or view with given name already exists*")
is always thrown when trying to run a `CREATE TABLE` statement on some
empty directories in a writable workspace.
[Error Id: 6ea46737-8b6a-4887-a671-4bddbea02476 on
mapr002.ucera.local:31010]
at
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
:
:
:
Caused by: org.apache.drill.common.exceptions.UserRemoteException:
VALIDATION ERROR: A table or view with given name
[/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists
in schema [dfs.etl_internal]
After some brief debugging, I see that the directory in question under the
workspace (ie. /internal_etl/project/version-2/stages/storage/ACCOUNT/tsv)
*is in fact empty*, yet still throwing these errors.
Looking for the error ID in the drillbit.log file in the associated node in
the error message above, we see
2018-12-04 10:13:25,285 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id
23f92019-db56-862f-e7b9-cd51b3e174ae: create table
dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv`
as
select <a bunch of fields>
from
dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/parquet`
2018-12-04 10:13:25,406 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,408 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,893 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,894 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,905 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.e.p.s.h.CreateTableHandler - User Error Occurred: A table or
view with given name
[/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists
in schema [dfs.etl_internal]
org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: A
table or view with given name
[/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists
in schema [dfs.etl_internal]
[Error Id: 45177abc-7e9f-4678-959f-f9e0e38bc564 ]
at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
~[drill-common-1.12.0-mapr.jar:1.12.0-mapr]
at
org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.checkTableCreationPossibility(CreateTableHandler.java:326)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at
org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler.java:90)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:567)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[na:1.8.0_151]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
2018-12-04 10:13:25,924 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.apache.drill.exec.work.WorkManager - Waiting for 0 queries to
complete before shutting down
2018-12-04 10:13:25,924 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.apache.drill.exec.work.WorkManager - Waiting for 0 running
fragments to complete before shutting down
This error occurs even when using `DROP TABLE [IF EXISTS]
<workspace>.<table path name>` before the `CREATE TABLE` statement.
Furthermore, the configurations for the dfs workspace itself does not
appear to be changed from before upgrading to drill-1.12, see below:
:
:
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"etl_internal": {
"location": "/etl/internal",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
:
:
Note that the full process in question is intended to `mv` the directory
contents every day and `CREATE TABLE` with new data from current day (in
case that makes a difference) and this process had been working fine when
we were using drill-1.11.
If anyone with more experience using drill knows what could be happening
here, any opinions or advice would be appreciated.
--
This electronic message is intended only for the named
recipient, and may
contain information that is confidential or
privileged. If you are not the
intended recipient, you are
hereby notified that any disclosure, copying,
distribution or
use of the contents of this message is strictly
prohibited. If
you have received this message in error or are not the
named
recipient, please notify us immediately by contacting the
sender at
the electronic mail address noted above, and delete
and destroy all copies
of this message. Thank you.
daily ETL scripts (which all use drill for converting parquet files to
tsv), a validation error ("*table or view with given name already exists*")
is always thrown when trying to run a `CREATE TABLE` statement on some
empty directories in a writable workspace.
[Error Id: 6ea46737-8b6a-4887-a671-4bddbea02476 on
mapr002.ucera.local:31010]
at
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
:
:
:
Caused by: org.apache.drill.common.exceptions.UserRemoteException:
VALIDATION ERROR: A table or view with given name
[/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists
in schema [dfs.etl_internal]
After some brief debugging, I see that the directory in question under the
workspace (ie. /internal_etl/project/version-2/stages/storage/ACCOUNT/tsv)
*is in fact empty*, yet still throwing these errors.
Looking for the error ID in the drillbit.log file in the associated node in
the error message above, we see
2018-12-04 10:13:25,285 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id
23f92019-db56-862f-e7b9-cd51b3e174ae: create table
dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv`
as
select <a bunch of fields>
from
dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/parquet`
2018-12-04 10:13:25,406 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,408 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,893 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,894 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1
2018-12-04 10:13:25,905 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.a.d.e.p.s.h.CreateTableHandler - User Error Occurred: A table or
view with given name
[/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists
in schema [dfs.etl_internal]
org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: A
table or view with given name
[/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists
in schema [dfs.etl_internal]
[Error Id: 45177abc-7e9f-4678-959f-f9e0e38bc564 ]
at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
~[drill-common-1.12.0-mapr.jar:1.12.0-mapr]
at
org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.checkTableCreationPossibility(CreateTableHandler.java:326)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at
org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler.java:90)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:567)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264)
[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[na:1.8.0_151]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
2018-12-04 10:13:25,924 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.apache.drill.exec.work.WorkManager - Waiting for 0 queries to
complete before shutting down
2018-12-04 10:13:25,924 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
INFO o.apache.drill.exec.work.WorkManager - Waiting for 0 running
fragments to complete before shutting down
This error occurs even when using `DROP TABLE [IF EXISTS]
<workspace>.<table path name>` before the `CREATE TABLE` statement.
Furthermore, the configurations for the dfs workspace itself does not
appear to be changed from before upgrading to drill-1.12, see below:
:
:
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"etl_internal": {
"location": "/etl/internal",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
:
:
Note that the full process in question is intended to `mv` the directory
contents every day and `CREATE TABLE` with new data from current day (in
case that makes a difference) and this process had been working fine when
we were using drill-1.11.
If anyone with more experience using drill knows what could be happening
here, any opinions or advice would be appreciated.
--
This electronic message is intended only for the named
recipient, and may
contain information that is confidential or
privileged. If you are not the
intended recipient, you are
hereby notified that any disclosure, copying,
distribution or
use of the contents of this message is strictly
prohibited. If
you have received this message in error or are not the
named
recipient, please notify us immediately by contacting the
sender at
the electronic mail address noted above, and delete
and destroy all copies
of this message. Thank you.