Balasubramanian Naganathan
2018-10-18 19:00:51 UTC
Hello,
We have tableau BI tool which is getting data from MongoDB using Apache
drill.
We are running drill on 5 nodes each having 8 core and 16 GB RAM, but they
are not running as a cluster. Each node is an individual instance. We have
a Load Balancer to load balance across these 5 nodes.
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/bin/java
-Xms6G -Xmx6G -XX:MaxDirectMemorySize=13G -XX:ReservedCodeCacheSize=1024m
-Ddrill.exec.enable-epoll=false -Dproperty=value -Duser.timezone=UTC
-XX:+UseStringDeduplication -Dproperty=value -Duser.timezone=UTC
-Duser.timezone=UTC -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=27017
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -XX:+UseG1GC
-Dlog.path=/opt/apache-drill-1.14.0/log/drillbit.log
-Dlog.query.path=/opt/apache-drill-1.14.0/log/drillbit_queries.json -cp
/opt/apache-drill-1.14.0/conf:/opt/apache-drill-1.14.0/jars/*:/opt/apache-drill-1.14.0/jars/ext/*:/opt/apache-drill-1.14.0/jars/3rdparty/*:/opt/apache-drill-1.14.0/jars/classb/*:/opt/apache-drill-1.14.0/jars/3rdparty/linux/*
org.apache.drill.exec.server.Drillbit
We see that apache drill is reaching 100% CPU when we run 30 queries per
second. All the queries are very simple queries without any aggregation.
Also each query in Apache drill is getting converted to 100 queries in
Mongo. 97% of the queries are find(1) and count() mongoDB Queries. Not sure
why they are triggered.
Also we tried adding "-XX:+UseStringDeduplication", JVM parameter, We saw
the load become uneven after adding this parameter. Are there any other JVM
parameters that we can add to improve drill performance?
Even though there are no calls to drill, the heap memory usage is
fluctuating. It goes up to around 70% (~4GB) and comes back to around 1GB.
Thanks,
Bala
We have tableau BI tool which is getting data from MongoDB using Apache
drill.
We are running drill on 5 nodes each having 8 core and 16 GB RAM, but they
are not running as a cluster. Each node is an individual instance. We have
a Load Balancer to load balance across these 5 nodes.
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/bin/java
-Xms6G -Xmx6G -XX:MaxDirectMemorySize=13G -XX:ReservedCodeCacheSize=1024m
-Ddrill.exec.enable-epoll=false -Dproperty=value -Duser.timezone=UTC
-XX:+UseStringDeduplication -Dproperty=value -Duser.timezone=UTC
-Duser.timezone=UTC -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=27017
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -XX:+UseG1GC
-Dlog.path=/opt/apache-drill-1.14.0/log/drillbit.log
-Dlog.query.path=/opt/apache-drill-1.14.0/log/drillbit_queries.json -cp
/opt/apache-drill-1.14.0/conf:/opt/apache-drill-1.14.0/jars/*:/opt/apache-drill-1.14.0/jars/ext/*:/opt/apache-drill-1.14.0/jars/3rdparty/*:/opt/apache-drill-1.14.0/jars/classb/*:/opt/apache-drill-1.14.0/jars/3rdparty/linux/*
org.apache.drill.exec.server.Drillbit
We see that apache drill is reaching 100% CPU when we run 30 queries per
second. All the queries are very simple queries without any aggregation.
Also each query in Apache drill is getting converted to 100 queries in
Mongo. 97% of the queries are find(1) and count() mongoDB Queries. Not sure
why they are triggered.
Also we tried adding "-XX:+UseStringDeduplication", JVM parameter, We saw
the load become uneven after adding this parameter. Are there any other JVM
parameters that we can add to improve drill performance?
Even though there are no calls to drill, the heap memory usage is
fluctuating. It goes up to around 70% (~4GB) and comes back to around 1GB.
Thanks,
Bala