Discussion:
UNORDERED_RECEIVER taking 70% of query time
Add Reply
j***@accenture.com
2017-06-02 05:18:07 UTC
Reply
Permalink
Raw Message
Hi,

I am running a simple query which performs JOIN operation between two parquet files and it takes around 3-4 secs and I noticed that 70% of the time is used by UNORDERED_RECEIVER.

Sample query is -

select sum(sales),week from dfs.`C:\parquet-location\F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet` where model_component_id in(
select model_component_id from dfs.`C:\parquet-location\poc48k.parquet`) group by week


Can we somehow reduce unordered receiver time?

Please find the below screenshot of Visualized plan

[cid:***@01D2DB8D.B3B1C790]






________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com
Abhishek Girish
2017-06-02 05:29:57 UTC
Reply
Permalink
Raw Message
Attachment hasn't come through. Can you upload the query profile to some
cloud storage and share a link to it?

Also, please share details on how large your dataset is, number of
Drillbits, memory and other configurations.
Post by j***@accenture.com
Hi,
I am running a simple query which performs JOIN operation between two
parquet files and it takes around 3-4 secs and I noticed that 70% of the
time is used by UNORDERED_RECEIVER.
Sample query is –
select sum(sales),week from dfs.`C:\parquet-location\
F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet` where
model_component_id in(
select model_component_id from dfs.`C:\parquet-location\poc48k.parquet`) group by week
Can we somehow reduce unordered receiver time?
Please find the below screenshot of Visualized plan
------------------------------
This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If you have
received it in error, please notify the sender immediately and delete the
original. Any other use of the e-mail by you is prohibited. Where allowed
by local law, electronic communications with Accenture and its affiliates,
including e-mail and instant messaging (including content), may be scanned
by our systems for the purposes of information security and assessment of
internal compliance with Accenture policy.
____________________________________________________________
__________________________
www.accenture.com
j***@accenture.com
2017-06-02 07:13:21 UTC
Reply
Permalink
Raw Message
Hi,

Please find the attached query profile.

I am running Drill in local mode on my laptop with default memory allocation to Apache Drill.

Let me know if you are not able to find the attachment.

Also, sending the file in RAR format.

Regards,
Jasbir Singh


-----Original Message-----
From: Abhishek Girish [mailto:***@apache.org]
Sent: Friday, June 02, 2017 11:00 AM
To: ***@drill.apache.org
Subject: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Attachment hasn't come through. Can you upload the query profile to some cloud storage and share a link to it?

Also, please share details on how large your dataset is, number of Drillbits, memory and other configurations.
Post by j***@accenture.com
Hi,
I am running a simple query which performs JOIN operation between two
parquet files and it takes around 3-4 secs and I noticed that 70% of
the time is used by UNORDERED_RECEIVER.
Sample query is –
select sum(sales),week from dfs.`C:\parquet-location\
F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet`
where model_component_id in(
select model_component_id from
dfs.`C:\parquet-location\poc48k.parquet`)
group by week
Can we somehow reduce unordered receiver time?
Please find the below screenshot of Visualized plan
------------------------------
This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If you
have received it in error, please notify the sender immediately and
delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture
and its affiliates, including e-mail and instant messaging (including
content), may be scanned by our systems for the purposes of
information security and assessment of internal compliance with Accenture policy.
____________________________________________________________
__________________________
www.accenture.com
________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com
Kunal Khatua
2017-06-02 18:29:49 UTC
Reply
Permalink
Raw Message
Hi Jasbir


I don't think the Apache mailing lists allows you to send attachments, except may be text files. (The txt file made it through).


In your Operator Profile, you'll see two columns... %Fragment Time and %QueryTime

Taking your mouse over those table headers should show you a description of the two.


%Fragment time is the fraction of time spent by threads of that Major Fragment for a specific operator. This simply means which operator did the threads of a major fragment spend most time on.


%QueryTime is teh fraction of time spent by the threads of ALL the Major fragments for a specific operator. This simply means which operator, implicitly, consumed the most CPU resources.


From the latter, it appears that the HashJoin (03-xx-04) and Parquet Scan (03-xx-06) are the biggest bottlenecks. THe unordered receiver is not the bottleneck in the query.


~ Kunal



________________________________
From: ***@accenture.com <***@accenture.com>
Sent: Friday, June 2, 2017 12:13:21 AM
To: ***@drill.apache.org; ***@drill.apache.org
Cc: ***@accenture.com; ***@accenture.com; ***@accenture.com
Subject: RE: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi,

Please find the attached query profile.

I am running Drill in local mode on my laptop with default memory allocation to Apache Drill.

Let me know if you are not able to find the attachment.

Also, sending the file in RAR format.

Regards,
Jasbir Singh


-----Original Message-----
From: Abhishek Girish [mailto:***@apache.org]
Sent: Friday, June 02, 2017 11:00 AM
To: ***@drill.apache.org
Subject: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Attachment hasn't come through. Can you upload the query profile to some cloud storage and share a link to it?

Also, please share details on how large your dataset is, number of Drillbits, memory and other configurations.
Post by j***@accenture.com
Hi,
I am running a simple query which performs JOIN operation between two
parquet files and it takes around 3-4 secs and I noticed that 70% of
the time is used by UNORDERED_RECEIVER.
Sample query is –
select sum(sales),week from dfs.`C:\parquet-location\
F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet`
where model_component_id in(
select model_component_id from
dfs.`C:\parquet-location\poc48k.parquet`)
group by week
Can we somehow reduce unordered receiver time?
Please find the below screenshot of Visualized plan
------------------------------
This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If you
have received it in error, please notify the sender immediately and
delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture
and its affiliates, including e-mail and instant messaging (including
content), may be scanned by our systems for the purposes of
information security and assessment of internal compliance with Accenture policy.
____________________________________________________________
__________________________
www.accenture.com<http://www.accenture.com>
________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>
j***@accenture.com
2017-06-03 12:51:20 UTC
Reply
Permalink
Raw Message
Thanks Kunal.

Query that I am running on Apache Drill is -

select sum(sales), week from dfs.`C:\parquet-location\F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet` where model_component_id in(select model_component_id from dfs.`C:\parquet-location\poc48k.parquet`) group by week

And record count of my 2 parquet files are like -

F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet - 4,000,000(approx) and size of parquet is 7 MB
poc48k.parquet - 48000 (approx) and size is 1.68 MB

from query profile, I could see that its PARQUET_ROW_GROUP_SCAN which is taking most of the % query time. With this record size, it is expected to take so much time or is there any way by which we can try reducing it?

Regards,
Jasbir singh





-----Original Message-----
From: Kunal Khatua [mailto:***@mapr.com]
Sent: Saturday, June 03, 2017 12:00 AM
To: ***@drill.apache.org; ***@drill.apache.org
Cc: Kothari, Maneesh <***@accenture.com>; Sareen, Nitin A. <***@accenture.com>; Kumar, H. P. <***@accenture.com>
Subject: Re: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi Jasbir


I don't think the Apache mailing lists allows you to send attachments, except may be text files. (The txt file made it through).


In your Operator Profile, you'll see two columns... %Fragment Time and %QueryTime

Taking your mouse over those table headers should show you a description of the two.


%Fragment time is the fraction of time spent by threads of that Major Fragment for a specific operator. This simply means which operator did the threads of a major fragment spend most time on.


%QueryTime is teh fraction of time spent by the threads of ALL the Major fragments for a specific operator. This simply means which operator, implicitly, consumed the most CPU resources.


From the latter, it appears that the HashJoin (03-xx-04) and Parquet Scan (03-xx-06) are the biggest bottlenecks. THe unordered receiver is not the bottleneck in the query.


~ Kunal



________________________________
From: ***@accenture.com <***@accenture.com>
Sent: Friday, June 2, 2017 12:13:21 AM
To: ***@drill.apache.org; ***@drill.apache.org
Cc: ***@accenture.com; ***@accenture.com; ***@accenture.com
Subject: RE: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi,

Please find the attached query profile.

I am running Drill in local mode on my laptop with default memory allocation to Apache Drill.

Let me know if you are not able to find the attachment.

Also, sending the file in RAR format.

Regards,
Jasbir Singh


-----Original Message-----
From: Abhishek Girish [mailto:***@apache.org]
Sent: Friday, June 02, 2017 11:00 AM
To: ***@drill.apache.org
Subject: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Attachment hasn't come through. Can you upload the query profile to some cloud storage and share a link to it?

Also, please share details on how large your dataset is, number of Drillbits, memory and other configurations.
Post by j***@accenture.com
Hi,
I am running a simple query which performs JOIN operation between two
parquet files and it takes around 3-4 secs and I noticed that 70% of
the time is used by UNORDERED_RECEIVER.
Sample query is -
select sum(sales),week from dfs.`C:\parquet-location\
F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet`
where model_component_id in(
select model_component_id from
dfs.`C:\parquet-location\poc48k.parquet`)
group by week
Can we somehow reduce unordered receiver time?
Please find the below screenshot of Visualized plan
------------------------------
This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If you
have received it in error, please notify the sender immediately and
delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture
and its affiliates, including e-mail and instant messaging (including
content), may be scanned by our systems for the purposes of
information security and assessment of internal compliance with Accenture policy.
____________________________________________________________
__________________________
www.accenture.com<http://www.accenture.com>
________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>
Kunal Khatua
2017-06-04 06:45:49 UTC
Reply
Permalink
Raw Message
I suspect you are running this probably on a laptop or something that has a small number of cores. (2 or 4 perhaps)? That would explain the


You need to try and reduce the parallelization, but looking at the profile, you're already pretty low.


If you are using Drill 1.9+ ; the Async Parquet Reader can be disabled (or tweaked) to reduce the number of active threads. This might put less strain on your system.


Kunal

<http://www.mapr.com/>

________________________________
From: ***@accenture.com <***@accenture.com>
Sent: Saturday, June 3, 2017 5:51:20 AM
To: ***@drill.apache.org; ***@drill.apache.org
Cc: ***@accenture.com; ***@accenture.com; ***@accenture.com
Subject: RE: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Thanks Kunal.

Query that I am running on Apache Drill is -

select sum(sales), week from dfs.`C:\parquet-location\F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet` where model_component_id in(select model_component_id from dfs.`C:\parquet-location\poc48k.parquet`) group by week

And record count of my 2 parquet files are like -

F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet - 4,000,000(approx) and size of parquet is 7 MB
poc48k.parquet - 48000 (approx) and size is 1.68 MB

from query profile, I could see that its PARQUET_ROW_GROUP_SCAN which is taking most of the % query time. With this record size, it is expected to take so much time or is there any way by which we can try reducing it?

Regards,
Jasbir singh





-----Original Message-----
From: Kunal Khatua [mailto:***@mapr.com]
Sent: Saturday, June 03, 2017 12:00 AM
To: ***@drill.apache.org; ***@drill.apache.org
Cc: Kothari, Maneesh <***@accenture.com>; Sareen, Nitin A. <***@accenture.com>; Kumar, H. P. <***@accenture.com>
Subject: Re: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi Jasbir


I don't think the Apache mailing lists allows you to send attachments, except may be text files. (The txt file made it through).


In your Operator Profile, you'll see two columns... %Fragment Time and %QueryTime

Taking your mouse over those table headers should show you a description of the two.


%Fragment time is the fraction of time spent by threads of that Major Fragment for a specific operator. This simply means which operator did the threads of a major fragment spend most time on.


%QueryTime is teh fraction of time spent by the threads of ALL the Major fragments for a specific operator. This simply means which operator, implicitly, consumed the most CPU resources.


From the latter, it appears that the HashJoin (03-xx-04) and Parquet Scan (03-xx-06) are the biggest bottlenecks. THe unordered receiver is not the bottleneck in the query.


~ Kunal



________________________________
From: ***@accenture.com <***@accenture.com>
Sent: Friday, June 2, 2017 12:13:21 AM
To: ***@drill.apache.org; ***@drill.apache.org
Cc: ***@accenture.com; ***@accenture.com; ***@accenture.com
Subject: RE: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi,

Please find the attached query profile.

I am running Drill in local mode on my laptop with default memory allocation to Apache Drill.

Let me know if you are not able to find the attachment.

Also, sending the file in RAR format.

Regards,
Jasbir Singh


-----Original Message-----
From: Abhishek Girish [mailto:***@apache.org]
Sent: Friday, June 02, 2017 11:00 AM
To: ***@drill.apache.org
Subject: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Attachment hasn't come through. Can you upload the query profile to some cloud storage and share a link to it?

Also, please share details on how large your dataset is, number of Drillbits, memory and other configurations.
Post by j***@accenture.com
Hi,
I am running a simple query which performs JOIN operation between two
parquet files and it takes around 3-4 secs and I noticed that 70% of
the time is used by UNORDERED_RECEIVER.
Sample query is -
select sum(sales),week from dfs.`C:\parquet-location\
F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet`
where model_component_id in(
select model_component_id from
dfs.`C:\parquet-location\poc48k.parquet`)
group by week
Can we somehow reduce unordered receiver time?
Please find the below screenshot of Visualized plan
------------------------------
This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If you
have received it in error, please notify the sender immediately and
delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture
and its affiliates, including e-mail and instant messaging (including
content), may be scanned by our systems for the purposes of
information security and assessment of internal compliance with Accenture policy.
____________________________________________________________
__________________________
www.accenture.com<http://www.accenture.com>
________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

Loading...