John Omernik
2018-12-04 16:05:39 UTC
Time for a fun question: How to be clever with queries!
I have a table that takes readings from an IoT type device
opt_id dt ts eday opt_string
2.1.1 2018-12-01 10:43:43 12.5 1
2.1.2 2018-12-01 10:32:43 5.5 1
2.1.3 2018-12-01 10:55:02 20.5 1
2.2.1 2018-12-01 10:43:43 12.2 2
2.2.2 2018-12-01 10:12:56 43.7 2
2.2.3 2018-12-01 10:50:23 12.2 2
2.1.1 2018-12-01 13:43:43 45.5 1
2.1.2 2018-12-01 13:32:43 20.5 1
2.1.3 2018-12-01 13:55:02 45.5 1
2.2.1 2018-12-01 13:43:43 30.2 2
2.2.2 2018-12-01 13:12:56 81.7 2
2.2.3 2018-12-01 13:50:23 50.2 2
Pretty simple. Essentially, each "device" (opt_id) reports throughout the
day, what it's cumulative value (eday), at that time, for that day
These devices belong to groups which is it's own column (opt_string)
So if I run a query like this:
select opt_id, ds, max(opt_string) as opt_string, max(ts) as max_ts,
max(Eday) as max_Eday
from `mydata`
where ds = '2018-12-01'
group by ds, opt_id, order by opt_id ASC
I get:
opt_id ds opt_string max_ts max_Eday
2.1.1 2018-12-02 1 15:56:33 181.000
2.1.3 2018-12-02 1 15:47:02 162.500
2.1.4 2018-12-02 1 15:54:19 122.250
2.1.5 2018-12-02 1 15:40:15 132.000
2.1.6 2018-12-02 1 15:45:04 183.250
2.1.7 2018-12-02 1 15:41:15 161.000
2.1.8 2018-12-02 1 15:42:10 166.250
2.1.9 2018-12-02 1 15:39:23 126.250
2.2.1 2018-12-02 2 16:02:54 163.250
2.2.2 2018-12-02 2 15:57:39 178.750
2.2.3 2018-12-02 2 15:46:53 148.250
2.2.4 2018-12-02 2 15:40:27 163.500
2.2.5 2018-12-02 2 15:50:49 147.500
2.2.6 2018-12-02 2 15:40:24 156.000
2.2.7 2018-12-02 2 15:43:00 104.750
2.2.8 2018-12-02 2 15:41:55 170.750
2.2.9 2018-12-02 2 15:41:04 127.250
Which shows me the last time each device reported on a day, and what that
report time was, Great.
So what if I wanted to show the values through the day? Is there a way from
a query perpective, to perhaps use Windowing or something to show results
like the above raw table, but instead of just having the random times
returned, group by opt_id and get the max_ts as I am doing, but do that for
each say 5 minute, or 30 minute, or 60 minute block in a day?
The best example to explain what I am looking for is using 3 groups of 1
hour blocks: I could use a UNION like this:
select * from (
select opt_id, ds, max(opt_string) as opt_string, '10:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '10:00:00'
group by ds, opt_id
order by opt_id ASC
)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '11:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '11:00:00'
group by ds, opt_id
order by opt_id ASC)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '12:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '12:00:00'
group by ds, opt_id
order by opt_id ASC)
Where I would get something like:
index opt_id ds opt_string max_ts max_Eday
0 2.1.1 12/2/18 1 10:00:00 18
1 2.1.3 12/2/18 1 10:00:00 15.75
2 2.1.4 12/2/18 1 10:00:00 12.5
3 2.1.5 12/2/18 1 10:00:00 12.75
4 2.1.6 12/2/18 1 10:00:00 18.75
5 2.1.7 12/2/18 1 10:00:00 7.25
6 2.1.8 12/2/18 1 10:00:00 16.25
7 2.1.9 12/2/18 1 10:00:00 9.75
8 2.2.1 12/2/18 2 10:00:00 18
9 2.2.2 12/2/18 2 10:00:00 15.25
10 2.2.3 12/2/18 2 10:00:00 2.75
11 2.2.4 12/2/18 2 10:00:00 6.5
12 2.2.5 12/2/18 2 10:00:00 16.25
13 2.2.6 12/2/18 2 10:00:00 11.25
14 2.2.7 12/2/18 2 10:00:00 13.5
15 2.2.8 12/2/18 2 10:00:00 13.75
16 2.2.9 12/2/18 2 10:00:00 14.5
17 2.1.1 12/2/18 1 11:00:00 41
18 2.1.3 12/2/18 1 11:00:00 32
19 2.1.4 12/2/18 1 11:00:00 25
20 2.1.5 12/2/18 1 11:00:00 26.75
21 2.1.6 12/2/18 1 11:00:00 42.25
22 2.1.7 12/2/18 1 11:00:00 27
23 2.1.8 12/2/18 1 11:00:00 35.75
24 2.1.9 12/2/18 1 11:00:00 22
25 2.2.1 12/2/18 2 11:00:00 34.5
26 2.2.2 12/2/18 2 11:00:00 42.5
27 2.2.3 12/2/18 2 11:00:00 19
28 2.2.4 12/2/18 2 11:00:00 29.25
29 2.2.5 12/2/18 2 11:00:00 32.25
30 2.2.6 12/2/18 2 11:00:00 30.25
31 2.2.7 12/2/18 2 11:00:00 26.75
32 2.2.8 12/2/18 2 11:00:00 33.5
33 2.2.9 12/2/18 2 11:00:00 28.5
34 2.1.1 12/2/18 1 12:00:00 81.5
35 2.1.3 12/2/18 1 12:00:00 62.75
36 2.1.4 12/2/18 1 12:00:00 48.5
37 2.1.5 12/2/18 1 12:00:00 48.25
38 2.1.6 12/2/18 1 12:00:00 80.5
39 2.1.7 12/2/18 1 12:00:00 62
40 2.1.8 12/2/18 1 12:00:00 66.25
41 2.1.9 12/2/18 1 12:00:00 43.75
42 2.2.1 12/2/18 2 12:00:00 62.75
43 2.2.2 12/2/18 2 12:00:00 83.5
44 2.2.3 12/2/18 2 12:00:00 49
45 2.2.4 12/2/18 2 12:00:00 66.25
46 2.2.5 12/2/18 2 12:00:00 56.5
47 2.2.6 12/2/18 2 12:00:00 68
48 2.2.7 12/2/18 2 12:00:00 43
49 2.2.8 12/2/18 2 12:00:00 72.5
50 2.2.9 12/2/18 2 12:00:00 49.25
That's great, but if I wanted to say all the hours in a day, that Union
would get ugly, or what about 30 minute blocks? 5 minute blocks?
Is there a way to get these types of results without the Union and being
explicit about the times? Any advice, including "Look at X" would be
helpful. Or if you need a diversion, dig on in!
John
I have a table that takes readings from an IoT type device
opt_id dt ts eday opt_string
2.1.1 2018-12-01 10:43:43 12.5 1
2.1.2 2018-12-01 10:32:43 5.5 1
2.1.3 2018-12-01 10:55:02 20.5 1
2.2.1 2018-12-01 10:43:43 12.2 2
2.2.2 2018-12-01 10:12:56 43.7 2
2.2.3 2018-12-01 10:50:23 12.2 2
2.1.1 2018-12-01 13:43:43 45.5 1
2.1.2 2018-12-01 13:32:43 20.5 1
2.1.3 2018-12-01 13:55:02 45.5 1
2.2.1 2018-12-01 13:43:43 30.2 2
2.2.2 2018-12-01 13:12:56 81.7 2
2.2.3 2018-12-01 13:50:23 50.2 2
Pretty simple. Essentially, each "device" (opt_id) reports throughout the
day, what it's cumulative value (eday), at that time, for that day
These devices belong to groups which is it's own column (opt_string)
So if I run a query like this:
select opt_id, ds, max(opt_string) as opt_string, max(ts) as max_ts,
max(Eday) as max_Eday
from `mydata`
where ds = '2018-12-01'
group by ds, opt_id, order by opt_id ASC
I get:
opt_id ds opt_string max_ts max_Eday
2.1.1 2018-12-02 1 15:56:33 181.000
2.1.3 2018-12-02 1 15:47:02 162.500
2.1.4 2018-12-02 1 15:54:19 122.250
2.1.5 2018-12-02 1 15:40:15 132.000
2.1.6 2018-12-02 1 15:45:04 183.250
2.1.7 2018-12-02 1 15:41:15 161.000
2.1.8 2018-12-02 1 15:42:10 166.250
2.1.9 2018-12-02 1 15:39:23 126.250
2.2.1 2018-12-02 2 16:02:54 163.250
2.2.2 2018-12-02 2 15:57:39 178.750
2.2.3 2018-12-02 2 15:46:53 148.250
2.2.4 2018-12-02 2 15:40:27 163.500
2.2.5 2018-12-02 2 15:50:49 147.500
2.2.6 2018-12-02 2 15:40:24 156.000
2.2.7 2018-12-02 2 15:43:00 104.750
2.2.8 2018-12-02 2 15:41:55 170.750
2.2.9 2018-12-02 2 15:41:04 127.250
Which shows me the last time each device reported on a day, and what that
report time was, Great.
So what if I wanted to show the values through the day? Is there a way from
a query perpective, to perhaps use Windowing or something to show results
like the above raw table, but instead of just having the random times
returned, group by opt_id and get the max_ts as I am doing, but do that for
each say 5 minute, or 30 minute, or 60 minute block in a day?
The best example to explain what I am looking for is using 3 groups of 1
hour blocks: I could use a UNION like this:
select * from (
select opt_id, ds, max(opt_string) as opt_string, '10:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '10:00:00'
group by ds, opt_id
order by opt_id ASC
)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '11:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '11:00:00'
group by ds, opt_id
order by opt_id ASC)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '12:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '12:00:00'
group by ds, opt_id
order by opt_id ASC)
Where I would get something like:
index opt_id ds opt_string max_ts max_Eday
0 2.1.1 12/2/18 1 10:00:00 18
1 2.1.3 12/2/18 1 10:00:00 15.75
2 2.1.4 12/2/18 1 10:00:00 12.5
3 2.1.5 12/2/18 1 10:00:00 12.75
4 2.1.6 12/2/18 1 10:00:00 18.75
5 2.1.7 12/2/18 1 10:00:00 7.25
6 2.1.8 12/2/18 1 10:00:00 16.25
7 2.1.9 12/2/18 1 10:00:00 9.75
8 2.2.1 12/2/18 2 10:00:00 18
9 2.2.2 12/2/18 2 10:00:00 15.25
10 2.2.3 12/2/18 2 10:00:00 2.75
11 2.2.4 12/2/18 2 10:00:00 6.5
12 2.2.5 12/2/18 2 10:00:00 16.25
13 2.2.6 12/2/18 2 10:00:00 11.25
14 2.2.7 12/2/18 2 10:00:00 13.5
15 2.2.8 12/2/18 2 10:00:00 13.75
16 2.2.9 12/2/18 2 10:00:00 14.5
17 2.1.1 12/2/18 1 11:00:00 41
18 2.1.3 12/2/18 1 11:00:00 32
19 2.1.4 12/2/18 1 11:00:00 25
20 2.1.5 12/2/18 1 11:00:00 26.75
21 2.1.6 12/2/18 1 11:00:00 42.25
22 2.1.7 12/2/18 1 11:00:00 27
23 2.1.8 12/2/18 1 11:00:00 35.75
24 2.1.9 12/2/18 1 11:00:00 22
25 2.2.1 12/2/18 2 11:00:00 34.5
26 2.2.2 12/2/18 2 11:00:00 42.5
27 2.2.3 12/2/18 2 11:00:00 19
28 2.2.4 12/2/18 2 11:00:00 29.25
29 2.2.5 12/2/18 2 11:00:00 32.25
30 2.2.6 12/2/18 2 11:00:00 30.25
31 2.2.7 12/2/18 2 11:00:00 26.75
32 2.2.8 12/2/18 2 11:00:00 33.5
33 2.2.9 12/2/18 2 11:00:00 28.5
34 2.1.1 12/2/18 1 12:00:00 81.5
35 2.1.3 12/2/18 1 12:00:00 62.75
36 2.1.4 12/2/18 1 12:00:00 48.5
37 2.1.5 12/2/18 1 12:00:00 48.25
38 2.1.6 12/2/18 1 12:00:00 80.5
39 2.1.7 12/2/18 1 12:00:00 62
40 2.1.8 12/2/18 1 12:00:00 66.25
41 2.1.9 12/2/18 1 12:00:00 43.75
42 2.2.1 12/2/18 2 12:00:00 62.75
43 2.2.2 12/2/18 2 12:00:00 83.5
44 2.2.3 12/2/18 2 12:00:00 49
45 2.2.4 12/2/18 2 12:00:00 66.25
46 2.2.5 12/2/18 2 12:00:00 56.5
47 2.2.6 12/2/18 2 12:00:00 68
48 2.2.7 12/2/18 2 12:00:00 43
49 2.2.8 12/2/18 2 12:00:00 72.5
50 2.2.9 12/2/18 2 12:00:00 49.25
That's great, but if I wanted to say all the hours in a day, that Union
would get ugly, or what about 30 minute blocks? 5 minute blocks?
Is there a way to get these types of results without the Union and being
explicit about the times? Any advice, including "Look at X" would be
helpful. Or if you need a diversion, dig on in!
John