Discussion:
Time for a fun Query Question
John Omernik
2018-12-04 16:05:39 UTC
Permalink
Time for a fun question: How to be clever with queries!


I have a table that takes readings from an IoT type device


opt_id dt ts eday opt_string
2.1.1 2018-12-01 10:43:43 12.5 1
2.1.2 2018-12-01 10:32:43 5.5 1
2.1.3 2018-12-01 10:55:02 20.5 1
2.2.1 2018-12-01 10:43:43 12.2 2
2.2.2 2018-12-01 10:12:56 43.7 2
2.2.3 2018-12-01 10:50:23 12.2 2
2.1.1 2018-12-01 13:43:43 45.5 1
2.1.2 2018-12-01 13:32:43 20.5 1
2.1.3 2018-12-01 13:55:02 45.5 1
2.2.1 2018-12-01 13:43:43 30.2 2
2.2.2 2018-12-01 13:12:56 81.7 2
2.2.3 2018-12-01 13:50:23 50.2 2


Pretty simple. Essentially, each "device" (opt_id) reports throughout the
day, what it's cumulative value (eday), at that time, for that day

These devices belong to groups which is it's own column (opt_string)


So if I run a query like this:

select opt_id, ds, max(opt_string) as opt_string, max(ts) as max_ts,
max(Eday) as max_Eday
from `mydata`
where ds = '2018-12-01'
group by ds, opt_id, order by opt_id ASC

I get:

opt_id ds opt_string max_ts max_Eday
2.1.1 2018-12-02 1 15:56:33 181.000
2.1.3 2018-12-02 1 15:47:02 162.500
2.1.4 2018-12-02 1 15:54:19 122.250
2.1.5 2018-12-02 1 15:40:15 132.000
2.1.6 2018-12-02 1 15:45:04 183.250
2.1.7 2018-12-02 1 15:41:15 161.000
2.1.8 2018-12-02 1 15:42:10 166.250
2.1.9 2018-12-02 1 15:39:23 126.250
2.2.1 2018-12-02 2 16:02:54 163.250
2.2.2 2018-12-02 2 15:57:39 178.750
2.2.3 2018-12-02 2 15:46:53 148.250
2.2.4 2018-12-02 2 15:40:27 163.500
2.2.5 2018-12-02 2 15:50:49 147.500
2.2.6 2018-12-02 2 15:40:24 156.000
2.2.7 2018-12-02 2 15:43:00 104.750
2.2.8 2018-12-02 2 15:41:55 170.750
2.2.9 2018-12-02 2 15:41:04 127.250

Which shows me the last time each device reported on a day, and what that
report time was, Great.


So what if I wanted to show the values through the day? Is there a way from
a query perpective, to perhaps use Windowing or something to show results
like the above raw table, but instead of just having the random times
returned, group by opt_id and get the max_ts as I am doing, but do that for
each say 5 minute, or 30 minute, or 60 minute block in a day?

The best example to explain what I am looking for is using 3 groups of 1
hour blocks: I could use a UNION like this:

select * from (
select opt_id, ds, max(opt_string) as opt_string, '10:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '10:00:00'
group by ds, opt_id
order by opt_id ASC
)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '11:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '11:00:00'
group by ds, opt_id
order by opt_id ASC)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '12:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '12:00:00'
group by ds, opt_id
order by opt_id ASC)


Where I would get something like:

index opt_id ds opt_string max_ts max_Eday
0 2.1.1 12/2/18 1 10:00:00 18
1 2.1.3 12/2/18 1 10:00:00 15.75
2 2.1.4 12/2/18 1 10:00:00 12.5
3 2.1.5 12/2/18 1 10:00:00 12.75
4 2.1.6 12/2/18 1 10:00:00 18.75
5 2.1.7 12/2/18 1 10:00:00 7.25
6 2.1.8 12/2/18 1 10:00:00 16.25
7 2.1.9 12/2/18 1 10:00:00 9.75
8 2.2.1 12/2/18 2 10:00:00 18
9 2.2.2 12/2/18 2 10:00:00 15.25
10 2.2.3 12/2/18 2 10:00:00 2.75
11 2.2.4 12/2/18 2 10:00:00 6.5
12 2.2.5 12/2/18 2 10:00:00 16.25
13 2.2.6 12/2/18 2 10:00:00 11.25
14 2.2.7 12/2/18 2 10:00:00 13.5
15 2.2.8 12/2/18 2 10:00:00 13.75
16 2.2.9 12/2/18 2 10:00:00 14.5
17 2.1.1 12/2/18 1 11:00:00 41
18 2.1.3 12/2/18 1 11:00:00 32
19 2.1.4 12/2/18 1 11:00:00 25
20 2.1.5 12/2/18 1 11:00:00 26.75
21 2.1.6 12/2/18 1 11:00:00 42.25
22 2.1.7 12/2/18 1 11:00:00 27
23 2.1.8 12/2/18 1 11:00:00 35.75
24 2.1.9 12/2/18 1 11:00:00 22
25 2.2.1 12/2/18 2 11:00:00 34.5
26 2.2.2 12/2/18 2 11:00:00 42.5
27 2.2.3 12/2/18 2 11:00:00 19
28 2.2.4 12/2/18 2 11:00:00 29.25
29 2.2.5 12/2/18 2 11:00:00 32.25
30 2.2.6 12/2/18 2 11:00:00 30.25
31 2.2.7 12/2/18 2 11:00:00 26.75
32 2.2.8 12/2/18 2 11:00:00 33.5
33 2.2.9 12/2/18 2 11:00:00 28.5
34 2.1.1 12/2/18 1 12:00:00 81.5
35 2.1.3 12/2/18 1 12:00:00 62.75
36 2.1.4 12/2/18 1 12:00:00 48.5
37 2.1.5 12/2/18 1 12:00:00 48.25
38 2.1.6 12/2/18 1 12:00:00 80.5
39 2.1.7 12/2/18 1 12:00:00 62
40 2.1.8 12/2/18 1 12:00:00 66.25
41 2.1.9 12/2/18 1 12:00:00 43.75
42 2.2.1 12/2/18 2 12:00:00 62.75
43 2.2.2 12/2/18 2 12:00:00 83.5
44 2.2.3 12/2/18 2 12:00:00 49
45 2.2.4 12/2/18 2 12:00:00 66.25
46 2.2.5 12/2/18 2 12:00:00 56.5
47 2.2.6 12/2/18 2 12:00:00 68
48 2.2.7 12/2/18 2 12:00:00 43
49 2.2.8 12/2/18 2 12:00:00 72.5
50 2.2.9 12/2/18 2 12:00:00 49.25


That's great, but if I wanted to say all the hours in a day, that Union
would get ugly, or what about 30 minute blocks? 5 minute blocks?

Is there a way to get these types of results without the Union and being
explicit about the times? Any advice, including "Look at X" would be
helpful. Or if you need a diversion, dig on in!

John
John Omernik
2018-12-04 16:39:58 UTC
Permalink
So one approach I have is this:



select ds, ts_win, opt_string, opt_id, max(eday) as max_eday from (
select a.ds, a.ts_win, a.opt_string, b.opt_id, b.eday, b.ts_win as
rpt_ts_win, b.ts
FROM
(
select distinct ds, opt_string, opt_id, concat(substr(ts, 1, 4), '0:00') as
ts_win
from dfs.prod.view_optdata where ds = '2018-12-02' order by ds, ts_win,
opt_string
) a JOIN
(
select opt_id, ds, opt_string, concat(substr(ts, 1, 4), '0:00') as ts_win,
ts, eday
from dfs.prod.view_optdata where ds = '2018-12-02'
) b ON a.ds = b.ds and a.opt_string = b.opt_string
WHERE b.ts <= a.ts_win
) c
group by ds, ts_win, opt_string, opt_id
order by ds, ts_win ASC, opt_string ASC, opt_id ASC

This breaks up my day into 10 minute intervals (based on string slicing)
and then shows me what's reported before that... I think :)

This is limited in that I can do it only in time intervals that can be
substringed out of a time field in string format. Still open to other
ideas :)
Post by John Omernik
Time for a fun question: How to be clever with queries!
I have a table that takes readings from an IoT type device
opt_id dt ts eday opt_string
2.1.1 2018-12-01 10:43:43 12.5 1
2.1.2 2018-12-01 10:32:43 5.5 1
2.1.3 2018-12-01 10:55:02 20.5 1
2.2.1 2018-12-01 10:43:43 12.2 2
2.2.2 2018-12-01 10:12:56 43.7 2
2.2.3 2018-12-01 10:50:23 12.2 2
2.1.1 2018-12-01 13:43:43 45.5 1
2.1.2 2018-12-01 13:32:43 20.5 1
2.1.3 2018-12-01 13:55:02 45.5 1
2.2.1 2018-12-01 13:43:43 30.2 2
2.2.2 2018-12-01 13:12:56 81.7 2
2.2.3 2018-12-01 13:50:23 50.2 2
Pretty simple. Essentially, each "device" (opt_id) reports throughout the
day, what it's cumulative value (eday), at that time, for that day
These devices belong to groups which is it's own column (opt_string)
select opt_id, ds, max(opt_string) as opt_string, max(ts) as max_ts,
max(Eday) as max_Eday
from `mydata`
where ds = '2018-12-01'
group by ds, opt_id, order by opt_id ASC
opt_id ds opt_string max_ts max_Eday
2.1.1 2018-12-02 1 15:56:33 181.000
2.1.3 2018-12-02 1 15:47:02 162.500
2.1.4 2018-12-02 1 15:54:19 122.250
2.1.5 2018-12-02 1 15:40:15 132.000
2.1.6 2018-12-02 1 15:45:04 183.250
2.1.7 2018-12-02 1 15:41:15 161.000
2.1.8 2018-12-02 1 15:42:10 166.250
2.1.9 2018-12-02 1 15:39:23 126.250
2.2.1 2018-12-02 2 16:02:54 163.250
2.2.2 2018-12-02 2 15:57:39 178.750
2.2.3 2018-12-02 2 15:46:53 148.250
2.2.4 2018-12-02 2 15:40:27 163.500
2.2.5 2018-12-02 2 15:50:49 147.500
2.2.6 2018-12-02 2 15:40:24 156.000
2.2.7 2018-12-02 2 15:43:00 104.750
2.2.8 2018-12-02 2 15:41:55 170.750
2.2.9 2018-12-02 2 15:41:04 127.250
Which shows me the last time each device reported on a day, and what that
report time was, Great.
So what if I wanted to show the values through the day? Is there a way
from a query perpective, to perhaps use Windowing or something to show
results like the above raw table, but instead of just having the random
times returned, group by opt_id and get the max_ts as I am doing, but do
that for each say 5 minute, or 30 minute, or 60 minute block in a day?
The best example to explain what I am looking for is using 3 groups of 1
select * from (
select opt_id, ds, max(opt_string) as opt_string, '10:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '10:00:00'
group by ds, opt_id
order by opt_id ASC
)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '11:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '11:00:00'
group by ds, opt_id
order by opt_id ASC)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '12:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '12:00:00'
group by ds, opt_id
order by opt_id ASC)
index opt_id ds opt_string max_ts max_Eday
0 2.1.1 12/2/18 1 10:00:00 18
1 2.1.3 12/2/18 1 10:00:00 15.75
2 2.1.4 12/2/18 1 10:00:00 12.5
3 2.1.5 12/2/18 1 10:00:00 12.75
4 2.1.6 12/2/18 1 10:00:00 18.75
5 2.1.7 12/2/18 1 10:00:00 7.25
6 2.1.8 12/2/18 1 10:00:00 16.25
7 2.1.9 12/2/18 1 10:00:00 9.75
8 2.2.1 12/2/18 2 10:00:00 18
9 2.2.2 12/2/18 2 10:00:00 15.25
10 2.2.3 12/2/18 2 10:00:00 2.75
11 2.2.4 12/2/18 2 10:00:00 6.5
12 2.2.5 12/2/18 2 10:00:00 16.25
13 2.2.6 12/2/18 2 10:00:00 11.25
14 2.2.7 12/2/18 2 10:00:00 13.5
15 2.2.8 12/2/18 2 10:00:00 13.75
16 2.2.9 12/2/18 2 10:00:00 14.5
17 2.1.1 12/2/18 1 11:00:00 41
18 2.1.3 12/2/18 1 11:00:00 32
19 2.1.4 12/2/18 1 11:00:00 25
20 2.1.5 12/2/18 1 11:00:00 26.75
21 2.1.6 12/2/18 1 11:00:00 42.25
22 2.1.7 12/2/18 1 11:00:00 27
23 2.1.8 12/2/18 1 11:00:00 35.75
24 2.1.9 12/2/18 1 11:00:00 22
25 2.2.1 12/2/18 2 11:00:00 34.5
26 2.2.2 12/2/18 2 11:00:00 42.5
27 2.2.3 12/2/18 2 11:00:00 19
28 2.2.4 12/2/18 2 11:00:00 29.25
29 2.2.5 12/2/18 2 11:00:00 32.25
30 2.2.6 12/2/18 2 11:00:00 30.25
31 2.2.7 12/2/18 2 11:00:00 26.75
32 2.2.8 12/2/18 2 11:00:00 33.5
33 2.2.9 12/2/18 2 11:00:00 28.5
34 2.1.1 12/2/18 1 12:00:00 81.5
35 2.1.3 12/2/18 1 12:00:00 62.75
36 2.1.4 12/2/18 1 12:00:00 48.5
37 2.1.5 12/2/18 1 12:00:00 48.25
38 2.1.6 12/2/18 1 12:00:00 80.5
39 2.1.7 12/2/18 1 12:00:00 62
40 2.1.8 12/2/18 1 12:00:00 66.25
41 2.1.9 12/2/18 1 12:00:00 43.75
42 2.2.1 12/2/18 2 12:00:00 62.75
43 2.2.2 12/2/18 2 12:00:00 83.5
44 2.2.3 12/2/18 2 12:00:00 49
45 2.2.4 12/2/18 2 12:00:00 66.25
46 2.2.5 12/2/18 2 12:00:00 56.5
47 2.2.6 12/2/18 2 12:00:00 68
48 2.2.7 12/2/18 2 12:00:00 43
49 2.2.8 12/2/18 2 12:00:00 72.5
50 2.2.9 12/2/18 2 12:00:00 49.25
That's great, but if I wanted to say all the hours in a day, that Union
would get ugly, or what about 30 minute blocks? 5 minute blocks?
Is there a way to get these types of results without the Union and being
explicit about the times? Any advice, including "Look at X" would be
helpful. Or if you need a diversion, dig on in!
John
Ted Dunning
2018-12-04 21:02:39 UTC
Permalink
I would parse the timestamp into seconds since epoch. Then divide by use
floor(ts/600) as the key to group on 10 minute boundaries.

This works because:

- all timezones are multiples of 10 minutes away from UTC

- all leap seconds are hidden in the seconds since epoch conversions

- the epoch was on an even 10 minute interval

These all apply for any time interval that is a multiple of 10 minutes as
well (30 minutes, hourly, daily, weekly). More advanced calendar
manipulation is required for things like monthly grouping with timezone and
DST awareness. String hacking might work if you can take on some important
assumptions. Putting the calendar manipulations into a UDF wouldn't be hard.
Post by John Omernik
select ds, ts_win, opt_string, opt_id, max(eday) as max_eday from (
select a.ds, a.ts_win, a.opt_string, b.opt_id, b.eday, b.ts_win as
rpt_ts_win, b.ts
FROM
(
select distinct ds, opt_string, opt_id, concat(substr(ts, 1, 4), '0:00') as
ts_win
from dfs.prod.view_optdata where ds = '2018-12-02' order by ds, ts_win,
opt_string
) a JOIN
(
select opt_id, ds, opt_string, concat(substr(ts, 1, 4), '0:00') as ts_win,
ts, eday
from dfs.prod.view_optdata where ds = '2018-12-02'
) b ON a.ds = b.ds and a.opt_string = b.opt_string
WHERE b.ts <= a.ts_win
) c
group by ds, ts_win, opt_string, opt_id
order by ds, ts_win ASC, opt_string ASC, opt_id ASC
This breaks up my day into 10 minute intervals (based on string slicing)
and then shows me what's reported before that... I think :)
This is limited in that I can do it only in time intervals that can be
substringed out of a time field in string format. Still open to other
ideas :)
Post by John Omernik
Time for a fun question: How to be clever with queries!
I have a table that takes readings from an IoT type device
opt_id dt ts eday opt_string
2.1.1 2018-12-01 10:43:43 12.5 1
2.1.2 2018-12-01 10:32:43 5.5 1
2.1.3 2018-12-01 10:55:02 20.5 1
2.2.1 2018-12-01 10:43:43 12.2 2
2.2.2 2018-12-01 10:12:56 43.7 2
2.2.3 2018-12-01 10:50:23 12.2 2
2.1.1 2018-12-01 13:43:43 45.5 1
2.1.2 2018-12-01 13:32:43 20.5 1
2.1.3 2018-12-01 13:55:02 45.5 1
2.2.1 2018-12-01 13:43:43 30.2 2
2.2.2 2018-12-01 13:12:56 81.7 2
2.2.3 2018-12-01 13:50:23 50.2 2
Pretty simple. Essentially, each "device" (opt_id) reports throughout the
day, what it's cumulative value (eday), at that time, for that day
These devices belong to groups which is it's own column (opt_string)
select opt_id, ds, max(opt_string) as opt_string, max(ts) as max_ts,
max(Eday) as max_Eday
from `mydata`
where ds = '2018-12-01'
group by ds, opt_id, order by opt_id ASC
opt_id ds opt_string max_ts max_Eday
2.1.1 2018-12-02 1 15:56:33 181.000
2.1.3 2018-12-02 1 15:47:02 162.500
2.1.4 2018-12-02 1 15:54:19 122.250
2.1.5 2018-12-02 1 15:40:15 132.000
2.1.6 2018-12-02 1 15:45:04 183.250
2.1.7 2018-12-02 1 15:41:15 161.000
2.1.8 2018-12-02 1 15:42:10 166.250
2.1.9 2018-12-02 1 15:39:23 126.250
2.2.1 2018-12-02 2 16:02:54 163.250
2.2.2 2018-12-02 2 15:57:39 178.750
2.2.3 2018-12-02 2 15:46:53 148.250
2.2.4 2018-12-02 2 15:40:27 163.500
2.2.5 2018-12-02 2 15:50:49 147.500
2.2.6 2018-12-02 2 15:40:24 156.000
2.2.7 2018-12-02 2 15:43:00 104.750
2.2.8 2018-12-02 2 15:41:55 170.750
2.2.9 2018-12-02 2 15:41:04 127.250
Which shows me the last time each device reported on a day, and what that
report time was, Great.
So what if I wanted to show the values through the day? Is there a way
from a query perpective, to perhaps use Windowing or something to show
results like the above raw table, but instead of just having the random
times returned, group by opt_id and get the max_ts as I am doing, but do
that for each say 5 minute, or 30 minute, or 60 minute block in a day?
The best example to explain what I am looking for is using 3 groups of 1
select * from (
select opt_id, ds, max(opt_string) as opt_string, '10:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '10:00:00'
group by ds, opt_id
order by opt_id ASC
)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '11:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '11:00:00'
group by ds, opt_id
order by opt_id ASC)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '12:00:00' as max_ts,
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '12:00:00'
group by ds, opt_id
order by opt_id ASC)
index opt_id ds opt_string max_ts max_Eday
0 2.1.1 12/2/18 1 10:00:00 18
1 2.1.3 12/2/18 1 10:00:00 15.75
2 2.1.4 12/2/18 1 10:00:00 12.5
3 2.1.5 12/2/18 1 10:00:00 12.75
4 2.1.6 12/2/18 1 10:00:00 18.75
5 2.1.7 12/2/18 1 10:00:00 7.25
6 2.1.8 12/2/18 1 10:00:00 16.25
7 2.1.9 12/2/18 1 10:00:00 9.75
8 2.2.1 12/2/18 2 10:00:00 18
9 2.2.2 12/2/18 2 10:00:00 15.25
10 2.2.3 12/2/18 2 10:00:00 2.75
11 2.2.4 12/2/18 2 10:00:00 6.5
12 2.2.5 12/2/18 2 10:00:00 16.25
13 2.2.6 12/2/18 2 10:00:00 11.25
14 2.2.7 12/2/18 2 10:00:00 13.5
15 2.2.8 12/2/18 2 10:00:00 13.75
16 2.2.9 12/2/18 2 10:00:00 14.5
17 2.1.1 12/2/18 1 11:00:00 41
18 2.1.3 12/2/18 1 11:00:00 32
19 2.1.4 12/2/18 1 11:00:00 25
20 2.1.5 12/2/18 1 11:00:00 26.75
21 2.1.6 12/2/18 1 11:00:00 42.25
22 2.1.7 12/2/18 1 11:00:00 27
23 2.1.8 12/2/18 1 11:00:00 35.75
24 2.1.9 12/2/18 1 11:00:00 22
25 2.2.1 12/2/18 2 11:00:00 34.5
26 2.2.2 12/2/18 2 11:00:00 42.5
27 2.2.3 12/2/18 2 11:00:00 19
28 2.2.4 12/2/18 2 11:00:00 29.25
29 2.2.5 12/2/18 2 11:00:00 32.25
30 2.2.6 12/2/18 2 11:00:00 30.25
31 2.2.7 12/2/18 2 11:00:00 26.75
32 2.2.8 12/2/18 2 11:00:00 33.5
33 2.2.9 12/2/18 2 11:00:00 28.5
34 2.1.1 12/2/18 1 12:00:00 81.5
35 2.1.3 12/2/18 1 12:00:00 62.75
36 2.1.4 12/2/18 1 12:00:00 48.5
37 2.1.5 12/2/18 1 12:00:00 48.25
38 2.1.6 12/2/18 1 12:00:00 80.5
39 2.1.7 12/2/18 1 12:00:00 62
40 2.1.8 12/2/18 1 12:00:00 66.25
41 2.1.9 12/2/18 1 12:00:00 43.75
42 2.2.1 12/2/18 2 12:00:00 62.75
43 2.2.2 12/2/18 2 12:00:00 83.5
44 2.2.3 12/2/18 2 12:00:00 49
45 2.2.4 12/2/18 2 12:00:00 66.25
46 2.2.5 12/2/18 2 12:00:00 56.5
47 2.2.6 12/2/18 2 12:00:00 68
48 2.2.7 12/2/18 2 12:00:00 43
49 2.2.8 12/2/18 2 12:00:00 72.5
50 2.2.9 12/2/18 2 12:00:00 49.25
That's great, but if I wanted to say all the hours in a day, that Union
would get ugly, or what about 30 minute blocks? 5 minute blocks?
Is there a way to get these types of results without the Union and being
explicit about the times? Any advice, including "Look at X" would be
helpful. Or if you need a diversion, dig on in!
John
Joel Pfaff
2018-12-05 16:01:17 UTC
Permalink
Hello,

Since I had to maintain a Timezone library a while ago, I discovered that
not all timezones are multiple of 10 minutes away from UTC, a small number
of them are using 15 mins steps.
See https://www.timeanddate.com/time/time-zones-interesting.html

And not all timezone are less than 12 hours away from UTC:
https://www.timeanddate.com/worldclock/kiribati/kiritimati (-14h from UTC).

And some places have more than one DST period per year as well.

Maintaining this lib was definitively a funny time.

Regards, Joel
Post by Ted Dunning
I would parse the timestamp into seconds since epoch. Then divide by use
floor(ts/600) as the key to group on 10 minute boundaries.
- all timezones are multiples of 10 minutes away from UTC
- all leap seconds are hidden in the seconds since epoch conversions
- the epoch was on an even 10 minute interval
These all apply for any time interval that is a multiple of 10 minutes as
well (30 minutes, hourly, daily, weekly). More advanced calendar
manipulation is required for things like monthly grouping with timezone and
DST awareness. String hacking might work if you can take on some important
assumptions. Putting the calendar manipulations into a UDF wouldn't be hard.
Post by John Omernik
select ds, ts_win, opt_string, opt_id, max(eday) as max_eday from (
select a.ds, a.ts_win, a.opt_string, b.opt_id, b.eday, b.ts_win as
rpt_ts_win, b.ts
FROM
(
select distinct ds, opt_string, opt_id, concat(substr(ts, 1, 4), '0:00')
as
Post by John Omernik
ts_win
from dfs.prod.view_optdata where ds = '2018-12-02' order by ds, ts_win,
opt_string
) a JOIN
(
select opt_id, ds, opt_string, concat(substr(ts, 1, 4), '0:00') as
ts_win,
Post by John Omernik
ts, eday
from dfs.prod.view_optdata where ds = '2018-12-02'
) b ON a.ds = b.ds and a.opt_string = b.opt_string
WHERE b.ts <= a.ts_win
) c
group by ds, ts_win, opt_string, opt_id
order by ds, ts_win ASC, opt_string ASC, opt_id ASC
This breaks up my day into 10 minute intervals (based on string slicing)
and then shows me what's reported before that... I think :)
This is limited in that I can do it only in time intervals that can be
substringed out of a time field in string format. Still open to other
ideas :)
Post by John Omernik
Time for a fun question: How to be clever with queries!
I have a table that takes readings from an IoT type device
opt_id dt ts eday opt_string
2.1.1 2018-12-01 10:43:43 12.5 1
2.1.2 2018-12-01 10:32:43 5.5 1
2.1.3 2018-12-01 10:55:02 20.5 1
2.2.1 2018-12-01 10:43:43 12.2 2
2.2.2 2018-12-01 10:12:56 43.7 2
2.2.3 2018-12-01 10:50:23 12.2 2
2.1.1 2018-12-01 13:43:43 45.5 1
2.1.2 2018-12-01 13:32:43 20.5 1
2.1.3 2018-12-01 13:55:02 45.5 1
2.2.1 2018-12-01 13:43:43 30.2 2
2.2.2 2018-12-01 13:12:56 81.7 2
2.2.3 2018-12-01 13:50:23 50.2 2
Pretty simple. Essentially, each "device" (opt_id) reports throughout
the
Post by John Omernik
Post by John Omernik
day, what it's cumulative value (eday), at that time, for that day
These devices belong to groups which is it's own column (opt_string)
select opt_id, ds, max(opt_string) as opt_string, max(ts) as max_ts,
max(Eday) as max_Eday
from `mydata`
where ds = '2018-12-01'
group by ds, opt_id, order by opt_id ASC
opt_id ds opt_string max_ts max_Eday
2.1.1 2018-12-02 1 15:56:33 181.000
2.1.3 2018-12-02 1 15:47:02 162.500
2.1.4 2018-12-02 1 15:54:19 122.250
2.1.5 2018-12-02 1 15:40:15 132.000
2.1.6 2018-12-02 1 15:45:04 183.250
2.1.7 2018-12-02 1 15:41:15 161.000
2.1.8 2018-12-02 1 15:42:10 166.250
2.1.9 2018-12-02 1 15:39:23 126.250
2.2.1 2018-12-02 2 16:02:54 163.250
2.2.2 2018-12-02 2 15:57:39 178.750
2.2.3 2018-12-02 2 15:46:53 148.250
2.2.4 2018-12-02 2 15:40:27 163.500
2.2.5 2018-12-02 2 15:50:49 147.500
2.2.6 2018-12-02 2 15:40:24 156.000
2.2.7 2018-12-02 2 15:43:00 104.750
2.2.8 2018-12-02 2 15:41:55 170.750
2.2.9 2018-12-02 2 15:41:04 127.250
Which shows me the last time each device reported on a day, and what
that
Post by John Omernik
Post by John Omernik
report time was, Great.
So what if I wanted to show the values through the day? Is there a way
from a query perpective, to perhaps use Windowing or something to show
results like the above raw table, but instead of just having the random
times returned, group by opt_id and get the max_ts as I am doing, but
do
Post by John Omernik
Post by John Omernik
that for each say 5 minute, or 30 minute, or 60 minute block in a day?
The best example to explain what I am looking for is using 3 groups of
1
Post by John Omernik
Post by John Omernik
select * from (
select opt_id, ds, max(opt_string) as opt_string, '10:00:00' as
max_ts,
Post by John Omernik
Post by John Omernik
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '10:00:00'
group by ds, opt_id
order by opt_id ASC
)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '11:00:00' as
max_ts,
Post by John Omernik
Post by John Omernik
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '11:00:00'
group by ds, opt_id
order by opt_id ASC)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '12:00:00' as
max_ts,
Post by John Omernik
Post by John Omernik
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '12:00:00'
group by ds, opt_id
order by opt_id ASC)
index opt_id ds opt_string max_ts max_Eday
0 2.1.1 12/2/18 1 10:00:00 18
1 2.1.3 12/2/18 1 10:00:00 15.75
2 2.1.4 12/2/18 1 10:00:00 12.5
3 2.1.5 12/2/18 1 10:00:00 12.75
4 2.1.6 12/2/18 1 10:00:00 18.75
5 2.1.7 12/2/18 1 10:00:00 7.25
6 2.1.8 12/2/18 1 10:00:00 16.25
7 2.1.9 12/2/18 1 10:00:00 9.75
8 2.2.1 12/2/18 2 10:00:00 18
9 2.2.2 12/2/18 2 10:00:00 15.25
10 2.2.3 12/2/18 2 10:00:00 2.75
11 2.2.4 12/2/18 2 10:00:00 6.5
12 2.2.5 12/2/18 2 10:00:00 16.25
13 2.2.6 12/2/18 2 10:00:00 11.25
14 2.2.7 12/2/18 2 10:00:00 13.5
15 2.2.8 12/2/18 2 10:00:00 13.75
16 2.2.9 12/2/18 2 10:00:00 14.5
17 2.1.1 12/2/18 1 11:00:00 41
18 2.1.3 12/2/18 1 11:00:00 32
19 2.1.4 12/2/18 1 11:00:00 25
20 2.1.5 12/2/18 1 11:00:00 26.75
21 2.1.6 12/2/18 1 11:00:00 42.25
22 2.1.7 12/2/18 1 11:00:00 27
23 2.1.8 12/2/18 1 11:00:00 35.75
24 2.1.9 12/2/18 1 11:00:00 22
25 2.2.1 12/2/18 2 11:00:00 34.5
26 2.2.2 12/2/18 2 11:00:00 42.5
27 2.2.3 12/2/18 2 11:00:00 19
28 2.2.4 12/2/18 2 11:00:00 29.25
29 2.2.5 12/2/18 2 11:00:00 32.25
30 2.2.6 12/2/18 2 11:00:00 30.25
31 2.2.7 12/2/18 2 11:00:00 26.75
32 2.2.8 12/2/18 2 11:00:00 33.5
33 2.2.9 12/2/18 2 11:00:00 28.5
34 2.1.1 12/2/18 1 12:00:00 81.5
35 2.1.3 12/2/18 1 12:00:00 62.75
36 2.1.4 12/2/18 1 12:00:00 48.5
37 2.1.5 12/2/18 1 12:00:00 48.25
38 2.1.6 12/2/18 1 12:00:00 80.5
39 2.1.7 12/2/18 1 12:00:00 62
40 2.1.8 12/2/18 1 12:00:00 66.25
41 2.1.9 12/2/18 1 12:00:00 43.75
42 2.2.1 12/2/18 2 12:00:00 62.75
43 2.2.2 12/2/18 2 12:00:00 83.5
44 2.2.3 12/2/18 2 12:00:00 49
45 2.2.4 12/2/18 2 12:00:00 66.25
46 2.2.5 12/2/18 2 12:00:00 56.5
47 2.2.6 12/2/18 2 12:00:00 68
48 2.2.7 12/2/18 2 12:00:00 43
49 2.2.8 12/2/18 2 12:00:00 72.5
50 2.2.9 12/2/18 2 12:00:00 49.25
That's great, but if I wanted to say all the hours in a day, that Union
would get ugly, or what about 30 minute blocks? 5 minute blocks?
Is there a way to get these types of results without the Union and
being
Post by John Omernik
Post by John Omernik
explicit about the times? Any advice, including "Look at X" would be
helpful. Or if you need a diversion, dig on in!
John
Ted Dunning
2018-12-05 23:52:00 UTC
Permalink
Argh ...

I didn't want to know that. Just when I thought things made a slight bit of
sense. Happily I have never known of any users in Eucla or the Chatham
islands. But Nepal doing that hurts.

I will have to see if I can remove that knowledge from my head.
Post by Joel Pfaff
Hello,
Since I had to maintain a Timezone library a while ago, I discovered that
not all timezones are multiple of 10 minutes away from UTC, a small number
of them are using 15 mins steps.
See https://www.timeanddate.com/time/time-zones-interesting.html
https://www.timeanddate.com/worldclock/kiribati/kiritimati (-14h from UTC).
And some places have more than one DST period per year as well.
Maintaining this lib was definitively a funny time.
Regards, Joel
Post by Ted Dunning
I would parse the timestamp into seconds since epoch. Then divide by use
floor(ts/600) as the key to group on 10 minute boundaries.
- all timezones are multiples of 10 minutes away from UTC
- all leap seconds are hidden in the seconds since epoch conversions
- the epoch was on an even 10 minute interval
These all apply for any time interval that is a multiple of 10 minutes as
well (30 minutes, hourly, daily, weekly). More advanced calendar
manipulation is required for things like monthly grouping with timezone
and
Post by Ted Dunning
DST awareness. String hacking might work if you can take on some
important
Post by Ted Dunning
assumptions. Putting the calendar manipulations into a UDF wouldn't be hard.
Post by John Omernik
select ds, ts_win, opt_string, opt_id, max(eday) as max_eday from (
select a.ds, a.ts_win, a.opt_string, b.opt_id, b.eday, b.ts_win as
rpt_ts_win, b.ts
FROM
(
select distinct ds, opt_string, opt_id, concat(substr(ts, 1, 4),
'0:00')
Post by Ted Dunning
as
Post by John Omernik
ts_win
from dfs.prod.view_optdata where ds = '2018-12-02' order by ds, ts_win,
opt_string
) a JOIN
(
select opt_id, ds, opt_string, concat(substr(ts, 1, 4), '0:00') as
ts_win,
Post by John Omernik
ts, eday
from dfs.prod.view_optdata where ds = '2018-12-02'
) b ON a.ds = b.ds and a.opt_string = b.opt_string
WHERE b.ts <= a.ts_win
) c
group by ds, ts_win, opt_string, opt_id
order by ds, ts_win ASC, opt_string ASC, opt_id ASC
This breaks up my day into 10 minute intervals (based on string
slicing)
Post by Ted Dunning
Post by John Omernik
and then shows me what's reported before that... I think :)
This is limited in that I can do it only in time intervals that can be
substringed out of a time field in string format. Still open to other
ideas :)
Post by John Omernik
Time for a fun question: How to be clever with queries!
I have a table that takes readings from an IoT type device
opt_id dt ts eday opt_string
2.1.1 2018-12-01 10:43:43 12.5 1
2.1.2 2018-12-01 10:32:43 5.5 1
2.1.3 2018-12-01 10:55:02 20.5 1
2.2.1 2018-12-01 10:43:43 12.2 2
2.2.2 2018-12-01 10:12:56 43.7 2
2.2.3 2018-12-01 10:50:23 12.2 2
2.1.1 2018-12-01 13:43:43 45.5 1
2.1.2 2018-12-01 13:32:43 20.5 1
2.1.3 2018-12-01 13:55:02 45.5 1
2.2.1 2018-12-01 13:43:43 30.2 2
2.2.2 2018-12-01 13:12:56 81.7 2
2.2.3 2018-12-01 13:50:23 50.2 2
Pretty simple. Essentially, each "device" (opt_id) reports throughout
the
Post by John Omernik
Post by John Omernik
day, what it's cumulative value (eday), at that time, for that day
These devices belong to groups which is it's own column (opt_string)
select opt_id, ds, max(opt_string) as opt_string, max(ts) as max_ts,
max(Eday) as max_Eday
from `mydata`
where ds = '2018-12-01'
group by ds, opt_id, order by opt_id ASC
opt_id ds opt_string max_ts max_Eday
2.1.1 2018-12-02 1 15:56:33 181.000
2.1.3 2018-12-02 1 15:47:02 162.500
2.1.4 2018-12-02 1 15:54:19 122.250
2.1.5 2018-12-02 1 15:40:15 132.000
2.1.6 2018-12-02 1 15:45:04 183.250
2.1.7 2018-12-02 1 15:41:15 161.000
2.1.8 2018-12-02 1 15:42:10 166.250
2.1.9 2018-12-02 1 15:39:23 126.250
2.2.1 2018-12-02 2 16:02:54 163.250
2.2.2 2018-12-02 2 15:57:39 178.750
2.2.3 2018-12-02 2 15:46:53 148.250
2.2.4 2018-12-02 2 15:40:27 163.500
2.2.5 2018-12-02 2 15:50:49 147.500
2.2.6 2018-12-02 2 15:40:24 156.000
2.2.7 2018-12-02 2 15:43:00 104.750
2.2.8 2018-12-02 2 15:41:55 170.750
2.2.9 2018-12-02 2 15:41:04 127.250
Which shows me the last time each device reported on a day, and what
that
Post by John Omernik
Post by John Omernik
report time was, Great.
So what if I wanted to show the values through the day? Is there a
way
Post by Ted Dunning
Post by John Omernik
Post by John Omernik
from a query perpective, to perhaps use Windowing or something to
show
Post by Ted Dunning
Post by John Omernik
Post by John Omernik
results like the above raw table, but instead of just having the
random
Post by Ted Dunning
Post by John Omernik
Post by John Omernik
times returned, group by opt_id and get the max_ts as I am doing, but
do
Post by John Omernik
Post by John Omernik
that for each say 5 minute, or 30 minute, or 60 minute block in a
day?
Post by Ted Dunning
Post by John Omernik
Post by John Omernik
The best example to explain what I am looking for is using 3 groups
of
Post by Ted Dunning
1
Post by John Omernik
Post by John Omernik
select * from (
select opt_id, ds, max(opt_string) as opt_string, '10:00:00' as
max_ts,
Post by John Omernik
Post by John Omernik
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <=
'10:00:00'
Post by Ted Dunning
Post by John Omernik
Post by John Omernik
group by ds, opt_id
order by opt_id ASC
)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '11:00:00' as
max_ts,
Post by John Omernik
Post by John Omernik
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <=
'11:00:00'
Post by Ted Dunning
Post by John Omernik
Post by John Omernik
group by ds, opt_id
order by opt_id ASC)
UNION
(select opt_id, ds, max(opt_string) as opt_string, '12:00:00' as
max_ts,
Post by John Omernik
Post by John Omernik
max(Eday) as max_Eday
from dfs.prod.view_optdata where ds = '2018-12-02' and ts <=
'12:00:00'
Post by Ted Dunning
Post by John Omernik
Post by John Omernik
group by ds, opt_id
order by opt_id ASC)
index opt_id ds opt_string max_ts max_Eday
0 2.1.1 12/2/18 1 10:00:00 18
1 2.1.3 12/2/18 1 10:00:00 15.75
2 2.1.4 12/2/18 1 10:00:00 12.5
3 2.1.5 12/2/18 1 10:00:00 12.75
4 2.1.6 12/2/18 1 10:00:00 18.75
5 2.1.7 12/2/18 1 10:00:00 7.25
6 2.1.8 12/2/18 1 10:00:00 16.25
7 2.1.9 12/2/18 1 10:00:00 9.75
8 2.2.1 12/2/18 2 10:00:00 18
9 2.2.2 12/2/18 2 10:00:00 15.25
10 2.2.3 12/2/18 2 10:00:00 2.75
11 2.2.4 12/2/18 2 10:00:00 6.5
12 2.2.5 12/2/18 2 10:00:00 16.25
13 2.2.6 12/2/18 2 10:00:00 11.25
14 2.2.7 12/2/18 2 10:00:00 13.5
15 2.2.8 12/2/18 2 10:00:00 13.75
16 2.2.9 12/2/18 2 10:00:00 14.5
17 2.1.1 12/2/18 1 11:00:00 41
18 2.1.3 12/2/18 1 11:00:00 32
19 2.1.4 12/2/18 1 11:00:00 25
20 2.1.5 12/2/18 1 11:00:00 26.75
21 2.1.6 12/2/18 1 11:00:00 42.25
22 2.1.7 12/2/18 1 11:00:00 27
23 2.1.8 12/2/18 1 11:00:00 35.75
24 2.1.9 12/2/18 1 11:00:00 22
25 2.2.1 12/2/18 2 11:00:00 34.5
26 2.2.2 12/2/18 2 11:00:00 42.5
27 2.2.3 12/2/18 2 11:00:00 19
28 2.2.4 12/2/18 2 11:00:00 29.25
29 2.2.5 12/2/18 2 11:00:00 32.25
30 2.2.6 12/2/18 2 11:00:00 30.25
31 2.2.7 12/2/18 2 11:00:00 26.75
32 2.2.8 12/2/18 2 11:00:00 33.5
33 2.2.9 12/2/18 2 11:00:00 28.5
34 2.1.1 12/2/18 1 12:00:00 81.5
35 2.1.3 12/2/18 1 12:00:00 62.75
36 2.1.4 12/2/18 1 12:00:00 48.5
37 2.1.5 12/2/18 1 12:00:00 48.25
38 2.1.6 12/2/18 1 12:00:00 80.5
39 2.1.7 12/2/18 1 12:00:00 62
40 2.1.8 12/2/18 1 12:00:00 66.25
41 2.1.9 12/2/18 1 12:00:00 43.75
42 2.2.1 12/2/18 2 12:00:00 62.75
43 2.2.2 12/2/18 2 12:00:00 83.5
44 2.2.3 12/2/18 2 12:00:00 49
45 2.2.4 12/2/18 2 12:00:00 66.25
46 2.2.5 12/2/18 2 12:00:00 56.5
47 2.2.6 12/2/18 2 12:00:00 68
48 2.2.7 12/2/18 2 12:00:00 43
49 2.2.8 12/2/18 2 12:00:00 72.5
50 2.2.9 12/2/18 2 12:00:00 49.25
That's great, but if I wanted to say all the hours in a day, that
Union
Post by Ted Dunning
Post by John Omernik
Post by John Omernik
would get ugly, or what about 30 minute blocks? 5 minute blocks?
Is there a way to get these types of results without the Union and
being
Post by John Omernik
Post by John Omernik
explicit about the times? Any advice, including "Look at X" would be
helpful. Or if you need a diversion, dig on in!
John
Loading...