Postgresql generate date series (performance)
up vote
1
down vote
favorite
Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series
function. In essence, it does not accord for the day of the month
correctly.
I have many different frequencies (provided by the user) that need to be calculated between a given start and end date. The start date can be any date and thus any day of the month. This generates issues when having frequencies such as monthly
combined with a start date of 2018-01-31
or 2018-01-30
as shown in the output below.
I created a solution and wanted to post this here for others to use as I could not find any other solution.
However, after some tests I have seen that my solution has a different performance compared to the built-in generate_series
when used on (absurdly) large date ranges. Does anyone have an insight as to how this can be improved?
TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.
Built-in Output
select generate_series(date '2018-01-31',
date '2018-05-31',
interval '1 month')::date
as frequency;
generates:
frequency
------------
2018-01-31
2018-02-28
2018-03-28
2018-04-28
2018-05-28
As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari
.
Expected Output
As a result of this issue I created a custom function:
create or replace function generate_date_series(
startsOn date,
endsOn date,
frequency interval)
returns setof date as $$
declare
intervalOn date := startsOn;
count int := 1;
begin
while intervalOn <= endsOn loop
return next intervalOn;
intervalOn := startsOn + (count * frequency);
count := count + 1;
end loop;
return;
end;
$$ language plpgsql immutable;
select generate_date_series(date '2018-01-31',
date '2018-05-31',
interval '1 month')
as frequency;
generates:
frequency
------------
2018-01-31
2018-02-28
2018-03-31
2018-04-30
2018-05-31
Performance comparison
No matter what date range is provided, the built-in generate_series
has a performance of 2ms on average for:
select generate_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
while the custom function generate_date_series
has a performance of 120ms on average for:
select generate_date_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
Question
In reality, such ranges will never occur and thus it is a non-issue. For most queries the custom generate_date_series
will attain the same performance. Although, I do wonder what causes the difference.
Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?
Is there a better way to implement generate_date_series
that performs as well as the built-in generate_series
?
Improved implementation without loops
(derived from the answer of @eurotrash)
create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)
returns setof date as $$
select (startsOn + (frequency * count))::date
from (
select (row_number() over ()) - 1 as count
from generate_series(startsOn, endsOn, frequency)
) series
$$ language sql immutable;
with the improved implementation, the generate_date_series
function has a performance of 45ms on average for:
select generate_date_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series
function twice.
sql postgresql dateinterval generate-series
add a comment |
up vote
1
down vote
favorite
Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series
function. In essence, it does not accord for the day of the month
correctly.
I have many different frequencies (provided by the user) that need to be calculated between a given start and end date. The start date can be any date and thus any day of the month. This generates issues when having frequencies such as monthly
combined with a start date of 2018-01-31
or 2018-01-30
as shown in the output below.
I created a solution and wanted to post this here for others to use as I could not find any other solution.
However, after some tests I have seen that my solution has a different performance compared to the built-in generate_series
when used on (absurdly) large date ranges. Does anyone have an insight as to how this can be improved?
TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.
Built-in Output
select generate_series(date '2018-01-31',
date '2018-05-31',
interval '1 month')::date
as frequency;
generates:
frequency
------------
2018-01-31
2018-02-28
2018-03-28
2018-04-28
2018-05-28
As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari
.
Expected Output
As a result of this issue I created a custom function:
create or replace function generate_date_series(
startsOn date,
endsOn date,
frequency interval)
returns setof date as $$
declare
intervalOn date := startsOn;
count int := 1;
begin
while intervalOn <= endsOn loop
return next intervalOn;
intervalOn := startsOn + (count * frequency);
count := count + 1;
end loop;
return;
end;
$$ language plpgsql immutable;
select generate_date_series(date '2018-01-31',
date '2018-05-31',
interval '1 month')
as frequency;
generates:
frequency
------------
2018-01-31
2018-02-28
2018-03-31
2018-04-30
2018-05-31
Performance comparison
No matter what date range is provided, the built-in generate_series
has a performance of 2ms on average for:
select generate_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
while the custom function generate_date_series
has a performance of 120ms on average for:
select generate_date_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
Question
In reality, such ranges will never occur and thus it is a non-issue. For most queries the custom generate_date_series
will attain the same performance. Although, I do wonder what causes the difference.
Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?
Is there a better way to implement generate_date_series
that performs as well as the built-in generate_series
?
Improved implementation without loops
(derived from the answer of @eurotrash)
create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)
returns setof date as $$
select (startsOn + (frequency * count))::date
from (
select (row_number() over ()) - 1 as count
from generate_series(startsOn, endsOn, frequency)
) series
$$ language sql immutable;
with the improved implementation, the generate_date_series
function has a performance of 45ms on average for:
select generate_date_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series
function twice.
sql postgresql dateinterval generate-series
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series
function. In essence, it does not accord for the day of the month
correctly.
I have many different frequencies (provided by the user) that need to be calculated between a given start and end date. The start date can be any date and thus any day of the month. This generates issues when having frequencies such as monthly
combined with a start date of 2018-01-31
or 2018-01-30
as shown in the output below.
I created a solution and wanted to post this here for others to use as I could not find any other solution.
However, after some tests I have seen that my solution has a different performance compared to the built-in generate_series
when used on (absurdly) large date ranges. Does anyone have an insight as to how this can be improved?
TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.
Built-in Output
select generate_series(date '2018-01-31',
date '2018-05-31',
interval '1 month')::date
as frequency;
generates:
frequency
------------
2018-01-31
2018-02-28
2018-03-28
2018-04-28
2018-05-28
As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari
.
Expected Output
As a result of this issue I created a custom function:
create or replace function generate_date_series(
startsOn date,
endsOn date,
frequency interval)
returns setof date as $$
declare
intervalOn date := startsOn;
count int := 1;
begin
while intervalOn <= endsOn loop
return next intervalOn;
intervalOn := startsOn + (count * frequency);
count := count + 1;
end loop;
return;
end;
$$ language plpgsql immutable;
select generate_date_series(date '2018-01-31',
date '2018-05-31',
interval '1 month')
as frequency;
generates:
frequency
------------
2018-01-31
2018-02-28
2018-03-31
2018-04-30
2018-05-31
Performance comparison
No matter what date range is provided, the built-in generate_series
has a performance of 2ms on average for:
select generate_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
while the custom function generate_date_series
has a performance of 120ms on average for:
select generate_date_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
Question
In reality, such ranges will never occur and thus it is a non-issue. For most queries the custom generate_date_series
will attain the same performance. Although, I do wonder what causes the difference.
Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?
Is there a better way to implement generate_date_series
that performs as well as the built-in generate_series
?
Improved implementation without loops
(derived from the answer of @eurotrash)
create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)
returns setof date as $$
select (startsOn + (frequency * count))::date
from (
select (row_number() over ()) - 1 as count
from generate_series(startsOn, endsOn, frequency)
) series
$$ language sql immutable;
with the improved implementation, the generate_date_series
function has a performance of 45ms on average for:
select generate_date_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series
function twice.
sql postgresql dateinterval generate-series
Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series
function. In essence, it does not accord for the day of the month
correctly.
I have many different frequencies (provided by the user) that need to be calculated between a given start and end date. The start date can be any date and thus any day of the month. This generates issues when having frequencies such as monthly
combined with a start date of 2018-01-31
or 2018-01-30
as shown in the output below.
I created a solution and wanted to post this here for others to use as I could not find any other solution.
However, after some tests I have seen that my solution has a different performance compared to the built-in generate_series
when used on (absurdly) large date ranges. Does anyone have an insight as to how this can be improved?
TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.
Built-in Output
select generate_series(date '2018-01-31',
date '2018-05-31',
interval '1 month')::date
as frequency;
generates:
frequency
------------
2018-01-31
2018-02-28
2018-03-28
2018-04-28
2018-05-28
As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari
.
Expected Output
As a result of this issue I created a custom function:
create or replace function generate_date_series(
startsOn date,
endsOn date,
frequency interval)
returns setof date as $$
declare
intervalOn date := startsOn;
count int := 1;
begin
while intervalOn <= endsOn loop
return next intervalOn;
intervalOn := startsOn + (count * frequency);
count := count + 1;
end loop;
return;
end;
$$ language plpgsql immutable;
select generate_date_series(date '2018-01-31',
date '2018-05-31',
interval '1 month')
as frequency;
generates:
frequency
------------
2018-01-31
2018-02-28
2018-03-31
2018-04-30
2018-05-31
Performance comparison
No matter what date range is provided, the built-in generate_series
has a performance of 2ms on average for:
select generate_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
while the custom function generate_date_series
has a performance of 120ms on average for:
select generate_date_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
Question
In reality, such ranges will never occur and thus it is a non-issue. For most queries the custom generate_date_series
will attain the same performance. Although, I do wonder what causes the difference.
Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?
Is there a better way to implement generate_date_series
that performs as well as the built-in generate_series
?
Improved implementation without loops
(derived from the answer of @eurotrash)
create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)
returns setof date as $$
select (startsOn + (frequency * count))::date
from (
select (row_number() over ()) - 1 as count
from generate_series(startsOn, endsOn, frequency)
) series
$$ language sql immutable;
with the improved implementation, the generate_date_series
function has a performance of 45ms on average for:
select generate_date_series(date '1900-01-01',
date '10000-5-31',
interval '1 month')::date
as frequency;
The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series
function twice.
sql postgresql dateinterval generate-series
sql postgresql dateinterval generate-series
edited Nov 11 at 12:14
asked Nov 10 at 13:55
chvndb
14519
14519
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
up vote
3
down vote
accepted
Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.
CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)
RETURNS SETOF DATE AS
$BODY$
SELECT (starts_on + (frequency * g))::DATE
FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;
$BODY$
LANGUAGE SQL IMMUTABLE;
The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.
On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.
Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call togenerate_series
did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
– chvndb
Nov 11 at 9:56
@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48
add a comment |
up vote
1
down vote
REVISED SOLUTION
This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years
where February have 29 days:
SELECT t.day_of_month
FROM (
SELECT ds.day_of_month
, date_part('day', ds.day_of_month) AS day
, date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom
FROM (
SELECT generate_series( date '1900-01-01',
date '10000-12-31',
INTERVAL '1 day')::DATE as day_of_month
) AS ds
) AS t
--> REMEMBER to change the day at both places below (eg. 31)
WHERE t.day = 31 OR (t.day = t.eom AND t.day < 31)
Resulting output:
Please ensure you change the day on BOTH the RED numbers.
The output data:
This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54
Revised the solution.
– Joseph Lee
Nov 10 at 15:51
add a comment |
up vote
1
down vote
You may use date_trunc
and add a month to the output ofgenerate_series
, the performance should be almost similar.
SELECT
(date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency
FROM
generate_series(
DATE '2018-01-31', DATE '2018-05-31',
interval '1 MONTH'
) AS dt
Demo
Test
knayak=# select generate_series(date '2018-01-31',
knayak(# date '2018-05-31',
knayak(# interval '1 month')::date
knayak-# as frequency;
frequency
------------
2018-01-31
2018-02-28
2018-03-28
2018-04-28
2018-05-28
(5 rows)
Time: 0.303 ms
knayak=#
knayak=#
knayak=# SELECT
knayak-# (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency
knayak-# FROM
knayak-# generate_series(
knayak(# DATE '2018-01-31', DATE '2018-05-31',
knayak(# interval '1 MONTH'
knayak(# ) AS dt
knayak-# ;
frequency
------------
2018-01-31
2018-02-28
2018-03-31
2018-04-30
2018-05-31
(5 rows)
Time: 0.425 ms
The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55
@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.
CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)
RETURNS SETOF DATE AS
$BODY$
SELECT (starts_on + (frequency * g))::DATE
FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;
$BODY$
LANGUAGE SQL IMMUTABLE;
The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.
On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.
Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call togenerate_series
did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
– chvndb
Nov 11 at 9:56
@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48
add a comment |
up vote
3
down vote
accepted
Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.
CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)
RETURNS SETOF DATE AS
$BODY$
SELECT (starts_on + (frequency * g))::DATE
FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;
$BODY$
LANGUAGE SQL IMMUTABLE;
The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.
On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.
Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call togenerate_series
did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
– chvndb
Nov 11 at 9:56
@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.
CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)
RETURNS SETOF DATE AS
$BODY$
SELECT (starts_on + (frequency * g))::DATE
FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;
$BODY$
LANGUAGE SQL IMMUTABLE;
The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.
On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.
Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.
CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)
RETURNS SETOF DATE AS
$BODY$
SELECT (starts_on + (frequency * g))::DATE
FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;
$BODY$
LANGUAGE SQL IMMUTABLE;
The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.
On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.
answered Nov 10 at 19:06
eurotrash
2,3751625
2,3751625
Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call togenerate_series
did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
– chvndb
Nov 11 at 9:56
@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48
add a comment |
Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call togenerate_series
did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
– chvndb
Nov 11 at 9:56
@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48
Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to
generate_series
did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.– chvndb
Nov 11 at 9:56
Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to
generate_series
did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.– chvndb
Nov 11 at 9:56
@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48
@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48
add a comment |
up vote
1
down vote
REVISED SOLUTION
This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years
where February have 29 days:
SELECT t.day_of_month
FROM (
SELECT ds.day_of_month
, date_part('day', ds.day_of_month) AS day
, date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom
FROM (
SELECT generate_series( date '1900-01-01',
date '10000-12-31',
INTERVAL '1 day')::DATE as day_of_month
) AS ds
) AS t
--> REMEMBER to change the day at both places below (eg. 31)
WHERE t.day = 31 OR (t.day = t.eom AND t.day < 31)
Resulting output:
Please ensure you change the day on BOTH the RED numbers.
The output data:
This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54
Revised the solution.
– Joseph Lee
Nov 10 at 15:51
add a comment |
up vote
1
down vote
REVISED SOLUTION
This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years
where February have 29 days:
SELECT t.day_of_month
FROM (
SELECT ds.day_of_month
, date_part('day', ds.day_of_month) AS day
, date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom
FROM (
SELECT generate_series( date '1900-01-01',
date '10000-12-31',
INTERVAL '1 day')::DATE as day_of_month
) AS ds
) AS t
--> REMEMBER to change the day at both places below (eg. 31)
WHERE t.day = 31 OR (t.day = t.eom AND t.day < 31)
Resulting output:
Please ensure you change the day on BOTH the RED numbers.
The output data:
This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54
Revised the solution.
– Joseph Lee
Nov 10 at 15:51
add a comment |
up vote
1
down vote
up vote
1
down vote
REVISED SOLUTION
This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years
where February have 29 days:
SELECT t.day_of_month
FROM (
SELECT ds.day_of_month
, date_part('day', ds.day_of_month) AS day
, date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom
FROM (
SELECT generate_series( date '1900-01-01',
date '10000-12-31',
INTERVAL '1 day')::DATE as day_of_month
) AS ds
) AS t
--> REMEMBER to change the day at both places below (eg. 31)
WHERE t.day = 31 OR (t.day = t.eom AND t.day < 31)
Resulting output:
Please ensure you change the day on BOTH the RED numbers.
The output data:
REVISED SOLUTION
This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years
where February have 29 days:
SELECT t.day_of_month
FROM (
SELECT ds.day_of_month
, date_part('day', ds.day_of_month) AS day
, date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom
FROM (
SELECT generate_series( date '1900-01-01',
date '10000-12-31',
INTERVAL '1 day')::DATE as day_of_month
) AS ds
) AS t
--> REMEMBER to change the day at both places below (eg. 31)
WHERE t.day = 31 OR (t.day = t.eom AND t.day < 31)
Resulting output:
Please ensure you change the day on BOTH the RED numbers.
The output data:
edited Nov 10 at 15:51
answered Nov 10 at 14:15
Joseph Lee
2,0591223
2,0591223
This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54
Revised the solution.
– Joseph Lee
Nov 10 at 15:51
add a comment |
This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54
Revised the solution.
– Joseph Lee
Nov 10 at 15:51
This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54
This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54
Revised the solution.
– Joseph Lee
Nov 10 at 15:51
Revised the solution.
– Joseph Lee
Nov 10 at 15:51
add a comment |
up vote
1
down vote
You may use date_trunc
and add a month to the output ofgenerate_series
, the performance should be almost similar.
SELECT
(date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency
FROM
generate_series(
DATE '2018-01-31', DATE '2018-05-31',
interval '1 MONTH'
) AS dt
Demo
Test
knayak=# select generate_series(date '2018-01-31',
knayak(# date '2018-05-31',
knayak(# interval '1 month')::date
knayak-# as frequency;
frequency
------------
2018-01-31
2018-02-28
2018-03-28
2018-04-28
2018-05-28
(5 rows)
Time: 0.303 ms
knayak=#
knayak=#
knayak=# SELECT
knayak-# (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency
knayak-# FROM
knayak-# generate_series(
knayak(# DATE '2018-01-31', DATE '2018-05-31',
knayak(# interval '1 MONTH'
knayak(# ) AS dt
knayak-# ;
frequency
------------
2018-01-31
2018-02-28
2018-03-31
2018-04-30
2018-05-31
(5 rows)
Time: 0.425 ms
The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55
@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05
add a comment |
up vote
1
down vote
You may use date_trunc
and add a month to the output ofgenerate_series
, the performance should be almost similar.
SELECT
(date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency
FROM
generate_series(
DATE '2018-01-31', DATE '2018-05-31',
interval '1 MONTH'
) AS dt
Demo
Test
knayak=# select generate_series(date '2018-01-31',
knayak(# date '2018-05-31',
knayak(# interval '1 month')::date
knayak-# as frequency;
frequency
------------
2018-01-31
2018-02-28
2018-03-28
2018-04-28
2018-05-28
(5 rows)
Time: 0.303 ms
knayak=#
knayak=#
knayak=# SELECT
knayak-# (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency
knayak-# FROM
knayak-# generate_series(
knayak(# DATE '2018-01-31', DATE '2018-05-31',
knayak(# interval '1 MONTH'
knayak(# ) AS dt
knayak-# ;
frequency
------------
2018-01-31
2018-02-28
2018-03-31
2018-04-30
2018-05-31
(5 rows)
Time: 0.425 ms
The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55
@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05
add a comment |
up vote
1
down vote
up vote
1
down vote
You may use date_trunc
and add a month to the output ofgenerate_series
, the performance should be almost similar.
SELECT
(date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency
FROM
generate_series(
DATE '2018-01-31', DATE '2018-05-31',
interval '1 MONTH'
) AS dt
Demo
Test
knayak=# select generate_series(date '2018-01-31',
knayak(# date '2018-05-31',
knayak(# interval '1 month')::date
knayak-# as frequency;
frequency
------------
2018-01-31
2018-02-28
2018-03-28
2018-04-28
2018-05-28
(5 rows)
Time: 0.303 ms
knayak=#
knayak=#
knayak=# SELECT
knayak-# (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency
knayak-# FROM
knayak-# generate_series(
knayak(# DATE '2018-01-31', DATE '2018-05-31',
knayak(# interval '1 MONTH'
knayak(# ) AS dt
knayak-# ;
frequency
------------
2018-01-31
2018-02-28
2018-03-31
2018-04-30
2018-05-31
(5 rows)
Time: 0.425 ms
You may use date_trunc
and add a month to the output ofgenerate_series
, the performance should be almost similar.
SELECT
(date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency
FROM
generate_series(
DATE '2018-01-31', DATE '2018-05-31',
interval '1 MONTH'
) AS dt
Demo
Test
knayak=# select generate_series(date '2018-01-31',
knayak(# date '2018-05-31',
knayak(# interval '1 month')::date
knayak-# as frequency;
frequency
------------
2018-01-31
2018-02-28
2018-03-28
2018-04-28
2018-05-28
(5 rows)
Time: 0.303 ms
knayak=#
knayak=#
knayak=# SELECT
knayak-# (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency
knayak-# FROM
knayak-# generate_series(
knayak(# DATE '2018-01-31', DATE '2018-05-31',
knayak(# interval '1 MONTH'
knayak(# ) AS dt
knayak-# ;
frequency
------------
2018-01-31
2018-02-28
2018-03-31
2018-04-30
2018-05-31
(5 rows)
Time: 0.425 ms
edited Nov 10 at 18:50
answered Nov 10 at 18:44
Kaushik Nayak
15.7k31128
15.7k31128
The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55
@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05
add a comment |
The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55
@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05
The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55
The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55
@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05
@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239669%2fpostgresql-generate-date-series-performance%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password