Time series data are the dataset that has been collected in a regular or constant time intervals. Time series data are used to follow a long term forecast, observe a time-dependent trend or a seasonality trend. This is very useful and commonly used in financial institutes, retail businesses, real estate, and many other types of businesses. But what about you have the data but the dates are not recorded. What if you need to use last year’s data this year to generate an experimental report? Or, you may need to use last quarter’s data in this quarter. Sometimes it is required for research, analysis, or forecasting purpose. But this year’s or this quarter’s holidays and weekends will be different.
In this article, I will explain,
Luckily Pandas has a function named date-range to generate a series of dates or times. We will see how we can use it to solve some problems that we may encounter at work. Here, we will solve a few questions.
import pandas as pd
pd.date_range(start = '1/1/2020', end='1/15/2020')#Output:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10', '2020-01-11', '2020-01-12', '2020-01-13', '2020-01-14', '2020-01-15'], dtype='datetime64[ns]', freq='D')
2. Generate a series of dates with an interval of two days
pd.date_range('1/1/2020', '1/15/2020', freq='2D')#Output:
DatetimeIndex(['2020-01-01', '2020-01-03', '2020-01-05', '2020-01-07', '2020-01-09', '2020-01-11', '2020-01-13', '2020-01-15'], dtype='datetime64[ns]', freq='2D')
If you notice, we do not need to mention start and end all the time. By default, the first date is taken as the start date and the second one is taken as the end date.
3. Make a time series with the business days only
pd.date_range('1/1/2020', '1/15/2020', freq='B')#Output:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06', '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10', '2020-01-13', '2020-01-14', '2020-01-15'], dtype='datetime64[ns]', freq='B')
Here, “freq = ‘B’ means business days.
4. What to do if we have the starting date and no end date. In that case, we can just input a period that means the number of dates or times we need.
pd.date_range('1/1/2020', periods = 9, freq='B')#Output:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06', '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10', '2020-01-13'], dtype='datetime64[ns]', freq='B')
5. If we have an end date and a period(or how many dates we require),
pd.date_range(end='1/1/2020', periods = 5)#Output:
DatetimeIndex(['2019-12-28', '2019-12-29', '2019-12-30', '2019-12-31', '2020-01-01'], dtype='datetime64[ns]', freq='D')
If we do not specify a frequency, date_range function will use “freq=’D’”.
6. Frequency does not have to be in days or business days only. It can be months, hours, minutes seconds, and many other frequencies. Let’s use a 3M frequency.
pd.date_range('1/1/2020', periods = 5, freq='3M')#Output:
DatetimeIndex(['2020-01-31', '2020-04-30', '2020-07-31', '2020-10-31', '2021-01-31'], dtype='datetime64[ns]', freq='3M')
#data-analysis #data-science #python #data analysis