I'm trying to capture dates of the form -
20 Apr 2009
20 April 2009
20 Apr. 2009
20 April, 2009
...from raw text in a pandas dataframe. I want to get rid of rest of the text apart from the dates
I'm been partially successful in my attempt
df['some_column'] = df['some_column'].str.replace(r'(.*?)(\d{1,2}[ ](?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*\.?,?[ ]\d{4})(.*?)\n\1', lambda x: x.groups()[1])
But for some cases I'm getting the preceding/succedding text as well .. Any inputs would be appreciated
#regex #pandas