Capturing dates of a particular format from raw data

I'm trying to capture dates of the form -

20 Apr 2009

20 April 2009

20 Apr. 2009

20 April, 2009

...from raw text in a pandas dataframe. I want to get rid of rest of the text apart from the dates

I'm been partially successful in my attempt

df['some_column'] = df['some_column'].str.replace(r'(.*?)(\d{1,2}[ ](?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*\.?,?[ ]\d{4})(.*?)\n\1', lambda x: x.groups()[1])

But for some cases I'm getting the preceding/succedding text as well .. Any inputs would be appreciated

#regex #pandas

3 Likes1.30 GEEK