dateparser – python parser for human readable dates¶
dateparser provides modules to easily parse localized dates in almost any string formats commonly found on web pages.
Features¶
- Generic parsing of dates in English, Spanish, Dutch, Russian and several other languages and formats.
- Generic parsing of relative dates like:
'1 min ago'
,'2 weeks ago'
,'3 months, 1 week and 1 day ago'
. - Generic parsing of dates with time zones abbreviations or UTC offsets like:
'August 14, 2015 EST'
,'July 4, 2013 PST'
,'21 July 2013 10:15 pm +0500'
. - Extensive test coverage.
Usage¶
The most straightforward way is to use the dateparser.parse function, that wraps around most of the functionality in the module.
-
dateparser.
parse
(date_string, date_formats=None, languages=None)[source]¶ Parse date and time from given date string.
Parameters: - date_string (str|unicode) – A string representing date and/or time in a recognizably valid format.
- date_formats (list) –
A list of format strings using directives as given here. The parser applies formats one by one, taking into account the detected languages.
- languages (list) – A list of two letters language codes.e.g. [‘en’, ‘es’]. If languages are given, it will not attempt to detect the language.
Returns: Returns a
datetime.datetime
if successful, else returns NoneRaises: ValueError - Unknown Language
Popular Formats¶
>>> import dateparser
>>> dateparser.parse('12/12/12')
datetime.datetime(2012, 12, 12, 0, 0)
>>> dateparser.parse(u'Fri, 12 Dec 2014 10:55:50')
datetime.datetime(2014, 12, 12, 10, 55, 50)
>>> dateparser.parse(u'Martes 21 de Octubre de 2014') # Spanish (Tuesday 21 October 2014)
datetime.datetime(2014, 10, 21, 0, 0)
>>> dateparser.parse(u'Le 11 Décembre 2014 à 09:00') # French (11 December 2014 at 09:00)
datetime.datetime(2014, 12, 11, 9, 0)
>>> dateparser.parse(u'13 января 2015 г. в 13:34') # Russian (13 January 2015 at 13:34)
datetime.datetime(2015, 1, 13, 13, 34)
>>> dateparser.parse(u'1 เดือนตุลาคม 2005, 1:00 AM') # Thai (1 October 2005, 1:00 AM)
datetime.datetime(2005, 10, 1, 1, 0)
This will try to parse a date from the given string, attempting to detect the language each time.
You can specify the language(s), if known, using languages
argument. In this case, given languages are used and language detection is skipped:
>>> dateparser.parse('2015, Ago 15, 1:08 pm', languages=['pt', 'es'])
datetime.datetime(2015, 8, 15, 13, 8)
If you know the possible formats of the dates, you can
use the date_formats
argument:
>>> dateparser.parse(u'22 Décembre 2010', date_formats=['%d %B %Y'])
datetime.datetime(2010, 12, 22, 0, 0)
Relative Dates¶
>>> parse('1 hour ago')
datetime.datetime(2015, 5, 31, 23, 0)
>>> parse(u'Il ya 2 heures') # French (2 hours ago)
datetime.datetime(2015, 5, 31, 22, 0)
>>> parse(u'1 anno 2 mesi') # Italian (1 year 2 months)
datetime.datetime(2014, 4, 1, 0, 0)
>>> parse(u'yaklaşık 23 saat önce') # Turkish (23 hours ago)
datetime.datetime(2015, 5, 31, 1, 0)
>>> parse(u'Hace una semana') # Spanish (a week ago)
datetime.datetime(2015, 5, 25, 0, 0)
>>> parse(u'2小时前') # Chinese (2 hours ago)
datetime.datetime(2015, 5, 31, 22, 0)
Note
Testing above code might return different values for you depending on your environment’s current date and time.
Dependencies¶
dateparser translates non-English dates to English and uses dateutil module parser
to parse the translated date.
Also, it requires PyYAML for its language detection module to work.
Limitations¶
- Limited language support.
Using DateDataParser¶
dateparser.parse()
uses a default parser which tries to detect language
every time it is called and is not the most efficient way while parsing dates
from the same source.
dateparser.date.DateDataParser
provides an alternate and efficient way
to control language detection behavior.
The instance of dateparser.date.DateDataParser
reduces the number
of applicable languages, until only one or no language is left. It
assumes the previously detected language for all the next dates and does not try
to execute the language detection again after a language is discarded.
This class wraps around the core dateparser
functionality, and by default
assumes that all of the dates fed to it are in the same language.
-
class
dateparser.date.
DateDataParser
(languages=None, allow_redetect_language=False)[source]¶ Class which handles language detection, translation and subsequent generic parsing of string representing date and/or time.
Parameters: - languages (list) – A list of two letters language codes, e.g. [‘en’, ‘es’]. If languages are given, it will not attempt to detect the language.
- allow_redetect_language (bool) – Enables/disables language re-detection.
Returns: A parser instance
Raises: ValueError - Unknown Language, TypeError - Languages argument must be a list
-
get_date_data
(date_string, date_formats=None)[source]¶ Parse string representing date and/or time in recognizable localized formats. Supports parsing multiple languages and timezones.
Parameters: - date_string (str|unicode) – A string representing date and/or time in a recognizably valid format.
- date_formats (list) –
A list of format strings using directives as given here. The parser applies formats one by one, taking into account the detected languages.
Returns: a dict mapping keys to
datetime.datetime
object and period. For example: {‘date_obj’: datetime.datetime(2015, 6, 1, 0, 0), ‘period’: u’day’}Raises: ValueError - Unknown Language
Note
Period values can be a ‘day’ (default), ‘week’, ‘month’, ‘year’.
Period represents the granularity of date parsed from the given string.
In the example below, since no day information is present, the day is assumed to be current day
16
from current date (which is June 16, 2015, at the moment of writing this). Hence, the level of precision ismonth
.>>> DateDataParser().get_date_data(u'March 2015') {'date_obj': datetime.datetime(2015, 3, 16, 0, 0), 'period': u'month'}
Similarly, for date strings with no day and month information present, level of precision is
year
and day16
and month6
are from current_date.>>> DateDataParser().get_date_data(u'2014') {'date_obj': datetime.datetime(2014, 6, 16, 0, 0), 'period': u'year'}
- Dates with time zone indications or UTC offsets are returned in UTC time.
>>> DateDataParser().get_date_data(u'23 March 2000, 1:21 PM CET') {'date_obj': datetime.datetime(2000, 3, 23, 14, 21), 'period': 'day'}
Once initialized, dateparser.date.DateDataParser.get_date_data()
parses date strings:
>>> from dateparser.date import DateDataParser
>>> ddp = DateDataParser()
>>> ddp.get_date_data(u'Martes 21 de Octubre de 2014') # Spanish
{'date_obj': datetime.datetime(2014, 10, 21, 0, 0), 'period': u'day'}
>>> ddp.get_date_data(u'13 Septiembre, 2014') # Spanish
{'date_obj': datetime.datetime(2014, 9, 13, 0, 0), 'period': u'day'}
Warning
It fails to parse English dates in the example below, because Spanish was detected and stored with the ddp
instance:
>>> ddp.get_date_data('11 August 2012')
{‘date_obj’: None, ‘period’: ‘day’}
dateparser.date.DateDataParser
can also be initialized with known languages:
>>> ddp = DateDataParser(languages=['de', 'nl'])
>>> ddp.get_date_data(u'vr jan 24, 2014 12:49')
{'date_obj': datetime.datetime(2014, 1, 24, 12, 49), 'period': u'day'}
>>> ddp.get_date_data(u'18.10.14 um 22:56 Uhr')
{'date_obj': datetime.datetime(2014, 10, 18, 22, 56), 'period': u'day'}
Documentation¶
Contents: