Python Deserves Better Date Parsing
The Zen of Python
Python is a great programming language for non-programmers, and that's not a faint-praise criticism. Even a small amount of knowledge and a rough understanding makes the language both powerful and empowering.
We learn from the Zen of Python[1] that beautiful is better than ugly; explicit is better than implicit; simple is better than complex; and if the implementation is hard to explain, it's a bad idea. And I think this philosophy — that our code should be elegant, clear, and "Pythonic" — is a big part of Python's success. The language is inviting, not intimidating. It values clarity, not cleverness. If you borrow code from a coworker or Stack Overflow, you can quickly get the gist of what it's doing, change it for your use case, and see immediate results.
To put this philosophy another way, how would you explain your code? Would you give a straightforward breakdown of the problem it solves? Or would you dissect implementations, obsess over optimizations, and walk through the history of Bell Labs? Git, for example, is the polar opposite of "Pythonic". People who've used it for years don't understand it, and explanations tend to go like this:
Q: How do I undo my changes since the last commit?
A: Use "git reset --hard" or "git checkout -- ."
Q: Is that a typo?
A: No.
Q: Wait, "checkout" can switch branches, or create a new branch, or undo? Why is it so overloaded? And what's the difference between those two commands?
A: Okay, so, a commit is a pointer to a node in a directed acyclic graph…
Q: [Screaming internally]
The Zen of Python has always been aspirational, not descriptive. Python has plenty of rough edges and awkward design (looking at you, if __name__ == '__main__'
). But these stick out because the language otherwise tries to live up to its ideals, and these edges do get smoother over time. Dictionaries now preserve order, so beginners aren't surprised when data "changes" underneath them, and don't have to walk through the hash tables under the hood.[2] F-strings are clearer and easier than littering sentences with %s
.[3]
We should keep refining this language to meet its own vision, and be appropriately critical when it falls short. And right now, there are still parts that don't feel like Python; they feel like Git.
A Straw Example — Chmod
This is the least Pythonic standard library function I've personally used:
Q: How do I make this file read-only?
A: Use
os.chmod(filepath, 0o444)
Q: chmod? I'm guessing "mod" means modify; does this modify something that starts with ch, like characters or channels?
A: No, it stands for Change Mode.
Q: They removed one letter from the word Mode, even though that changes the meaning?
A: That's how Bell Labs named it in 1971.
Q: But Python was released in… never mind. So, I pass the filepath and 444?
A: No, you need to pass 0o444, so Python knows it's base 8 instead of base 10.
Q: Should I know base 8?
A: No. This exact situation is the only time you'll ever use it.
Q: If I just pass 444, without the 0o in front, will it raise a ValueError?
A: No, it will still work. But instead of making the file read-only, you'll make it executable.
Q: Wow, that seems dangerous. Did they also write 0o444 in 1971?
A: No, they just wrote 444. If you aren't going through Python, adding the 0o is an error.
Q: There's no better way to do this? I can't write
os.set_permissions(filepath, read=True, write=False)
?A: You could use the stat library, import masks with names like S_IRWXU,[4] and define the permissions you want with bitwise | and &.
Q: I am never programming again.
Yes, I know, this is unfair. Beginners are not setting Unix file permissions, this is an OS convention in an OS library, and details like user vs. group permissions would complicate a cleaner API. It would be nice if Python did more than pass along half-century-old bad decisions, but I understand why it doesn't; I won't die on this hill.
In any case, let's agree this is not at all Pythonic. We feel a bit gross making excuses for it. This wouldn't fly in code designed from scratch, and we'd never do this in something we want beginners to use. Right?
The Real Example — Date Formats
Say we have a date in the format MM/DD/YYYY, and we want to subtract one day. This is the standard, idiomatic approach:
def get_previous_date(input_date): date_format = '%m/%d/%Y' current_date = datetime.strptime(input_date, date_format) previous_date = current_date - timedelta(days=1) return previous_date.strftime(date_format)
This is worse than chmod.
Unlike file permissions, dates are a basic, fundamental concept we work with every day. They're ubiquitous, easy to get wrong (years don't evenly divide into 52 weeks), and filled with edge cases (February). It's the perfect use case for a nice, Pythonic library. As you're reading this, beginners are learning these functions to solve real-world problems, and having a needlessly difficult time.[5]
Like chmod, the names strptime and strftime aggressively violate Python's style guide,[6] which encourages "words separated by underscores as necessary to improve readability" and names that "reflect usage rather than implementation." The words "str" and "time" jump out (at least they didn't shorten it to tim), but they're separated by a single letter: f for Format, p for Parse. Or, wait, does p stand for Print? Does strptime turn a datetime into a string, or is it the opposite? This isn't just confusing for beginners; I've used these functions for years, but when I wrote that code above, I got them wrong.[7]
Their format strings are equally opaque. %Y and %y both represent Year, but %M means Minute while %m means Month. And while %d represents the Day,[8] %D gives you the Month, Day, and Year in the American MM/DD/YYYY format, with no corresponding code for the DD/MM/YYYY format more common worldwide. Like chmod's base 8, these codes are hard to decipher, easy to screw up,[9] and not used for anything else. Even the documentation doesn't use this format, preferring the much clearer MM, DD, and YYYY.[10] The codes you actually use are buried at the bottom, alongside — what else? — obscure implementation details, as Python points its fingers at — what else? — decisions made in 1989.[11]
We all make fun of comments like i++; //increment i
, but surely that's a strawman. You'd never see that in real software, right? Well, this is the actual docstring in the actual function Python uses to talk about dates:[12]
Even with an IDE, this is all the help you get. There's no hint what fmt it expects. You don't even get the word "format"! At least it can tell the functions apart, right? strftime is a method on a datetime instance, while strptime belongs to datetime itself, and your IDE knows the right context, so it should make it harder to use the wrong one.
…Right?
A: No. strptime can also be called on an instance.
Q: Oh… Does that update the instance with a new value?
A: No.
>>> now = datetime.now() >>> now.strptime('1999–01–01', '%Y-%m-%d') datetime.datetime(1999, 1, 1, 0, 0) >>> now datetime.datetime(2021, 10, 11, 23, 24, 58, 525334)
This whole thing is a confusing, un-Pythonic mess. It's Git and chmod all over again.
We Can Do Better
Why not change strftime and strptime to tostring and fromstring? Those names are clear and Pythonic; you'll never mix them up.[13]
current_date = datetime.from_string(input_date, date_format)
Then again, do we even need from_string? Datetimes are inflexible and don't allow easy construction. You can't even cast between datetimes and dates.
>>> now = datetime.now() >>> date(now) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: an integer is required (got type datetime.datetime)
But outside this library, we construct things from strings all the time, like int('100')
or int('100', base=16)
. Why can't we do the same here?
current_date = datetime(input_date, format=date_format)
Why not accept formats like 'MM/DD/YYYY'
? These are easier to change, match Python's own documentation, and save us from traps like "week-based years".[14] The old functions and exotic formats would still exist for compatibility, and to help the few people who need them. The new ones could use strftime under the hood, but most users wouldn't need to know; we'd think about usage, not implementation.
While we're making changes, why not let us add or subtract days with ints? This would simplify a common use case and match a familiar pattern from Excel.
def get_previous_date(input_date): date_format = 'MM/DD/YYYY' current_date = datetime(input_date, format=date_format) previous_date = current_date - 1 return previous_date.to_string(date_format)
This would not be a fundamental rewrite, and it wouldn't even try to solve more intractable problems like time zones.[15] But these small changes would reduce so much friction and help people learn and get things done. The language would be cleaner, more empowering, and more Pythonic. Why handcuff ourselves to decisions made decades ago?
-
For what it's worth, I also think sets should be ordered. The standard excuse is that, mathematically, sets don't have an order, but Practicality is supposed to beat Purity. Preserving order wouldn't make sets less useful, but would make some things cleaner. For example,
set(data)
is the Pythonic way to remove duplicates, but if the data was sorted, this un-sorts it! [return] -
Python's often criticized for having too many ways to format strings, and, yes, it's annoying that F-strings weren't always around. But does anyone honestly think we shouldn't have them? The best time to implement F-strings was 20 years ago, but the second best time was 2016. [return]
-
https://docs.python.org/3/library/stat.html#stat.S_IRWXU [return]
-
We all know exiting Vim is confusing; it's one of the four jokes on r/ProgrammerHumor. Well, 2.5 million people have asked Stack Overflow how to quit Vim, but 3.7 million have asked how to get a date from a string. [return]
-
https://www.python.org/dev/peps/pep-0008/#overriding-principle [return]
-
Yes, Hacker News commenter, I'm sure a diligent, "real" programmer would never, ever make this mistake, and it's my fault for being confused. But Python doesn't just cater to macho, professional, obsessively diligent programmers, and that's a feature, not a bug. [return]
-
Python's first attempt at string formatting, which was also inherited from the '70s, used
%d
as a placeholder for numbers. To reflect this, my IDE still makes the %d in'%m/%d/%Y'
a different color, adding nothing but further confusion. [return] -
https://docs.python.org/3/library/datetime.html#datetime.datetime.fromisoformat [return]
-
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes [return]
-
https://github.com/python/cpython/blob/master/Lib/datetime.py#L927-L929 [return]
-
To really nitpick, I wish Python had used "text" instead of "string". "String" is confusing and meaningless jargon (cue the joke that Strings and Threads are unrelated), and we don't call lists "arrays" just because C did. But whatever. That ship hasn't just sailed; it's circumnavigated the globe. [return]
-
While we're dreaming, this format argument could even be optional. If it's not provided, it could default to YYYY-MM-DD, or even regex match a few common formats and raise an Exception if the input's ambiguous (e.g. 01/02/2021 could mean January 2nd or February 1st). Also, when I import the library, Łukasz Langa should send me a pony. [return]
-
https://www.vox.com/2014/8/5/5970767/case-against-time-zones [return]