Time For A Dating Opportunity
- Posted in:
- postgreSQL
- data synchronization
- c#
Date and Time data structures are always so fun and easy to use, aren’t they?
Nope.
They are, in fact, the devil. No matter how hard I try (and believe me, I’ve tried pretty hard) or what decisions I make, I always find myself facing subtle issues relating to time that go unnoticed until the problem is way harder to fix than it should be. Like say, after your customers have created millions and millions of entities with important timestamps attached to them that aren’t timezone aware.
More recently, we were bitten when we tried to sync fields representing a whole day (i.e. 1 January 2017, no time) through our data synchronization algorithm.
To our surprise, a different day came out of the other end, which was less than ideal.
Dates Are Delicious
During the initial analysis of the tables earmarked to be synced from client databases, we located quite a few fields containing dates with time information. That is, usage of the actual DateTime data structure in SQL Server. As most of us had been bitten in the past by subtle conversion bugs when playing with raw DateTimes in .NET, we made the decision to convert all DateTimes to DateTimeOffsets (i.e. 2017-01-01 00:00+10) at the earliest possible stage in the sync process, using the timezone information on the client server that the sync process was running on.
What we didn’t know was that some of the DateTime fields actually represented whole dates, and they were just represented as that day at midnight because there was no better structure available when they were initially created.
Unfortunately, converting whole days stored as DateTimes into DateTimeOffsets isn’t actually the best idea, because an atomic days representation should not change when you move into different timezones. For example, 1 January 2017 in Brisbane does not magically turn into 31 December 2016 22:00 just because you’re in Western Australia. Its still 1 January 2017.
This is one of the weird and frustrating things about the difference between whole Dates and DateTimes. Technically speaking, a Date as explained above probably should be location aware, especially as the timezone differences get more extreme. The difference between WA and QLD is pretty academic, but there’s a whole day between the US and Australia. If two users were to calculate something like rental arrears in two different places using whole dates, they would probably get two different numbers, which could lead to poor decisions. Of course, from a users point of view, the last thing they would expect is to have one day turn into another, or to add a time representation to something they entered as a day using a calendar selector, so all told, its confusing and terrible and I hate it.
If you want to get technical, the converted DateTime still represents the same instantin time, so as long as you know what the original offset was, you can use that to revert back to the original value (which is some day at midnight) without too much trouble, and then interpret it as necessary.
Of course, that’s when PostgreSQL decided to get involved.
Lost Time
A long time ago when we started the data synchronization project, we decided to use PostgreSQL as the remote store. We did this mostly because PostgreSQL was cheaper to run in AWS via RDS (the licensing costs for SQL Server in RDS were kind of nuts in comparison).
In hindsight, this was a terrible decision.
We might have saved raw money on a month to month basis, but we exposed ourselves to all sorts of problems inherent to the differences between the two database engines, not to mention the generally poor quality of the PostgreSQL tools, at least in comparison SQL Server Management Studio.
Returning to the date and time discussion; we chose to use Entity Framework (via NPGSQL) as our interface to PostgreSQL and to be honest we pretty much just trusted it to get the database schema right. All of our DateTimeOffsets got mapped to the PostgreSQL data structure timestamp_with_timezone, which looks like its pretty much the same thing.
Except its not. Its not the same at all. It actually loses date when storing a DateTimeOffset, and it does this by design.
In PostgreSQL terms, using a timestamp_with_timezone structure actually means “please automatically adjust the data I insert into this field using the given offset, so store it as UTC”. This makes sense, in a way, because strictly speaking, the data still represents the same instant in time, and can be freely adjusted to the users current offset as necessary (i.e. show me what the data looks like in +10).
Unfortunately, this approach means that the actual offset the data was inserted with is lost completely.
PostgreSQL has another data type called timestamp_without_timezone, but all it does it ignore the offset completely, while still stripping it out. Less than useful.
To summarise, here is the chain of events:
- The user enters some data, representing a whole day: 1 Jan 2017
- The system stores this data in SQL Server as a DateTime: 1 Jan 2017 00:00
- Our sync process reads the data as a DateTimeOffset, using the local timezone: 1 Jan 2017 00:00 +10
- The data is pushed to PostgreSQL and stored: 31 Dec 2016 14:00
Technically the data still represents the exact same point in time, but its actual meaning is now compromised. If anyone reads the remote data and assumes its still just a date, they are now a full day off, which is terrible.
Daylight savings doesn’t help either, because now the offset is inconsistent, so in NSW sometimes you will see the date as the previous day at 1400 and sometimes at 1300.
I mean honestly, daylight savings doesn’t really help anyone anyway, but that’s a different story.
That Approach Is Sooooo Dated
For whole dates, the only realistic solution is to treat them exactly as they should be treated, as dates with no time component.
Completely obvious in retrospect.
Both SQL Server and PostgreSQL have a Date data type which does exactly what it says on the box and leaves no room for misinterpretation or weird conversion errors.
Of course, .NET still only has the good old DateTime and DateTimeOffset data types, so there is room for shenanigans there, but at least the storage on either end would be correct.
For dates that actually do come with a time component (like an appointment), you have to really understand whether or not the time should be able to be interpreted in another timezone. Taking the appointment example, it might be reasonable to think that a company using your software product to track their appointments might exist in both NSW and QLD. The person executing the appointment would want to schedule it in their local time, but a remote administration person might want to know what time the appointment was in their own local time so that they know now to transfer calls.
SQL Server is fine in this respect, because a DateTimeOffset is perfect, freely able to be converted between one timezone and another with no room for misinterpretation.
In PostgreSQL, the timestamp_with_timezone data type might be good enough, assuming you don’t need to know with absolute certainty what the original offset was (and thus the time in the original creators context). If you do need to know that (maybe for business intelligence or analytics) you either need to know some information about the creator, or you should probably just use a timestamp data type, convert it to UTC yourself and store the original offset separately.
Conclusion
Representing time is hard in software. I’ve heard some people say this is because the fundamentally explicit structures that we have to have in software are simply unable to represent such a fluid and mutable construct, and I’m not sure if I disagree.
For the case I outlined above, we made some bad decisions with good goals (lets use DateTimeOffset, its impossible to misinterpret!) that built on top of other peoples bad decisions, but at least we understand the problem well enough now to potentially fix it.
Unfortunately, its unlikely that the concept of dates and times in software is just going to go away, as it seems pretty important.
The best we can hope for is probably just for existence to end.
Someone would probably want to know when existence is going to end taking daylight savings into account.