A recent conversation about solving hard problems reminded me of a lightning talk I gave at an internal micro-conference in 2012 for my then-job. The idea was to point out cases where perfectly reasonable assumptions turned out to not be so perfectly reasonable, challenging ourselves to question such assumptions, with two specific examples.
The first was an issue with the online store we were tasked with maintaining. I feel comfortable talking about it publicly now because that version of the store has long-since been decommissioned.
When I first joined the team maintaining the store, we had an external-facing store database and an internal-facing sales database. They largely had the same information but it was organized a little differently.
Store data was transferred to the sales database via email as a way to get around our corporate firewall. A scheduled task on the external store server collected all of the recent order data from the store database, built an XML document out of it, and then emailed it to a shared email address. Another process then ran regularly on a machine inside the firewall to check that email inbox. Any XML found would be parsed and the data would be entered into the sales database.
Needless to say, this was awful for many reasons. So, eventually, we re-worked it. We updated the store to send an MSMQ message for each order, with an MSMQ-specific hole in the firewall facilitating that. Then we added an internal process that listened to that queue, pulling in the data, parsing it, and adding it to the sales database.
Along the way, we also added quite a bit more data validation to this new listener process. We’d never validated the data before because we were sending it to ourselves. We knew it was safe.
Until we started getting errors.
Specifically, we were getting XML parsing errors. Which we thought was really strange because this was the same XML format that had been working fine for years.
So we took a “good” XML document and compared it to a “bad” one and were surprised to see that, for the bad one, the OrdDate (which was actually a DateTime corresponding to the time that the customer reached the payment page in the store) was only a date (e.g. 2012-03-15) while good XML included a full DateTime (2012-03-15 11:34:27). Why would that happen for only some orders?
We went into the store database and found that the orders with un-parseable XML had an OrdDate of exactly midnight, down to the second. And it turned out that, back then, some combination of Microsoft SQL Server and Classic ASP caused the time portion of a DateTime to be removed if it was 00:00:00.
Because we’d never been validating it before, we never knew that, and because we assumed that every OrdDate would be a valid DateTime, we built our parser around that, and it ended up failing.
We couldn’t fix the combination of MSSQL and Classic ASP so our solution was to manually bolt on a timestamp whenever we detected that one was missing, before sending the XML. Inelegant, but it solved the problem.
The other example I spoke about was related to the company’s user-to-user forums and took place in 2008 or so.
We had an out-of-the-box vBulletin instance that was barely supported. It was just me and one guy from IT, with moderation handled by a group of volunteers from outside of the company.
One of those moderators reported that she was getting censored whenever she typed the word “specialized” in a post. We had a badword filter enabled but we couldn’t see why it would be triggered in that case.
We kind of ignored it for awhile but one day I wrote it out on my whiteboard.
specialized spe******ed cializ
And then I realized what was happening.
There was no reason for the badword filter to censor specialized but that’s not what it was censoring. This particular moderator was German and used British English. That meant she wasn’t typing specialized, she was typing specialised. The censored letters didn’t spell out “cializ,” they spelled out “cialis,” which was in our badword filter to prevent spam.
We didn’t consider this immediately because we assumed that our posters would be using the same American English that we were using ourselves.
We solved that issue by setting the badword filter to not censor “cialis” if it was in the middle of another word.
These two examples may not apply to anyone else but the idea of challenging your assumptions does. It’s hard to solve problems when your assumption is what’s getting in the way.