Social media – must have

backup by Tony Austin
When Your Corporate Social Platform Becomes Mission Critical:
[Via A Journey In Social Media]

Life is full of learning experiences, and we had one yesterday.

A minor patch to our environment exposed underlying database corruption, which resulted in our internal social platform being unavailable for almost a full business day.

The backups? They were corrupt as well.

Thanks to the exceptional effort of everyone involved, nothing significant was really lost.

Sure, there are lessons to be learned on proper support practices for important applications (and our social platform is now one of those), but there are other lessons to be learned as well.

Things happen. While these tools are becoming as mature as email, they still rely on databases that can become corrupted. Chuck details some of the lessons learned. And this is from a company where mission-critical is a well known term. For these tools, it should become used by almost everyone.

#1 — The Impact Was Stunning

Len wrote a great post on “The Air That I Breathe“, and I think that’s a great analogy.

All day long, it was hard for many of us to get business done, simply because the platform wasn’t available. It was pretty much in the same league as “email unavailable”.

So, at what point did this social platform go from “nice to have” to “need to have”? There wasn’t a defined point that I can see, it just kind of snuck up on us.

People were resilient, and adapted — that’s what we all do anyway. But it was a huge impact to a lot of people’s workday, and didn’t do anything to help with establishing confidence around the platform.

When tools become a part of a person’s workflow, removing them has huge effects on productivity. It’s like running out of toilet paper.No one really notices until it hapnes. They can get by with alternatives but no one is very happy.

#2 — At Some Point, Declare Your Social Platform As Mission Critical

We didn’t do that.

As a result, we didn’t get the same operational procedures that EMC’s top-tier applications get. I’m *not* blaming the IT guys — they have a schema as to how they categorize things, and our application wasn’t in the appropriate tier.

Why does that matter? More scrutiny and extra effort is applied to make sure that the application is always available — and usually at significant additional cost.

Some of the investments that top-tier applications get include:

advanced test, dev and staging environment to allow quick roll-back if there’s a problem
snapping off disk copies of your database and running consistency checks before it goes to tape or other backup device
HA failover of servers, storage — or even physical locations!
Maintenance at off-hours, rather than prime time

Well, now we have a case to do elevate the category, so to speak.

And probably a willingness to spend more $$$ to keep this from happening again.

It is sometimes hard to convince the powers that be to put the money and resources into new areas such as this. But it should not be necessary for a meltdown to see the need. There should be a process in place, one that is well-defined, to determine what has moved from “nice to have” to “can’t live without it.”

#3 — Vendors In This Space Will Need To Revisit Their Processes

EMC sells mission-critical hardware and software for a living. We know what top-tier customer support looks like — it’s an integral part of our business.

You never can get good enough at this stuff, trust me.

Now, we’re not blaming anyone here, but I think it’s safe to say that we were exercising our software vendor’s support processes in a very unique and unexpected manner. We had 10,000+ users down, and things were pretty bleak there for a while.

Everyone pitched in and helped once an emergency was declared, but it was pretty clear that it was an immature process, relatively speaking.

If you’re a vendor in this space, and you’re convincing customers that your product is essential to their business, and your customer does what you told them to do and now has their entire company running on your stuff, you’re going to have to start thinking like a mission-critical vendor, and invest appropriately.

Everything breaks now and then — it’s what technology does.

What can’t break are the service and support processes: problem escalation, expert triage, advanced notice of potential problem areas, proactive preventative fixes … the whole ball of wax.

This is one of the worries about the cloud. An organization’s capability is determined by another organizations view of mission critical processes. If a company says it is prepared, it had better be because expectations will not be happy with ‘an accident that could not be foreseen.’

Technorati Tags: ,

Leave a Reply