Jump to content


- - - - -

[resolved] Colo4 Power Outage


164 replies to this topic

#41 Web Studio West

    Member

  • Members
  • PipPipPip
  • 48 posts

Posted 10 August 2011 - 11:31 AM

View Postfr0zen, on 10 August 2011 - 11:26 AM, said:

All our websites are up now.

I know problems like these are unforeseen and you can't be prepared for every little outcome. In the end we are still very happy with Cartika and the support you guys provide.


Thank you

Websites are still all down here.

#42 websherpa

    Junior Member

  • Members
  • Pip
  • 9 posts

Posted 10 August 2011 - 11:31 AM

View PostCH-Andrew, on 10 August 2011 - 11:03 AM, said:

I can say, that based on the data I have - across ALL of our services - we have not missed our SLA more then 2-3 months on a larger scale then a single server in the last 12-18 months. I am not sure what else to say other then that is more then reasonable.. typically, these issues impact a percentage of our customers. What I am sure is frustrating is for the group of people where there is overlap and they are impacted by more then one issue or even all of major issues.

Since there have been overlapping issues (and some service change issues which were perceived to be a problem but may not have been), is it fair to ask for a rundown report and explanation of the largest issues affecting the most customers once the dust has cleared? I for one would like to see a line by line explanation from Cartika about what they are doing to ensure non-repeat failures.

Quote

Things like this one are most frustrating though because they are completely out of our control. There is nothing to improve upon - there is nothing to do better. We need to rely on our facility to figure out what happened and why, and prevent this from happening again in the future (and we trust they will as they have been a good vendor for 7-8 years now) - we are not in the data center business (at least not yet).

Never say never. Basically we rely on you to get an adequate response from Colo4 and relay your response to us so that we can evaluate whether the combined response/report mandates any changes at our end of the business. It may be enough that you continue to trust Colo4, but I think we'd still like to hear what assurances / measures you have obtained.

#43 cvonnieda

    Junior Member

  • Members
  • PipPip
  • 11 posts
  • LocationAZ

Posted 10 August 2011 - 11:34 AM

I've been with Cartika 5 or 6 years now and host approximately 100 websites with them as a reseller including my own business. Before that, nightmares. I can say without a shadow of doubt the service Cartika provides is 2nd to none! This is not their fault and clearly is a rare occurance. The fact is they are here providing feedback and information as it is made available. I am always amazed at how great the service level is and I recommend them to everyone.

I'm not going anywhere and if I lose customers it is their loss not mine.

Keep up the great work guys. You are much appreciated!
Chris von Nieda
LenderTech Web Solutions
www.LenderTech.com
www.roi4my.com

#44 Carl

    Member

  • Members
  • PipPipPip
  • 57 posts
  • LocationLocked in a small dark room.

Posted 10 August 2011 - 11:43 AM

View Postcvonnieda, on 10 August 2011 - 11:34 AM, said:

...I can say without a shadow of doubt the service Cartika provides is 2nd to none! This is not their fault and clearly is a rare occurance. The fact is they are here providing feedback and information as it is made available. I am always amazed at how great the service level is and I recommend them to everyone.

I'm not going anywhere and if I lose customers it is their loss not mine.

Keep up the great work guys. You are much appreciated!

I'll second that!

Edited by Carl, 10 August 2011 - 11:44 AM.


#45 gatnews

    Junior Member

  • Members
  • Pip
  • 4 posts

Posted 10 August 2011 - 11:49 AM

I think the less we ask, the more time they will have to focus on the problem and communicating with colo4. I've had hosting servers for before and know how tough it can be to deal with emergencies such as this.

#46 CH-Andrew

    Cartika Staff

  • Managers
  • 2,697 posts

Posted 10 August 2011 - 11:53 AM

View Postwebsherpa, on 10 August 2011 - 11:31 AM, said:

Since there have been overlapping issues (and some service change issues which were perceived to be a problem but may not have been), is it fair to ask for a rundown report and explanation of the largest issues affecting the most customers once the dust has cleared? I for one would like to see a line by line explanation from Cartika about what they are doing to ensure non-repeat failures.

All outages, maintenance (scheduled or otherwise) are reported in this network news forum. There was only one incident which was avoidable and that was human error. I really do not see the need to review past incidents. Each incident is reviewed when they happen and if anything needs to be implemented at that time that can improve things and avoid future incidents, it absolutely is.


Quote

Never say never. Basically we rely on you to get an adequate response from Colo4 and relay your response to us so that we can evaluate whether the combined response/report mandates any changes at our end of the business. It may be enough that you continue to trust Colo4, but I think we'd still like to hear what assurances / measures you have obtained.

I apologize, but, what assurances /measures would you accept? frankly, they have not had a power outage that impacted us in the 7-8 years we have been colocating at that facility. I can guarantee you - they will say something like this - we have had continual power uptime on your circuits for x years - this component degraded and failed which caused this to happen. This same component was certified in an inspection on this date (which would have been in the last 6 months).

How exactly would you address this? what is there to debate or be concerned about?

End of the day, we need to decide which vendors we will use and why. We would not leave a vendor simply based on a power outage (you may or may not feel the same way). We would however leave a vendor if communication is poor and if the same issue kept happening over and over again without a resolution. That is how we operate - obviously everyone else may look at things differently and may handle things differently - and I completely understand that as well
www.cartika.com
www.andrewr.biz
www.bacula4hosts.com

#47 CH-Andrew

    Cartika Staff

  • Managers
  • 2,697 posts

Posted 10 August 2011 - 11:54 AM

Thank you to everyone for your positive support and comments and feedback.

I have spoken to 100s of customers today on the phone and obviously have been busy on this forum as well.

I wanted to thank everyone for their patience and understanding in this matter - I personally very much appreciate it

Andrew
www.cartika.com
www.andrewr.biz
www.bacula4hosts.com

#48 ICS

    Newbie

  • Members
  • Pip
  • 5 posts

Posted 10 August 2011 - 11:57 AM

View PostCH-Andrew, on 10 August 2011 - 11:11 AM, said:

Hello,

That is a fair question

Firstly, the fact that our primary network is up and several of our infrastructure is up indicates that we are indeed using A+B power. We just arent using it on all of our infrastructure - frankly, it is very expensive and people already complain that our $10 shared hosting is too expensive.

Having said this, as I indicated in the very first post - the comments from Paul are not completely accurate - we have several A+B infrastructure that is also down. What A+B means is power is served to single devices from different PDUs - this outage is high enough up the power chain, that multiple PDUs are impacted - meaning, in some scenarios, A+B power is still down - and we have hard proof of that. Our own server is hooked up as A+B and it went down - we needed to move it to a different circuit for example - many such examples across our fleet

Hopefully this makes sense and answers your questions


Thanks Andrew that makes much more sense. I certainly don't think your too expensive. Thanks for keeping us all up to date.

#49 CH-Andrew

    Cartika Staff

  • Managers
  • 2,697 posts

Posted 10 August 2011 - 11:59 AM

Hello,

We have an update from Colo4 on this issue.

Apparently an ATS (automatic transfer switch) completely failed. The ATS is responsible for flipping load between the active and backup power (which is why backup power failed to start as the brain that is responsible for it is fried).

Colo4 has a spare ATS onsite and is currently replacing it. The ETA provided is 2-3 hours to restore remaining power.

We are not sure why they are not able to bi-pass their ATS (as we would expect they could) and are waiting for a response to that question

thanks and we will continue to keep you updated
www.cartika.com
www.andrewr.biz
www.bacula4hosts.com

#50 nineDs

    Junior Member

  • Members
  • Pip
  • 4 posts

Posted 10 August 2011 - 12:01 PM

View PostCH-Andrew, on 10 August 2011 - 11:54 AM, said:

Thank you to everyone for your positive support and comments and feedback.

I have spoken to 100s of customers today on the phone and obviously have been busy on this forum as well.

I wanted to thank everyone for their patience and understanding in this matter - I personally very much appreciate it

Andrew
Thanks for keeping us updated Andrew. You always do a great job with communication during issues :)

#51 CH-Andrew

    Cartika Staff

  • Managers
  • 2,697 posts

Posted 10 August 2011 - 12:11 PM

View PostICS, on 10 August 2011 - 11:57 AM, said:

Thanks Andrew that makes much more sense. I certainly don't think your too expensive. Thanks for keeping us all up to date.

Thank you ICS...

I think I am just really slammed here, I didnt mean to come off so short in my previous reply - I apologize if I was curt

For example, our Exchange and our websitepanel DNS is completely redundant - A+B, etc, etc - and it is down - both PDUs are down that A+B are plugged into. Part of our cloud is also down and that is completely redundant over 4 circuits - but, an A+B pair is down and so some parts of our cloud are down.

There is nothing positive about this scenario and nothing we could have done to avoid it. We are still working on this with Colo4 and we obviously have some decisions to make moving forward

thanks
www.cartika.com
www.andrewr.biz
www.bacula4hosts.com

#52 Web Studio West

    Member

  • Members
  • PipPipPip
  • 48 posts

Posted 10 August 2011 - 12:13 PM

View PostCH-Andrew, on 10 August 2011 - 11:59 AM, said:

Hello,

We have an update from Colo4 on this issue.

Apparently an ATS (automatic transfer switch) completely failed. The ATS is responsible for flipping load between the active and backup power (which is why backup power failed to start as the brain that is responsible for it is fried).

Colo4 has a spare ATS onsite and is currently replacing it. The ETA provided is 2-3 hours to restore remaining power.

We are not sure why they are not able to bi-pass their ATS (as we would expect they could) and are waiting for a response to that question

thanks and we will continue to keep you updated


"The ETA provided is 2-3 hours to restore remaining power."

That is very bad.

#53 gesc

    Newbie

  • Members
  • Pip
  • 2 posts

Posted 10 August 2011 - 01:03 PM

Test message to determine time zone.

#54 gesc

    Newbie

  • Members
  • Pip
  • 2 posts

Posted 10 August 2011 - 01:15 PM

View PostCH-Andrew, on 10 August 2011 - 11:59 AM, said:

Hello,

We have an update from Colo4 on this issue.

Apparently an ATS (automatic transfer switch) completely failed. The ATS is responsible for flipping load between the active and backup power (which is why backup power failed to start as the brain that is responsible for it is fried).

Colo4 has a spare ATS onsite and is currently replacing it. The ETA provided is 2-3 hours to restore remaining power.

We are not sure why they are not able to bi-pass their ATS (as we would expect they could) and are waiting for a response to that question

thanks and we will continue to keep you updated

Andrew, I have setup backup MX records for our corporate domains (strategy has worked and damage is mitigated with some catchall email accounts) and am trying to figure out if I should point our web traffic away to the backup servers (DNS propagation may take some more time) yet I see the post an hour ago giving hope that the servers will be back online in the next 1-2 hours from now. Is the fix on track and the timeline to recovery still 1-2 hours? Please let me know so I can proceed further with DNS updates accordingly.

-o

#55 James

    Newbie

  • Members
  • Pip
  • 1 posts

Posted 10 August 2011 - 01:43 PM

Is there an ETA on mail servers being back up and running?

#56 CH-Andrew

    Cartika Staff

  • Managers
  • 2,697 posts

Posted 10 August 2011 - 01:52 PM

View PostJames, on 10 August 2011 - 01:43 PM, said:

Is there an ETA on mail servers being back up and running?

Hello James

Many mail servers are up and running. We are still waiting on power to be returned to the remaining infrastructure. Once this is done, all remaining servers, including mail servers, will be back online

thanks
www.cartika.com
www.andrewr.biz
www.bacula4hosts.com

#57 smithwood

    Junior Member

  • Members
  • PipPip
  • 10 posts

Posted 10 August 2011 - 02:04 PM

View PostCH-Andrew, on 10 August 2011 - 01:52 PM, said:

Hello James

Many mail servers are up and running. We are still waiting on power to be returned to the remaining infrastructure. Once this is done, all remaining servers, including mail servers, will be back online

thanks

Can you actually respond to the question about an ETA. Not that I'm trying to be snotty here, but I have a lot of angry angry clients here and timing something that placates them.

Edited by smithwood, 10 August 2011 - 02:05 PM.


#58 CH-Andrew

    Cartika Staff

  • Managers
  • 2,697 posts

Posted 10 August 2011 - 02:13 PM

View Postsmithwood, on 10 August 2011 - 02:04 PM, said:

Can you actually respond to the question about an ETA. Not that I'm trying to be snotty here, but I have a lot of angry angry clients here and timing something that placates them.


Hello,

I completely understand.

Having said this, I provided you with the last update we received. Contractors are onsite at colo4 and they are currently replacing the ATS unit.

As soon as I have another update, I will absolutely provide it

I apologize that I cannot be more precise here
www.cartika.com
www.andrewr.biz
www.bacula4hosts.com

#59 robgoldstein

    Junior Member

  • Members
  • Pip
  • 2 posts

Posted 10 August 2011 - 02:36 PM

Are they making any progress? I need something to tell my clients who are calling. Who knows how many angry clients I have emailing me.

#60 frodojrr

    Junior Member

  • Members
  • PipPip
  • 14 posts

Posted 10 August 2011 - 02:41 PM

We are still down: enewspf.com

I am posting mainly to see the time zone.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users

© 2012 Cartika Hosting. All rights reserved