Statut

Vérifiez l'état de nos services d'un seul coup d'œil

Titre	http4 hardware issue
ID	Opération #64
État	terminé
Date de début	20 oct. 2012 16:51
Date de fin	21 oct. 2012 10:20
Serveurs concernés	http4

Messages

20 oct. 2012 17:05	We’re investigating.
20 oct. 2012 17:10	Back again. The machine was found frozen and has to be rebooted. Probably a kernel issue, we’ll have to update it soon.
20 oct. 2012 22:01	It happened again. We’re forcing the kernel upgrade immediately.
20 oct. 2012 22:15	The kernel has been upgraded. We stay vigilant as these freezes may be caused by a hardware issue.
20 oct. 2012 23:47	We’ve found evidence of a probable hardware issue in the logs after the second freeze. We’ll schedule a motherboard replacement.
21 oct. 2012 00:25	The motherboard will be replaced at 1:00 (in 35 minutes).
21 oct. 2012 01:02	The operation is starting.
21 oct. 2012 01:47	The hardware is still being replaced, it takes longer than usual.
21 oct. 2012 03:16	We’re stil waiting to hear from our provider. They’re clearly not doing a good job right now.
21 oct. 2012 04:11	It seems like the new motherboard was not the exact same model as before, and it has incompatibility issues with the kernel. We’ll know more when the operation has completed.
21 oct. 2012 05:08	Apparently, they didn’t get the new motherboard to work. The old one is being put back into the server.
21 oct. 2012 05:48	And it still doesn’t work (network down, as before). They must have misconfigured something during the operation. We’re very, very sorry about all of this. We’ll keep you informed as soon as we get more details.
21 oct. 2012 06:05	They’re still trying to figure this out. Here is the error message that gets printed every 3 seconds, to be specific: ixgbe 0000:04:00.4 eth0: reset adapter
21 oct. 2012 06:36	Nothing new since the last message. Just to make things clear: your data is fine, it’s “just” a network issue. The machine is fully accessible by KVM.
21 oct. 2012 07:49	They still didn’t make it work. A senior technician will arrive at 10:00. We cannot give an ETA right now, I’m sorry.
21 oct. 2012 08:03	They’re preparing a spare server where our disks will be inserted. Hopefully that will solve the issue.
21 oct. 2012 10:04	The senior technician is now investigating on this issue. They’re looking for another network card model, as it seems to be the cause of all this.
21 oct. 2012 10:17	The server is up again. It will be slower than usual for several minutes.
21 oct. 2012 10:40	The network card has been replaced. Why the previous one, which worked normally for 3 years, stopped working when the motherboard was replaced is still a mystery. We will have a chat with our provider next week to understand how such an extended downtime could have happened, especially on a high-end server such as http4. That’s the worse downtime we’ve experienced with a server, by far. We will obviously take actions to avoid this to happen again. All customers on http4 can ask for a full refund for October by opening a support ticket. We are very, very sorry. This is clearly not the quality of service you should expect from us.