Jump to content


CH-Andrew

Member Since 17 Apr 2004
Offline Last Active Today, 09:40 AM
*****

Posts I've Made

In Topic: Cartika Launches Bacula4Hosts in our Canadian Facility

Today, 08:44 AM

View PostCH-Andrew, on 18 May 2012 - 07:12 AM, said:

Now that we have a good amount of data and restore dates for Bacula4Hosts on our Canadian cluster, we are seeing a small performance issue with the time it takes to initially load data in the hsphere bacula4hosts module. The issue is that the system loads all data on the first load - which obviously isnt efficient. We are cleaning this up and will load the data for the restore date selected only and only as its selected. This means there will be a small wait on every restore date selected, vs a long wait on initial load and then no wait there after. Seems the more logical approach for day to day usability.

Hello, we have addressed the performance issue above by adding a "calender selection tool". By default, 2 days of data is loaded when selecting a service and this can be edited using the calender tool to load more dates. So, select service to restore, select date range, then select backup job from that date range you wish to work with.

Screen captures attached

Attached File  Screen shot 2012-05-21 at 11.53.43 AM.png   21.51K   3 downloads

Attached File  Screen shot 2012-05-21 at 11.55.53 AM.png   26.03K   2 downloads

This wraps up our hsphere integration with Bacula4Hosts.

We are going to proceed with mirgating our Canadian customers from R1Soft to Bacula4Hosts and then will begin the process of migrating our Dallas customers. As always, we will keep this thread updated as we go along

In Topic: Cartika Launches Bacula4Hosts in our Canadian Facility

18 May 2012 - 07:12 AM

View PostCH-Andrew, on 24 April 2012 - 07:01 AM, said:

Hello,

we have been made aware of a bug with the bacula4hosts hsphere module where users are not shown the job list (restore points) for mail or DB services. Windows and Linux web files are shown properly.

Hello,

the above mentioned bug has been resolved. I previously posted a screen capture of mysql along with the new feature to allow for mysql table level restores. We have corrected email and mssql end user access - as seen in screen captures below

Attached File  Screen shot 2012-05-18 at 10.14.52 AM.png   54.75K   3 downloads
Attached File  Screen shot 2012-05-18 at 10.24.37 AM.png   47.61K   3 downloads

Now that we have a good amount of data and restore dates for Bacula4Hosts on our Canadian cluster, we are seeing a small performance issue with the time it takes to initially load data in the hsphere bacula4hosts module. The issue is that the system loads all data on the first load - which obviously isnt efficient. We are cleaning this up and will load the data for the restore date selected only and only as its selected. This means there will be a small wait on every restore date selected, vs a long wait on initial load and then no wait there after. Seems the more logical approach for day to day usability.

In Topic: Cartika Launches Bacula4Hosts in our Canadian Facility

06 May 2012 - 08:52 AM

View PostCH-Andrew, on 24 April 2012 - 07:01 AM, said:

Hello,

we have been made aware of a bug with the bacula4hosts hsphere module where users are not shown the job list (restore points) for mail or DB services. Windows and Linux web files are shown properly. the issue is that we do not use default hsphere partitioning on our production cluster for mail and DB services, so the module is looking in the wrong location (as the dev environment had default locations for this data). We are going to fix this over the next couple of days and make the API more intelligent so that it can find these services and data even if they are not in th default location.

In the interim, customers can request support to restore any DB or eMail data they require.

Hello,

just a quick note to let everyone know that we have resolved the bug referenced above. We also took this one step further and added some additional functionality which was not originally in the roadmap for our initial release and is a net new feature which was not available with r1soft.

***NEW - Users are able to restore individual mysql/pgsql tables from their hosting control panel

Restore an individual table by selecting a mysql/pgsql database, and then selecting an individual table

Attached File  Screen shot 2012-05-06 at 11.59.12 AM.png   74.84K   21 downloads

or restore an entire database to any restore point by selecting the database name (which includes all tables automatically)

Attached File  Screen shot 2012-05-06 at 12.02.49 PM.png   77.81K   19 downloads

we will resolve the bug with hsphere MSSQL end user restores and update everyone accordingly once that is complete (windows end users can currently self service restore windows web files and mysql/pgsql DBs and tables if they use these DB platforms)

thanks and please let us know if you have any additiional questions

In Topic: [resolved ] Dallas Network Connectivity

05 May 2012 - 08:26 PM

Hello,

earlier this evening we put our redundant switching back into service. during this period, traffic was redirected to the new switch. customers may have noticed a brief period of traffic reconvergence or 3rd party monitors may have alerted - however, service was maintained and just some traffic reconvergence was noted for a brief period of time. We believe this incident to be resolved and are closing this out. We will open a new notice for an upcoming maintenance window to apply Juniper patches to our routing and switching layers - we do not expect any impact during this maintenance and again, will announce and schedule this accordingly.

In Topic: [resolved ] Dallas Network Connectivity

05 May 2012 - 12:27 PM

Hello,

I have had a few customers call/email me asking what the above post by Jordan means in laymans terms. So, I wanted to take a minute and provide a less technical explanation of the issue we have been facing with our Dallas network this past week. Most customers did not notice the issue, or, if they did notice anything, it was simply some intermittent network latency, which really did not impact service. Unfortunately, those truely impacted by this scenario are cloud customers as the cloud heavily relies on network connectivity to connect to storage and is quite sensitive in this regards.

We have been working with Juniper all week and we believe we have finally identified and resolved the issue.

the ultimate issue was a bug in the Juniper code base. The bug was triggered by a combination of a failing redundant switch and a massive flood of traffic (ie from DDOS). In a normal situation, neither the DDOS nor the failing switch should have impacted connectivity to storage for cloud devices. In this scenario however (and this is not technically exactly accurate - see Jordans post above for the more correct techincal analysis) - the networking equipment above the defective switch was blocking traffic to the defective switch and by doing so, the defective switch should not have been accepting traffic on the other end. As a result of a bug in the Juniper system, the defective switch continued to accept traffic from items below it, while the network devices above it had it blocked. As a result it was creating network loops and traffic floods within our network, and specifically targetting the storage devices as one of their redundant paths was using this switch. Under normal operating conditions, we were operating without any impact from the failing switch or the bug - however, when the network was flooded with traffic via the DDOS attacks we have been seeing, the loop was amplified in an exponential manner and caused a condition Juniper is calling a layer2 storm (basically, it made it really difficult for devices within our networks to communicate with other devices in our network and most aggressively to those behind the failing switch)

Once our team and Juniper engineers identified the issue (and unfortunately, sometimes you need to accummulate data - ie several events in order to identify this sort of issue) - we simply took the switch that was failing completely offline, thereby forcing all traffic over the redundant switch and removing the network loop. We have been running the storage network without a redundant path since last night. We have replaced the failed switch with a new replacement and have configured it, racked it, cabled, it etc and will be bringing it back online later today, once again establishing a redundant path to our storage network.

we believe we have found and isolated the root cause of the issue and expect this issue to be permanently resolved once we bring the redundant switch replacement back online and once Juniper produces a bug patch for us. Even though the incident should not resurface now that we have replaced the defective switch, thereby should not trigger the bug again, we have removed the target of the DDOS attacks to also remove the trigger event from the equation.

© 2012 Cartika Hosting. All rights reserved