Questions Data Center Operators Don’t Want You to Ask

We live in a world of clouds, SaaS, outsourcing, and Everything over IP (EoIP). The challenges IT professionals face when trying to sort through the maze of technology, globalization, SOX, HIPPA, PUE, and on,… result in daunting confusion. Mix in a few Your Future Data centeroverzealous sales people, an inquiring CFO, incorrigible users within the organization, and you have all the pre-requisites for a world class, globalized, migraine headache.

Now let’s go out and consider throwing all this confusion into an outsourced data center. You know your company wants to save money, have better quality facilities, be close to network and Internet exchange points, be close to carriers who can support your national distributed office. So you do what anybody might consider doing – you call on a data center sales person.

Each company has a pitch. That pitch is refined based on what resources the company has to sell, and the thought leadership provided by the data center operator will most certainly promote their “unique” product or service. As the overzealous sales person goes into their pitch, several topics will no doubt emerge:

  • Their power stability
  • Mechanical and Electrical Systems (including maintenance)
  • Their remote hands, smart hands, on-site tech support, and “nutty” devotion to service
  • Completion of SAS70 audits
  • Facility structure
  • Security
  • And so on…

This article will walk through a few topics that are normally not well explained by data center operators, avoided, or simply misrepresented.

The Data Center Compromise, Mixed-Use Buildings

Any data center presents the potential tenant with a series of compromises. Very few commercial data centers are custom-built from the ground up, and most data centers are either built into mixed-use properties (those properties originally built as office space), and conversions (those properties built for another reason, such as a retail outlet <we built a large data center in a former WalMart property in Seoul a few years ago>, a warehouse <such as the original Equinix/Pihana site in Tokyo>, or factory <such as the original Level 3 gateway in Brussels>).

Data center operators choose mixed-use building primarily when they are in an attractive location, such as near a carrier hotel, major fiber optic terminal, or in a strategic central business district location. Mixed-use buildings are normally built for limited floor loading (how much weight you can actually place on a slab of concrete, where you can place the weight (such as over a structure beam), and with lower floor to ceiling separation (in the US, this is normally around 12.5 ft).

In addition, mixed-use buildings may have one or more of the following shortfalls:

  • Limited access to utility power
  • Limited “riser” space within the building (for telecom, power, and cooling infrastructure needing to transit the building from basement/ground level or from the rooftop)
  • Antiquated power distribution within the building (such as old buss ducts, switch gear, panels, etc)
  • Limited cooling capacity
  • Limited ability to either power or cool tenants with higher “watts/sqft” requirements (server farms)

Mixed-use buildings are best used by tenants with the following profile:

  • Telecom, routing, and switching carriers/networks
  • Members/participants in a carrier hotel meet-me-room
  • Tenants with limited requirement to support large server installations

While the mixed-use building may have the most technical limitations, they also tend to be the most expensive space. This is primarily due to the lower cost of telecom carrier and network interconnections, limited need for interconnection backhaul (if the property has an open meet-me-room or distribution frame), and in most cases simply legacy network effect. The Newby-ism “if you are a network, and not present in a carrier hotel, then you are paying somebody to be present in a carrier hotel” is still valid (Hunter Newby, CEO, Allied Fiber).

For those who are considering outsourcing into a mixed-use building, make sure you understand your requirement for long term growth, the power, cooling, structural, and telecom restrictions, and safety record of the building. MOST major electrical failures and events which have occurred in the data center industry over the past ten years have been in mixed-use buildings. Find out if your building has had failures, and if so, a very detailed accounting of how the data center owner has corrected the infrastructure problems which caused the problem.

Do not accept explanations that it (the failure) was human error. While probable many electrical failures in mixed-use buildings are caused by sloppy maintenance, the age of infrastructure should be considered more of a concern. To understand the infrastructure in a building, ask the data center operator to produce a recent, stamped (by certified electrical engineer), single line diagram showing not only the infrastructure, but also age of infrastructure. Only those with something to hide will refuse the request. Stay away from them…

Bring a qualified consultant with you to the sales meeting, and understand the burden is on the data center operator to answer your questions.

Conversion Buildings

In many cases the conversion building will meet all requirements for building out a high quality data center. If the conversion building is considered a shell, meeting all structural requirements such as near unlimited floor loading, high floor to ceiling clearance, very large floor plates (greater than 40,000sqft per plate), adequate for high capacity cooling systems (prefer chilled water), generator backup, fuel storage, and good proximity to multiple facility-based telecom carriers, then you can do a lot of good things with a conversation.

Things to keep in mind with conversions:

  • They are often built outside of the city center, limiting high concentrations of facility-based fiber and carrier diversity
  • They are often located in areas sensitive to natural disasters such as flooding
  • They are often located in industrial areas, presenting both physical security challenges to the property (vandalism), as well as physical danger to people who need 24×7 access to their equipment (assault)

With the conversion, just as with the mixed-use building, you will need to ensure you fully understand the electrical and mechanical source and distribution. You need to know the age of equipment, that existing single line diagrams are accurate and certified, as well as ensure the facility has infrastructure laid out for future growth – and the local utilities can support growth (will the power utility provide more power? Will the city allow additional generators and fuel storage?).

The conversion is often a very good choice for server farms, and large deployments. The cost of space is normally cheaper, power may be cheaper, and floor loading is normally not an issue. Many satellite data center cluster are popping up in locations such as El Segundo near Los Angeles, offering very high quality data center space developed from conversions.

Site Commissioning, SAS 70, and CMMS

We covered this pretty well in a previous article, and will not go into complete detail here. However the main theme cannot be avoided:

No company should consider collocation within a facility that cannot produce complete documentation that integration testing and commissioning was completed prior to facility operations – and that testing should be at NETA Level 5. In some cases, documentation of “retro” testing is acceptable, however potential tenants in a facility should be aware that is still a compromise, as it is almost impossible to complete a retro-commissioning test in a live facility.

Disaster ResponseThis is most critical in a mixed-use use building, where there have been numerous electrical failures due to lack of any commissioning, limited commissioning, or major infrastructure upgrades without any significant level of integration testing. The candidate data center should provide all historical information on the electric al system, as well as commissioning documentation – on demand, for the prospective tenant. Reticence or reluctance to provide the documentation probably indicates a major problem.

Understanding SAS70 Audits

One thing to keep in mind about SAS70 audits… The audit only reviews items the data center operator chooses to audit. Thus, a company may have a very nice and polished SAS70 audit documentation, however the contents may not include every item you need to ensure the data center operator has a comprehensive operations plan. You may consider finding an experienced consultant to review the SAS70 document, and provide any additional guidance on whether or not the audit actually includes all facility maintenance and management items needed to ensure continuing protection from mechanical, monitoring/management, electrical, security, or human staffing failures.

Comprehensive SAS70 audits will go into a fair level of detail. If your candidate data center offers a SAS70 audit of 5~10 pages, then you might find it lacking the level of detail needed to give you confidence your mission-critical equipment and applications are being facility-managed in data center that really “walks the talk.”

The SAS70 audit should include all the following sections:

Security

  • Security Company profile
  • Key inventories
  • Access management
  • Badges
  • Biometrics
  • Staff selection criteria
  • Materials control
  • Confirmation each security guard has completed a background check
  • Security equipment is routinely inspected/tested
  • Security “rounds” are recorded and confirmed
  • Security camera images and access logs are kept for a minimum 60 days, longer is preferred

Maintenance/CMMS (Computerized Maintenance Management System)

  • Comprehensive preventive maintenance/testing schedule for ALL mechanical and electrical equipment
  • UPS
  • Emergency generators
  • Rectifiers/DC Plant
  • ATS
  • Switchgear
  • Complete semi-annual (or more frequent) infrared scan
  • Breaker audit for NEC compliance (or automated view via current transformers)
  • Service level agreements
  • Emergency call out for all critical M&E equipment
  • Diesel refueling during emergencies or extended operation

Human Resources

  • Staffing process
  • Background checks
  • Certifications
  • Termination management

NOTE: While all of us have examples and stories of people who became super routing engineers, electrical staff, and field ops professionals, having a high number of network, cabling (BICSI), or electrical certifications does give you a level of confidence that the data center company knowledge and experience level is capable of performing at the desired or marketed service level.

Operations

  • Recurring training
  • Recurring staff meetings
  • Business continuity and disaster recovery plans
  • Daily site verifications
  • Escalation process

Again, the more detailed an audit, the greater your confidence the data center is being managed and operated to the level you can confidently bring your business into their environment for outsourcing.

The SAS70 Type 1 audit is a paper audit, and the Type 2 audit actually includes measurement and compliance of each control or observation.

Final Recommendation

The bottom line is each that your business, whether it is in a cabinet, a 1000ft cage, or a private suite, depends on the data center operator for supporting mission-critical applications and function essential to your business. If you do not believe you have the knowledge, or ability to drive a hard factual line of due-diligence in your data center search, find a consultant who can provide that guidance and ensure you are getting exactly what you are paying to receive.

If the data center operator is reluctant to support your requests for audit or compliance, then the chances are that data center operator is either treating your company with a high level of contempt, they have problems which may make a potential tenant reluctant to use that facility, or even worse, they simply do not have the needed documentation.

John Savageau, Long Beach

Telecom Risk and Security Part 4 – Facilities

A 40 year old building with much of the original mechanical and electrical infrastructure. A 40 year old 4000 amp, 480 volt aluminum electrical buss duct, which had been modified and “tapped” often during its life, with much of the work done violating equipment specifications. With the old materials such as buss insulation gradually deteriorating, the duct expanding and contracting over the years, the fact aluminum was used during the initial installation to either save money or test a new technology vision – it all becomes a risk. A risk of buss failure, or at worst a buss failing to the point it results in a massive electrical explosion.

Facility ExplosionSound extreme? Now add a couple of additional factors. The building is a mixed use-telecom carrier hotel, with additional space used for commercial collocation and standard commercial office space. This narrows it down to most of the carrier hotel facilities in the US and Europe. Old buildings, converted to mixed-use carrier hotel and collocation facilities, due mainly to an abundance of vacant space during the mid-1990s, and a need for telecom interconnection space following the Telecommunications Act of 1996.

Over the past four years the telecom, Internet, and data center industry has suffered several major electrical events. Some have resulted in complete facility outages, others have been saved by backup systems which operated as designed, preventing significant disruption to tenants and the services operated within the building.

A partial list of recent carrier hotel and data center facility outages or significant events include some of the most important facilities in the telecom and Internet-connected industry:

  • 365 Main in San Francisco
  • RackSpace hosting facilities in Dallas
  • Equinix facilities in Australia and France
  • MPT in San Jose
  • IBM facility in NZ
  • Fisher Plaza in Seattle
  • Cincinnati Bell

And the list goes on. Facilities which are managed by good companies, but have many issues in common. Most of those issues are human issues. The resulting outages caused havoc or chaos throughout a wide range of commercial companies, telecom companies, Internet services and content.

The Human Factor in Facility Failures

Building a modern data center or carrier interconnection point follows a fairly simple series of tasks. Following a data center design and construction checklist, with strict compliance to the process and individual steps, can often mean the difference between a well-run facility and one that is at risk of failure during a commercial power outage, or systems failure.

In the design/construction phase, data center operators follow a system of:

  • Determining the scope of the project
  • Developing a data center design specification based on both company/industry standards
  • Designing a specific facility based on business scope and budget, which will comply with the standard design specification
  • Publish the design specification and distribute to several candidate construction management companies and engineering companies
  • Use a strong project manager to drive the construction, permitting, certification, and vendor management process
  • Complete systems integration and commissioning prior to actual operations

Of all the above tasks, a complete commissioning plan and integration test is essential to building confidence the data center or telecom facility will operate as planned. Many outages in the past have resulted from systems that were not fully tested or integrated prior to operations.

Facility ChecklistAn example may be a breaker coordination study. This is the process of ensuring switch gear and panel breakers from the point of electrical presentation by the local power utility down to individual breaker panels are set, tested, and integrated according to vendor specification. Without a complete coordination study, there is no assurance components within an electrical system will either operate correctly during normal conditions, or operate correctly during equipment failures. An essential component of a complete systems integration test. Failure to complete a simple breaker coordination study during commissioning has resulted in major electrical failures in data centers as recently as 2008.

The InterNational Electrical Testing
Association (NETA) provides guidance on electrical commissioning for data centers under “full design load” conditions. This includes testing recommendations to test performance and operations including the sequence of operations for electrical, mechanical, building management systems/BMS, and power monitoring/management. The actual levels of NETA testing are:

  • Level 1- Submittal Review and Factory Testing
  • Level 2- Site Inspection and Verification to Submittal
  • Level 3- Installation Inspections and Verifications to Design Drawings
  • Level 4- Component Testing to Design Loads
  • Level 5- System Integration Tests at Full Design Loads

No company should consider collocation within a facility that cannot produce complete documentation that integration testing and commissioning was completed prior to facility operations – and that testing should be at NETA Level 5. In some cases, documentation of “retro” testing is acceptable, however potential tenants in a facility should be aware that is still a compromise, as it is almost impossible to complete a retro-commissioning test in a live facility.

Bottom Line – even a multi-million dollar facility has no integrity without a detailed design specification and complete integration/commissioning test.

The Human Factor in Continuing Facility Operations

Assuming the facility adequately completes integration and commissioning at NETA Level 5, the next step is ensuring the facility has a comprehensive continuing operations plan to manage their electrical (and mechanical/air conditioning) systems. There are two main recommendations for ensuring the annual, monthly, and even daily equipment maintenance and inspection plans are being completed.

Computerized Maintenance Management System (CMMS)

Data centers and central offices are complex operations. Thousands of moving parts, thousands of things that can potentially break or go wrong. A CMMS system tries to bring all those components together into an integrated resource that includes (according to Wikipedia)

  • Work orders: Scheduling jobs, assigning personnel, reserving materials, recording costs, and tracking relevant information such as the cause of the problem (if any), downtime involved (if any), and recommendations for future action
  • Preventive maintenance (PM): Keeping track of PM inspections and jobs, including step-by-step instructions or check-lists, lists of materials required, and other pertinent details. Typically, the CMMS schedules PM jobs automatically based on schedules and/or meter readings. Different software packages use different techniques for reporting when a job should be performed.
  • Asset management: Recording data about equipment and property including specifications, warranty information, service contracts, spare parts, purchase date, expected lifetime, and anything else that might be of help to management or maintenance workers. The CMMS may also generate Asset Management metrics such as the Facility Condition Index, or FCI.
  • Inventory control: Management of spare parts, tools, and other materials including the reservation of materials for particular jobs, recording where materials are stored, determining when more materials should be purchased, tracking shipment receipts, and taking inventory.
  • Safety: Management of permits and other documentation required for the processing of safety requirements. These safety requirements can include lockout-tagout, confined space, foreign material exclusion (FME), electrical safety, and others.

And we can also add additional steps such as daily equipment inspections, facility walkthroughs, and staff training.

SAS 70 Audits

The SAS 70 Audit is becoming more popular with companies to force the data center operator to provide audited documentation by a neutral evaluator that they are actually completing the maintenance, security, staffing, and permitting activities as stated in marketing and other sales negotiations.

Wikipedia defines a SAS70 Audit as:

“… the professional standards used by a service auditor to assess the internal controls of a service organization and issue a service auditor’s report. Service organizations are typically entities that provide outsourcing services that impact the control environment of their customers. Examples of service organizations are insurance and medical claims processors, trust companies, hosted data centers, application service providers (ASPs), managed security providers, credit processing organizations and clearinghouses.

There are two types of service auditor reports. A Type I service auditor’s report includes the service auditor’s opinion on the fairness of the presentation of the service organization’s description of controls that had been placed in operation and the suitability of the design of the controls to achieve the specified control objectives. A Type II service auditor’s report includes the information contained in a Type I service auditor’s report and also includes the service auditor’s opinion on whether the specific controls were operating effectively during the period under review.”

Many companies considering outsourcing within the financial services industries are now considering a SAS70 audit essential to considering candidate data center facilities to host their data and applications. Startup companies with savvy investors are demanding SAS70 audits. In fact, any company considering outsourcing their data or applications into a commercial data center should demand to obtain or review SAS70 audits for each facility considered.

Otherwise, you are forced to “believe” the words of a marketer’s spin, a salesman’s desperate pitch, or the words of others to provide confidence your business will be protected in another company’s facility.

You Have the Best Data CenterOne thing to keep in mind about SAS70 audits… The audit only reviews items the data center operator chooses to audit. Thus, a company may have a very nice and polished SAS70 audit documentation, however the contents may not include every item you need to ensure the data center operator has a comprehensive operations plan. You may consider finding an experienced consultant to review the SAS70 document, and provide any additional guidance on whether or not the audit actually includes all facility maintenance and management items needed to ensure continuing protection from mechanical, monitoring/management, electrical, security, or human staffing failures.

Finally, Know Your Facility

Facility operators are traditionally reluctant to show a potential customer or tenant their electrical and mechanical diagrams and “as-built” documentation for the facility. This is the point you would find a 40 year old aluminum buss duct, single points of failure, and other infrastructure designs and realities you should know before putting your business into a data center or carrier hotel.

So, when all other data center and carrier hotel facilities appear equal, in geography and interconnections, look at facilities which will incur the least impact if your interconnections are disrupted, and demand your candidate data center operator and hosting provider are able to provide you complete documentation on the facility, commissioning, CMMS, and SAS70.

Your business, the global marketplace, and network-connected world depend on forcing the highest possible standards of facility design and operation.

John Savageau, Long Beach

Other articles in this series include:

%d bloggers like this: