Deleting Your Hard Drives – Entering a Green Data Center Future of SSDs

For those of us old-timers who muscled 9-track tapes on 10 ft tall on Burroughs B-3500 mainframe computers tape drives, with a total storage capacity of about 5 kilobytes, the idea of sticking a 64 gigabyte SD memory chip into my laptop computer is pretty cosmic.

Disk DriveTerms like PCAM (punch card adding machines) are no longer part of the taxonomy of information technology, nor would any young person in the industry comprehend the idea of a disk platter or disk pack.

Skipping a bit ahead, we find a time when you could purchase an IBM “XT” computer with an integrated 10 megabyte hard drive. No more reliance on 5.25″ or later 3.5″ floppy disks. Hard drives evolved to the point “Fryes” will pitch you a USB or home network 1 terabyte drive for about $100.

Enter the SSD

October 2009 brings us to the point hard drives are now becoming a compromise solution. The SSD (Solid State Disk) has jumped on the data center stage. With MySpace’s announcement they are replacing all 1770 of their existing disk drive-based server systems with higher capacity SSDs, and quoted that SSDs use only 1% of the power required by disk drives, data center rules are set to change again.

SSDs are efficient. If you read press releases and marketing material supporting SSD sales you will hear numbers like:

  • “…single-server performance levels with 1.5GB/sec. throughput and almost 200,000 IOPS
  • … a 320GB ioDrive can fill a 10Gbit/sec. Ethernet pipe
  • … four ioDrive Duos in a single server can scale linearly, which provides up to 6GB/sec. of read bandwidth and more than 500,000 read IOPS (Fusion.io)

This means not only are you saving power per server, you are also able to pack a multiple of existing storage capacity into the same space as currently possible with traditional disk systems. As clusters of SSDs become possible through additional tech development of parallel systems, we need to mentally get our heads around the concept of a three dimensional storage system, rather than a linear systems used today.

The concept of RAID and tape backup systems may also become obsolete, as SSDs hold their images when primary power is removed.

Now companies like MySpace will be in a really great position to re-negotiate their data center and colocation deals, as their actual energy and space requirements will potentially be a fraction of existing installations. Even considering their growth potential, the reduction in actual power and space will no doubt give them more leverage to use in the data center agreements.

Why? Data center operators are now planning their unit costs and revenues based on power sales and consumption. If a company like MySpace is able to reduce their power draw by 30% or more, this represents a potentially huge opportunity cost to the data center in space and power sales. Advantage goes to the tenant.

The Economics of SSDs

Today, the cost of SSDs is slightly higher than traditional disk systems. Even with fiber channel or Infiniband supporting large disk (SAN or NAS) installations. According to Yahoo Tech the cost of an SSD is about 4 times that of a traditional disk. However they also indicate that cost is quickly dropping, and we will probably see near parity within the next 3~4 years.

Now, if we remember the claim MySpace made that with the SSD migration they will consume only 1% of the power used by traditional disk (that is only the disk, not the entire chassis or server enclosure). If you look through a great white paper (actually it is called a “Green Paper”) provided by Fusion.io you will see that implementation of their SSD systems in a large disk farm of 250 servers (components include main memory, 4xnet cache, 4x tier 1/2/3 storage, tape storage) you will see a reduction from 146.6kw to 32kw for the site.

Data centers can charge anywhere from $120~$225/kw, showing that we could potentially, if you believe the marketing material, see a savings of $20,000/month @ $180/kw. This would also represent 47 tons of carbon, using the Carbon Footprint Calculator.

Fusion .io reminds us that

“In 2006, U.S. data centers consumed an estimated 61 billion kilowatt-hours (kWh) of energy, which accounted for about 1.5% of the total electricity consumed in the U.S. that year, up from 1.2% in 2005. The total cost of that energy consumption was $4.5 billion, which is more than the electricity consumed by all color televisions in the country and is equivalent to the electricity consumption of about 5.8 million average U.S. households.

• Data centers’ cooling infrastructure accounts for about half of that electricity consumption.

• If current trends continue, by 2011, data centers will consume 100 billion kWh of energy, at a total annual cost of $7.4 billion and would necessitate the construction of 10 additional power plants. (from “Taming the Power Hungry Data Center”)”

When we consider the potential impact of data center consolidation through use of virtualization and cloud computing, and the rapid advancements of SSD technologies and capacities, we may be able to make a huge positive impact by reducing the load Internet, entertainment, content delivery, and enterprise systems will have on our use of electricity – and subsequent impact on the environment.

Of course we need to keep our eyes on the byproducts of technology (e-Waste), and ensure making improvements in one area does not create a nightmare in another part of our environment.

Some Additional Resources

StorageSearch.Com has a great listing of current announcements and articles both following and describing the language of the SSD technology and industry. There is still a fair amount of discussion on the quality and future direction of SSDs, however the future does look very exciting and positive.

For those of us who can still read the Hollerith coding on punch cards, the idea of >1.25TB on and SSD is abstract. But abstract in a fun, exciting way.

How do you feel about the demise of disk? Too soon to consider? Ready to install?

John Savageau, Long Beach

How Green is Your Data Center?

Data Center “X” just announced a 2 MegaWatt expansion to their facility in Northern California. A major increase in data center capacity, and a source of great joy for the company. And the source of potentially 714 additional tons of carbon introduced each month into the environment.

Think Green and EfficientMany groups and organizations are gathering to address the need to bring our data centers under control. Some are focused on providing marketing value for their members, most others appear genuinely concerned with the amount of power being consumed within data centers, the amount of carbon being produced by data centers, and the potential for using alternative or clean energy initiatives within data centers. There are stories around which claim the data center industry is actually using up to 5% of power consumed within the United States, which if true, makes this a really important discussion.

If you do a “Bing” search won the topic of “green data center,” you will find around 144 million results. Three times as many as a “paris hilton” search. That makes it a fairly saturated topic, indicating a heck of a lot of interest. The first page of the Bing search gives you a mixture of commercial companies, blogs, and “ezines” covering the topic – as well as an organization or two. Some highlights include:

With this level of interest you might expect just about everybody in the data center industry to be aggressively implementing “green data center best practices.” Well, not really. In the past month the author (me!) toured not less than six commercial data centers. In every data center I saw major best practices violations, including:

  • Large spacing within cabinets forcing hot air recirculation (not using blanking panels, as well as loose PCs and tower servers placed adhoc within a cabinet shelf)
  • Failure to use Hot/Cold aisle separation
  • High density cabinets using open 4 post racks
  • Spacing in high density server areas between cabinets
  • Failure to use any level of hot or cold air containment in high density data center spaces, including those with raised floors and drop-ceilings which would support hot air plenums

And other more complicated issues such as not integrating the electrical and environmental data into a building management system.

The Result of Poor Data Center Management

The Uptime Institute developed a metric called Power Utilization Efficiency (PUE) to measure the effectiveness of power usage within a data center. The equation is very simple, the PUE is the total facility powe3r consumption divided by the amount of power actually consumed by either internal IT equipment, or in the case of a public data center customer-facing or revenue-producing energy consumed. A factor of 2.0 would indicate for every watt consumed by IT equipment, another watt is required by support equipment (such as air conditioning, lighting, or other).

Most data centers today consider a target value of 1.5 good, with some companies such as Google trying to drive their PUE below 1.2 – an industry benchmark.

Other data centers are not even at the point where they can collect meaningful PUE data. The previous Google link has an extended description of data collection methodology, which is a great introduction to the concept. The Uptime Institute of course has a large amount of support materials. And a handy Bong search reveals another 995,000 results on the topic. No reason why any data center operator should be in the dark or uniformed on the topic.

So let’s use a simple PUE example and carbon calculation to determine the effect of a poor PUE:

Let’s start with a 4 MW data center. The data center currently has a PUE of 3.0, meaning of the 4 MW of power consumed within the data center 3MW are consumed by support materials, and 1MW by actual IT equipment. In California, using the carbon calculator, this would return 357 tons of carbon produced by the IT equipment and 1071 tons of carbon produced by support equipment such as air conditioning, lights, poorly maintained electrical equipment, etc., etc., etc…

1071 tons of carbon each month, possibly generated by waste which could be controlled through better design, management, and operations in our data centers. Most commercial data centers are in the 4~10MW range. Scary.

The US Department of Energy recently did an audit entitled “Department of Energy Efforts to Manage Information technology in an Energy-Efficient and Environmentally Responsible Manner,” which highlights the fact even tightly regulated agencies within the US Government have ample room for improvement.

“We concluded that Headquarters programs offices (which are part of the Department of Energy’s Common Operating Environment) as well as field sites had not developed and/or implemented policies and procedures necessary to ensure that information technology equipment and supporting infrastructure was operated in an energy-efficient manner and in a way that minimized impact on the environment.” (OAS-RA-09-03)

What Can We Do?

The easiest thing to do is quickly replace all traditional lighting with low power draw LED lamps, and only use the lamps when human beings are actually within the data center space working. Lights generate a tremendous amount of heat, and consume a tremendous amount of electricity. Heat=air-conditioning load if that wasn’t already obvious. Completely wasted power, and completely unnecessary production of carbon. If you are in a 10,000sqft data center, you may have 100 lighting fixtures in the room. Turn them off.

If your data center requires security cameras 24×7, consider using dual-mode cameras that have low light vision capability.

Place blanking panels in all cabinets. Considering removing all open racks from your data center unless you are using them for passive cabling, cross-connects, or very low power equipment. Consider using hot or cold aisle containment models for your cabinet lineups. Lots of debate on the merits of hot aisle containment vs. cold aisle containment, but the bottom line is that cool air going into a server makes the server run better, reduces the electrical draw on fans, and increases the value of every watt applied to your data center.

Consider this – if you have 10 servers using a total of 1920 watts (120v with a 20 amp breaker <at 16 amps draw>), that gives you the potential of running those 10 servers at full specification draw. That includes internal fans which start as needed to keep internal components cool enough to operate within equipment thresholds. If the server is running hot, then you are using your full 192 watts per server. If the server is running with cool air on the intake side, no hot air recirculation producing heat on the circuit boards, then you can reasonably expect to reduce the electrical draw on that component.

If you are able to reduce the actual draw each server consumes by 30~40% by removing hot air recirculation and keeping the supply side cool, then you may be able to add additional servers to the cabinet and increase your potential processing capacity for each breaker and cabinet by another 30~40%. This will definitely increase your efficiency, cost you less in electricity and power, give you additional processing potential.

Sources of Information

Quite a few sources of information, beyond the Bing search are available to help IT managers and data center managers. APC probably has the most comprehensive library of white papers supporting the data center discussion (although like all commercial vendors, you will see a few references to their own hardware and solutions). HP also has several great, and easy to understand white papers, including one of the best reviewed entitled “Optimizing facility operation in high density data center environments” – a step-by-step guide in deploying an efficient data center.

The Bing search will give you more data than you will ever be able to absorb, however the good news is that it is a great way to read through individual experiences, including both success stories and horror stories. Learn through other’s experiences, and start on the road to both reducing your carbon footprint, as well as getting the most out of your data center or data center installation.

Give us your opinions and experiences designing and implementing the green data center – leave a comment and let others learn from you too!

John Savageau, Long Beach

Telecom Risk and Security Part 4 – Facilities

A 40 year old building with much of the original mechanical and electrical infrastructure. A 40 year old 4000 amp, 480 volt aluminum electrical buss duct, which had been modified and “tapped” often during its life, with much of the work done violating equipment specifications. With the old materials such as buss insulation gradually deteriorating, the duct expanding and contracting over the years, the fact aluminum was used during the initial installation to either save money or test a new technology vision – it all becomes a risk. A risk of buss failure, or at worst a buss failing to the point it results in a massive electrical explosion.

Facility ExplosionSound extreme? Now add a couple of additional factors. The building is a mixed use-telecom carrier hotel, with additional space used for commercial collocation and standard commercial office space. This narrows it down to most of the carrier hotel facilities in the US and Europe. Old buildings, converted to mixed-use carrier hotel and collocation facilities, due mainly to an abundance of vacant space during the mid-1990s, and a need for telecom interconnection space following the Telecommunications Act of 1996.

Over the past four years the telecom, Internet, and data center industry has suffered several major electrical events. Some have resulted in complete facility outages, others have been saved by backup systems which operated as designed, preventing significant disruption to tenants and the services operated within the building.

A partial list of recent carrier hotel and data center facility outages or significant events include some of the most important facilities in the telecom and Internet-connected industry:

  • 365 Main in San Francisco
  • RackSpace hosting facilities in Dallas
  • Equinix facilities in Australia and France
  • MPT in San Jose
  • IBM facility in NZ
  • Fisher Plaza in Seattle
  • Cincinnati Bell

And the list goes on. Facilities which are managed by good companies, but have many issues in common. Most of those issues are human issues. The resulting outages caused havoc or chaos throughout a wide range of commercial companies, telecom companies, Internet services and content.

The Human Factor in Facility Failures

Building a modern data center or carrier interconnection point follows a fairly simple series of tasks. Following a data center design and construction checklist, with strict compliance to the process and individual steps, can often mean the difference between a well-run facility and one that is at risk of failure during a commercial power outage, or systems failure.

In the design/construction phase, data center operators follow a system of:

  • Determining the scope of the project
  • Developing a data center design specification based on both company/industry standards
  • Designing a specific facility based on business scope and budget, which will comply with the standard design specification
  • Publish the design specification and distribute to several candidate construction management companies and engineering companies
  • Use a strong project manager to drive the construction, permitting, certification, and vendor management process
  • Complete systems integration and commissioning prior to actual operations

Of all the above tasks, a complete commissioning plan and integration test is essential to building confidence the data center or telecom facility will operate as planned. Many outages in the past have resulted from systems that were not fully tested or integrated prior to operations.

Facility ChecklistAn example may be a breaker coordination study. This is the process of ensuring switch gear and panel breakers from the point of electrical presentation by the local power utility down to individual breaker panels are set, tested, and integrated according to vendor specification. Without a complete coordination study, there is no assurance components within an electrical system will either operate correctly during normal conditions, or operate correctly during equipment failures. An essential component of a complete systems integration test. Failure to complete a simple breaker coordination study during commissioning has resulted in major electrical failures in data centers as recently as 2008.

The InterNational Electrical Testing
Association (NETA) provides guidance on electrical commissioning for data centers under “full design load” conditions. This includes testing recommendations to test performance and operations including the sequence of operations for electrical, mechanical, building management systems/BMS, and power monitoring/management. The actual levels of NETA testing are:

  • Level 1- Submittal Review and Factory Testing
  • Level 2- Site Inspection and Verification to Submittal
  • Level 3- Installation Inspections and Verifications to Design Drawings
  • Level 4- Component Testing to Design Loads
  • Level 5- System Integration Tests at Full Design Loads

No company should consider collocation within a facility that cannot produce complete documentation that integration testing and commissioning was completed prior to facility operations – and that testing should be at NETA Level 5. In some cases, documentation of “retro” testing is acceptable, however potential tenants in a facility should be aware that is still a compromise, as it is almost impossible to complete a retro-commissioning test in a live facility.

Bottom Line – even a multi-million dollar facility has no integrity without a detailed design specification and complete integration/commissioning test.

The Human Factor in Continuing Facility Operations

Assuming the facility adequately completes integration and commissioning at NETA Level 5, the next step is ensuring the facility has a comprehensive continuing operations plan to manage their electrical (and mechanical/air conditioning) systems. There are two main recommendations for ensuring the annual, monthly, and even daily equipment maintenance and inspection plans are being completed.

Computerized Maintenance Management System (CMMS)

Data centers and central offices are complex operations. Thousands of moving parts, thousands of things that can potentially break or go wrong. A CMMS system tries to bring all those components together into an integrated resource that includes (according to Wikipedia)

  • Work orders: Scheduling jobs, assigning personnel, reserving materials, recording costs, and tracking relevant information such as the cause of the problem (if any), downtime involved (if any), and recommendations for future action
  • Preventive maintenance (PM): Keeping track of PM inspections and jobs, including step-by-step instructions or check-lists, lists of materials required, and other pertinent details. Typically, the CMMS schedules PM jobs automatically based on schedules and/or meter readings. Different software packages use different techniques for reporting when a job should be performed.
  • Asset management: Recording data about equipment and property including specifications, warranty information, service contracts, spare parts, purchase date, expected lifetime, and anything else that might be of help to management or maintenance workers. The CMMS may also generate Asset Management metrics such as the Facility Condition Index, or FCI.
  • Inventory control: Management of spare parts, tools, and other materials including the reservation of materials for particular jobs, recording where materials are stored, determining when more materials should be purchased, tracking shipment receipts, and taking inventory.
  • Safety: Management of permits and other documentation required for the processing of safety requirements. These safety requirements can include lockout-tagout, confined space, foreign material exclusion (FME), electrical safety, and others.

And we can also add additional steps such as daily equipment inspections, facility walkthroughs, and staff training.

SAS 70 Audits

The SAS 70 Audit is becoming more popular with companies to force the data center operator to provide audited documentation by a neutral evaluator that they are actually completing the maintenance, security, staffing, and permitting activities as stated in marketing and other sales negotiations.

Wikipedia defines a SAS70 Audit as:

“… the professional standards used by a service auditor to assess the internal controls of a service organization and issue a service auditor’s report. Service organizations are typically entities that provide outsourcing services that impact the control environment of their customers. Examples of service organizations are insurance and medical claims processors, trust companies, hosted data centers, application service providers (ASPs), managed security providers, credit processing organizations and clearinghouses.

There are two types of service auditor reports. A Type I service auditor’s report includes the service auditor’s opinion on the fairness of the presentation of the service organization’s description of controls that had been placed in operation and the suitability of the design of the controls to achieve the specified control objectives. A Type II service auditor’s report includes the information contained in a Type I service auditor’s report and also includes the service auditor’s opinion on whether the specific controls were operating effectively during the period under review.”

Many companies considering outsourcing within the financial services industries are now considering a SAS70 audit essential to considering candidate data center facilities to host their data and applications. Startup companies with savvy investors are demanding SAS70 audits. In fact, any company considering outsourcing their data or applications into a commercial data center should demand to obtain or review SAS70 audits for each facility considered.

Otherwise, you are forced to “believe” the words of a marketer’s spin, a salesman’s desperate pitch, or the words of others to provide confidence your business will be protected in another company’s facility.

You Have the Best Data CenterOne thing to keep in mind about SAS70 audits… The audit only reviews items the data center operator chooses to audit. Thus, a company may have a very nice and polished SAS70 audit documentation, however the contents may not include every item you need to ensure the data center operator has a comprehensive operations plan. You may consider finding an experienced consultant to review the SAS70 document, and provide any additional guidance on whether or not the audit actually includes all facility maintenance and management items needed to ensure continuing protection from mechanical, monitoring/management, electrical, security, or human staffing failures.

Finally, Know Your Facility

Facility operators are traditionally reluctant to show a potential customer or tenant their electrical and mechanical diagrams and “as-built” documentation for the facility. This is the point you would find a 40 year old aluminum buss duct, single points of failure, and other infrastructure designs and realities you should know before putting your business into a data center or carrier hotel.

So, when all other data center and carrier hotel facilities appear equal, in geography and interconnections, look at facilities which will incur the least impact if your interconnections are disrupted, and demand your candidate data center operator and hosting provider are able to provide you complete documentation on the facility, commissioning, CMMS, and SAS70.

Your business, the global marketplace, and network-connected world depend on forcing the highest possible standards of facility design and operation.

John Savageau, Long Beach

Other articles in this series include:

Follow

Get every new post delivered to your Inbox.

Join 185 other followers