Data center and workload management

Data Center

A data center or computer centre (also datacenter) is a facility used to house computer systems and associated components, such as telecommunications and storage systems. It generally includes redundant or backup power supplies, redundant data communications connections, environmental controls (e.g., air conditioning, fire suppression) and security devices. Large data centers are industrial scale operations using as much electricity as a small town and sometimes are a significant source of air pollution in the form of diesel exhaust.

A data center (sometimes spelled datacenter) is a centralized repository, either physical or virtual, for the storage, management, and dissemination of data and information organized around a particular body of knowledge or pertaining to a particular business.

The National Climatic Data Center (NCDC), for example, is a public data center that maintains the world's largest archive of weather information. A private data center may exist within an organization's facilities or may be maintained as a specialized facility. Every organization has a data center, although it might be referred to as a server room or even a computer closet.

In that sense, data center may be synonymous with network operations center (NOC), a restricted access area containing automated systems that constantly monitor server activity, Web traffic, and network performance.

A four tier system that provides a simple and effective means for identifying different data center site infrastructure design topologies. The Uptime Institute's tiered classification system is an industry standard approach to site infrastructure functionality addresses common benchmarking standard needs. The four tiers, as classified by The Uptime Institute include the following:

Tier 1: composed of a single path for power and cooling distribution, without redundant components, providing 99.671% availability.
Tier II: composed of a single path for power and cooling distribution, with redundant components, providing 99.741% availability
Tier III: composed of multiple active power and cooling distribution paths, but only one path active, has redundant components, and is concurrently maintainable, providing 99.982% availability
Tier IV: composed of multiple active power and cooling distribution paths, has redundant components, and is fault tolerant, providing 99.995% availability.

An image of racks of telecommunications equipment in part of a data center

Workload management

One of the challenges in earlier implementations of virtualized environments was the task of workload balancing. As multiple virtual servers were deployed on a single physical machine, it was difficult to tell if the various servers were competing for resources. The question of which workloads were where and how many of them could comfortably co-exist was more a matter of intuition than information.

With increasing maturity in the technology and management tools, that reality has changed. Workload balancing is now able to keep up with shifting business requirements, and the intelligence incorporated into these consoles allows for memory management, automated resource optimization, and policy control to keep critical processes from resource deprivation. The value of being able to easily move a virtualized server from one physical machine to another, without losing track of the resources, is best realized by using automated virtualization management tools.

Workload management enables you to manage workload distributions to provide optimal performance for users and applications. Workload management comprises the following:

Connection Load Balancing: It balances incoming connections across all of the instances that provide the requested service.
High Availability Framework: It enables the Database to maintain components in a running state at all times.
Fast Connection Failover: This is the ability of Clients to provide rapid failover of connections by subscribing to network events.
Runtime Connection Load Balancing: This is the ability of Clients to provide intelligent allocations of connections in the connection pool based on the current service level provided by the database instances when applications request a connection to complete some work.

Workload is the amount of work assigned to, or done by, a client, workgroup, server, or Internetwork in a given time period. For example if we take a manufacturing organization, a workload can be a combination of:

Interactive or Network Intensive Workloads: The amount of online entry of sales orders, program planning, warranty claims that are referred to a help desk and similar interactive applications.

Content or Storage Intensive Workloads: The amount of huge content management systems that stores TBs of data, especially engineering drawings, CAD, CAM related.

In Memory / CPU Intensive or Calculation Related Workloads: Most of the advanced algorithms in a typical product design, like how to calculate the mass, width, breadth of a product and its consumption power etc., are highly resource intensive and are typically proprietary scientific calculations specific to the industry.

Batch Workloads: These workloads may utilize a combination of processor and storage; however they are not as calculation intensive in nature. However, they perform repetitive tasks for a large volume of records. For example, to generate a compliance related document to be sent to federal government for all the products that have been manufactured in the last quarter or a billing related batch job come under this category.

As the amount of data storage grows in the PETA BYTES and so is the associated processing, the biggest challenge the enterprises will face in the near future is how to make an optimal computing environment to take care of these workloads so that they are finished in the business-requested latency while the computing power needed to do them is optimal while dynamic and scalable to take care increased demands.

Workload Optimization and Challenges
The biggest challenge most enterprises face today is first, how to measure their work load size. Unlike a few other sizing parameters like Function Points (which defines the size of an application), LOC ( the size of the raw computing code), there are not many good industry standard measures to give an indication of a workload.

In today's world the complexity of the IT organization is determined by the $$ value of IT budget spending (like ours is a $5 billion IT shop), but not really about we process XXXX of workloads in a month. For example MIPS (millions of instructions per second (MIPS) is one such a measure to calculate the workload characteristics of an enterprise.

The other issues in today's enterprise workload processing are:

Most workloads are written for specific hardware and or software environments and making it difficult for enterprises to dynamically allocate them to the available compute and storage capacity.
The newer developer community are not having the same level of business logic of the legacy era resulting in the critical workloads written in a serial or single threaded manner and scaling them even in a Cloud infrastructure is difficult
Batch Jobs are used, but their ability to divide and rule the processing needs are limited.

Due to these application characteristics, most organization are not able to optimize their workloads because the workloads tend to contend for the same resources resulting in a deadlock situation among them. Also the operational expenses and capital expenses remain the same even when moved to a Dynamic infrastructure environments like Cloud.

Best Practices from Batch Jobs for a Legacy Era

Most batch jobs are written with parallel processing and workload scalability in mind. We can hardly see any batch program that does not utilize a organizational parameters like Division, State, Country etc., that makes them to be run in parallel in multiple servers at the same time.
Within a single instance of a batch job, the concept of ‘Divide and Rule' is employed well enough to be able to scale to multiple virtual servers in a Cloud world. For example most batch jobs split the tasks into Job Steps and the resource intensive operations like SORT, MERGE, TRANSFORM are done in independent manner, such that the multiple parallel fine grained resources can be put to task.
Most batch jobs have a restart logic, such that they can pick right from where they left. Considering a Cloud infrastructure where the work load can be internally moved to the available virtual machines, such characteristic of Batch Jobs are highly desirable to optimize the work load. This ensures that no processing power is wasted even due to a failed batch job, as we can always continue from where we left.
Good Monitoring and Instrumentation options, monitoring the flow of the batch jobs have been given ultimate importance and easier to track their progress even the work load moved to different servers.

Consider the above characteristics of the older batch jobs, against the monolithic stored procedures or business components that perform most of the processing in a single thread, so that even if a dynamic computing facility is available they will not scale up much.

There are many tools for technical management of virtual infrastructures – hypervisor configuration, VM performance monitoring, memory management, etc. Without doubt, such low-level infrastructure management is important, but ultimately IT needs to be supporting business-critical application services, not just infrastructure. In fact, Enterprise Management Associates (EMA) research shows 74% of enterprises with virtualization are using their virtual infrastructure to support production applications .
Managing a vendor- or platform-specific virtual infrastructure is just one part of managing business critical applications. EMA data shows that most enterprises actually have multiple virtualization vendors – not just VMware, but also Microsoft, Citrix, Red Hat, IBM, Oracle/Sun, etc. Most enterprises also have multiple platforms – not just Windows, but also Linux, UNIX, i5/OS, z/OS, etc. In fact the average enterprise has four different vendors, and four different platforms, in their virtualization environment alone. Moreover, the average enterprise also has a significant traditional or physical infrastructure – EMA research shows that in most cases, only 25-30% of the server environment is actually virtualized .
While virtualization vendors like VMware have great tools to support virtual infrastructure, they do not provide sophisticated tools for broad, multi-platform, multi-vendor, physical and virtual business workload management.

Workload Management (WM) is an emerging paradigm for IT systems management arising from the intersection of dynamic infrastructure, virtualization, identity management, and the discipline of software appliance development. WM enables the management and optimization of computing resources in a secure and compliant manner across physical, virtual and cloud environments to deliver business services for end customers.

The WM paradigm builds on the traditional concept of workload management whereby processing resources are dynamically assigned to tasks, or "workloads," based on criteria such as business process priorities (for example, in balancing business intelligence queries against online transaction processing), resource availability, security protocols, or event scheduling, but extends the concept into the structure of individual workloads themselves.

Certified Cloud Computing Professional Data center and workload management

Data center and workload management