A lot of the work on OSS has been centered on defining its architecture. Put simply, there are four key elements of OSS:
- the sequence of events
- the information that is acted upon
- the components that implement processes to manage data
- how we implement the applications
- Business Management Level (BML)
- Service Management Level (SML)
- Network Management Level (NML)
- Element Management Level (EML)
(Note: a fifth level is mentioned at times being the elements themselves, though the standards speak of only four levels) This was a basis for later work. Network management was further defined by the ISO using the FCAPS model - Fault, Configuration, Accounting, Performance and Security. This basis was adopted by the ITU-T TMN standards as the Functional model for the technology base of the TMN standards M.3000 - M.3599 series. Although the FCAPS model was originally conceived and is applicable for an IT enterprise network, it was adopted for use in the public networks run by telecommunication service providers adhering to ITU-T TMN standards.
A big issue of network and service management is the ability to manage and control the network elements of the access and core networks. Historically, many efforts have been spent in standardization fora (ITU-T, 3GPP) in order to define standard protocol for network management, but with no success and practical results. On the other hand IETF SNMP protocol (Simple Network Management Protocol) has become the de facto standard for internet and telco management, at the EML-NML communication level.
From 2000 and beyond, with the growth of the new broadband and VoIP services, the management of home networks is also entering the scope of OSS and network management. DSL Forum TR-069 specification has defined the CPE WAN Management Protocol (CWMP), suitable for managing home networks devices and terminals at the EML-NML interface.
Communications products/services could range from Voice services to IP and Data services to Hosting and CPE services. Some of the examples of communications products/services are:
- Voice – Basic telephony, long distance, toll-free, Voice over IP (VoIP), Contact Center, Local Access, etc.
- Internet Protocol (IP) – Internet Access, VPN, Contact Center, VoIP, Remote Access, etc.
- Data – Layer 1 Wide Area Network (WAN) Services such as SONET, Layer 2 WAN services such as ATM, Frame Relay, Private Lines, Layer 2 VPN and Metro Ethernet, etc.
- Hosting – Custom Application Environments, Disaster Recovery, Managed Services such as storage, security and network services, Web Site Hosting, etc.
Order Management systems are complex systems that allow customer or customer service representatives to capture and process new orders, modify existing orders, process customer moves and changes, price quotes and orders, validate orders, etc., while supporting multiple channels such as Web, Order template documents and partner applications as well as multiple lines of businesses.
Order Management includes the following areas:
Order Entry and validation – The Order Entry process captures order details such as package or plan, service address, service details, customer accounts, relevant contacts and applicable contracts. Data entered during Order Entry is also validated against predetermined rules.
Orders can be validated as the data is entered and/or validation after all the data has been entered. Products/solutions that validate order data as they are entered and walk the user through the product configuration process are known as “Product Configurators”. One of such tools available in the market is Selectica COnfigurator.
Order Decomposition – A single customer order can be decomposed into one or more service requests, typically based on service types or quantities, in order to be able to fulfill an order.
For example, if a customer order contains both a VoIP order and a phone line order, two service requests would be created, one each for VoIP and the phone line, each of which would be sent to the appropriate provisioning systems.
One of the major problems service providers often grapple with is that, as new services are added to the offerings, led by different business units, the lack of flexible order management platform results in product/service specific OSS/BSS applications. These in turn result in higher time-to-market as well as increased costs of maintaining many different applications and systems. Product catalog based Order Management solutions attempt to solve these problems by storing and processing qualification rules for services based on customer profiles, ordering channels, service locations, product interdependencies, availability, customer eligibility and other business constraints.
Service Provisioning systems are systems used to setup products/services for the customer after an order for the services has been created and accepted by the CSP.
Service provisioning activities include specifying the pieces of equipment and parts of the network to fulfill the service, configuring the customer’s routing path, allocation of bandwidth in the transport network, setting up of wiring and transmission, etc.
Some of the systems that constitute provisioning systems are: Circuit Design & Assignment Tools, Activation systems, and Field Service Management systems.
Circuit design refers to specifying whether facilities exist to provide the service and which pieces of the network equipment and routes the service shall utilize.
One of the most widely used systems providing Circuit Design facility is Telcordia TIRKS. Apart from Circuit Design support, it also provides circuit order control, inventory record maintenance, selection and assignment of components from inventory, and preparation and distribution of circuit work orders. The order control module in TIRKS works with a circuit provisioning system and operates in conjunction with other TIRKS components to assign facility and equipment information for circuit orders and design circuits. TIRKS can then provide automated design criteria for certain circuit orders. The circuit design generated in TIRKS is then communicated to field operations or automated activation systems for implementation.
Circuit Design and Assignment tools these days often have graphical tools that allow a user to create services on a network map using mouse clicks and drag-and-drop rather than drawing maps by hand or using an abstract set of equipment identifiers displayed in a table.
After a service is designed based on the existing equipment and circuit inventory, it is ready to be activated. If new equipment or lines need to be configured manually, a Field Service Management (FSM) system is notified which in turn dispatches technicians.
Moreover, certain activations can be performed automatically. For example, issuing commands to ATM or circuit switches to provision circuits, to SONET terminals to allocate bandwidth, and to a wide array of access devices such as DSLAMS, Digital Loop Carriers (DLC), or cable modems. For such activations, Service Activation systems pass the device specific commands and configuration changes to the network elements, Element Management Systems (EMS), Network Management Systems (NMS) or application hosts.
EMSs are designed to receive and execute commands sent by activation systems on the devices. EMSs can also feed equipment status data back to network and trouble management systems. EMSs use protocols such as Common Management Information protocol (CMIP) or Transaction Language (TL) or Simple Network Management Protocol (SNMP) to communicate with activation and other systems.
Activation systems often comprise a library of adapters to various network systems. They usually also support transaction control, i.e. the capability to roll-back operations already performed, in case an error occurs.
It should be noted that Provisioning systems interact with the Inventory systems, both to verify that the required network elements and other facilities are available, and once the resources are provisioned - to reflect the changed on-line configuration of the facilities. Therefore, provisioning systems have close channels with inventory systems. As a result, some vendors have combined workflow capabilities with inventory management capabilities in their products.
Tracking inventory involves tracking equipment, facilities and circuits.
Some examples of information tracked are: the location and quantities of the equipment, how a piece of equipment is configured and its status, etc.
Inventory Management Systems track both the physical network assets (such as equipment and devices) as well as “logical” inventory (such as active ports, circuit ids, IP addresses, etc.), although not all support both.
By relating usage of network assets to specific customers and services, an inventory system can help network operations determine the network usage and available capacity as well as enable automated network design and planning. Inventory Management Systems also enable Service Assurance systems to find the impact of a network fault on the customer’s circuits.
Some tools also have “auto-discovery” features to automatically check physical network assets and match the results with the information held in the inventory. However, these work only with some of the newer intelligent network elements.
Communications service providers (CSP) strive to differentiate themselves from their competitors by implementing attractive Service Level Agreements (SLA). SLAs are formal contracts where the level of service delivered by the CSP to his customer is stipulated. An SLA may specify levels of service availability, performance, operation, etc. as well as penalties upon violation of the SLA.
Offering SLAs implies that the service provider has the ability to monitor, act and report the level of service, in order to assure the quality of services delivered to the customers. Service Assurance refers to all the activities performed for such an assurance. The goal of Service Assurance is to provide an optimal customer experience, that helps retain existing customers, attract new customers and prevent penalties arising out of violation of SLAs.
The following sub-sections introduce some of the common service Assurance systems.
Fault and Trouble Management
Fault Management Systems are designed for detection, isolation and correction of malfunctions in a communications network. They monitor and process network alarms generated by network elements (routers, switches, gateways, etc.). An alarm* is a persistent indication of a fault that is cleared only when the triggering condition is resolved.
Examples of trouble or fault in a network are damage to an optical fiber line, switch failure, etc. Such a problem in the network can result in a chain reaction where many network elements in a certain path produce alarms*.
Fault Management Systems may be either a component within Network Management Systems or as a standalone set of system and application software.
Network Elements are designed to provide various levels of self-diagnosis. Older Network Elements might simply send an alarm notifying a problem while newer Network Elements can provide more precise and detailed messages. Fault Management Systems may collect alarms via SNMP traps, CMIP events or proprietary agents, via EMS. They use complex filtering systems to assign alarms to specific severity levels and correlate different alarms to locate the source and cause of a problem.
After a problem is identified, the FMS then notifies appropriate network operators as well as pass the problem information to a Trouble Management System that in turn logs the problem and issues a trouble ticket to start the repair process.
The Trouble Management System then sends commands to appropriate systems such as Field Service Management to schedule and dispatch technicians to repair the equipment and/or to EMS to reroute network traffic around the problem areas.
Trouble Management systems also handle automatic escalation, such as progression of a ticket from minor to major or major to critical, etc., and support a variety of notification methods such as paging, emails, synthesis voice dial-out.
Fault Management systems usually provide graphical network displays which are projected on large screens at the Network Operations Centres (NOC). NOC operators can see role-based views on their consoles, shortcuts to operations they perform the most as well as tools to quickly make connections to EMS to perform any testing or diagnostic operation.
Network Performance Management
Performance Management components in NMS and other Alarm Handlers monitor applications and systems and collect performance variables of interest at specified intervals. Performance variables of interest may be service provider network edge availability, customer premises availability, response times, packet delivery rate, packet losses, latencies, jitters and out of sequence packet reorder, etc., to name a few.
One way to capture performance metrics is collecting event logs, CDRs and other performance data such as counters or timers that the network and system elements maintain as part of their normal operation. This is referred to as passive measurement. Performance data is captured by polling MIB using SNMP or using syslog, (I & II), FTP, EMS feeds, etc. Most passive measurements report on a single network element.
For example, an Ethernet Switch may have a MIB which provides in and out data volumes of each port, histograms of frame sizes, number and types of erroneous frames, central processing unit (CPU) busy status. Associated Remote Monitoring (RMON) MIB-type data can then list ten most active users, etc. Performance Management tools can access the data by using SNMP to poll the MIBs at predefined intervals.
Statistics on performance variables can also be captured via dedicated network appliances known such as “probes” and “sniffers” that monitor or probe customer’s local loop connections, packet performance, etc. This form of performance testing is usually referred to as active testing.
Packet sniffers typically monitor signaling protocols such as SIP and RTP by inspecting packets on the wire/fiber, using pings, DNS, FTP, HTTP fetches, etc. Examples include WireShark and Geoprobes.
Probes such as Brix Networks BrixWorks Verifiers and Tektronix/Minacom IVR tools typically emulate customer traffic in order to test or probe specific paths to measure the quality of the services supported. Probes could be either placed into the network or could be built into network elements such as in the case of Cisco’s IP Service Level Agreements tools.
Note that active measurement measures a service, such as application response time, instead of the internal operation of a network element.
An example of active network performance test is injecting “ping” (short, network layer echo packet) into the network aimed at a remote IP address. Round-trip time is measured if the ping packet returns, and an error counter is incremented if it doesn’t.
Performance statistics captured by “active” or “passive” performance tests are normalized and routed to relational databases and/or data-warehouses. An alternative is to pass the performance data directly to Performance Management tools. For example, Concord eHealth could collect performance statistics from Netcool agents via SNMP polls at a pre-defined interval.
Performance statistics are initially analyzed to determine the normal (baseline) levels. Appropriate thresholds are determined for each of the interesting performance variable so that exceeding the thresholds indicates a problem.
Performance Management tools then measure the performance variables against SLAs defined as thresholds per application or service, on an on-going basis. In case of exceptions they report them to alarm handlers. This form of performance monitoring is reactive performance monitoring. Some tools also support proactive monitoring by way of providing simulation tools that helps network operators project how growth in network traffic will affect performance metrics and plan to take proactive countermeasures such as increase capacity.
Performance Management tools may also support real-time and historical reporting. Some CSPs have taken performance statistics of the network affecting customers’ circuits to their customer self-service portals.
Topology & Configuration Management
Older networks and systems were static and the network wiring was fixed in place, and sometimes required long outages while changes to the network and its configuration were being made. Any error or inconsistency in the configuration files of different network devices caused problem, and therefore these changes were well controlled.
With the rise of IP-based, dynamically routed networks, network topologies started becoming dynamic. The topology of the network became dynamic because a few of routers might decide, on their own, to shift routing patterns, or because a network operator group might add a new router or switch to the network, possibly without everyone else in the network operations center being aware of the changes. Instead of static associations between users and network addresses (as was set in the old “hosts” file), DHCP and other techniques allowed users to appear, move, and disappear without providing prior notice to the network administration.
Most major NMSs therefore provide capabilities to automatically discover a network’s actual topology, which is critical to understand network performance or root cause of network alarms, etc.
Probes are placed into the network to automatically find devices and circuits. Also, most network elements provide MIBs that can be polled via SNMP to discover the network, although discovering the network topology in its entirety may not be guaranteed. Backup paths, virtual private networks, MPLS, etc., can make it very difficult to discover actual paths, through multiplexed links, patch panels, and test equipment.
Also, most Topology Management Systems allow the network operator to provide hints so that the system, for instance, in order that the system can ignore certain portions of the network. This makes it easier to discover relevant portions of the network more accurately.
Some service providers may run network discovery routines on a daily basis to discover any unauthorized changes to the network topology as a result of security intrusions or unplanned insertion of devices.
Moreover, network elements and computer systems have a variety of version information associated with them. For example, a workstation may have: Operation System, version 32, Ethernet Interface, version 5.4, TCP/IP Software, version 2.0 and SNMP Software, version 3.1. Since multiple engineers/network operators work on making changes to the network equipment, tracking the changes manually would be very tedious and error-prone. Configuration Management tools help automates the tracking of the changes. Configuration Management systems store the configurations in a database or LDAP server for easy access.
They also enable network operators to change configurations of the network elements as well as to roll back a change to a previous configuration, if required.
When a problem in the network occurs, network operators often search the Configuration Management database for clues that can help solve the problem.
Planning & Testing
Network Planning solutions help determine when a communication network needs an upgrade or additional equipment as well as to predict the impact of changes to a service provider’s network’s topology, configuration, traffic and technology. They provide simulation tools that help the network operators to project how growth in network traffic will affect the network performance. Based on the results and other planning activities, network operators can take countermeasures such as increase capacity.
Testing is an important activity in setting up a network or customer circuits. For simplicity in understanding the gamut of testing activities, let us divide them into the following:
1. Testing of existing network or a change
2. Integration testing of services configured for the customer
3. End-to-end testing of services configured for the customer
Testing the entire network platform - including the equipment, services and call quality – is critical for assessing the system prior to deployment and for service assurance in production environments.
Network testing tools usually simulate a production environment and generate synthetic voice, video and data traffic, which helps measure call/data quality, network performance, and the affects of any changes to the network or increasing traffic or adding new applications. These tests typically include tests like DNS, HTTP, RTP, Ping, etc. Also, during ongoing operations, these testing tools enable active testing of facilities.
Another form of testing is integration testing of network setup for the customer, i.e., routes, circuits, etc. configured for a customer. Network operators or field engineers perform integration testing of services upon completion of activations and other provisioning activities. Field engineers typically use equipment and network element specific applications to perform integration testing.
Upon completion of integration testing, field operations teams are notified to perform end-to-end testing. End-to-end testing includes testing of circuits, both within the CSP’s network as well as local access circuits between the CSP and the customer premises. Some service provider’s use craft access systems for the benefit of field technician’s access to their internal systems through a hand held terminal . The hand held terminal helps them to access loop testing system and to view the complete test summary from remote locations.
IDC defines Billing as: the processing and compiling of charges and enabling of revenue collection for network usage, feature transactions, and access charges of the services.Mediation systems collect network usage data from the network elements and convert to billable statistics.
Traditionally, for phone calls, Call Detail Records (CDR) have been used to record the details of the circuit-switched phone call. CDR includes information on start time of call, end time of call, duration of call, originating and termination numbers. CDRs are stored until a billing cycle runs. For IP Based Services, a new standard is gaining acceptance called Internet Protocol Detail Record (IPDR). IPDR supports both voice and data.
Billing systems use mediation output to determine charges for the customers. It is also used to feed other downstream applications such as Fraud and Churn Management.
Rating systems calculate the charge for an individual call, IP usage event, etc. using the CDRs/IPDRs. Rating systems apply charges based on pre-configured pricing rules, applicable discounts and rebates from promotions.
This rating process has grown increasingly complex in recent years. In older times, it was solely a matter of taking the length of the call, assigning a price based on the mileage band (calculated by cross-referencing the prefix of the originating and terminating numbers in a table of values), and assigning discounts based on the time of day (peak, evening, night), day of the week, and holidays.
Modern rating systems can assign discounts based on calling circles, provide flexible rating plans based on size of accounts and increase switching costs . These serve as strategic marketing tools but can be very complex to administer and operate.
Billing systems aggregate rated calls, IP/data usage events, etc. and calculate customer invoices. In the United States, billing is usually performed once a month.
Billing systems combine rated records with prior balance information, payment records, recurring charges (such as line rentals), one-time fees (such as installation and service charges), promotions and discounts associated with the customer account, taxes and credits. Overnight billing batch jobs are among the largest batch environment at a CSP’s operating environment. Each customer is assigned a specific billing cycle.
According to Insight , the holy grails of the billing industry are unified billing and convergent billing. With unified billing, a customer gets a single bill for all services provided (or billed) by the service provider, appropriately rated, discounted, and taxed, and a single contact for inquiries and negotiation.
In the competitive world of communications, service providers often tie-up with partners, in order to bundle their own products with their partners. This helps the service providers to provide attractive bundles of products and services. However, in order to successfully settle interconnect billing settlements an effective Interconnection Billing is required.
Interconnection Billing products support inter-working of a service provider’s billing systems with the corresponding systems of another service provider, based on interconnect agreements and contracts.
Revenue Assurance & Fraud Management systems verify billing, detect and identify unauthorized usage of service provider network assets. Some of the kinds of frauds are Usage and Subscription.
Usage Fraud means that a customer uses the telecommunications network illegally. This is accomplished either by obtaining a service with no intent to pay or by obtaining unauthorized access to the network (i.e. “hacking” or “cracking”).
Fraud Management systems typically detect and prevent unauthorized access to a communications network by analyzing traffic patterns on the network. Some examples are provided in :
- One technique involves analyzing the average call duration or the number of calls placed to foreign countries to determine whether the traffic patterns are consistent with a subscriber's call history or pattern. If a call is inconsistent with the subscriber's call pattern profile, the subscriber is provided with a report of the abnormal call activity.
- Other methods for dealing with the problem of unauthorized use involve automatically denying or blocking access to the network when abnormal use is detected to minimize the subscriber's financial loss.
Subscription fraud means that a customer obtains a service account by giving a false identity (name and/or SSN) or by giving a false address or false credit worthiness.
Detecting subscription fraud involves searching recent order and existing customer data for multiple orders and/or accounts with the same customer name, SSN, or service address.
Common subscription fraud patterns include:
- Change of billing address within a few weeks of opening an account.
- Substantial deviation of usage profile of a new user from an average new user.
Common techniques to control subscription based fraud include threshold based analysis, inference rules analysis, profile based analysis such as habitual user profiles and neural networks.
Fraud Management Systems typically read and store usage data from the service provider’s network switching equipment and allows queries to be executed against the data that detect suspicious usage patterns.
They also allow operators to review customer accounts that have suspicious activity, to track their investigation and record the final case resolution.
It should be noted that fraud is different from revenue leakage. Revenue leakage is characterized by the loss of revenues resulting from operational or technical loopholes where the resulting losses are sometimes recoverable and generally detected through audits or similar procedures . Fraud, on the other hand, is characterized with theft by deception, typically characterized by evidence of intent where the resulting losses are often not recoverable and may be detected by analysis of calling patterns.
Another important class of Revenue Assurance tools includes Churn Management tools. Churn management is an important area for service providers that have subscription-based business - due to price wars, aggressive marketing and promotions from competing service providers, and customer’s expectations related to customer service.
Churn Management tools provide functions such as automated behavior analysis, forecasting and simulation, empirical profiling, churn metrics capture, that enable service providers to learn which customers are likely to leave and take appropriate countermeasures.