AutoNOC 2.5 User Guide
Preface
Acknowledgements
System Requirements
Legal

Part 1 - Introduction
1.1 The Ideal Difference
1.2 Automated Operations
1.3 Services & Scaler
1.4 Acquisition Stacks
1.5 Portal Deployment
1.6 Discovery and Crawler
1.7 Monitoring Agents
1.8 Recoiling Database
1.9 Multiple Languages
1.10 Security

Part 2 - NOC Views
2.1 Investigate
2.2 Observe
2.3 Visualize
2.4 Alarms
2.5 Analyze
2.6 Design
2.7 Configure

Part 3 - Model Design
3.1 Object Model
3.2 Devices
3.3 Sets
3.4 Set Criteria
3.5 Probes
3.6 Logs & Events
3.7 Alarms
3.8 Actions
3.9 Reports
3.10 Users
3.11 Polling
3.12 Service Levels
3.13 Dependencies
3.14 Performance

Part 4 - Developer Features
4.1 Adding SNMP MIBs
4.2 Variables
4.3 OSP API
4.4 Probe Template
4.5 Log Template
4.6 Device Template
4.7 Interface Template
4.8 Rebranding

Part 5 - Troubleshooting
5.1 General Issues
5.2 Linux
5.3 Windows

Appendix
A.1 OSP API Functions
A.2 Variables
A.3 Object Reference


3.14 Performance Tuning
In terms of operational application complexity, we have often offered the view that AutoNOC is as much as 10X more complex than a SQL server. Our reasonsing in this claim has been related to the number of moving pieces in the technology. As a general theory, at any given time, the following operations may be occurring:
  • Active Data Acquisition
    AutoNOC is breaking down probe expressions, sending out related packets, receiving results, rebuilding the expressions, and storing the data.

  • Passive Data Acquisition
    New data and events are coming in and AutoNOC is evaluating, processing, filtering, and storing them.

  • Alarm Computation
    On scheduled intervals AutoNOC will process the existing alarms to determine what objects are tripping the alarm. When using the process every data point algorithm, every uniquely acquired time second interval is processed for the alarm since it was last processed.

  • User GUI Queries
    User HTTP queries require various forms of graph analysis, HTTP page generation, and database queries. They only occur when there is an active user.

  • OSP Queries
    OSP queries are programmatic queries that come in via the OSP command port. These queries can involve a number of user driven computation and data queries, however processing is only used when there are OSP queries.

  • Portal to Portal Queries
    Portal to portal queries occur related to the peer to peer model. Aside from a nominal amount of global network map and routing sharing and replication, the majority overhead of portal to portal queries is user or task driven.

  • Reporting Computation
    Reports are generated live by the user as well as on a schedule. Reports are computed and mailed after midnight one at a time to minimize overhead on the server.

  • Computation of Precomputed Variables
    In addition to midnight processing of reports, AutoNOC also performs computation of any necessary scheduled precomputed variables. This is mostly the case with baselines and trending which has the averages updated after midnight.

  • Persistence, RDB Compression, Indexing, and Formation
    The last major computationally intensive process in AutoNOC is the creation, compression, and management of RDB files. This is typically performed automatically on an interval.

As should be apparent, AutoNOC performs a significant number of on-going real-time tasks to manage and monitor the network.

3.14.1 Multiprocessing and Number of Processors
AutoNOC's operations core is fully multiprocessing and because of the number of tasks occurring at the same time, it can greatly benefit from multiple CPUs.

  • Two CPUs Recommended
    AutoNOC spawns and manages multiple operations threads for acquisition and user query response. Many of these operations occur at the same time. With large network models the many different asynchronous operations can lead to a lot of overhead related to thread switching. This has consequences in terms of the overhead of the switch as well as in terms of internal CPU cache's which remain less valid. AutoNOC is definitely a product that can benefit from multiple CPUs.

  • Two Slower CPUs Better than One Fast One
    In our experience, a single server with two 1 Ghz processors will perform better than a server with a single 2.4 ghz processor. This is because of the number of on-going tasks that all run at the same time. Two CPUs means there is greater likelihood of a processor being free at a given time to perform some new task.

3.14.2 In Memory Database and RAM Utilization
AutoNOC uses system RAM in the following manner:

  • Newly Acquired Data Kept In Memory
    All new incoming data in AutoNOC is kept in memory until the next recoil. This is done for performance reasons and insures the software is able to handle the large volumes of incoming network operations event, passive, and active data required. Therefore, the more RAM you have, the longer the recoiling interval can be.

  • Pre-Computed RDB Accesses
    AutoNOC keeps RDB queries cached in memory for each probe for a period of time that lasts up to an hour in case they may be used again. This is designed to improve the performance of system graphing, alarm analysis, reporting and similar types of RDB queries that may require multiple accesses to the same range of data. AutoNOC will release this data periodically when it does not need it.

3.14.3 Recoiling Intervals
According to a user defined interval, AutoNOC will write out the data it has acquired, compress it, index it, and archive it for high-speed retrieval later. This process is computationally intensive, however it is what allows AutoNOC to manage and store the large data sets and "infinite" histories that it manages. On a typical test portal with two CPUs, the Processor Usage looks like the following graph (from AutoNOC). This process of filling up a queue (which has a memory map that looks much like a spring) and then processing, compressing, indexing, and writing the data away is called recoiling.

Each spike in the graph below shows the recoil interval.

pic_recoils.png (43827 bytes)
The following graph is the above graph zoomed in. In this case, the user can see that the recoil interval is set to 30 minutes.

pic_recoils2.png (20691 bytes)
The recoiling interval can be adjusted by the user under Configure | Services | Persistence on the Volume Settings tab. Note that AutoNOC also writes off the operations model during recoiling if it has been flagged as being changed, or if a backup has become necessary.

3.14.4 Alarm Processing Performance
The computation and analysis of alarms in AutoNOC can require a good amount of CPU usage. There are a number of variables related to how much processor usage and alarm needs to be processed and these are described below:

  • Size of Monitored Alarm Set
    AutoNOC has to look at each object in the monitored set and determine if the object trips the alarm expression.

     
  • Complexity of Trigger Expression
    The more complex the trigger expression, the longer it will take to compute the alarm set.

      
  • Compute Interval
    AutoNOC will attempt to compute each alarm based on the compute interval the shorter the computation interval, the more times the alarm will need to be processed.

  • Processing of Every Data Point
    AutoNOC includes the option of processing every data point plausible. When this option is used, AutoNOC will query objects that affect the expression for all data points that have data that changes. For instance, when analyzing the level of a device, it will look at all probe data points since the last alarm computation. This can use significantly more CPU than just looking at the current state, but it results in alarm analysis that will not miss conditions related to some complex trigger functions.

pic_alarms_settings.png (32088 bytes)

3.14.5 Improving Report Computation and Performance
AutoNOC can use significant CPU power for large computationally intensive reports. Because of this, the software provides the ability to do data sampling while computing reports. The following diagram demonstrates this.

pic_reports_sampling.png (59577 bytes)

In the above example report, a sampling size of 50 is used. That means the time range for the report will be broken up into 50 segments and AutoNOC will use the most recent data point in each segment when evaluating the expression and the report. Increase the size of this segment size to get more accurate reports, decrease it for faster computation, or disable sampling by specifying 0. Note that specifying 0 means every data point, to the 60 second resolution for every object will be evaluated and this can take quite some time.

3.14.6 Model Design Tips
The following are some tips with respect to designing your model to be higher performance:

  • Use Folders to Maximize Binary Digital Search Tree (BDST) Performance
    AutoNOC's operations model makes heavy use of digital partition trees for high performance access to objects within the operations model. It maintains internal cache and direct map ID information that make it very quick to rip through a tree and find the appropriate objects. However, if all of your devices or objects are located in one folder, then AutoNOC will not be able to make use of the BDST technologies and will have to search through all of the objects.

  • Monitor Set Query Performance
    AutoNOC provides a performance counter on on the set results display in seconds that is beneficial in tweaking the performance of your set queries.
(C) 2007 - All Rights Reserved - AutoNOC LLC
      Call me!