3.14
Performance Tuning
In terms of operational application complexity, we have
often offered the view that AutoNOC is as much as 10X more complex than a SQL server. Our
reasonsing in this claim has been related to the number of moving pieces in the
technology. As a general theory, at any given time, the following operations may be
occurring:
- Active Data Acquisition
AutoNOC is breaking down probe expressions, sending out related packets,
receiving results, rebuilding the expressions, and storing the data.
- Passive Data Acquisition
New data and events are coming in and AutoNOC is evaluating, processing,
filtering, and storing them.
- Alarm Computation
On scheduled intervals AutoNOC will process the existing alarms to determine
what objects are tripping the alarm. When using the process every data point algorithm,
every uniquely acquired time second interval is processed for the alarm since it was last
processed.
- User GUI Queries
User HTTP queries require various forms of graph analysis, HTTP page
generation, and database queries. They only occur when there is an active user.
- OSP Queries
OSP queries are programmatic queries that come in via the OSP command port.
These queries can involve a number of user driven computation and data queries, however
processing is only used when there are OSP queries.
- Portal to Portal Queries
Portal to portal queries occur related to the peer to peer model. Aside from
a nominal amount of global network map and routing sharing and replication, the majority
overhead of portal to portal queries is user or task driven.
- Reporting Computation
Reports are generated live by the user as well as on a schedule. Reports are
computed and mailed after midnight one at a time to minimize overhead on the server.
- Computation of Precomputed Variables
In addition to midnight processing of reports, AutoNOC also performs
computation of any necessary scheduled precomputed variables. This is mostly the case with
baselines and trending which has the averages updated after midnight.
- Persistence, RDB Compression, Indexing, and Formation
The last major computationally intensive process in AutoNOC is the creation,
compression, and management of RDB files. This is typically performed automatically on an
interval.
As should be apparent, AutoNOC performs a significant
number of on-going real-time tasks to manage and monitor the network.
3.14.1 Multiprocessing and
Number of Processors
AutoNOC's operations core is fully multiprocessing and
because of the number of tasks occurring at the same time, it can greatly benefit from
multiple CPUs.
- Two CPUs Recommended
AutoNOC spawns and manages multiple operations threads for acquisition and
user query response. Many of these operations occur at the same time. With large network
models the many different asynchronous operations can lead to a lot of overhead related to
thread switching. This has consequences in terms of the overhead of the switch as well as
in terms of internal CPU cache's which remain less valid. AutoNOC is definitely a product
that can benefit from multiple CPUs.
- Two Slower CPUs Better than One Fast One
In our experience, a single server with two 1 Ghz processors will perform
better than a server with a single 2.4 ghz processor. This is because of the number of
on-going tasks that all run at the same time. Two CPUs means there is greater likelihood
of a processor being free at a given time to perform some new task.
3.14.2 In Memory Database and RAM
Utilization
AutoNOC uses system RAM in the following manner:
- Newly Acquired Data Kept In Memory
All new incoming data in AutoNOC is kept in memory until the next recoil.
This is done for performance reasons and insures the software is able to handle the large
volumes of incoming network operations event, passive, and active data required.
Therefore, the more RAM you have, the longer the recoiling interval can be.
- Pre-Computed RDB Accesses
AutoNOC keeps RDB queries cached in memory for each probe for a period of
time that lasts up to an hour in case they may be used again. This is designed to improve
the performance of system graphing, alarm analysis, reporting and similar types of RDB
queries that may require multiple accesses to the same range of data. AutoNOC will release
this data periodically when it does not need it.
3.14.3 Recoiling Intervals
According to a
user defined interval, AutoNOC will write out the data it has acquired, compress it, index
it, and archive it for high-speed retrieval later. This process is computationally
intensive, however it is what allows AutoNOC to manage and store the large data sets and
"infinite" histories that it manages. On a typical test portal with two CPUs,
the Processor Usage looks like the following graph (from AutoNOC). This process of filling
up a queue (which has a memory map that looks much like a spring) and then processing,
compressing, indexing, and writing the data away is called recoiling.
Each spike in the graph below shows the recoil interval.

The following graph is the above graph zoomed in. In this case, the user can see
that the recoil interval is set to 30 minutes.

The recoiling interval can be adjusted by the user under Configure
| Services | Persistence on the Volume Settings tab. Note that AutoNOC also
writes off the operations model during recoiling if it has been flagged as being changed,
or if a backup has become necessary.
3.14.4 Alarm Processing Performance
The computation
and analysis of alarms in AutoNOC can require a good amount of CPU usage. There are a
number of variables related to how much processor usage and alarm needs to be processed
and these are described below:
- Size of Monitored Alarm Set
AutoNOC has to look at each object in the monitored set and determine if the
object trips the alarm expression.
- Complexity of Trigger Expression
The more complex the trigger expression, the longer it will take to compute
the alarm set.
- Compute Interval
AutoNOC will attempt to compute each alarm based on the compute interval the
shorter the computation interval, the more times the alarm will need to be processed.
- Processing of Every Data Point
AutoNOC includes the option of processing every data point plausible. When
this option is used, AutoNOC will query objects that affect the expression for all data
points that have data that changes. For instance, when analyzing the level of a device, it
will look at all probe data points since the last alarm computation. This can use
significantly more CPU than just looking at the current state, but it results in alarm
analysis that will not miss conditions related to some complex trigger functions.

3.14.5 Improving Report Computation and
Performance
AutoNOC can use significant CPU power for
large computationally intensive reports. Because of this, the software provides the
ability to do data sampling while computing reports. The following diagram demonstrates
this.
In the above example report, a sampling
size of 50 is used. That means the time range for the report will be broken up into 50
segments and AutoNOC will use the most recent data point in each segment when evaluating
the expression and the report. Increase the size of this segment size to get more accurate
reports, decrease it for faster computation, or disable sampling by specifying 0. Note
that specifying 0 means every data point, to the 60 second resolution for every object
will be evaluated and this can take quite some time.
3.14.6 Model Design Tips
The following are some tips with respect to
designing your model to be higher performance:
- Use Folders to Maximize Binary Digital Search Tree (BDST)
Performance
AutoNOC's operations model makes heavy use of digital partition trees for
high performance access to objects within the operations model. It maintains internal
cache and direct map ID information that make it very quick to rip through a tree and find
the appropriate objects. However, if all of your devices or objects are located in one
folder, then AutoNOC will not be able to make use of the BDST technologies and will have
to search through all of the objects.
- Monitor Set Query Performance
AutoNOC provides a performance counter on on the set results display in
seconds that is beneficial in tweaking the performance of your set queries.
|