Performance Module
Following is the list of metrics in the AimBetter Platform Performance Module in each Tab.
-Windows
— Hosts
Metric | Description | Investigate this alert |
CPU Usage | The overall percentage of time the CPU spends executing non-idle tasks. High CPU utilization may indicate that the CPU is under heavy load and could be a bottleneck in system performance. | Read more |
CPU Queue Length | The number of processes waiting to be executed by the CPU in a system at a given time. Can provide insights into system performance and potential bottlenecks. | |
CPU Hardware Interrupts | These interrupts are signals that occur when a hardware device requires servicing or when a specific event or condition, such as a timer or a peripheral input, needs to be handled by the CPU. | Read more |
CPU DPC Interrupts | Also known as Deferred Procedure Call interrupts, device drivers generate this type of hardware interrupt (in Windows operating systems) to request deferred execution of time-sensitive tasks that cannot be performed immediately. | |
Total memory | The total amount of RAM (Random Access Memory) in GB. It is a volatile memory that provides temporary storage for data and instructions that the CPU needs to access quickly. | Read more |
Memory free | The amount of system memory (RAM) in GB that is currently not used by any active processes or applications. It represents the memory portion readily available for immediate allocation and usage by the operating system or any newly launched programs. | Read more |
Memory free % (percentage) | The percentage level of the system’s free memory in comparison to the total memory. | Read more |
Disk Usage | Indicates how much of the total disk capacity is currently being utilized by various files and system components in GB. | Read more |
Disk Busy | Reflects the proportion of time (percentage) the disk is actively engaged in performing read or write operations compared to the total available time. | Read more |
Disk IOPS | Disk IOPS (Input/Output Operations per Second) indicates how much work the disk is handling. Each IO operation involve a block of data, typically defined in kilobytes (typically for SQL Server it’s 64 KB). The size of the data block can impact the reported IOPS value. | Read more |
Last Restart | Last system restart. | |
Ping Lost Packets (0-12) | The amount of unsuccessful communication integrity checks out of 12 attempts. As default, ping is sent to google.com. This can be changed on the AimBetter configuration. | Read more |
Network Jitter | The variation in milliseconds in the delay of packet delivery during all 12 communication integrity checks. It is a measure of the variability or inconsistency in the timing of data packets as they travel from the source to the destination. | Read more |
Network Latency | The time taken in milliseconds for a packet to travel from its source to its destination, including the time spent in transit and any processing delays along the way. | Read more |
Internet Latency | The delay or latency in milliseconds experienced when data packets travel between a user’s device and a remote server or destination over the internet. | Read more |
Internet Jitter | The variation in latency or delay experienced by data packets as they travel over the internet during all 12 communication integrity checks. | Read more |
Uptime | The duration of time that a server has been continuously running without experiencing a restart or shutdown. | |
OS | The operating system version name. | |
SP | The operating system updated version (service pack). | |
CPU Cores | The number of individual processing units within a central processing unit (CPU). Each CPU core can execute instructions and perform calculations independently of other cores. | Read more |
Memory Page Read | The time for retrieving data from a page of memory into the processor’s cache. Memory page reads are essential for efficient memory access and are typically performed automatically by the hardware and operating system. To minimize the number of pages reads, frequently accessed data should be kept in the processor’s cache. | |
Paging Used | The amount of Pagefile usage, which might be reffered as virtual memory also, found on the computer Disk storage that serves to aid the system’s physical memory (RAM) when there is a need for additional memory to operate processes. | Read more |
Total Disk IO | The amount of reading (output) and writing (input) from the system Disk storage, measured as the number of IO operations per second (IOPS). |
— Network
Metric | Description | Investigate this alert |
Card Name | The sampled network card name | |
Bandwidth | The amount of data that can be transmitted over a network in gigabits per second (Gbps). Bandwidth relates to the speed at which data can be transmitted between devices. A higher bandwidth allows for faster and more extensive data. There are situations where the card configuration is set in a sub-optimal way. For example, the card supports 1Gbps but is set at 100Mbps. | |
Network utilization | The percentage of available network bandwidth that is being used by data traffic at a given time. A high percentage indicates an extensive transfer of data. This will cause data transfer slowness between different programs and systems throughout the network. | Read more |
Receive Kbyte (sec) | The amount of data received through the network card in Kbps (KB per second). High values indicate that the server is receiving large amounts of data which can be the cause of system slowness. | |
Send Kbyte (sec) | The amount of data sent through the network card in Kbps (KB per second). High values indicate that the server is sending large amounts of data which can be the cause of system slowness. | |
Model | The name and model number of a specific network interface card (NIC). |
— Disk
Metric | Description | Investigate this alert |
Drive | The drive name. | |
Total | The total disk storage capacity (in GB). | |
Disk Usage (GB) | The disk storage usage in GB. Usage higher than 95% of the storage space can lead to loss of information and the integrity of programs and processes in the system. | Read more |
Free Space | The free disk storage space in GB. Low free storage space can lead to loss of information and the integrity of the programs and processes in the system. | Read more |
Busy Time | Indicates the percentage of time the disk is actively handling I/O operations (Read/ Write) compared to its idle or idle-like state. A high value of usage can cause system slowness. | Read more |
Write /R (ms) | The time taken to write to the disk in milliseconds. Writing time higher than one millisecond indicates a load on the disk or a lack of integrity. | Read more |
Read /R (ms) | The time taken to read to the disk in milliseconds. Reading time higher than one millisecond indicates a load on the disk or a lack of integrity. | Read more |
IO (sec) | The amount of data that is readen/written from/to the disk per second. If the amount of reading and writing is high, the system will respond slowly. | Read more |
IO Write(sec) | The amount of writing to the disk per second. If the writing amount is high, the system will respond slowly. | Read more |
IO Read(sec) | The amount of reading from the disk per second. If the amount of reading is high, the system response may be slow. | Read more |
Disk Free (%) | The percentage of the free disk storage space in GB. A low percentage indicates that disk storage capacity almost reached its limit. It’s recommended to look for the processes or files that consume most of the storage. | Read more |
Est. Max IO | Estimated maximum I/O operations rate that the disk can reach. This parameter is important to understand whether the system can handle the expected workload and identify potential performance limitations or the need for additional disk resources. | Read more |
— CPU
Metric | Description | Investigate this alert |
Core Usage | The percentage of time an individual core spends executing non-idle tasks. | Read more |
Core No. | The core number. | |
Core Hardware interrupts | Indicates the signals a specific core receives from an OS external device. A high value may indicate that there are processes that can be the cause of slowness in the OS. | Read more |
Core DPC interrupts | Indicates the DPC signals a specific core receives. DPC= deferred procedure calls are interrupts that are run at a lower priority than standard interrupts. A high value indicates that there may be a processor bottleneck or an application or hardware-related issue that can significantly diminish overall system performance. |
— Paging
Metric | Description | Investigate this alert |
Pagefile | The Pagefile path that the operating system uses as an extension of physical memory. When physical memory (RAM) becomes scarce, the operating system can move infrequently accessed or idle pages to the page file to free up memory for other processes or data. | |
Used | The amount of Pagefile usage on the disk. Addition of physical memory should be considered in case of consistent high pagefile usage value. | Read more |
Max | If the Pagefile limit has been manually set, this metric indicates the maximum storage space assigned to the Pagefile. | |
Init | If the Pagefile limit has been manually set, this metric indicates the initial storage space assigned to the Pagefile. | |
Manage type | How the Pagefile limit (virtual Memory) has been set: manually or automatically (the latter is less recommended). | |
Allocated | The physical size assigned to Pagefile (virtual memory) on the disk storage space. |
— Services
Metric | Description | Investigate this alert |
Host | The host name as defined in the AimBetter Configuration | |
Name | The name of the service in the registry (service’s key) | |
Display Name | A user-friendly name for the service that appears in the Services control panel application | |
State | The service status: Running, Stopped or Paused | |
Mode | The service operation mode: Manual, Automatic, or Disabled. We recommend setting critical services to Automatic mode. The Manual setting is for when we want to control the moment the service turns on. | |
Account | The authorization level with which the service is running. This is important in cases where we want to grant limited permissions to specific users of the service. | |
Path | The service executable file location. | |
Running | The service status: 0 is Down, and 1 is Up. The graph indicates if a service is either Up or Down. It samples every minute and illustrates the uptime continuity. | Read more |
Host | The host name as defined in the AimBetter Configuration | |
Status | Indicates its current state or condition. | |
Description | The location of the service description. This description includes information and configuration settings of the system service. |
— Process
Metric | Description | Investigate this alert |
Host | The host name as defined in the AimBetter Configuration | |
User Name | The name of the user running the process. In cases where a process needs to be shut down due to high system resource consumption, it is important to know who is running the process. | |
Process Name | The running process name. | |
CPU | The processor usage (percentage) by the process. High values can lead to system slowness. | Read more |
Memory | The amount of physical memory utilized by the process in MB. High values may lead to slowness in system programs and processes. | Read more |
Page Files | The amount of Pagefile (virtual memory) used by the process in MB. A high value can be indicative of a problem with the physical memory. | Read more |
Virtual Memory | The process’s amount of physical memory and Pagefile (virtual memory). | |
Reads | The process’s amount of reads from the physical memory since it started or since the counter was last reset. | |
Writes | The process’s amount of writes to the physical memory since it started or since the counter was last reset. | |
Process ID | A number identifying the process in the system. | |
Command Line | The running command of the executable file which the process is running, including parameters. | |
Last initialization | The time when the process was initiated. | |
Path | The path of the executable file. | |
Page Fault/sec | The rate at which page faults occur. It’s used to assess the efficiency of memory management in a system . | |
Uptime | The duration of time that the process has been continuously running without experiencing a restart or pausing. | |
Private Memory | A memory exclusively allocated and accessible to a specific process or application. It’s used to store information that is unique to a particular process, such as the process’s variables, stack, and heap. | |
Shared Memory | A memory that can be accessed by multiple processes simultaneously. It’s used for inter-process communication (IPC), where one process writes data into the shared memory region, and other processes can read that data from the same memory region. | |
Physical Mem. | The actual hardware memory used by the process. | |
Virtual Mem. | The process usage of memory beyond the physical memory such as disk space or swap space as an extension of the RAM memory. The general recommendation is for this value to be as low as possible. |
— Images
Metric | Description | Investigate this alert |
Host | The name of the server where the IMAGE is located, according to the name given in the UI during the installation. | |
Image | The Image name. An Image is a compiled binary file that contains the machine code representation of a program or software application after it undergoes the compilation process (the executable source code of a program). | |
CPU | CPU Consumption (%). Values above the average, indicate an increase in system resource consumption, which may lead to system latency. | Read more |
Memory | The amount of physical memory (RAM) utilized by the program in MB or GB. High values may lead to slowness in system programs and processes. | Read more |
Process Count | The number of processes currently running on the server related to the selected IMAGE. A high number of processes can indicate multiple processes that lead to high resource consumption. | |
Path | The image path. |
— MSSQL on Windows
Metric | Description | Investigate this alert |
Version | The SQL version, installed on the server | |
Instance | The SQL server instance name given in the installation. | |
Test connection | A check of the time to establish a connection to the SQL server in milliseconds. A high value indicates that there are network communication problems or a load on the SQL server. | Read more |
Last Restart | SQL server last restart | |
Collation | In SQL Server, a collation is a set of rules that determine how data is sorted and compared, for string based operations. SQL collations allow database administrators to define the appropriate rules for sorting and comparing strings based on the specific language and cultural context of the data being stored. | |
Edition | The installed SQL Server edition. There are numerous editions, and each edition has two runtimes – 32bit or 64bit—Ex: Express, Developer, Enterprise, etc. | |
SP | The Service Pack, which includes cumulative updates of all the fixes and improvements from previous service packs and cumulative updates for a specific version of SQL Server. | |
Page life expectancy | The time SQL keeps the retrieved information from the server’s physical memory in seconds. Low values indicate that the SQL is exchanging the information that arrives from the physical memory at a high frequency and needs more physical memory in order to perform faster. | Read more |
User Connections | A connection established between a client application and a database server using SQL credentials is considered a single user connection. A large number may indicate a load on the system, a fault, or a security error. | Read more |
Connection reuse/sec | The total number of logins started from the connection pool per second. Apps tend to open and close connections repeatedly – this value indicates the amount of the connections’ reuse. | |
Batch requests/sec | The number of updates, retrievals, deletions, or saving operations in the SQL per second. This metric enables the user to detect abnormalities in the operations amount on the SQL server. | Read more |
Buffer cache hit ratio | The percentage of memory requests that are satisfied from the cache (physical memory of the SQL server). Values below 90% indicate multiple reads/writes from/to the main memory or disk storage. You should investigate whether there is a high physical memory consumption by different programs or processes and consider the need to add physical memory to the SQL server. | Read more |
Page reads/sec | The amount of Page reads (each page weighs 8Kb) from the disk per second. Many reads indicate that we should examine the SQL server’s integrity, indexing, and system query logic. | |
Page writes/sec | The amount of Page writes (each page weighs 8Kb) to the disk per second. Many writes indicate that we should examine the SQL server’s integrity, indexing, and system query logic. | |
SP Compilation | The number of times the SQL compiles the running programs of the queries per second. A large amount of program compilation along with a small number of Batch requests indicates large usage of direct queries, p_executesql, and no procedures with determined variables. | |
SP Re Compilation | The number of times the SQL recompiles the running programs of the queries per second. A large amount of program recompilation, combined with a small number of Batch requests, indicates that the request retrieves have grown, a statistical update has been performed, or the indexing has been recompiled. We should investigate the amount of information and whether or not the other operations have been performed. | |
Page Lookups | The number of times SQL seeks pages (each page size is 8Kb) from the physical memory. (Page lookups/sec) / (Batch requests/sec) greater than 100 indicate that some queries are not running optimally. | |
Latches Times | The duration in seconds for which a thread holds exclusive access (“latch”) to a shared resource (for ex. “latched table”). A high amount of latches causes slowness in data reception from the latched tables. We should investigate a change in the Update or Deletion method. | |
Page Splits/sec | The number of pages per second splitting for allocation purposes in the event that the index does not have space at the frequency of a second. An amount higher than 20 per second requires a check of the index specifications. | |
Checkpoint Pages/sec | The data pages that are written per second to disk during a checkpoint operation. A checkpoint is a process in which the SQL Server ensures that all modified data pages in memory are flushed and written to disk to maintain data consistency and durability. | |
DB IO/sec | The amount of reads and writes of the entire database per second | |
Target Memory | The target RAM memory limit that the SQL Server is allowed to consume and utilize for its internal operations. | |
Memory | The amount of memory SQL Server is utilizing in MB. If SQL is not using the maximum memory amount specified, we should consider lowering this amount. | |
Memory Details | A description (cake) of the division of the physical memory usage of the SQL Server for the database, internal needs, and free memory in MB | |
DB Memory | The memory used by the SQL Server instance to cache data and other objects related to specific databases. | |
Free Memory | The amount of physical memory not utilized by SQL Server in MB. A high value may indicate that the assigned memory to the SQL Server can be reduced. | |
Internal Memory | The amount of physical memory which the SQL Server is utilizing for internal operations, not including operations for the database, in MB. For example: buffer pool, execution plans, system tables, procedures cache, and management. | Read more |
Memory (min) | The minimum amount of assigned physical memory which the SQL can use in MB. | |
Memory(max) | The maximum amount of assigned physical memory which the SQL can use in MB. | |
Temp table creation/sec | Amount of temp table creations per second. | |
Uptime | How long the SQL server management studio has been up and running. It’s recommended to have as high uptime value as possible on high-traffic instances. | |
Cluster active name | The name of the active cluster in clustered instances. | |
Cluster nodes down | The amount of cluster nodes that are down. The server may be one of the cluster nodes. | |
Transactions/sec | Number of open transactions per second. Values higher than usual can cause system slowness. | Read more |
Lazy writes/sec | Measures the process of flushing modified data pages from memory (buffer cache) to disk (data files) per second. By deferring the immediate disk writes and batching them together, lazy writes help improve system performance. It reduces the frequency of disk I/O operations, minimizes disk access latency, and allows the system to perform multiple writes in a more efficient manner. A high value indicates that the SQL server needs more memory and can affect other OS resources, such as disk IO and CPU usage. | |
Index Full scan/sec | Amount of indexes that were scanned per second. This is an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has a “NOT FULL” constraint | |
Index page splits/sec | Amount of indexes page split for second, affected by fragmentation. Page split describes a situation when there’s no dedicated space for updating/inserting value to the table, the split is to free space for the command to complete. | Logins/sec | The number of logins to the SQL server per second. A high value may refer to security or application problems. |
Logouts/sec | The number of logouts to the SQL server per second. A high value may refer to connection problems. | |
Core available | Total cores number in the server. | |
Core in use | The number of cores that are assigned for SQL Server use. The recommendation is that the SQL Server will use all cores. | |
Session Memory wait | The number of sessions that are waiting for free memory. Those queries don’t have enough RAM memory to start running, so they are “delayed.” | |
Create temp table/ variables | The number of created temporary tables/ variables available. A high value can indicate unnecessary open connections. | |
TempDB free space | Tempdb database unused data space in KB. A high value may indicate unusual data growth. | |
Session avg. wait for signal | The average wait time in mili-seconds that the SQL Server reports that he’s in a wait. The threshold leans on past activity and behavior. When the value is higher than average, it can cause SQL Server slow performance. | Read more |
Session CPU wait | The number of queries that SQL Server reports as waiting for CPU availability. The threshold leans on past activity and behavior. When the value is higher than average, it’s recommended to investigate and look for those queries. | Read more |
Currently Active | The number of queries that are currently running (status is running; for oracle, status is active) | |
Currently Blocked | The number of queries that are currently blocked (status is suspended) | Read more |
Currently Sleeping | The number of queries that are currently sleeping. A query that has been executed, and its results have been returned to the client application, but the connection to the database is still open and waiting for further instructions or actions from the client. | |
Currently Background | The number of queries that are running on the background. This separates from the main execution flow of a program or application in order to prevent blockings. | |
Currently Open Transactions | The number of queries that have open transactions at the moment. | Read more |
Currently Killed | The number of queries that were killed at the last minute. | Read more |
Currently Avg Duration/sec | Measures the average time taken to execute a single query or command in seconds. | |
Number of Queries 0-9.99 | A count of queries running up to 10 seconds. | Read more |
Number of Queries 10-19.99 | A count of queries running between 10 to 20 seconds. | Read more |
Number of Queries 20-29.99 | A count of queries running between 20 to 30 seconds. | Read more |
Number of Queries 30-59.99 | A count of queries running between 30 to 60 seconds. | Read more |
Number of Queries over 60 | A count of queries running over 60 seconds. | Read more |
Subscriber High latency | The time it takes for changes made at the publisher to be replicated to the subscriber. | Read more |
Distributor High latency | The time it takes for transactional changes generated at the publisher to be delivered to the distributor for further replication processing. | Read more |
LogReader High latency | The time it takes for the LogReader agent to read the transaction log from the publisher and deliver the changes to the distributor for replication. | Read more |
— Oracle on Windows
Metric | Description | Investigate this alert |
Database | The Database name. | |
Edition | The Database edition. | |
32/64 | The Database runtime – 32bit or 64bit. | |
Version | The Database version. | |
Log Mode | Refers to the Database redo logs management, used for data integrity or recovery in a case of disaster. There are several types: ARCHIVELOG Mode- allows to create backups that capture changes made to the database since the last backup NONARCHIVELOG Mode- limits the ability to perform point-in-time recovery since archived redo logs are not available FORCE LOGGING Mode- all data changes made to the database are logged to the redo log files, even for operations that would not typically generate redo logs | |
National Language (NLS) | Refers to a set of features and settings that allow Oracle Database to handle multiple languages. | |
Patch Level | The version number of the Oracle software and the cumulative updates and releases. | |
Last Restart | Last restart in the format date:hour:minute | |
Test Connection | A check of the time to establish a connection to the Database in milliseconds. A high value indicates that there are network communication problems or a load on the Oracle Database. | Read more |
Session Limit | The utlized number of user sessions connected to the database at the moment, out of the maximum sessions allowed, in percentage. | |
Session (Max) | The maximum number of concurrent user sessions allowed to connect to the database. Each user session represents a connection with the database. | |
Processes (Max) | The maximum number of concurrent user processes allowed to connect to the database. | |
Default block size | The standard size of a data block used for storing data and managing database objects within the database’s data files. As of Oracle Database 12c, the default block size is typically 8192 bytes (8.19 KB) for a general-purpose database. | |
Open Cursors (max) | The maximum possible open cursors in the database. It is a programmatic handle or pointer used by the database to access or process the results of queries or DML statements. It is essential for developers to explicitly close cursors after they are no longer needed. | |
DR Last Sync Date | The last synchronization date and time for a Data Guard configuration (high-availability and disaster recovery solution that allows to maintain standby databases synchronized with the primary database). | Read more |
Physical Reads | The amount of reads of the entire database measured in blocks as defind in “Default Block Size”(defauly is 8KB). | |
Physical Writes | The amount of writes of the entire database measured in blocks as defind in “Default Block Size”(defauly is 8KB). | |
DR Full Backup | The last date and time of a full backup taken from the primary database and used to initialize or restore a standby database in a Data Guard configuration. | Read more |
Archive Log Backup | The last date and time of a backup operation that specifically targets the archived redo logs. | Read more |
CTL SP File Backup | The last date and time of a backup of the “control file and server parameter (SP) file.” It helps to ensure the recoverability of the database in case of disasters, media failures, or user errors. | Read more |
Avg. Threads | The average amount of concurrent executions of multiple tasks or processes, such as: Operating System threads, Java threads, Database sessions, or parallel query executions. Calculated as the amount of active sessions / CPU cores (in percentage). | |
Database CPU Time | The remaining CPU time in percentage to the execution of SQL statements by the Oracle database processes and other database-related operations. Higher values mean less waits for CPU improving performance. Best practices include minimum scans and hold-ons in queries executions. | |
Buffer Cache Hit | The percentage of a requested data block found in the database buffer cache, thereby avoiding the need to read the block from disk. | Read more |
PGA Cache Hit | The percentage of times process data requests are found in the Global Area (PGA) cache allocation, without a need for additional memory or read from disk. The higher this value, the more efficient this database is. | Read more |
Deadlocks | The amount of deadlocks in the database server. | Read more |
Invalid Object | The amount of database objects that are currently in an invalid state. | |
Redo Entries (rows update) | The amount of records that capture changes made to the database, related to redo log. When this value is higher than usual, it may indicate a possible cause for slowness in performance. | |
Query Execute | The amount of queries executed at the moment. | |
Avg. Sessions | The average amount of sessions at the moment. | |
Avg. Active | The average amount of active sessions at the moment. | |
Avg. Blocking | The average amount of blocking sessions at the moment. | Read more |
Avg. Sleeping Blocking | The average amount of both sleeping and blocking sessions at the moment. | |
Avg. Blocked | The average amount of blocked sessions at the moment. | Read more |
Avg. Open Transaction | The average amount of open transactions at the moment. | Read more |
Avg. Sleeping | The average amount of sleeping sessions at the moment. A query that has been executed, and its result has been returned to the client application, but the connection to the database is still open and waiting for further instructions or actions from the client. | |
Avg. Background | The average amount of background sessions at the moment. This separates from the main execution flow of a program or application in order to prevent blockings. | |
Avg. Duration | A count of the average duration in seconds of all queries running at the moment in different sessions. | |
Number of Queries 0-9.99 | A count of queries running up to 10 seconds. | Read more |
Number of Queries 10-19.99 | A count of queries running between 10 to 20 seconds. | Read more |
Number of Queries 20-29.99 | A count of queries running between 20 to 30 seconds. | Read more |
Number of Queries 30-59.99 | A count of queries running between 30 to 60 seconds. | Read more |
Number of Queries over 60 | A count of queries running over 60 seconds. | Read more |
Archive Logs Retention | The retention period for archived redo log files. | |
Log Switch | The count of log file switch completion which is when the database switches from writing redo log entries from one redo log group (also known as a redo log file) to another. |
— DB on Windows MSSQL
Metric | Description | Investigate this alert |
Status | Database Status: ● Online – the database is available ● Offline – the database is not in use ● Mirror Disconnect – the sync is disconnected. ● Mirror Principal – the principal sync of all updating of the database. ● Mirror – the database is synchronized. ● Restoring – the database is currently being restored ● Suspect – the database is defective | Read more |
Instance | The SQL server instance name given in the installation. | |
Database | The Database name | |
Recovery | The recovery model determines the possible restore options specified for the database. It defines how the database transaction logs are managed and which data type can be recovered in case of a failure. | |
Full Backup | The date of the last Full Backup performed on the database. The Full Backup documents are .bak files or snapshots. A Full Backup once a day is the general recommendation. | Read more |
Log Backup | The date of the last Log changes backup performed on the database. The Log Backup documents are .trn files A log backup once an hour is the general recommendation but if the recovery model is “simple” this value should be null. | Read more |
Diff Backup | Date of the last differential backup performed for the Database. In general, it’s recommended once a day but it depends on the full backup frequency. | Read more |
Memory | The amount of memory the database is taking up in the physical memory in MB. | |
Size | A description (cake) of the distribution of data and log file sizes occupying the disk storage, measured in MB. It is not recommended that the log takes up more than 60% of the database size. We should investigate the process integrity of this database, such as transactions (containing recursion) and backups. Bandwidth relates to the speed at which data can be transmitted between devices. A higher bandwidth allows for faster and more extensive data. | |
Disk IO/sec | Amount of disk reads and writes per second. Usually, the main or biggest DBs will have a high value. Higher values than usual can indicate a performance problem of queries causing other queries to wait for free IO. | |
Data Growth | The rate of information growth in the database on the disk storage in MB, which includes all the filegroups that contain the primary data file (.mdf). A lack of space in the disk storage may indicate substantial data growth in the database. | Read more |
Log Growth | The rate of log growth in the database on the disk storage in MB. A lack of space in the disk storage may indicate substantial log growth in the database. | Read more |
In-Memory | Known as In-Memory OLTP, a feature in SQL Server that leverages memory-resident tables and natively compiled stored procedures for a better performance in specific transactions. | |
Unused data space | Free space of the DB – data that is not in use. A value higher than 50% indicates that a shrink should be considered for the data growth. At least 10% of the DB unused space for indexes and more is recommended. | |
Collation | The language and the manner of string comparison defined for the database. | |
Page Verify | Page Verify is a database option that defines the SQL Server mechanism of verifying page consistency when the page is written to disk and when it is read again from disk. The recommendation is CHECKSUM. | |
DBCC last success | Last successful Database check. Checks the database’s integrity, tables, indexes, schema, etc. Running this test on a daily basis is very important for the proper functioning of the organization with the databases. The test runs on both a physical level and a logical level. | |
Compatibility | The Compiler version at the Database level. | |
Transactions | The number of transaction operations UPDATE, INSERT, DELETE, BEGIN TRAN executed per second. A high value (above average) may be the reason for slowness or log growth issues. | |
Log Flush | The time it takes to save the log found in the physical memory to the disk storage. High values affect Transaction operations, Update, and saving to SQL times causing slowness. | |
File stream Growth | File streams use of storage volume. File streams enables the storage of large amount of data (more than data 2GB storage) such as large documents, images or files. High values may cause storage problem to the data drive. | |
File stream Drive | The file stream’s drive. | |
IO | The amount of read and write operations from the disk storage at the sampled time. A high value can cause slowness as a result of a load on the disk storage | |
Log size | The size assigned to the log files of the database in MB. | |
Log Use | The size of the log used in MB | |
Log Flush | The process of writing the contents of the transaction log buffer to the physical transaction log file on the disk, measured in milliseconds. Higher time may increase the chance for data loss. | |
Log Reuse Wait | A condition where the transaction log of a database is unable to reuse or truncate log space for reuse. NOTHING is a good value for it. REPLICATION is for a database in replication program. | |
Creation Date | The Database creation date. | |
Data Files | The number of .mdf files the database contains (filegroups). | |
Log Files | The number of .ldf files the database contains. | |
Disk Log IO/sec | The number of logs input and output from the disk storage per second. | |
Data Read IO/sec | The number of reads from the disk storage per second. | |
Data Write IO/sec | The number of writes from the disk storage per second. | |
Open transactions | The number of open transactions per second. A high number of open transactions can cause log oversize. | Read more |
Log transactions | The log amount in MB while there’s an open transaction. An increased log growth can be caused when transactions don’t clean themselves while running. | |
Transaction Duration | The duration of the transaction. | |
Alwayson State | The DB’s AlwaysOn state- can be synchronized or not synchronized. Not synchronized means that there’s a problem with the AlwaysOn. | Read more |
AlwaysOn Status | AlwaysOn Status- may be healthy or not healthy. Not healthy means that there’s a problem with the AlwaysOn. | Read more |
AlwaysOn graph | If it’s active- the value is 1. If it is not active -the value is 0. | |
AlwaysOn Log records not committed at Secondary | Amount of the AlwaysOn logs that couldn’t be committed yet from the primary to the secondary server. When these graphs are active, this is the secondary group.; if there’s no data- it’s the primary group. | |
AlwaysOn Log records waiting to send to Secondary | Amount of the AlwaysOn logs waiting to be sent to the secondary server (from the primary server). When these graphs are active, this is the secondary group.; if there’s no data- it’s the primary group. | |
AlwaysOn is Primary | 0 for secondary server databases, 1 for primary server databases | |
AlwaysOn group name | The group that a DB in AlwaysON is related to. The group can hold several databases in the enterprise edition. | |
Mirror Status | The DB’s Mirror status- can be synchronized or not synchronized. Not synchronized means that there’s a problem with the Mirror. | Read more |
Mirror Status Graph | If it’s active- the value is 1. If not-the value is 0. | |
Mirror Mode | The DB’s mirror mode on the primary server is “principal,” and on the secondary server is “mirror.” | Read more |
Mirror Log records not committed at Secondary | Amount of the Mirror logs that couldn’t be committed yet from the primary to the secondary server. | Read more |
Mirror Log records waiting to send to Secondary | Amount of the Mirror logs waiting to be sent to the secondary server (from the primary server). | |
Data Drive | The drive where the data files of the DB are located. It’s recommended that there’ll be a separation between the data, logs, and tempdb files. | |
Log Drive | The drive where the log files of the DB are located. It’s recommended that there’ll be a separation between the data, logs, and tempdb files. |
— Wait Stats on Windows MSSQL
Metric | Description | Investigate this alert |
Wait Type | The SQL Server wait type name. | |
Wait (%) | The percentage of the wait time compared to other waits. If the value is higher than average, there’s a wait for the specific resource, which can be caused by a specific delayed/long-running query/ies. | |
Avg Wait (ms) | The average wait stats in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Wait (ms) | The wait time in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Tasks | The number of tasks waiting for the wait type at the moment. |
Wait Type | Description |
Azure Limit | Waits commonly seen in Azure SQL Database environments, particularly in scenarios where resources are limited or throttling is in place to manage log generation rates. Common wait types included: LOG_RATE_GOVERNOR. |
Change Tracking | Waits related to modifications on tables where Change Tracking has been enabled. Common wait types included: COMMIT_TABLE. |
CPU | Waits related to a situation where a query or process waits for CPU resources to become available. Common wait types included: SOS_SCHEDULER_YIELD. |
DB Log | Waits related to the logging operations that SQL Server performs to ensure transaction durability (ACID compliance). Common wait types included: WRITELOG, LOGMGR_FLUSH, LOGBUFFER. |
DB Maintenance | Waits related to backup operations performed as a maintenance task. Common wait types included: BACKUPIO, BACKUPTHREAD. |
DB Mirror/Snapshot | Waits related to mirror or snapshot operations. Common wait types included: FCB_REPLICA_READ. |
Full Text | Happens when SQL Server is performing full-text indexing operations, such as a query involving a full-text search or updates to full-text indexes. Common wait types included: FT_IFTS_RWLOCK. |
I/O Complete | Waits related to disk I/O operations caused by disk bottlenecks or slow storage subsystems. Common wait types included: PAGEIOLATCH_EX, PAGEIOLATCH_SH, PAGELATCH_EX. |
Latch | Lightweight synchronization waits for in-memory data structures caused by buffer pool contention or heavy I/O operations. Common wait types included: LATCH_EX, LATCH_SH. |
Lock | Waits caused by contention over database objects – high transaction concurrency or blocking. Common wait types included: LCK_M_IX, LCK_M_S, LCK_M_IS. |
Low Memory | Specific waits triggered by low memory conditions during query compilation or execution. |
Memory | Waits involving memory allocation or resource semaphore management. Common wait types included: MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT, RESOURCE_SEMAPHORE. |
Network I/O | Waits involving network communication for sending/receiving data. Common wait type included: ASYNC_NETWORK_IO. |
Parallelism | Waits caused by query parallelism, where multiple threads synchronize tasks. Common wait types included: CXPACKET, CXSYNC_PORT, EXECSYNC, CXCONSUMER. |
Remote Transaction | Waits involving distributed transactions across servers or slow network communication. Common wait types included: OLEDB,DTC. |
SQL CLR/Extended SP | Waits related to the SQL CLR (Common Language Runtime) or extended stored procedures. |
SQL Internal | Internal system waits used by SQL Server for background or maintenance tasks. Common wait types included: MISCELLANEOUS, SLEEP_TASK. |
SQL Trace | Waits related to SQL Server tracing mechanisms, such as Extended Events or SQL Profiler. Common wait types included: SQLTRACE_FILE_BUFFER. |
Thread Pool | Occurs when SQL Server cannot allocate worker threads to execute queries due to thread pool exhaustion. |
Transaction | Waits associated with transaction management, such as locks, log writing, or dependency resolution. Common wait types included: TRANSACTION_MUTEX. |
XTP In-Memory | These waits occur during operations involving memory-optimized tables and natively compiled stored procedures. Common wait types included: WAIT_XTP_RECOVERY. |
— Wait Stats on Windows Oracle
Metric | Description | Investigate this alert |
Wait Type | The Oracle Database wait type name. | |
Wait (%) | The percentage of the wait time compared to other waits. If the value is higher than average, there’s a wait for the specific resource, which can be caused by a specific delayed/long-running query/ies. | |
Avg Wait (ms) | The average wait stats in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Wait (ms) | The wait time in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Tasks | The number of tasks waiting for the wait type at the moment. |
— App Pools
Metric | Description | Investigate this alert |
App Pool | The application pool name | |
State | The application pool status- running or stopped. | |
Total App Recycles | Amount of application recycles. It is an indication of a problem if there is some value without a related scheduled task. | |
Uptime | Range of time of the application uptime in days. This value should be as higher as possible. | |
Process | The App Pool’s process name according to a specific ID. Shows only if there’s a consumption of CPU, Memory, or Page file. | |
CPU | The processor usage (%) by the app pool. Shows above a minimum defined value that was marked as influencing and significant. | |
Memory | The RAM memory consumption of the app pool in MB. Shows above a minimum defined value that was marked as influencing and significant. | |
Page files | The paging consumption of the app pool in MB. Shows above a minimum defined value that was marked as influencing and significant. |
— Web Sites
Metric | Description | Investigate this alert |
Web Site | Web site name | |
App Pool | The application pool name of the website | |
Current Connections | The number of connections to the website at the moment. A high value indicates that there are probably unnecessary open sites. | |
Get Request/Sec | The amount of get requests per second to the website at the moment. A high value indicates that there is probably a higher activity that may cause slowness. | |
Post Requests/sec | The amount of post requests per second to the website at the moment. A high value indicates that there is probably a higher activity that may cause slowness. | |
Bytes Received/Sec | The amount of received bytes per second. A high value indicates that there is probably a higher activity that may cause slowness. | |
Bytes Sent/Sec | The amount of sent bytes per second. A high value indicates that there is probably a higher activity that may cause slowness. | |
Deleted Requests/Sec | The amount of deleted requests per second. If not 0, it indicates an error status. | |
Files received/sec | The number of received files per second. | |
Files sent/sec | The amount of sent files per second. | |
Not found Errors/sec | The number of errors of “page not found” status- 404. Indicates a problem for getting to the URL page. | |
Put Requests/Sec | The amounts of put requests per second. A high value indicates that there is probably a higher activity that may cause slowness. |
-Linux
— Hosts
Metric | Description | Investigate this alert |
CPU Usage | The overall percentage of time the CPU spends executing non-idle tasks. High CPU utilization may indicate that the CPU is under heavy load and could be a bottleneck in system performance. | Read more |
Total memory | The total amount of RAM (Random Access Memory) in GB. It is a volatile memory that provides temporary storage for data and instructions that the CPU needs to access quickly. | Read more |
Memory free | The amount of system memory (RAM) in GB that is currently not used by any active processes or applications. It represents the memory portion readily available for immediate allocation and usage by the operating system or any newly launched programs. | Read more |
Memory free % (percentage) | The percentage level of the system’s free memory in comparison to the total memory. | Read more |
Ping Lost Packets (0-12) | The amount of unsuccessful communication integrity checks out of 12 attempts. | Read more |
Buffer Cache | A portion of the system memory used to cache data from disk storage devices. The buffer cache operates as a dynamic pool of memory that grows or shrinks based on the system’s demand for caching data. | |
Swap Cache | A cache for swapped-out pages that allows the kernel to quickly access and retrieve frequently accessed pages from memory without needing to perform disk I/O. | |
Total Swap | The total amount of swap space available on the system that functions as a designated area on the disk used by the Operating System as virtual memory when the physical memory is fully utilized. | |
Used Swap | The amount of used swap space. Low values mean the machine has free physical memory. | |
Free Swap | The amount of available swap space out of the total swap. High values mean free physical memory. |
— Network
Metric | Description | Investigate this alert |
Card Name | The sampled network card name | |
Model | The name and model number of a specific network interface card (NIC). | |
Receive Kbyte (sec) | The amount of data received through the network card in Kbps (KB per second). High values indicate that the server is receiving large amounts of data which can be the cause of system slowness. | |
Send Kbyte (sec) | The amount of data sent through the network card in Kbps (KB per second). High values indicate that the server is sending large amounts of data which can be the cause of system slowness. |
— Disk
Metric | Description | Investigate this alert |
Drive | The drive name. | |
File System | A structured collection of files on a disk drive or a partition, which is a segment of memory containing some specific data. | |
Mounted | A specified directory(eg. /usr/local) in the file system hierarchy to access the disk content. | |
Total | The total disk storage capacity (in GB). | |
Disk Usage (GB) | The disk storage usage in GB. Usage higher than 95% of the storage space can lead to loss of information and the integrity of programs and processes in the system. | Read more |
Disk Free | The free disk storage space in GB. Low free storage space can lead to loss of information and the integrity of the programs and processes in the system. | Read more |
IO Write(sec) | The amount of writing to the disk per second. If the writing amount is high, the system will respond slowly. | |
IO Read(sec) | The amount of reading from the disk per second. If the amount of reading is high, the system response may be slow. | |
Utilization % (%) | The percentage of the used disk storage space in GB. A high percentage indicates that disk storage capacity almost reached its limit. It’s recommended to look for the processes or files that consume most of the storage. | Read more |
— Swaps
Metric | Description | Investigate this alert |
FileName | The swap file name. | |
Type | One of the two main types of swap storage: swap partitions and swap files. Swap file is defined as a regular file on the file system that is used as virtual memory when the physical memory is full. Swap partition is a dedicated section of a hard disk or solid-state drive (SSD) that is reserved seperately for use as swap space. | |
Total Size | The total amount of available storage for the swap file. | |
Used | The storage used for the swap file. |
— Services
Metric | Description | Investigate this alert |
Host | The host name as defined in the AimBetter Configuration | |
Service | The name of the service in the registry (service’s key) | |
Loaded | In Linux, a service can be “loaded” or “not-found”. “Loaded” refers to a service that is currently configured to start automatically during system boot-up and found in the system’s management of services. “Not-Found” refers to a configuration file or service unit that does not exist, therefore it can’t be managed in the system’s management of services. | |
IsActive | Indicates whether the service is running and available for use or not. | |
Status | Indicates its current state or condition. |
— Process
Metric | Description | Investigate this alert |
Host | The host name as defined in the AimBetter Configuration | |
CPU | The processor usage (percentage) by the process. High values can lead to system slowness. | Read more |
Process ID | A number identifying the process in the system. | |
Command Line | The running command of the executable file which the process is running, including parameters. | |
Physical Mem. | The actual hardware memory used by the process. | |
Virtual Mem. | The process usage of memory beyond the physical memory such as disk space or swap space as an extension of the RAM memory. The general recommendation is for this value to be as low as possible. | |
User Name | The name of the user running the process. In cases where a process needs to be shut down due to high system resource consumption, it is important to know who is running the process. | |
Start | The time when the process was started. | |
Status | The process status. In Linux it can be: Ss, Sl or both. | |
Terminal | The process terminal. |
— MSSQL on Linux
Metric | Description | Investigate this alert |
Version | The SQL version, installed on the server | |
Instance | The SQL server instance name given in the installation. | |
Test connection | A check of the time to establish a connection to the SQL server in milliseconds. A high value indicates that there are network communication problems or a load on the SQL server. | Read more |
Last Restart | SQL server last restart | |
Collation | In SQL Server, a collation is a set of rules that determine how data is sorted and compared, for string based operations. SQL collations allow database administrators to define the appropriate rules for sorting and comparing strings based on the specific language and cultural context of the data being stored. | |
Edition | The installed SQL Server edition. There are numerous editions, and each edition has two runtimes – 32bit or 64bit—Ex: Express, Developer, Enterprise, etc. | |
SP | The Service Pack, which includes cumulative updates of all the fixes and improvements from previous service packs and cumulative updates for a specific version of SQL Server. | |
Page life expectancy | The time SQL keeps the retrieved information from the server’s physical memory in seconds. Low values indicate that the SQL is exchanging the information that arrives from the physical memory at a high frequency and needs more physical memory in order to perform faster. | Read more |
User Connections | A connection established between a client application and a database server using SQL credentials is considered a single user connection. A large number may indicate a load on the system, a fault, or a security error. | Read more |
Connection reuse/sec | The total number of logins started from the connection pool per second. Apps tend to open and close connections repeatedly – this value indicates the amount of the connections’ reuse. | |
Batch requests/sec | The number of updates, retrievals, deletions, or saving operations in the SQL per second. This metric enables the user to detect abnormalities in the operations amount on the SQL server. | Read more |
Buffer cache hit ratio | The percentage of memory requests that are satisfied from the cache (physical memory of the SQL server). Values below 90% indicate multiple reads/writes from/to the main memory or disk storage. You should investigate whether there is a high physical memory consumption by different programs or processes and consider the need to add physical memory to the SQL server. | Read more |
Page reads/sec | The amount of Page reads (each page weighs 8Kb) from the disk per second. Many reads indicate that we should examine the SQL server’s integrity, indexing, and system query logic. | |
Page writes/sec | The amount of Page writes (each page weighs 8Kb) to the disk per second. Many writes indicate that we should examine the SQL server’s integrity, indexing, and system query logic. | |
SP Compilation | The number of times the SQL compiles the running programs of the queries per second. A large amount of program compilation along with a small number of Batch requests indicates large usage of direct queries, p_executesql, and no procedures with determined variables. | |
SP Re Compilation | The number of times the SQL recompiles the running programs of the queries per second. A large amount of program recompilation, combined with a small number of Batch requests, indicates that the request retrieves have grown, a statistical update has been performed, or the indexing has been recompiled. We should investigate the amount of information and whether or not the other operations have been performed. | |
Page Lookups | The number of times SQL seeks pages (each page size is 8Kb) from the physical memory. (Page lookups/sec) / (Batch requests/sec) greater than 100 indicate that some queries are not running optimally. | |
Latches Times | The duration in seconds for which a thread holds exclusive access (“latch”) to a shared resource (for ex. “latched table”). A high amount of latches causes slowness in data reception from the latched tables. We should investigate a change in the Update or Deletion method. | |
Page Splits/sec | The number of pages per second splitting for allocation purposes in the event that the index does not have space at the frequency of a second. An amount higher than 20 per second requires a check of the index specifications. | |
Checkpoint Pages/sec | The data pages that are written per second to disk during a checkpoint operation. A checkpoint is a process in which the SQL Server ensures that all modified data pages in memory are flushed and written to disk to maintain data consistency and durability. | |
DB IO/sec | The amount of reads and writes of the entire database per second | |
Target Memory | The target RAM memory limit that the SQL Server is allowed to consume and utilize for its internal operations. | |
Memory | The amount of memory SQL Server is utilizing in MB. If SQL is not using the maximum memory amount specified, we should consider lowering this amount. | |
Memory Details | A description (cake) of the division of the physical memory usage of the SQL Server for the database, internal needs, and free memory in MB | |
DB Memory | The memory used by the SQL Server instance to cache data and other objects related to specific databases. | |
Free Memory | The amount of physical memory not utilized by SQL Server in MB. A high value may indicate that the assigned memory to the SQL Server can be reduced. | |
Internal Memory | The amount of physical memory which the SQL Server is utilizing for internal operations, not including operations for the database, in MB. For example: buffer pool, execution plans, system tables, procedures cache, and management. | Read more |
Memory (min) | The minimum amount of assigned physical memory which the SQL can use in MB. | |
Memory(max) | The maximum amount of assigned physical memory which the SQL can use in MB. | |
Temp table creation/sec | Amount of temp table creations per second. | |
Uptime | How long the SQL server management studio has been up and running. It’s recommended to have as high uptime value as possible on high-traffic instances. | |
Cluster active name | The name of the active cluster in clustered instances. | |
Cluster nodes down | The amount of cluster nodes that are down. The server may be one of the cluster nodes. | |
Transactions/sec | Number of open transactions per second. Values higher than usual can cause system slowness. | Read more |
Lazy writes/sec | Measures the process of flushing modified data pages from memory (buffer cache) to disk (data files) per second. By deferring the immediate disk writes and batching them together, lazy writes help improve system performance. It reduces the frequency of disk I/O operations, minimizes disk access latency, and allows the system to perform multiple writes in a more efficient manner. A high value indicates that the SQL server needs more memory and can affect other OS resources, such as disk IO and CPU usage. | |
Index Full scan/sec | Amount of indexes that were scanned per second. This is an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has a “NOT FULL” constraint | |
Index page splits/sec | Amount of indexes page split for second, affected by fragmentation. Page split describes a situation when there’s no dedicated space for updating/inserting value to the table, the split is to free space for the command to complete. | Logins/sec | The number of logins to the SQL server per second. A high value may refer to security or application problems. |
Logouts/sec | The number of logouts to the SQL server per second. A high value may refer to connection problems. | |
Core available | Total cores number in the server. | |
Core in use | The number of cores that are assigned for SQL Server use. The recommendation is that the SQL Server will use all cores. | |
Session Memory wait | The number of sessions that are waiting for free memory. Those queries don’t have enough RAM memory to start running, so they are “delayed.” | |
Create temp table/ variables | The number of created temporary tables/ variables available. A high value can indicate unnecessary open connections. | |
TempDB free space | Tempdb database unused data space in KB. A high value may indicate unusual data growth. | |
Session avg. wait for signal | The average wait time in mili-seconds that the SQL Server reports that he’s in a wait. The threshold leans on past activity and behavior. When the value is higher than average, it can cause SQL Server slow performance. | Read more |
Session CPU wait | The number of queries that SQL Server reports as waiting for CPU availability. The threshold leans on past activity and behavior. When the value is higher than average, it’s recommended to investigate and look for those queries. | Read more |
Currently Active | The number of queries that are currently running (status is running; for oracle, status is active) | |
Currently Blocked | The number of queries that are currently blocked (status is suspended) | Read more |
Currently Sleeping | The number of queries that are currently sleeping. A query that has been executed, and its results have been returned to the client application, but the connection to the database is still open and waiting for further instructions or actions from the client. | |
Currently Background | The number of queries that are running on the background. This separates from the main execution flow of a program or application in order to prevent blockings. | |
Currently Open Transactions | The number of queries that have open transactions at the moment. | Read more |
Currently Killed | The number of queries that were killed at the last minute. | Read more |
Currently Avg Duration/sec | Measures the average time taken to execute a single query or command in seconds. | |
Number of Queries 0-9.99 | A count of queries running up to 10 seconds. | Read more |
Number of Queries 10-19.99 | A count of queries running between 10 to 20 seconds. | Read more |
Number of Queries 20-29.99 | A count of queries running between 20 to 30 seconds. | Read more |
Number of Queries 30-59.99 | A count of queries running between 30 to 60 seconds. | Read more |
Number of Queries over 60 | A count of queries running over 60 seconds. | Read more |
Subscriber High latency | The time it takes for changes made at the publisher to be replicated to the subscriber. | Read more |
Distributor High latency | The time it takes for transactional changes generated at the publisher to be delivered to the distributor for further replication processing. | Read more |
LogReader High latency | The time it takes for the LogReader agent to read the transaction log from the publisher and deliver the changes to the distributor for replication. | Read more |
— Oracle on Linux
Metric | Description | Investigate this alert |
Database | The Database name. | |
Edition | The Database edition. | |
32/64 | The Database runtime – 32bit or 64bit. | |
Version | The Database version. | |
Log Mode | Refers to the Database redo logs management, used for data integrity or recovery in a case of disaster. There are several types: ARCHIVELOG Mode- allows to create backups that capture changes made to the database since the last backup NONARCHIVELOG Mode- limits the ability to perform point-in-time recovery since archived redo logs are not available FORCE LOGGING Mode- all data changes made to the database are logged to the redo log files, even for operations that would not typically generate redo logs | |
National Language (NLS) | Refers to a set of features and settings that allow Oracle Database to handle multiple languages. | |
Patch Level | The version number of the Oracle software and the cumulative updates and releases. | |
Last Restart | Last restart in the format date:hour:minute | |
Test Connection | A check of the time to establish a connection to the Database in milliseconds. A high value indicates that there are network communication problems or a load on the Oracle Database. | Read more |
Session Limit | The utlized number of user sessions connected to the database at the moment, out of the maximum sessions allowed, in percentage. | |
Session (Max) | The maximum number of concurrent user sessions allowed to connect to the database. Each user session represents a connection with the database. | |
Processes (Max) | The maximum number of concurrent user processes allowed to connect to the database. | |
Default block size | The standard size of a data block used for storing data and managing database objects within the database’s data files. As of Oracle Database 12c, the default block size is typically 8192 bytes (8.19 KB) for a general-purpose database. | |
Open Cursors (max) | The maximum possible open cursors in the database. It is a programmatic handle or pointer used by the database to access or process the results of queries or DML statements. It is essential for developers to explicitly close cursors after they are no longer needed. | |
DR Last Sync Date | The last synchronization date and time for a Data Guard configuration (high-availability and disaster recovery solution that allows to maintain standby databases synchronized with the primary database). | Read more |
Physical Reads | The amount of reads of the entire database measured in blocks as defind in “Default Block Size”(defauly is 8KB). | |
Physical Writes | The amount of writes of the entire database measured in blocks as defind in “Default Block Size”(defauly is 8KB). | |
DR Full Backup | The last date and time of a full backup taken from the primary database and used to initialize or restore a standby database in a Data Guard configuration. | Read more |
Archive Log Backup | The last date and time of a backup operation that specifically targets the archived redo logs. | Read more |
CTL SP File Backup | The last date and time of a backup of the “control file and server parameter (SP) file.” It helps to ensure the recoverability of the database in case of disasters, media failures, or user errors. | Read more |
Avg. Threads | The average amount of concurrent executions of multiple tasks or processes, such as: Operating System threads, Java threads, Database sessions, or parallel query executions. Calculated as the amount of active sessions / CPU cores (in percentage). | |
Database CPU Time | The remaining CPU time in percentage to the execution of SQL statements by the Oracle database processes and other database-related operations. Higher values mean less waits for CPU improving performance. Best practices include minimum scans and hold-ons in queries executions. | |
Buffer Cache Hit | The percentage of a requested data block found in the database buffer cache, thereby avoiding the need to read the block from disk. | Read more |
PGA Cache Hit | The percentage of times process data requests are found in the Global Area (PGA) cache allocation, without a need for additional memory or read from disk. The higher this value, the more efficient this database is. | Read more |
Deadlocks | The amount of deadlocks in the database server. | Read more |
Invalid Object | The amount of database objects that are currently in an invalid state. | |
Redo Entries (rows update) | The amount of records that capture changes made to the database, related to redo log. When this value is higher than usual, it may indicate a possible cause for slowness in performance. | |
Query Execute | The amount of queries executed at the moment. | |
Avg. Sessions | The average amount of sessions at the moment. | |
Avg. Active | The average amount of active sessions at the moment. | |
Avg. Blocking | The average amount of blocking sessions at the moment. | Read more |
Avg. Sleeping Blocking | The average amount of both sleeping and blocking sessions at the moment. | |
Avg. Blocked | The average amount of blocked sessions at the moment. | Read more |
Avg. Open Transaction | The average amount of open transactions at the moment. | Read more |
Avg. Sleeping | The average amount of sleeping sessions at the moment. A query that has been executed, and its result has been returned to the client application, but the connection to the database is still open and waiting for further instructions or actions from the client. | |
Avg. Background | The average amount of background sessions at the moment. This separates from the main execution flow of a program or application in order to prevent blockings. | |
Avg. Duration | A count of the average duration in seconds of all queries running at the moment in different sessions. | |
Number of Queries 0-9.99 | A count of queries running up to 10 seconds. | Read more |
Number of Queries 10-19.99 | A count of queries running between 10 to 20 seconds. | Read more |
Number of Queries 20-29.99 | A count of queries running between 20 to 30 seconds. | Read more |
Number of Queries 30-59.99 | A count of queries running between 30 to 60 seconds. | Read more |
Number of Queries over 60 | A count of queries running over 60 seconds. | Read more |
Archive Logs Retention | The retention period for archived redo log files. | |
Log Switch | The count of log file switch completion which is when the database switches from writing redo log entries from one redo log group (also known as a redo log file) to another. |
— DB on Linux MSSQL
Metric | Description | Investigate this alert |
Status | Database Status: ● Online – the database is available ● Offline – the database is not in use ● Mirror Disconnect – the sync is disconnected. ● Mirror Principal – the principal sync of all updating of the database. ● Mirror – the database is synchronized. ● Restoring – the database is currently being restored ● Suspect – the database is defective | Read more |
Instance | The SQL server instance name given in the installation. | |
Database | The Database name | |
Recovery | The recovery model determines the possible restore options specified for the database. It defines how the database transaction logs are managed and which data type can be recovered in case of a failure. | |
Full Backup | The date of the last Full Backup performed on the database. The Full Backup documents are .bak files or snapshots. A Full Backup once a day is the general recommendation. | Read more |
Log Backup | The date of the last Log changes backup performed on the database. The Log Backup documents are .trn files A log backup once an hour is the general recommendation but if the recovery model is “simple” this value should be null. | Read more |
Memory | The amount of memory the database is taking up in the physical memory in MB. | |
Size | A description (cake) of the distribution of data and log file sizes occupying the disk storage, measured in MB. It is not recommended that the log takes up more than 60% of the database size. We should investigate the process integrity of this database, such as transactions (containing recursion) and backups. Bandwidth relates to the speed at which data can be transmitted between devices. A higher bandwidth allows for faster and more extensive data. | |
Disk IO/sec | Amount of disk reads and writes per second. Usually, the main or biggest DBs will have a high value. Higher values than usual can indicate a performance problem of queries causing other queries to wait for free IO. | |
Data Growth | The rate of information growth in the database on the disk storage in MB, which includes all the filegroups that contain the primary data file (.mdf). A lack of space in the disk storage may indicate substantial data growth in the database. | Read more |
Log Growth | The rate of log growth in the database on the disk storage in MB. A lack of space in the disk storage may indicate substantial log growth in the database. | Read more |
In-Memory | Known as In-Memory OLTP, a feature in SQL Server that leverages memory-resident tables and natively compiled stored procedures for a better performance in specific transactions. | |
Unused data space | Free space of the DB – data that is not in use. A value higher than 50% indicates that a shrink should be considered for the data growth. At least 10% of the DB unused space for indexes and more is recommended. | |
Collation | The language and the manner of string comparison defined for the database. | |
Page Verify | Page Verify is a database option that defines the SQL Server mechanism of verifying page consistency when the page is written to disk and when it is read again from disk. The recommendation is CHECKSUM. | |
DBCC last success | Last successful Database check. Checks the database’s integrity, tables, indexes, schema, etc. Running this test on a daily basis is very important for the proper functioning of the organization with the databases. The test runs on both a physical level and a logical level. | |
Compatibility | The Compiler version at the Database level. | |
Diff Backup | Date of the last differential backup performed for the Database. In general, it’s recommended once a day but it depends on the full backup frequency. | Read more |
Transactions | The number of transaction operations UPDATE, INSERT, DELETE, BEGIN TRAN executed per second. A high value (above average) may be the reason for slowness or log growth issues. | |
Log Flush | The time it takes to save the log found in the physical memory to the disk storage. High values affect Transaction operations, Update, and saving to SQL times causing slowness. | |
File stream Growth | File streams use of storage volume. File streams enables the storage of large amount of data (more than data 2GB storage) such as large documents, images or files. High values may cause storage problem to the data drive. | |
File stream Drive | The file stream’s drive. | |
IO | The amount of read and write operations from the disk storage at the sampled time. A high value can cause slowness as a result of a load on the disk storage | |
Log size | The size assigned to the log files of the database in MB. | |
Log Use | The size of the log used in MB | |
Log Flush | The process of writing the contents of the transaction log buffer to the physical transaction log file on the disk, measured in milliseconds. Higher time may increase the chance for data loss. | |
Log Reuse Wait | A condition where the transaction log of a database is unable to reuse or truncate log space for reuse. NOTHING is a good value for it. REPLICATION is for a database in replication program. | |
Creation Date | The Database creation date. | |
Data Files | The number of .mdf files the database contains (filegroups). | |
Log Files | The number of .ldf files the database contains. | |
Disk Log IO/sec | The number of logs input and output from the disk storage per second. | |
Data Read IO/sec | The number of reads from the disk storage per second. | |
Data Write IO/sec | The number of writes from the disk storage per second. | |
Open transactions | The number of open transactions per second. A high number of open transactions can cause log oversize. | Read more |
Log transactions | The log amount in MB while there’s an open transaction. An increased log growth can be caused when transactions don’t clean themselves while running. | |
Transaction Duration | The duration of the transaction. | |
Alwayson State | The DB’s AlwaysOn state- can be synchronized or not synchronized. Not synchronized means that there’s a problem with the AlwaysOn. | Read more |
AlwaysOn Status | AlwaysOn Status- may be healthy or not healthy. Not healthy means that there’s a problem with the AlwaysOn. | Read more |
AlwaysOn graph | If it’s active- the value is 1. If it is not active -the value is 0. | |
AlwaysOn Log records not committed at Secondary | Amount of the AlwaysOn logs that couldn’t be committed yet from the primary to the secondary server. When these graphs are active, this is the secondary group.; if there’s no data- it’s the primary group. | |
AlwaysOn Log records waiting to send to Secondary | Amount of the AlwaysOn logs waiting to be sent to the secondary server (from the primary server). When these graphs are active, this is the secondary group.; if there’s no data- it’s the primary group. | |
AlwaysOn is Primary | 0 for secondary server databases, 1 for primary server databases | |
AlwaysOn group name | The group that a DB in AlwaysON is related to. The group can hold several databases in the enterprise edition. | |
Mirror Status | The DB’s Mirror status- can be synchronized or not synchronized. Not synchronized means that there’s a problem with the Mirror. | Read more |
Mirror Status Graph | If it’s active- the value is 1. If not-the value is 0. | |
Mirror Mode | The DB’s mirror mode on the primary server is “principal,” and on the secondary server is “mirror.” | Read more |
Mirror Log records not committed at Secondary | Amount of the Mirror logs that couldn’t be committed yet from the primary to the secondary server. | Read more |
Mirror Log records waiting to send to Secondary | Amount of the Mirror logs waiting to be sent to the secondary server (from the primary server). | |
Data Drive | The drive where the data files of the DB are located. It’s recommended that there’ll be a separation between the data, logs, and tempdb files. | |
Log Drive | The drive where the log files of the DB are located. It’s recommended that there’ll be a separation between the data, logs, and tempdb files. |
— Wait Stats on Linux MSSQL
Metric | Description | Investigate this alert |
Wait Type | The SQL Server wait type name. | |
Wait (%) | The percentage of the wait time compared to other waits. If the value is higher than average, there’s a wait for the specific resource, which can be caused by a specific delayed/long-running query/ies. | |
Avg Wait (ms) | The average wait stats in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Wait (ms) | The wait time in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Tasks | The number of tasks waiting for the wait type at the moment. |
Wait Type | Description |
Azure Limit | Waits commonly seen in Azure SQL Database environments, particularly in scenarios where resources are limited or throttling is in place to manage log generation rates. Common wait types included: LOG_RATE_GOVERNOR. |
Change Tracking | Waits related to modifications on tables where Change Tracking has been enabled. Common wait types included: COMMIT_TABLE. |
CPU | Waits related to a situation where a query or process waits for CPU resources to become available. Common wait types included: SOS_SCHEDULER_YIELD. |
DB Log | Waits related to the logging operations that SQL Server performs to ensure transaction durability (ACID compliance). Common wait types included: WRITELOG, LOGMGR_FLUSH, LOGBUFFER. |
DB Maintenance | Waits related to backup operations performed as a maintenance task. Common wait types included: BACKUPIO, BACKUPTHREAD. |
DB Mirror/Snapshot | Waits related to mirror or snapshot operations. Common wait types included: FCB_REPLICA_READ. |
Full Text | Happens when SQL Server is performing full-text indexing operations, such as a query involving a full-text search or updates to full-text indexes. Common wait types included: FT_IFTS_RWLOCK. |
I/O Complete | Waits related to disk I/O operations caused by disk bottlenecks or slow storage subsystems. Common wait types included: PAGEIOLATCH_EX, PAGEIOLATCH_SH, PAGELATCH_EX. |
Latch | Lightweight synchronization waits for in-memory data structures caused by buffer pool contention or heavy I/O operations. Common wait types included: LATCH_EX, LATCH_SH. |
Lock | Waits caused by contention over database objects – high transaction concurrency or blocking. Common wait types included: LCK_M_IX, LCK_M_S, LCK_M_IS. |
Low Memory | Specific waits triggered by low memory conditions during query compilation or execution. |
Memory | Waits involving memory allocation or resource semaphore management. Common wait types included: MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT, RESOURCE_SEMAPHORE. |
Network I/O | Waits involving network communication for sending/receiving data. Common wait type included: ASYNC_NETWORK_IO. |
Parallelism | Waits caused by query parallelism, where multiple threads synchronize tasks. Common wait types included: CXPACKET, CXSYNC_PORT, EXECSYNC, CXCONSUMER. |
Remote Transaction | Waits involving distributed transactions across servers or slow network communication. Common wait types included: OLEDB,DTC. |
SQL CLR/Extended SP | Waits related to the SQL CLR (Common Language Runtime) or extended stored procedures. |
SQL Internal | Internal system waits used by SQL Server for background or maintenance tasks. Common wait types included: MISCELLANEOUS, SLEEP_TASK. |
SQL Trace | Waits related to SQL Server tracing mechanisms, such as Extended Events or SQL Profiler. Common wait types included: SQLTRACE_FILE_BUFFER. |
Thread Pool | Occurs when SQL Server cannot allocate worker threads to execute queries due to thread pool exhaustion. |
Transaction | Waits associated with transaction management, such as locks, log writing, or dependency resolution. Common wait types included: TRANSACTION_MUTEX. |
XTP In-Memory | These waits occur during operations involving memory-optimized tables and natively compiled stored procedures. Common wait types included: WAIT_XTP_RECOVERY. |
— Wait Stats on Linux Oracle
Metric | Description | Investigate this alert |
Wait Type | The Oracle Database wait type name. | |
Wait (%) | The percentage of the wait time compared to other waits. If the value is higher than average, there’s a wait for the specific resource, which can be caused by a specific delayed/long-running query/ies. | |
Avg Wait (ms) | The average wait stats in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Wait (ms) | The wait time in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Tasks | The number of tasks waiting for the wait type at the moment. |
-Azure
— Service
Metric | Description | Investigate this alert |
Cloud | The Database name given in the Aimbetter configuration. | |
SKU | SKU (Stock Keeping Unit) is an identifier used to represent different service tiers and performance levels for Azure products. | |
Capacity | The capacity for Azure SQL Database, it’s defined by the selected service tier and the number of DTUs. | |
DTU | Database Throughput Units – a performance metric to measure the resources usage (Data IO, Log Write, CPU) by percentage. Higher value may indicate about overload on the cloud due to high performance. | Read more |
Free Storage Space | The free storage space in GB out of the allocated storage. | |
Data I/O | The amount of data that is readen/written from storage in percentage. | |
Log Write | The process of writing the contents of the transaction log in percentage. | |
CPU | The overall percentage of time the CPU spends executing non-idle tasks. | |
Max Worker | The percentage of worker threads that can be used (out of maximum possible) to process concurrent queries and tasks within the database engine. | Read more |
— MSSQL on Azure
Metric | Description | Investigate this alert |
Version | The SQL version, installed on the server | |
Instance | The SQL server instance name given in the installation. | |
Test connection | A check of the time to establish a connection to the SQL server in milliseconds. A high value indicates that there are network communication problems or a load on the SQL server. | Read more |
Last Restart | SQL server last restart | |
Collation | In SQL Server, a collation is a set of rules that determine how data is sorted and compared, for string based operations. SQL collations allow database administrators to define the appropriate rules for sorting and comparing strings based on the specific language and cultural context of the data being stored. | |
Edition | The installed SQL Server edition. There are numerous editions, and each edition has two runtimes – 32bit or 64bit—Ex: Express, Developer, Enterprise, etc. | |
SP | The Service Pack, which includes cumulative updates of all the fixes and improvements from previous service packs and cumulative updates for a specific version of SQL Server. | |
Page life expectancy | The time SQL keeps the retrieved information from the server’s physical memory in seconds. Low values indicate that the SQL is exchanging the information that arrives from the physical memory at a high frequency and needs more physical memory in order to perform faster. | Read more |
User Connections | A connection established between a client application and a database server using SQL credentials is considered a single user connection. A large number may indicate a load on the system, a fault, or a security error. | Read more |
Connection reuse/sec | The total number of logins started from the connection pool per second. Apps tend to open and close connections repeatedly – this value indicates the amount of the connections’ reuse. | |
Batch requests/sec | The number of updates, retrievals, deletions, or saving operations in the SQL per second. This metric enables the user to detect abnormalities in the operations amount on the SQL server. | Read more |
Buffer cache hit ratio | The percentage of memory requests that are satisfied from the cache (physical memory of the SQL server). Values below 90% indicate multiple reads/writes from/to the main memory or disk storage. You should investigate whether there is a high physical memory consumption by different programs or processes and consider the need to add physical memory to the SQL server. | Read more |
Page reads/sec | The amount of Page reads (each page weighs 8Kb) from the disk per second. Many reads indicate that we should examine the SQL server’s integrity, indexing, and system query logic. | |
Page writes/sec | The amount of Page writes (each page weighs 8Kb) to the disk per second. Many writes indicate that we should examine the SQL server’s integrity, indexing, and system query logic. | |
SP Compilation | The number of times the SQL compiles the running programs of the queries per second. A large amount of program compilation along with a small number of Batch requests indicates large usage of direct queries, p_executesql, and no procedures with determined variables. | |
SP Re Compilation | The number of times the SQL recompiles the running programs of the queries per second. A large amount of program recompilation, combined with a small number of Batch requests, indicates that the request retrieves have grown, a statistical update has been performed, or the indexing has been recompiled. We should investigate the amount of information and whether or not the other operations have been performed. | |
Page Lookups | The number of times SQL seeks pages (each page size is 8Kb) from the physical memory. (Page lookups/sec) / (Batch requests/sec) greater than 100 indicate that some queries are not running optimally. | |
Latches Times | The duration in seconds for which a thread holds exclusive access (“latch”) to a shared resource (for ex. “latched table”). A high amount of latches causes slowness in data reception from the latched tables. We should investigate a change in the Update or Deletion method. | |
Page Splits/sec | The number of pages per second splitting for allocation purposes in the event that the index does not have space at the frequency of a second. An amount higher than 20 per second requires a check of the index specifications. | |
Checkpoint Pages/sec | The data pages that are written per second to disk during a checkpoint operation. A checkpoint is a process in which the SQL Server ensures that all modified data pages in memory are flushed and written to disk to maintain data consistency and durability. | |
DB IO/sec | The amount of reads and writes of the entire database per second | |
Target Memory | The target RAM memory limit that the SQL Server is allowed to consume and utilize for its internal operations. | |
Memory | The amount of memory SQL Server is utilizing in MB. If SQL is not using the maximum memory amount specified, we should consider lowering this amount. | |
Memory Details | A description (cake) of the division of the physical memory usage of the SQL Server for the database, internal needs, and free memory in MB | |
DB Memory | The memory used by the SQL Server instance to cache data and other objects related to specific databases. | |
Free Memory | The amount of physical memory not utilized by SQL Server in MB. A high value may indicate that the assigned memory to the SQL Server can be reduced. | |
Internal Memory | The amount of physical memory which the SQL Server is utilizing for internal operations, not including operations for the database, in MB. For example: buffer pool, execution plans, system tables, procedures cache, and management. | Read more |
Memory (min) | The minimum amount of assigned physical memory which the SQL can use in MB. | |
Memory(max) | The maximum amount of assigned physical memory which the SQL can use in MB. | |
Temp table creation/sec | Amount of temp table creations per second. | |
Uptime | How long the SQL server management studio has been up and running. It’s recommended to have as high uptime value as possible on high-traffic instances. | |
Cluster active name | The name of the active cluster in clustered instances. | |
Cluster nodes down | The amount of cluster nodes that are down. The server may be one of the cluster nodes. | |
Transactions/sec | Number of open transactions per second. Values higher than usual can cause system slowness. | Read more |
Lazy writes/sec | Measures the process of flushing modified data pages from memory (buffer cache) to disk (data files) per second. By deferring the immediate disk writes and batching them together, lazy writes help improve system performance. It reduces the frequency of disk I/O operations, minimizes disk access latency, and allows the system to perform multiple writes in a more efficient manner. A high value indicates that the SQL server needs more memory and can affect other OS resources, such as disk IO and CPU usage. | |
Index Full scan/sec | Amount of indexes that were scanned per second. This is an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has a “NOT FULL” constraint | |
Index page splits/sec | Amount of indexes page split for second, affected by fragmentation. Page split describes a situation when there’s no dedicated space for updating/inserting value to the table, the split is to free space for the command to complete. | Logins/sec | The number of logins to the SQL server per second. A high value may refer to security or application problems. |
Logouts/sec | The number of logouts to the SQL server per second. A high value may refer to connection problems. | |
Core available | Total cores number in the server. | |
Core in use | The number of cores that are assigned for SQL Server use. The recommendation is that the SQL Server will use all cores. | |
Session Memory wait | The number of sessions that are waiting for free memory. Those queries don’t have enough RAM memory to start running, so they are “delayed.” | |
Create temp table/ variables | The number of created temporary tables/ variables available. A high value can indicate unnecessary open connections. | |
TempDB free space | Tempdb database unused data space in KB. A high value may indicate unusual data growth. | |
Session avg. wait for signal | The average wait time in mili-seconds that the SQL Server reports that he’s in a wait. The threshold leans on past activity and behavior. When the value is higher than average, it can cause SQL Server slow performance. | Read more |
Session CPU wait | The number of queries that SQL Server reports as waiting for CPU availability. The threshold leans on past activity and behavior. When the value is higher than average, it’s recommended to investigate and look for those queries. | Read more |
Currently Active | The number of queries that are currently running (status is running; for oracle, status is active) | |
Currently Blocked | The number of queries that are currently blocked (status is suspended) | Read more |
Currently Sleeping | The number of queries that are currently sleeping. A query that has been executed, and its results have been returned to the client application, but the connection to the database is still open and waiting for further instructions or actions from the client. | |
Currently Background | The number of queries that are running on the background. This separates from the main execution flow of a program or application in order to prevent blockings. | |
Currently Open Transactions | The number of queries that have open transactions at the moment. | Read more |
Currently Killed | The number of queries that were killed at the last minute. | Read more |
Currently Avg Duration/sec | Measures the average time taken to execute a single query or command in seconds. | |
Number of Queries 0-9.99 | A count of queries running up to 10 seconds. | Read more |
Number of Queries 10-19.99 | A count of queries running between 10 to 20 seconds. | Read more |
Number of Queries 20-29.99 | A count of queries running between 20 to 30 seconds. | Read more |
Number of Queries 30-59.99 | A count of queries running between 30 to 60 seconds. | Read more |
Number of Queries over 60 | A count of queries running over 60 seconds. | Read more |
Subscriber High latency | The time it takes for changes made at the publisher to be replicated to the subscriber. | Read more |
Distributor High latency | The time it takes for transactional changes generated at the publisher to be delivered to the distributor for further replication processing. | Read more |
LogReader High latency | The time it takes for the LogReader agent to read the transaction log from the publisher and deliver the changes to the distributor for replication. | Read more |
— DB on Azure MSSQL
Metric | Description | Investigate this alert |
Status | Database Status: ● Online – the database is available ● Offline – the database is not in use ● Mirror Disconnect – the sync is disconnected. ● Mirror Principal – the principal sync of all updating of the database. ● Mirror – the database is synchronized. ● Restoring – the database is currently being restored ● Suspect – the database is defective | Read more |
Instance | The SQL server instance name given in the installation. | |
Database | The Database name | |
Recovery | The recovery model determines the possible restore options specified for the database. It defines how the database transaction logs are managed and which data type can be recovered in case of a failure. | |
Full Backup | The date of the last Full Backup performed on the database. The Full Backup documents are .bak files or snapshots. A Full Backup once a day is the general recommendation. | Read more |
Log Backup | The date of the last Log changes backup performed on the database. The Log Backup documents are .trn files A log backup once an hour is the general recommendation but if the recovery model is “simple” this value should be null. | Read more |
Memory | The amount of memory the database is taking up in the physical memory in MB. | |
Size | A description (cake) of the distribution of data and log file sizes occupying the disk storage, measured in MB. It is not recommended that the log takes up more than 60% of the database size. We should investigate the process integrity of this database, such as transactions (containing recursion) and backups. Bandwidth relates to the speed at which data can be transmitted between devices. A higher bandwidth allows for faster and more extensive data. | |
Disk IO/sec | Amount of disk reads and writes per second. Usually, the main or biggest DBs will have a high value. Higher values than usual can indicate a performance problem of queries causing other queries to wait for free IO. | |
Data Growth | The rate of information growth in the database on the disk storage in MB, which includes all the filegroups that contain the primary data file (.mdf). A lack of space in the disk storage may indicate substantial data growth in the database. | Read more |
Log Growth | The rate of log growth in the database on the disk storage in MB. A lack of space in the disk storage may indicate substantial log growth in the database. | Read more |
In-Memory | Known as In-Memory OLTP, a feature in SQL Server that leverages memory-resident tables and natively compiled stored procedures for a better performance in specific transactions. | |
Unused data space | Free space of the DB – data that is not in use. A value higher than 50% indicates that a shrink should be considered for the data growth. At least 10% of the DB unused space for indexes and more is recommended. | |
Collation | The language and the manner of string comparison defined for the database. | |
Page Verify | Page Verify is a database option that defines the SQL Server mechanism of verifying page consistency when the page is written to disk and when it is read again from disk. The recommendation is CHECKSUM. | |
DBCC last success | Last successful Database check. Checks the database’s integrity, tables, indexes, schema, etc. Running this test on a daily basis is very important for the proper functioning of the organization with the databases. The test runs on both a physical level and a logical level. | |
Compatibility | The Compiler version at the Database level. | |
Diff Backup | Date of the last differential backup performed for the Database. In general, it’s recommended once a day but it depends on the full backup frequency. | Read more |
Transactions | The number of transaction operations UPDATE, INSERT, DELETE, BEGIN TRAN executed per second. A high value (above average) may be the reason for slowness or log growth issues. | |
Log Flush | The time it takes to save the log found in the physical memory to the disk storage. High values affect Transaction operations, Update, and saving to SQL times causing slowness. | |
File stream Growth | File streams use of storage volume. File streams enables the storage of large amount of data (more than data 2GB storage) such as large documents, images or files. High values may cause storage problem to the data drive. | |
File stream Drive | The file stream’s drive. | |
IO | The amount of read and write operations from the disk storage at the sampled time. A high value can cause slowness as a result of a load on the disk storage | |
Log size | The size assigned to the log files of the database in MB. | |
Log Use | The size of the log used in MB | |
Log Flush | The process of writing the contents of the transaction log buffer to the physical transaction log file on the disk, measured in milliseconds. Higher time may increase the chance for data loss. | |
Log Reuse Wait | A condition where the transaction log of a database is unable to reuse or truncate log space for reuse. NOTHING is a good value for it. REPLICATION is for a database in replication program. | |
Creation Date | The Database creation date. | |
Data Files | The number of .mdf files the database contains (filegroups). | |
Log Files | The number of .ldf files the database contains. | |
Disk Log IO/sec | The number of logs input and output from the disk storage per second. | |
Data Read IO/sec | The number of reads from the disk storage per second. | |
Data Write IO/sec | The number of writes from the disk storage per second. | |
Open transactions | The number of open transactions per second. A high number of open transactions can cause log oversize. | Read more |
Log transactions | The log amount in MB while there’s an open transaction. An increased log growth can be caused when transactions don’t clean themselves while running. | |
Transaction Duration | The duration of the transaction. | |
Alwayson State | The DB’s AlwaysOn state- can be synchronized or not synchronized. Not synchronized means that there’s a problem with the AlwaysOn. | Read more |
AlwaysOn Status | AlwaysOn Status- may be healthy or not healthy. Not healthy means that there’s a problem with the AlwaysOn. | Read more |
AlwaysOn graph | If it’s active- the value is 1. If it is not active -the value is 0. | |
AlwaysOn Log records not committed at Secondary | Amount of the AlwaysOn logs that couldn’t be committed yet from the primary to the secondary server. When these graphs are active, this is the secondary group.; if there’s no data- it’s the primary group. | |
AlwaysOn Log records waiting to send to Secondary | Amount of the AlwaysOn logs waiting to be sent to the secondary server (from the primary server). When these graphs are active, this is the secondary group.; if there’s no data- it’s the primary group. | |
AlwaysOn is Primary | 0 for secondary server databases, 1 for primary server databases | |
AlwaysOn group name | The group that a DB in AlwaysON is related to. The group can hold several databases in the enterprise edition. | |
Mirror Status | The DB’s Mirror status- can be synchronized or not synchronized. Not synchronized means that there’s a problem with the Mirror. | Read more |
Mirror Status Graph | If it’s active- the value is 1. If not-the value is 0. | |
Mirror Mode | The DB’s mirror mode on the primary server is “principal,” and on the secondary server is “mirror.” | Read more |
Mirror Log records not committed at Secondary | Amount of the Mirror logs that couldn’t be committed yet from the primary to the secondary server. | Read more |
Mirror Log records waiting to send to Secondary | Amount of the Mirror logs waiting to be sent to the secondary server (from the primary server). | |
Data Drive | The drive where the data files of the DB are located. It’s recommended that there’ll be a separation between the data, logs, and tempdb files. | |
Log Drive | The drive where the log files of the DB are located. It’s recommended that there’ll be a separation between the data, logs, and tempdb files. |
— Wait Stats on Azure MSSQL
Metric | Description | Investigate this alert |
Wait Type | The SQL Server wait type name. | |
Wait (%) | The percentage of the wait time compared to other waits. If the value is higher than average, there’s a wait for the specific resource, which can be caused by a specific delayed/long-running query/ies. | |
Avg Wait (ms) | The average wait stats in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Wait (ms) | The wait time in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Tasks | The number of tasks waiting for the wait type at the moment. |
Wait Type | Description |
Azure Limit | Waits commonly seen in Azure SQL Database environments, particularly in scenarios where resources are limited or throttling is in place to manage log generation rates. Common wait types included: LOG_RATE_GOVERNOR. |
Change Tracking | Waits related to modifications on tables where Change Tracking has been enabled. Common wait types included: COMMIT_TABLE. |
CPU | Waits related to a situation where a query or process waits for CPU resources to become available. Common wait types included: SOS_SCHEDULER_YIELD. |
DB Log | Waits related to the logging operations that SQL Server performs to ensure transaction durability (ACID compliance). Common wait types included: WRITELOG, LOGMGR_FLUSH, LOGBUFFER. |
DB Maintenance | Waits related to backup operations performed as a maintenance task. Common wait types included: BACKUPIO, BACKUPTHREAD. |
DB Mirror/Snapshot | Waits related to mirror or snapshot operations. Common wait types included: FCB_REPLICA_READ. |
Full Text | Happens when SQL Server is performing full-text indexing operations, such as a query involving a full-text search or updates to full-text indexes. Common wait types included: FT_IFTS_RWLOCK. |
I/O Complete | Waits related to disk I/O operations caused by disk bottlenecks or slow storage subsystems. Common wait types included: PAGEIOLATCH_EX, PAGEIOLATCH_SH, PAGELATCH_EX. |
Latch | Lightweight synchronization waits for in-memory data structures caused by buffer pool contention or heavy I/O operations. Common wait types included: LATCH_EX, LATCH_SH. |
Lock | Waits caused by contention over database objects – high transaction concurrency or blocking. Common wait types included: LCK_M_IX, LCK_M_S, LCK_M_IS. |
Low Memory | Specific waits triggered by low memory conditions during query compilation or execution. |
Memory | Waits involving memory allocation or resource semaphore management. Common wait types included: MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT, RESOURCE_SEMAPHORE. |
Network I/O | Waits involving network communication for sending/receiving data. Common wait type included: ASYNC_NETWORK_IO. |
Parallelism | Waits caused by query parallelism, where multiple threads synchronize tasks. Common wait types included: CXPACKET, CXSYNC_PORT, EXECSYNC, CXCONSUMER. |
Remote Transaction | Waits involving distributed transactions across servers or slow network communication. Common wait types included: OLEDB,DTC. |
SQL CLR/Extended SP | Waits related to the SQL CLR (Common Language Runtime) or extended stored procedures. |
SQL Internal | Internal system waits used by SQL Server for background or maintenance tasks. Common wait types included: MISCELLANEOUS, SLEEP_TASK. |
SQL Trace | Waits related to SQL Server tracing mechanisms, such as Extended Events or SQL Profiler. Common wait types included: SQLTRACE_FILE_BUFFER. |
Thread Pool | Occurs when SQL Server cannot allocate worker threads to execute queries due to thread pool exhaustion. |
Transaction | Waits associated with transaction management, such as locks, log writing, or dependency resolution. Common wait types included: TRANSACTION_MUTEX. |
XTP In-Memory | These waits occur during operations involving memory-optimized tables and natively compiled stored procedures. Common wait types included: WAIT_XTP_RECOVERY. |
-Amazon RDS
— Service
Metric | Description | Investigate this alert |
Cloud | The Database name given in the Aimbetter configuration. | |
CPU | The overall percentage of time the CPU spends executing non-idle tasks. | |
Memory Free | The amount of system memory currently available. | |
Free Storage Space | The free storage space in GB out of allocated storage. | |
Availability Zone | The datacenter location within a specific AWS Region. | |
Receive Byte/sec | The incoming data transfer rate in bytes per second. | |
Send Byte/sec | The outgoing data transfer rate in bytes per second. | |
Read IOPS | The amount of read operations performed on a specific storage volume or disk during one-second interval. | |
Write IOPS | The amount of write operations performed on a specific storage volume or disk during one-second interval. | |
Read Latency | The response time to a read request. | |
Read Latency | The response time to a write request. | |
Availability Zone | The location where the Amazon cloud computing resources are hosted for this Database. | |
Allocated Storage | The storage, in gibibytes, that is allocated for the DB instance. | |
DB Instance Class | Determines the computation and memory capacity of an Amazon RDS DB instance. | |
Storage Encrypted | On a database instance running with Amazon RDS encryption, data stored is encrypted, as are its automated backups, read replicas, and snapshots. | |
Storage Type | Amazon RDS provides three storage types: General Purpose SSD (also known as gp2 and gp3), Provisioned IOPS SSD (also known as io1), and magnetic (also known as standard). |
— MSSQL on Amazon RDS
Metric | Description | Investigate this alert |
Version | The SQL version, installed on the server | |
Instance | The SQL server instance name given in the installation. | |
Test connection | A check of the time to establish a connection to the SQL server in milliseconds. A high value indicates that there are network communication problems or a load on the SQL server. | Read more |
Last Restart | SQL server last restart | |
Collation | In SQL Server, a collation is a set of rules that determine how data is sorted and compared, for string based operations. SQL collations allow database administrators to define the appropriate rules for sorting and comparing strings based on the specific language and cultural context of the data being stored. | |
Edition | The installed SQL Server edition. There are numerous editions, and each edition has two runtimes – 32bit or 64bit—Ex: Express, Developer, Enterprise, etc. | |
SP | The Service Pack, which includes cumulative updates of all the fixes and improvements from previous service packs and cumulative updates for a specific version of SQL Server. | |
Page life expectancy | The time SQL keeps the retrieved information from the server’s physical memory in seconds. Low values indicate that the SQL is exchanging the information that arrives from the physical memory at a high frequency and needs more physical memory in order to perform faster. | Read more |
User Connections | A connection established between a client application and a database server using SQL credentials is considered a single user connection. A large number may indicate a load on the system, a fault, or a security error. | Read more |
Connection reuse/sec | The total number of logins started from the connection pool per second. Apps tend to open and close connections repeatedly – this value indicates the amount of the connections’ reuse. | |
Batch requests/sec | The number of updates, retrievals, deletions, or saving operations in the SQL per second. This metric enables the user to detect abnormalities in the operations amount on the SQL server. | Read more |
Buffer cache hit ratio | The percentage of memory requests that are satisfied from the cache (physical memory of the SQL server). Values below 90% indicate multiple reads/writes from/to the main memory or disk storage. You should investigate whether there is a high physical memory consumption by different programs or processes and consider the need to add physical memory to the SQL server. | Read more |
Page reads/sec | The amount of Page reads (each page weighs 8Kb) from the disk per second. Many reads indicate that we should examine the SQL server’s integrity, indexing, and system query logic. | |
Page writes/sec | The amount of Page writes (each page weighs 8Kb) to the disk per second. Many writes indicate that we should examine the SQL server’s integrity, indexing, and system query logic. | |
SP Compilation | The number of times the SQL compiles the running programs of the queries per second. A large amount of program compilation along with a small number of Batch requests indicates large usage of direct queries, p_executesql, and no procedures with determined variables. | |
SP Re Compilation | The number of times the SQL recompiles the running programs of the queries per second. A large amount of program recompilation, combined with a small number of Batch requests, indicates that the request retrieves have grown, a statistical update has been performed, or the indexing has been recompiled. We should investigate the amount of information and whether or not the other operations have been performed. | |
Page Lookups | The number of times SQL seeks pages (each page size is 8Kb) from the physical memory. (Page lookups/sec) / (Batch requests/sec) greater than 100 indicate that some queries are not running optimally. | |
Latches Times | The duration in seconds for which a thread holds exclusive access (“latch”) to a shared resource (for ex. “latched table”). A high amount of latches causes slowness in data reception from the latched tables. We should investigate a change in the Update or Deletion method. | |
Page Splits/sec | The number of pages per second splitting for allocation purposes in the event that the index does not have space at the frequency of a second. An amount higher than 20 per second requires a check of the index specifications. | |
Checkpoint Pages/sec | The data pages that are written per second to disk during a checkpoint operation. A checkpoint is a process in which the SQL Server ensures that all modified data pages in memory are flushed and written to disk to maintain data consistency and durability. | |
DB IO/sec | The amount of reads and writes of the entire database per second | |
Target Memory | The target RAM memory limit that the SQL Server is allowed to consume and utilize for its internal operations. | |
Memory | The amount of memory SQL Server is utilizing in MB. If SQL is not using the maximum memory amount specified, we should consider lowering this amount. | |
Memory Details | A description (cake) of the division of the physical memory usage of the SQL Server for the database, internal needs, and free memory in MB | |
DB Memory | The memory used by the SQL Server instance to cache data and other objects related to specific databases. | |
Free Memory | The amount of physical memory not utilized by SQL Server in MB. A high value may indicate that the assigned memory to the SQL Server can be reduced. | |
Internal Memory | The amount of physical memory which the SQL Server is utilizing for internal operations, not including operations for the database, in MB. For example: buffer pool, execution plans, system tables, procedures cache, and management. | Read more |
Memory (min) | The minimum amount of assigned physical memory which the SQL can use in MB. | |
Memory(max) | The maximum amount of assigned physical memory which the SQL can use in MB. | |
Temp table creation/sec | Amount of temp table creations per second. | |
Uptime | How long the SQL server management studio has been up and running. It’s recommended to have as high uptime value as possible on high-traffic instances. | |
Cluster active name | The name of the active cluster in clustered instances. | |
Cluster nodes down | The amount of cluster nodes that are down. The server may be one of the cluster nodes. | |
Transactions/sec | Number of open transactions per second. Values higher than usual can cause system slowness. | Read more |
Lazy writes/sec | Measures the process of flushing modified data pages from memory (buffer cache) to disk (data files) per second. By deferring the immediate disk writes and batching them together, lazy writes help improve system performance. It reduces the frequency of disk I/O operations, minimizes disk access latency, and allows the system to perform multiple writes in a more efficient manner. A high value indicates that the SQL server needs more memory and can affect other OS resources, such as disk IO and CPU usage. | |
Index Full scan/sec | Amount of indexes that were scanned per second. This is an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has a “NOT FULL” constraint | |
Index page splits/sec | Amount of indexes page split for second, affected by fragmentation. Page split describes a situation when there’s no dedicated space for updating/inserting value to the table, the split is to free space for the command to complete. | Logins/sec | The number of logins to the SQL server per second. A high value may refer to security or application problems. |
Logouts/sec | The number of logouts to the SQL server per second. A high value may refer to connection problems. | |
Core available | Total cores number in the server. | |
Core in use | The number of cores that are assigned for SQL Server use. The recommendation is that the SQL Server will use all cores. | |
Session Memory wait | The number of sessions that are waiting for free memory. Those queries don’t have enough RAM memory to start running, so they are “delayed.” | |
Create temp table/ variables | The number of created temporary tables/ variables available. A high value can indicate unnecessary open connections. | |
TempDB free space | Tempdb database unused data space in KB. A high value may indicate unusual data growth. | |
Session avg. wait for signal | The average wait time in mili-seconds that the SQL Server reports that he’s in a wait. The threshold leans on past activity and behavior. When the value is higher than average, it can cause SQL Server slow performance. | Read more |
Session CPU wait | The number of queries that SQL Server reports as waiting for CPU availability. The threshold leans on past activity and behavior. When the value is higher than average, it’s recommended to investigate and look for those queries. | Read more |
Currently Active | The number of queries that are currently running (status is running; for oracle, status is active) | |
Currently Blocked | The number of queries that are currently blocked (status is suspended) | Read more |
Currently Sleeping | The number of queries that are currently sleeping. A query that has been executed, and its results have been returned to the client application, but the connection to the database is still open and waiting for further instructions or actions from the client. | |
Currently Background | The number of queries that are running on the background. This separates from the main execution flow of a program or application in order to prevent blockings. | |
Currently Open Transactions | The number of queries that have open transactions at the moment. | Read more |
Currently Killed | The number of queries that were killed at the last minute. | Read more |
Currently Avg Duration/sec | Measures the average time taken to execute a single query or command in seconds. | |
Number of Queries 0-9.99 | A count of queries running up to 10 seconds. | Read more |
Number of Queries 10-19.99 | A count of queries running between 10 to 20 seconds. | Read more |
Number of Queries 20-29.99 | A count of queries running between 20 to 30 seconds. | Read more |
Number of Queries 30-59.99 | A count of queries running between 30 to 60 seconds. | Read more |
Number of Queries over 60 | A count of queries running over 60 seconds. | Read more |
Subscriber High latency | The time it takes for changes made at the publisher to be replicated to the subscriber. | Read more |
Distributor High latency | The time it takes for transactional changes generated at the publisher to be delivered to the distributor for further replication processing. | Read more |
LogReader High latency | The time it takes for the LogReader agent to read the transaction log from the publisher and deliver the changes to the distributor for replication. | Read more |
— Oracle on Amazon RDS
Metric | Description | Investigate this alert |
Database | The Database name. | |
Edition | The Database edition. | |
32/64 | The Database runtime – 32bit or 64bit. | |
Version | The Database version. | |
Log Mode | Refers to the Database redo logs management, used for data integrity or recovery in a case of disaster. There are several types: ARCHIVELOG Mode- allows to create backups that capture changes made to the database since the last backup NONARCHIVELOG Mode- limits the ability to perform point-in-time recovery since archived redo logs are not available FORCE LOGGING Mode- all data changes made to the database are logged to the redo log files, even for operations that would not typically generate redo logs | |
National Language (NLS) | Refers to a set of features and settings that allow Oracle Database to handle multiple languages. | |
Patch Level | The version number of the Oracle software and the cumulative updates and releases. | |
Last Restart | Last restart in the format date:hour:minute | |
Test Connection | A check of the time to establish a connection to the Database in milliseconds. A high value indicates that there are network communication problems or a load on the Oracle Database. | Read more |
Session Limit | The utlized number of user sessions connected to the database at the moment, out of the maximum sessions allowed, in percentage. | |
Session (Max) | The maximum number of concurrent user sessions allowed to connect to the database. Each user session represents a connection with the database. | |
Processes (Max) | The maximum number of concurrent user processes allowed to connect to the database. | |
Default block size | The standard size of a data block used for storing data and managing database objects within the database’s data files. As of Oracle Database 12c, the default block size is typically 8192 bytes (8.19 KB) for a general-purpose database. | |
Open Cursors (max) | The maximum possible open cursors in the database. It is a programmatic handle or pointer used by the database to access or process the results of queries or DML statements. It is essential for developers to explicitly close cursors after they are no longer needed. | |
DR Last Sync Date | The last synchronization date and time for a Data Guard configuration (high-availability and disaster recovery solution that allows to maintain standby databases synchronized with the primary database). | Read more |
Physical Reads | The amount of reads of the entire database measured in blocks as defind in “Default Block Size”(defauly is 8KB). | |
Physical Writes | The amount of writes of the entire database measured in blocks as defind in “Default Block Size”(defauly is 8KB). | |
DR Full Backup | The last date and time of a full backup taken from the primary database and used to initialize or restore a standby database in a Data Guard configuration. | Read more |
Archive Log Backup | The last date and time of a backup operation that specifically targets the archived redo logs. | Read more |
CTL SP File Backup | The last date and time of a backup of the “control file and server parameter (SP) file.” It helps to ensure the recoverability of the database in case of disasters, media failures, or user errors. | Read more |
Avg. Threads | The average amount of concurrent executions of multiple tasks or processes, such as: Operating System threads, Java threads, Database sessions, or parallel query executions. Calculated as the amount of active sessions / CPU cores (in percentage). | |
Database CPU Time | The remaining CPU time in percentage to the execution of SQL statements by the Oracle database processes and other database-related operations. Higher values mean less waits for CPU improving performance. Best practices include minimum scans and hold-ons in queries executions. | |
Buffer Cache Hit | The percentage of a requested data block found in the database buffer cache, thereby avoiding the need to read the block from disk. | Read more |
PGA Cache Hit | The percentage of times process data requests are found in the Global Area (PGA) cache allocation, without a need for additional memory or read from disk. The higher this value, the more efficient this database is. | Read more |
Deadlocks | The amount of deadlocks in the database server. | Read more |
Invalid Object | The amount of database objects that are currently in an invalid state. | |
Redo Entries (rows update) | The amount of records that capture changes made to the database, related to redo log. When this value is higher than usual, it may indicate a possible cause for slowness in performance. | |
Query Execute | The amount of queries executed at the moment. | |
Avg. Sessions | The average amount of sessions at the moment. | |
Avg. Active | The average amount of active sessions at the moment. | |
Avg. Blocking | The average amount of blocking sessions at the moment. | Read more |
Avg. Sleeping Blocking | The average amount of both sleeping and blocking sessions at the moment. | |
Avg. Blocked | The average amount of blocked sessions at the moment. | Read more |
Avg. Open Transaction | The average amount of open transactions at the moment. | Read more |
Avg. Sleeping | The average amount of sleeping sessions at the moment. A query that has been executed, and its result has been returned to the client application, but the connection to the database is still open and waiting for further instructions or actions from the client. | |
Avg. Background | The average amount of background sessions at the moment. This separates from the main execution flow of a program or application in order to prevent blockings. | |
Avg. Duration | A count of the average duration in seconds of all queries running at the moment in different sessions. | |
Number of Queries 0-9.99 | A count of queries running up to 10 seconds. | Read more |
Number of Queries 10-19.99 | A count of queries running between 10 to 20 seconds. | Read more |
Number of Queries 20-29.99 | A count of queries running between 20 to 30 seconds. | Read more |
Number of Queries 30-59.99 | A count of queries running between 30 to 60 seconds. | Read more |
Number of Queries over 60 | A count of queries running over 60 seconds. | Read more |
Archive Logs Retention | The retention period for archived redo log files. | |
Log Switch | The count of log file switch completion which is when the database switches from writing redo log entries from one redo log group (also known as a redo log file) to another. |
— DB on Amazon RDS MSSQL
Metric | Description | Investigate this alert |
Status | Database Status: ● Online – the database is available ● Offline – the database is not in use ● Mirror Disconnect – the sync is disconnected. ● Mirror Principal – the principal sync of all updating of the database. ● Mirror – the database is synchronized. ● Restoring – the database is currently being restored ● Suspect – the database is defective | Read more |
Instance | The SQL server instance name given in the installation. | |
Database | The Database name | |
Recovery | The recovery model determines the possible restore options specified for the database. It defines how the database transaction logs are managed and which data type can be recovered in case of a failure. | |
Full Backup | The date of the last Full Backup performed on the database. The Full Backup documents are .bak files or snapshots. A Full Backup once a day is the general recommendation. | Read more |
Log Backup | The date of the last Log changes backup performed on the database. The Log Backup documents are .trn files A log backup once an hour is the general recommendation but if the recovery model is “simple” this value should be null. | Read more |
Memory | The amount of memory the database is taking up in the physical memory in MB. | |
Size | A description (cake) of the distribution of data and log file sizes occupying the disk storage, measured in MB. It is not recommended that the log takes up more than 60% of the database size. We should investigate the process integrity of this database, such as transactions (containing recursion) and backups. Bandwidth relates to the speed at which data can be transmitted between devices. A higher bandwidth allows for faster and more extensive data. | |
Disk IO/sec | Amount of disk reads and writes per second. Usually, the main or biggest DBs will have a high value. Higher values than usual can indicate a performance problem of queries causing other queries to wait for free IO. | |
Data Growth | The rate of information growth in the database on the disk storage in MB, which includes all the filegroups that contain the primary data file (.mdf). A lack of space in the disk storage may indicate substantial data growth in the database. | Read more |
Log Growth | The rate of log growth in the database on the disk storage in MB. A lack of space in the disk storage may indicate substantial log growth in the database. | Read more |
In-Memory | Known as In-Memory OLTP, a feature in SQL Server that leverages memory-resident tables and natively compiled stored procedures for a better performance in specific transactions. | |
Unused data space | Free space of the DB – data that is not in use. A value higher than 50% indicates that a shrink should be considered for the data growth. At least 10% of the DB unused space for indexes and more is recommended. | |
Collation | The language and the manner of string comparison defined for the database. | |
Page Verify | Page Verify is a database option that defines the SQL Server mechanism of verifying page consistency when the page is written to disk and when it is read again from disk. The recommendation is CHECKSUM. | |
DBCC last success | Last successful Database check. Checks the database’s integrity, tables, indexes, schema, etc. Running this test on a daily basis is very important for the proper functioning of the organization with the databases. The test runs on both a physical level and a logical level. | |
Compatibility | The Compiler version at the Database level. | |
Diff Backup | Date of the last differential backup performed for the Database. In general, it’s recommended once a day but it depends on the full backup frequency. | Read more |
Transactions | The number of transaction operations UPDATE, INSERT, DELETE, BEGIN TRAN executed per second. A high value (above average) may be the reason for slowness or log growth issues. | |
Log Flush | The time it takes to save the log found in the physical memory to the disk storage. High values affect Transaction operations, Update, and saving to SQL times causing slowness. | |
File stream Growth | File streams use of storage volume. File streams enables the storage of large amount of data (more than data 2GB storage) such as large documents, images or files. High values may cause storage problem to the data drive. | |
File stream Drive | The file stream’s drive. | |
IO | The amount of read and write operations from the disk storage at the sampled time. A high value can cause slowness as a result of a load on the disk storage | |
Log size | The size assigned to the log files of the database in MB. | |
Log Use | The size of the log used in MB | |
Log Flush | The process of writing the contents of the transaction log buffer to the physical transaction log file on the disk, measured in milliseconds. Higher time may increase the chance for data loss. | |
Log Reuse Wait | A condition where the transaction log of a database is unable to reuse or truncate log space for reuse. NOTHING is a good value for it. REPLICATION is for a database in replication program. | |
Creation Date | The Database creation date. | |
Data Files | The number of .mdf files the database contains (filegroups). | |
Log Files | The number of .ldf files the database contains. | |
Disk Log IO/sec | The number of logs input and output from the disk storage per second. | |
Data Read IO/sec | The number of reads from the disk storage per second. | |
Data Write IO/sec | The number of writes from the disk storage per second. | |
Open transactions | The number of open transactions per second. A high number of open transactions can cause log oversize. | Read more |
Log transactions | The log amount in MB while there’s an open transaction. An increased log growth can be caused when transactions don’t clean themselves while running. | |
Transaction Duration | The duration of the transaction. | |
Alwayson State | The DB’s AlwaysOn state- can be synchronized or not synchronized. Not synchronized means that there’s a problem with the AlwaysOn. | Read more |
AlwaysOn Status | AlwaysOn Status- may be healthy or not healthy. Not healthy means that there’s a problem with the AlwaysOn. | Read more |
AlwaysOn graph | If it’s active- the value is 1. If it is not active -the value is 0. | |
AlwaysOn Log records not committed at Secondary | Amount of the AlwaysOn logs that couldn’t be committed yet from the primary to the secondary server. When these graphs are active, this is the secondary group.; if there’s no data- it’s the primary group. | |
AlwaysOn Log records waiting to send to Secondary | Amount of the AlwaysOn logs waiting to be sent to the secondary server (from the primary server). When these graphs are active, this is the secondary group.; if there’s no data- it’s the primary group. | |
AlwaysOn is Primary | 0 for secondary server databases, 1 for primary server databases | |
AlwaysOn group name | The group that a DB in AlwaysON is related to. The group can hold several databases in the enterprise edition. | |
Mirror Status | The DB’s Mirror status- can be synchronized or not synchronized. Not synchronized means that there’s a problem with the Mirror. | Read more |
Mirror Status Graph | If it’s active- the value is 1. If not-the value is 0. | |
Mirror Mode | The DB’s mirror mode on the primary server is “principal,” and on the secondary server is “mirror.” | Read more |
Mirror Log records not committed at Secondary | Amount of the Mirror logs that couldn’t be committed yet from the primary to the secondary server. | Read more |
Mirror Log records waiting to send to Secondary | Amount of the Mirror logs waiting to be sent to the secondary server (from the primary server). | |
Data Drive | The drive where the data files of the DB are located. It’s recommended that there’ll be a separation between the data, logs, and tempdb files. | |
Log Drive | The drive where the log files of the DB are located. It’s recommended that there’ll be a separation between the data, logs, and tempdb files. |
— Wait Stats on Amazon RDS MSSQL
Metric | Description | Investigate this alert |
Wait Type | The SQL Server wait type name. | |
Wait (%) | The percentage of the wait time compared to other waits. If the value is higher than average, there’s a wait for the specific resource, which can be caused by a specific delayed/long-running query/ies. | |
Avg Wait (ms) | The average wait stats in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Wait (ms) | The wait time in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Tasks | The number of tasks waiting for the wait type at the moment. |
Wait Type | Description |
Azure Limit | Waits commonly seen in Azure SQL Database environments, particularly in scenarios where resources are limited or throttling is in place to manage log generation rates. Common wait types included: LOG_RATE_GOVERNOR. |
Change Tracking | Waits related to modifications on tables where Change Tracking has been enabled. Common wait types included: COMMIT_TABLE. |
CPU | Waits related to a situation where a query or process waits for CPU resources to become available. Common wait types included: SOS_SCHEDULER_YIELD. |
DB Log | Waits related to the logging operations that SQL Server performs to ensure transaction durability (ACID compliance). Common wait types included: WRITELOG, LOGMGR_FLUSH, LOGBUFFER. |
DB Maintenance | Waits related to backup operations performed as a maintenance task. Common wait types included: BACKUPIO, BACKUPTHREAD. |
DB Mirror/Snapshot | Waits related to mirror or snapshot operations. Common wait types included: FCB_REPLICA_READ. |
Full Text | Happens when SQL Server is performing full-text indexing operations, such as a query involving a full-text search or updates to full-text indexes. Common wait types included: FT_IFTS_RWLOCK. |
I/O Complete | Waits related to disk I/O operations caused by disk bottlenecks or slow storage subsystems. Common wait types included: PAGEIOLATCH_EX, PAGEIOLATCH_SH, PAGELATCH_EX. |
Latch | Lightweight synchronization waits for in-memory data structures caused by buffer pool contention or heavy I/O operations. Common wait types included: LATCH_EX, LATCH_SH. |
Lock | Waits caused by contention over database objects – high transaction concurrency or blocking. Common wait types included: LCK_M_IX, LCK_M_S, LCK_M_IS. |
Low Memory | Specific waits triggered by low memory conditions during query compilation or execution. |
Memory | Waits involving memory allocation or resource semaphore management. Common wait types included: MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT, RESOURCE_SEMAPHORE. |
Network I/O | Waits involving network communication for sending/receiving data. Common wait type included: ASYNC_NETWORK_IO. |
Parallelism | Waits caused by query parallelism, where multiple threads synchronize tasks. Common wait types included: CXPACKET, CXSYNC_PORT, EXECSYNC, CXCONSUMER. |
Remote Transaction | Waits involving distributed transactions across servers or slow network communication. Common wait types included: OLEDB,DTC. |
SQL CLR/Extended SP | Waits related to the SQL CLR (Common Language Runtime) or extended stored procedures. |
SQL Internal | Internal system waits used by SQL Server for background or maintenance tasks. Common wait types included: MISCELLANEOUS, SLEEP_TASK. |
SQL Trace | Waits related to SQL Server tracing mechanisms, such as Extended Events or SQL Profiler. Common wait types included: SQLTRACE_FILE_BUFFER. |
Thread Pool | Occurs when SQL Server cannot allocate worker threads to execute queries due to thread pool exhaustion. |
Transaction | Waits associated with transaction management, such as locks, log writing, or dependency resolution. Common wait types included: TRANSACTION_MUTEX. |
XTP In-Memory | These waits occur during operations involving memory-optimized tables and natively compiled stored procedures. Common wait types included: WAIT_XTP_RECOVERY. |
— Wait Stats on Amazon RDS Oracle
Metric | Description | Investigate this alert |
Wait Type | The SQL Server wait type name. | |
Wait (%) | The percentage of the wait time compared to other waits. If the value is higher than average, there’s a wait for the specific resource, which can be caused by a specific delayed/long-running query/ies. | |
Avg Wait (ms) | The average wait stats in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Wait (ms) | The wait time in mili-seconds. If the value is higher than average, a bottleneck in a specific resource can be caused by delayed/long-running queries. | Read more |
Tasks | The number of tasks waiting for the wait type at the moment. |
Queries Module
— Live/ History
Metric | Description | Investigate this alert |
Session | Displays the login name (SPID) | |
Start Time | The time when the execution of the query has started in the database. | |
Duration | For how long the query has been running, shown in Hours:Minutes:Seconds format. | |
Max Duration | In the case of blocking queries, it displays the highest execution time from the group of blocked queries. It adds the time the blocked query has been running before it got locked with the blocking query duration. | |
Notes | Notes about problems: missing indexes, non-optimal query plan, OS resources issue, antivirus application running, and more. | |
Blocks | The amount of queries that are currently blocked by the query. | |
Open Tran | The current amount of open transactions for this query. | |
Client | The server from which the query originates, by application level. | |
DB | For MSSQL, it displays the name of the database on which the query is being executing. For Oracle, it displays the database ID. | |
Instance ID | For Oracle, it displays the ID of the Oracle instance. | |
App | the application from which the query originates. | |
Process ID | For MSSQL, it is the process id as identified in the client. | |
Login | The login name. | |
OS Login | For Oracle, it displays the client login from which the query is running. | |
SQL ID | For Oracle, it displays the query ID. | |
Total Rows | The cummulative rows count of the same Query_ID (SQL_ID) since the last calculations for the query’s execution plan and statistics. | |
Command | For a Live query, it is the command being executed at the moment. For a History query, it is the last command executed. For example: SELECT, CHECKPOINT, AWAITING COMMAND, INSERT, UPDATE, DELETE, EXECUTE. | |
Status | For a Live query, it is the current status. For a History query, it is the last query status executed. For MSSQL Server: suspended, running, runable, active, inactive. For Oracle: active, inactive. | |
Wait Resource | A specific type of wait event that occurs when a query or transaction is waiting for access to a resource. For MSSQL it indicates the specific data page that the query or transaction is waiting to access and is represented in the format database_id:file_id:page_id. For Oracle, it displays the instance name (spid) and the specific wait resource: cluster, network,user I/O, other. | |
Last Wait | The last wait stats identified. The options are based on the wait stats available in Performance module. | |
Disk I/O | Summarizes all the Disk I/O consumed by the query since it statrted running (might be in Bit, KB, MB, GB, TB). | |
Cache Reads | Summarizes the total RAM memory consumed by the query (MB,GB,TB). | |
CPU | The query’s CPU usage time (sec, min). | |
tempdb | The tempdb data growth caused by the query. We recommend separating its file’s drive from other data files in order to avoid storage space pressure in the event of extensive data growth. | |
tempdb log | The tempdb log data growth caused by the query. | |
DB log | The log growth of the database originated by the query execution. A high value might cause a log over-sizing and log drive space pressure. | |
Plan Compile Time | The time taken by the system to generate and optimize the execution plan for an SQL query. If it exceeds some milliseconds, you should consider improving the query’s execution plan. | |
Transaction Isolation | Displays the varying degrees of isolation and concurrency control. For example: read committed, read uncommitted, repeatable read, serializable. | |
Executions | The cummulative executions count of the same Query_ID (SQL_ID in Oracle) since the last calculations for the query’s execution plan and statistics. | |
Avg Rows | The output of Total Rows / Executions. | |
Avg Sec | The output of Total Seconds / Executions. | |
Total Seconds | The cummulative seconds of the same Query_ID (SQL_ID) since the last calculations for the query’s execution plan and statistics. | |
Host Process ID | The dedicated process or thread running on the server responsible for the query execution. | |
Client Process | The the process path as recognized on the client and the client’s current CPU Usage by percentage. |
— Filters in Live / History
Filter | Description | Investigate this alert |
Long or Blocked Session | Displays only long sessions or blocked sessions. | |
Missing Index | Choose ‘exists’ to display sessions that have ‘missing indexes’ recommendation. | |
Plan Improvement | Choose ‘exists’ to display sessions that have ‘plan improvement’ recommendation to the query’s execution plan. | |
Max Duration | Set the time to display sessions longer or shorter than this time. | |
Query Hash/ID | Filters by a specific query id. | |
Disk I/O | Set the Disk I/O volume to display sessions that transmitted more or less than this volume. | |
CPU | Display sessions that consumed more or less than CPU time. | |
Queries | Display queries that contains a specific string (text), the text should be less than 100 chars. | |
Wait Stats | Select the Wait Stats categories to display sessions that are included in these categories. | |
Query Alerts | Choose “exists” to display sessions that had some related alert. | |
OS User | For Oracle, filter by the OS user name. |
— QAnalyze
Metric | Description | Investigate this alert |
Max Query CPU | Calculates the maximum query’s CPU utilization percentage at a selected time frame. | |
Execution Count | Counts the amount of executions of a specific query code. | |
AVG. Duration | Calculates the average duration of a query in a selected time range. | |
AVG. Disk I/O | Calculates average Disk I/O of a query in a selected time range. | |
AVG. Cache Usage | Calculates the average cache reads of a query in a selected time range. | |
Total CPU Time | Summarizes the total CPU time of all the query’s executions in a selected time range. | |
Total Duration | Summarizes the total duration time of all the query’s executions in a selected time range. | |
Total Disk Usage | Summarizes the total Disk I/O of all the query’s executions in a selected time range. | |
Total Cache Usage | Summarizes the total cache reads from all the query’s executions in a selected time range. |
— Filters in QAnalyze
Filter | Description | Investigate this alert |
Recommended Index | Choose ‘exists’ to display sessions that have queries with index recommendation. | |
Plan Improve | Choose ‘exists’ to display sessions that have queries with plan improvement recommendation. | |
Max Query CPU | Choose “more than” or “less than” a selected maximum CPU utilization in percentage. | |
Avg. SQL CPU | Choose “more than” or “less than” a selected average CPU utilization in percentage. | |
Execution Count | Choose “more than” or “less than” a selected amount of executions. | |
Avg. Duration | Choose “more than” or “less than” a selected average query’s duration in seconds or minutes. | |
Avg. Cache Usage | Choose “more than” or “less than” a selected average cache reads in bytes, KB, MB or GB. |
Observer Module
-Windows
— Change Tracking
Change | Description | Investigate this alert |
Computer Name | Informs of a change in the computer’s name. | |
CPU Cores | Informs of a change in the number of cores. | |
CPU Specifications | Informs of a change in the CPU specifications: manufacturer/speed/model. | |
Firewall Profile | Informs of a change in the Windows Firewall general profile: Domain, Private, Public. | |
Last Restart | Informs of a restart (reboot) and its date. | |
Manufacturer | Informs of a change in the machine manufacturer. | |
Total Memory | Informs of a change in the total memory. | |
Operating System | Informs of a change in the operating system version. | |
SP | Informs of a change in the operating system’s service pack (SP). | |
Windows Update Date | Informs of a Windows Update and its date. | |
Software Installation Date | Informs of a software installation or update and its date. | |
Paging Max | Informs of a change in the maximum size set for a Pagefile. | |
Paging Min | Informs of a change in the minimum size set for a Pagefile. | |
Network Bandwidth | Informs of a change in the network bandwidth of a card. | |
Service Account Name | Informs of a change in the account name of a service. | |
Service Path | Informs of a change in the path of a service. | |
Service Start Mode | Informs of a change in the start mode of a service: Automatic, Automatic (Delayed Start), Manual, Disabled. | |
Service State | Informs of a change in the state of a service: Running, Paused, Stopped. | |
Total Disk | Informs of a change in a disk’s total capacity. |
-MSSQL on Windows
— Change Tracking
Change | Description | Investigate this alert |
Collation | Informs of a change in the SQL Server Collation. | |
Edition | Informs of a change in the SQL Server Edition. | |
Version | Informs of a change in the SQL Server Version. | |
Last Restart | Informs of a restart of the SQL Server instance and its date. | |
Cores Available | Informs of a change in the number of available logical (virtual) cores for SQL Server. | |
Cores In Use | Informs of a change in the number of cores in use by SQL Server. | |
Cluster Active Name | Informs of a change in the name of the active node in a clustered instance. | |
SP or CU | Informs of a change in the SQL Server service package (SP) or cumulative update (CU). For each version, check the latest SP or CU recommended. | |
AlwaysOn Backup Preference | Informs of a change in a database AlwaysOn Backup preference. | |
AlwaysOn Group Name | Informs of a change in a database AlwaysOn group name. | |
AlwaysOn Health | Informs of a change in a database AlwaysOn health status. | |
AlwaysOn State | Informs of a change in a database AlwaysOn state: Not Synchronized,Synchronized. | |
Auto Close | Informs of a change in a database Auto Close status: enabled/ disabled. | |
Auto Create Statistics | Informs of a change in a database Auto Create Statistics status: enabled/ disabled. | |
Auto Update Statistics | Informs of a change in a database Auto Update Statistics status: enabled/ disabled. | |
Auto Shrink | Informs of a change in a database Auto Shrink status: enabled/ disabled. | |
Database Compatibility Level | Informs of a change in a database Compatibility level. | |
Database Creation Date | Informs of a new database or a change in a database creation date. | |
Database Data Drive | Informs of a change in a database data drive path. | |
Database File Stream Drive | Informs of a change in a database file stream drive path. | |
Database Log Drive | Informs of a change in a database log drive path. | |
Node Name | Informs of a change in the node name of an active cluster. | |
Cluster Nodes Down | Informs if a cluster node is down or unavailable. | |
Mirror Mode | Informs of a change in a database mirror mode: Principal/Mirror. | |
Mirror Safety | Informs of a change in a database mirror safety mode: Full/Off. It relates to the ensured level of transactional consistency and availability between the principal and mirror databases. | |
Mirror Status | Informs of a change in a database mirror status: Not Synchronized/Synchronized. | |
Stand-by | Informs of a change in a database stand-by status. | |
Read-only | Informs of a change in a database read-only status. | |
User Access | Informs of a change in a database user-access status. | |
Page Verify Option | Informs of a change in a database page verify option: NONE, TORN_PAGE_DETECTION, CHECKSUM. | |
Recovery | Informs of a change in a database recovery mode: Full/Simple/Bulk-Logged. |
-MSSQL on Linux
— Change Tracking
Change | Description | Investigate this alert |
Collation | Informs of a change in the SQL Server Collation. | |
Edition | Informs of a change in the SQL Server Edition. | |
Version | Informs of a change in the SQL Server Version. | |
Last Restart | Informs of a restart of the SQL Server instance and its date. | |
Cores Available | Informs of a change in the number of available logical (virtual) cores for SQL Server. | |
Cores In Use | Informs of a change in the number of cores in use by SQL Server. | |
Cluster Active Name | Informs of a change in the name of the active node in a clustered instance. | |
SP or CU | Informs of a change in the SQL Server service package (SP) or cumulative update (CU). For each version, check the latest SP or CU recommended. | |
AlwaysOn Backup Preference | Informs of a change in a database AlwaysOn Backup preference. | |
AlwaysOn Group Name | Informs of a change in a database AlwaysOn group name. | |
AlwaysOn Health | Informs of a change in a database AlwaysOn health status. | |
AlwaysOn State | Informs of a change in a database AlwaysOn state: Not Synchronized,Synchronized. | |
Auto Close | Informs of a change in a database Auto Close status: enabled/ disabled. | |
Auto Create Statistics | Informs of a change in a database Auto Create Statistics status: enabled/ disabled. | |
Auto Update Statistics | Informs of a change in a database Auto Update Statistics status: enabled/ disabled. | |
Auto Shrink | Informs of a change in a database Auto Shrink status: enabled/ disabled. | |
Database Compatibility Level | Informs of a change in a database Compatibility level. | |
Database Creation Date | Informs of a new database or a change in a database creation date. | |
Database Data Drive | Informs of a change in a database data drive path. | |
Database File Stream Drive | Informs of a change in a database file stream drive path. | |
Database Log Drive | Informs of a change in a database log drive path. | |
Node Name | Informs of a change in the node name of an active cluster. | |
Cluster Nodes Down | Informs if a cluster node is down or unavailable. | |
Mirror Mode | Informs of a change in a database mirror mode: Principal/Mirror. | |
Mirror Safety | Informs of a change in a database mirror safety mode: Full/Off. It relates to the ensured level of transactional consistency and availability between the principal and mirror databases. | |
Mirror Status | Informs of a change in a database mirror status: Not Synchronized/Synchronized. | |
Stand-by | Informs of a change in a database stand-by status. | |
Read-only | Informs of a change in a database read-only status. | |
User Access | Informs of a change in a database user-access status. | |
Page Verify Option | Informs of a change in a database page verify option: NONE, TORN_PAGE_DETECTION, CHECKSUM. | |
Recovery | Informs of a change in a database recovery mode: Full/Simple/Bulk-Logged. |
-Azure
— Change Tracking
Change | Description | Investigate this alert |
Capacity | Informs of a change in the license model: DTU/VCore. | |
Virtual Core Count | Informs of a change in the number of cores on VCore licenses. |
-MSSQL on Azure
— Change Tracking
Change | Description | Investigate this alert |
Capacity | Informs of a change in the Azure SQL Database or Managed Instance capacity (related to the selected pricing model). | |
Virtual Core | Informs of a change of virtual cores available in the Azure SQL Database or Managed Instance. | |
Collation | Informs of a change in the SQL Server Collation. | |
Edition | Informs of a change in the SQL Server Edition. | |
Version | Informs of a change in the SQL Server Version. | |
Last Restart | Informs of a restart of the SQL Server instance and its date. | |
Cores Available | Informs of a change in the number of available logical (virtual) cores for SQL Server. | |
Cores In Use | Informs of a change in the number of cores in use by SQL Server. | |
Cluster Active Name | Informs of a change in the name of the active node in a clustered instance. | |
SP or CU | Informs of a change in the SQL Server service package (SP) or cumulative update (CU). For each version, check the latest SP or CU recommended. | |
AlwaysOn Backup Preference | Informs of a change in a database AlwaysOn Backup preference. | |
AlwaysOn Group Name | Informs of a change in a database AlwaysOn group name. | |
AlwaysOn Health | Informs of a change in a database AlwaysOn health status. | |
AlwaysOn State | Informs of a change in a database AlwaysOn state: Not Synchronized,Synchronized. | |
Auto Close | Informs of a change in a database Auto Close status: enabled/ disabled. | |
Auto Create Statistics | Informs of a change in a database Auto Create Statistics status: enabled/ disabled. | |
Auto Update Statistics | Informs of a change in a database Auto Update Statistics status: enabled/ disabled. | |
Auto Shrink | Informs of a change in a database Auto Shrink status: enabled/ disabled. | |
Database Compatibility Level | Informs of a change in a database Compatibility level. | |
Database Creation Date | Informs of a new database or a change in a database creation date. | |
Database Data Drive | Informs of a change in a database data drive path. | |
Database File Stream Drive | Informs of a change in a database file stream drive path. | |
Database Log Drive | Informs of a change in a database log drive path. | |
Node Name | Informs of a change in the node name of an active cluster. | |
Cluster Nodes Down | Informs if a cluster node is down or unavailable. | |
Mirror Mode | Informs of a change in a database mirror mode: Principal/Mirror. | |
Mirror Safety | Informs of a change in a database mirror safety mode: Full/Off. It relates to the ensured level of transactional consistency and availability between the principal and mirror databases. | |
Mirror Status | Informs of a change in a database mirror status: Not Synchronized/Synchronized. | |
Stand-by | Informs of a change in a database stand-by status. | |
Read-only | Informs of a change in a database read-only status. | |
User Access | Informs of a change in a database user-access status. | |
Page Verify Option | Informs of a change in a database page verify option: NONE, TORN_PAGE_DETECTION, CHECKSUM. | |
Recovery | Informs of a change in a database recovery mode: Full/Simple/Bulk-Logged. |
-MSSQL on Amazon RDS
— Change Tracking
Change | Description | Investigate this alert |
Collation | Informs of a change in the SQL Server Collation. | |
Edition | Informs of a change in the SQL Server Edition. | |
Version | Informs of a change in the SQL Server Version. | |
Last Restart | Informs of a restart of the SQL Server instance and its date. | |
Cores Available | Informs of a change in the number of available logical (virtual) cores for SQL Server. | |
Cores In Use | Informs of a change in the number of cores in use by SQL Server. | |
Cluster Active Name | Informs of a change in the name of the active node in a clustered instance. | |
SP or CU | Informs of a change in the SQL Server service package (SP) or cumulative update (CU). For each version, check the latest SP or CU recommended. | |
AlwaysOn Backup Preference | Informs of a change in a database AlwaysOn Backup preference. | |
AlwaysOn Group Name | Informs of a change in a database AlwaysOn group name. | |
AlwaysOn Health | Informs of a change in a database AlwaysOn health status. | |
AlwaysOn State | Informs of a change in a database AlwaysOn state: Not Synchronized,Synchronized. | |
Auto Close | Informs of a change in a database Auto Close status: enabled/ disabled. | |
Auto Create Statistics | Informs of a change in a database Auto Create Statistics status: enabled/ disabled. | |
Auto Update Statistics | Informs of a change in a database Auto Update Statistics status: enabled/ disabled. | |
Auto Shrink | Informs of a change in a database Auto Shrink status: enabled/ disabled. | |
Database Compatibility Level | Informs of a change in a database Compatibility level. | |
Database Creation Date | Informs of a new database or a change in a database creation date. | |
Database Data Drive | Informs of a change in a database data drive path. | |
Database File Stream Drive | Informs of a change in a database file stream drive path. | |
Database Log Drive | Informs of a change in a database log drive path. | |
Node Name | Informs of a change in the node name of an active cluster. | |
Cluster Nodes Down | Informs if a cluster node is down or unavailable. | |
Mirror Mode | Informs of a change in a database mirror mode: Principal/Mirror. | |
Mirror Safety | Informs of a change in a database mirror safety mode: Full/Off. It relates to the ensured level of transactional consistency and availability between the principal and mirror databases. | |
Mirror Status | Informs of a change in a database mirror status: Not Synchronized/Synchronized. | |
Stand-by | Informs of a change in a database stand-by status. | |
Read-only | Informs of a change in a database read-only status. | |
User Access | Informs of a change in a database user-access status. | |
Page Verify Option | Informs of a change in a database page verify option: NONE, TORN_PAGE_DETECTION, CHECKSUM. | |
Recovery | Informs of a change in a database recovery mode: Full/Simple/Bulk-Logged. |
Web Module
Metric | Description | Investigate this alert |
Client IP | The machine or server from which the request was sent. | |
URL | The site the client is trying to access. In addition to the website name, parameters, and error messages are displayed. | |
Method | The method of the sent request- can be GET or POST. | |
Start time | The time when the request was sent. | |
Duration | The time it took until a reply was received after sending a request. It doesn’t include the render time, which depends on other factor like connectivity quality (not related to the web server itself). Above a few seconds indicate a slowness problem. | |
Status | The status of the sent request. There are five types or statuses which we can filter: 200 OK 304 Not Modified 404 Not Found 500 Internal Server Error Both statuses 404 and 500 indicate a serious problem. | |
Host | The server where the software is installed on. | |
Web Site Name | The site we are trying to reach. In addition to the website name, parameters, and error messages are displayed. | |
Server IP | Host IP. | |
Port | The port from where the request is sent. |
Connections Module
— Files
Metric | Description | Investigate this alert |
Name | The file display name given in the AimBetter Configuration. | |
Path | The file path. It can be a local or remote path. | |
Type | File or Folder. | |
Path | The file path. It can be a local or remote path. | |
Duration (ms) | The time it takes to reach the file in milliseconds. | |
Size | The size of the file or folder in MB. | |
Folders | The number of folders (if there are). | |
Files | The number of files (if there are). | |
Notes | Describes the specific error with file/folder access. |
— Web
Metric | Description | Investigate this alert |
Name | The website display name given in the AimBetter Configuration. | |
URL | The URL of the website. | |
Status | The status the website. Can be OK / Not Modified / Not Found / Internal Server Error | |
Status Code | The status code of the website: 200, 304, 404, 500. | |
Round Trip (ms) | The time in milliseconds it takes to load the website (not including the render time). | |
Notes | Describes the specific error with website access. | |
SSL Expired Days | The number of days remaining until the SSL certificate expires. |
— DB Connection
Metric | Description | Investigate this alert |
Name | The DB instance display name given in the AimBetter Configuration. | |
Duration (ms) | The time it takes to establish a connection to the DB in milliseconds. | |
Success | Whether the connection succeeded or not | |
Notes | Describes the specific error with DB connection. |
— Ping
Metric | Description | Investigate this alert |
Name | The Ping display name given in the AimBetter Configuration. | |
Server’s IP | The IP of the server the Ping is being sent to. | |
Ping Lost Packets (0-12) | The amount of unsuccessful communication integrity checks out of 12 attempts. | Read more |
Network Jitter | The variation in the delay of packet delivery during all 12 communication integrity checks. | Read more |
Network Latency | The time taken for a packet to travel from its source to its destination, including the time spent in transit and any processing delays along the way. | Read more |
Notes | Describes the specific error with DB connection. |
— Query
Metric | Description | Investigate this alert |
Name | The Query display name given in the AimBetter Configuration. | |
Table/View | Table or View name from which the query is done. | |
Column | The column name in the query. | |
Value Num | The output of the query in rows num, enabling to set an alert if it crosses a specific value. | |
Value | Which value the query belongs to. | |
Notes | Describes the specific exception with the execution of the query. |
Security Risk Alerts
The following alerts pose security risks.
— Windows
Risk | Description | Investigate this alert |
Windows Restart | A malicious party could initiate a Windows Server restart to exploit the brief downtime, potentially bypassing security controls or launching an attack while the system is rebooting and its defenses are not fully active. | |
Windows Update | A malicious party could exploit an update process to introduce vulnerabilities, potentially delaying critical patches or injecting malware, thereby gaining unauthorized access or compromising system integrity. | |
Change in Firewall Profile | May expose the network to unauthorized access or attacks by altering the security rules, potentially allowing malicious traffic through previously blocked ports or protocols. | |
Change in a Service Account Name | A change in the service account name can disrupt service operations, potentially causing authentication failures and creating opportunities for unauthorized access or privilege escalation if not properly managed. | |
Software Installation or Update | A software installation or update could introduce new vulnerabilities, compromise system stability, or inadvertently install malicious software, leaving the system exposed to attacks. | |
Remote Access | Remote access programs like AnyDesk, TeamViewer and others could allow unauthorized users to gain control of the server, potentially leading to data breaches, system manipulation, or further malicious activities. | Read more |
Suspicious Process | Suspicious programs can pose a risk to the company like malicious software, potentially leading to data breaches, system manipulation, or further malicious activities. | Read more |
File Not Found | If a sensitive file is deleted or has its path changed, it could indicate unauthorized access or malicious activity, potentially leading to data breaches, loss of critical information, or system compromise. | Read more |
— MSSQL
Risk | Description | Investigate this alert |
MSSQL Restart | May temporarily expose the server to attacks or unauthorized access during the reboot process, especially if security mechanisms and connections are not fully restored or properly secured. | |
Change in MSSQL Node Name | Can create security risks by potentially disrupting access controls and authentication processes, which could lead to unauthorized access or compromise data integrity if not correctly managed. | |
Cluster Node Down | Reduce the redundancy and failover capability of the system, potentially making the server more vulnerable to attacks, data loss, or service disruption during the downtime. | |
Change in MSSQL Node Name | Can create security risks by potentially disrupting access controls and authentication processes, which could lead to unauthorized access or compromise data integrity if not correctly managed. | |
Change in Database Status | The database may be temporarily inaccessible or vulnerable to unauthorized access, data corruption, or loss if the change is not properly managed and secured. | Read more |
Change in Database Mirror Status | Disrupts data synchronization and failover mechanisms, potentially leading to data loss, inconsistent backups, or exposure to unauthorized access if the mirrored database becomes vulnerable. | Read more |
Change in Database AlwaysOn Status | Compromises high availability and failover protection, potentially leading to data loss, downtime, or increased vulnerability to unauthorized access during the transition. | Read more |
Failed Login Attempts | Can indicate brute force attacks or unauthorized access attempts, potentially leading to account compromise if not properly monitored and mitigated. | Read more |
Permission Violation | Unauthorized users access attempts to sensitive data could indicate malicious intentions to perform actions that could compromise the integrity, confidentiality, or availability of the database. | |
Object Not Found | A high number of object not found exceptions with different object names or identifiers can indicate a potential enumeration attack, where an attacker is systematically probing the database to identify existing objects for further exploitation. |