Threshold exceptions

mbudke
mbudke Member Posts: 132 ✭✭✭
edited December 2023 in Remote Monitoring

Hey all,

I am using the threshold configuration to monitor my computer and server. I really think this threshold part is beneficial to have a chance to react before a real enduser-facing-problem appears.

My problem is that there are situations in which I expect the threshold to breach. Let me use the default Atera threshold as an example. If there is a memory consumption above 95% for more than 9 minutes a critical alarm shall be thrown. This makes perfectly sense during busines hours.

E.g. on the weekend or during night specific maintenance tasks or backup tasks are being executed. These tasks must run with high priority and are allowed to use as much ressources as needed to finish in time till the enduser starts to work again.
Therefore over night or in the weekend it can be expected to break this specific threshold.

All these alarms solve itself so no action is required but it spams my mail inbox and also spams my alarm list.

Have you found a solution/ workaround for this kind of situation?
Note that memory is just an example. It could also apply to disk size, CPU usage, specific windows events, uptime monitor etc.

Matthias

Comments

  • mjones
    mjones Member Posts: 184 ✭✭✭✭

    I would really like to be able to apply multiple thresholds to sites\agents. I think it is pretty limited how you can only apply a single threshold with a single type and level for each item.

    You could probly do it with scripting and the API, but that is a can of worms for sure.

  • mbudke
    mbudke Member Posts: 132 ✭✭✭

    I totally agree with the limitation. Having more thresholds available that would maybe not solve my issue but make life a lot easier.

  • kim
    kim Member Posts: 113 ✭✭✭

    I think adding multiple thresholds would be great in terms of getting better data for my KBRs for my clients when I'm trying to give a more holistic picture of their environment.

  • mbudke
    mbudke Member Posts: 132 ✭✭✭

    Being able to assign multiple Thresholds would be a game changer.

    You can easily create a threshold per use-case and then add the thresholds to the computer/ server when needed.
    Examples can be:

    • Monitor C:\ drive
    • Monitor D:\ drive
    • Monitor eventlogs for a specific application
    • Run scripts for a specific applications

    Currently it is required to setup a threshold per server / computer per customer.

    If you for example monitor a backup solution and this backup solution adds a new eventlog entry then you only have to assign it to a single threshold and it will be applied to all systems which do have this threshold assigned. Currently you need to assign it to any threshold which includes monitoring of the backup solution.

    Having then an option to set timeframes in which a threshold is active would also cover my original request.

  • nina
    nina Internal Posts: 428 ✭✭✭✭✭

    Hi @Matthias - did you add this request to UserVoice?

  • mbudke
    mbudke Member Posts: 132 ✭✭✭

    Hi @nina

    I have not as I was hoping that there is an easy workaround or wrong configuration done by myself.
    Now as you mentioned it I checked the UserVoice and there are many entries already which go in the same direction.

    https://atera.uservoice.com/forums/936306-ideas-and-feedback/suggestions/43724628-shared-threshold-library
    https://atera.uservoice.com/forums/936306-ideas-and-feedback/suggestions/44648814-multiple-threshold-profiles-on-device
    https://atera.uservoice.com/forums/936306-ideas-and-feedback/suggestions/44067594-override-threshold-profile-for-device
    https://atera.uservoice.com/forums/936306-ideas-and-feedback/suggestions/45320407-posible-to-have-individuel-monitoring-threshold-on
    https://atera.uservoice.com/forums/936306-ideas-and-feedback/suggestions/44012649-change-threshold-profiles-to-use-a-parent-child-re
    https://atera.uservoice.com/forums/936306-ideas-and-feedback/suggestions/44259168-possibility-of-setting-different-thresholds-for-vo
    https://atera.uservoice.com/forums/936306-ideas-and-feedback/suggestions/45104776-remove-item-or-add-new-item-to-multiple-or-all-thr
    https://atera.uservoice.com/forums/936306-ideas-and-feedback/suggestions/43981521-multiple-thresholds

    I am happy to create another one in that list if this is what is requested but as mentioned in my post here (

    ) I would prefer if the UserVoice would be maintained to get the requests together and build up a "full idea" instead of just addings smaller pieces. At the end it would benefit most people and the voting will give better results.
    I could vote at this moment but which one would be correct? I could vote for all of them but that also just results in confusion.

    While searching I found https://atera.uservoice.com/forums/936306-ideas-and-feedback/suggestions/45261343-forum which I think you can close now 😉

    Matthias

  • yasminproduct16
    yasminproduct16 Internal Posts: 16

    Hi, thanks for the feedback.
    We currently offer the option to pause alerts for up to 5 hours. It sounds like you are looking to do this on a recurring basis.
    Could I ask what would trigger this maintenance, is it an IT Automation profile via Atera?

  • mbudke
    mbudke Member Posts: 132 ✭✭✭

    Hi @Yasmin from Atera

    thanks a lot for the reply which is much appreciated!
    Just answering your question would be: trigger would be a given time on specific days a week (e.g. Mo-Fr 3am-6am)

    From all the comments in this thread I have the feeling I am using the functionality wrong or my expectations are too high.
    Maybe there is a better best-practice but let me explain what I wanted to achieve and my personal best-practive would have looked like:

    I am working as an MSP delivering IT services to many different companies.
    To keep my own support work as small as possible I do have one product per function which I use at all my customer.
    Example:

    • Antivirus application
    • Backup solution

    This has the positive effect that I know these product quite good in what they are good at but also in what they are not good at.
    Based on this knowledge I would like to create thresholds to capture potential problems and warn me.

    I create one threshold per application including the application-specific monitoring.
    In case that I find a new threshold to monitor or an update performs changes to the monitoring I just need to correct a single threshold configuration and not each device.

    Then I can pick a device or group and assign the required thresholds to it. A group could be used for client PCs and a direct assignment is maybe better for servers as each server has a different task.

    Here is an example

    Threshold "VM Backup software"

    • monitor if backup task failed
    • monitor if verification task failed

    Threshold "Monitor drive C:"

    • monitor if the drive C: has less than 10% free storage

    Threshold "Monitor drive D:"

    • monitor if the drive D: has less than 10% free storage

    Threshold "Monitor drive E:"

    • monitor if the drive E: has less than 10% free storage

    Threshold "Monitor RAM consumption"

    • monitor if RAM consumption is above 90% for more than 10 minutes

    To my group "Client PCs" I assign the thresholds:

    • "Monitor drive C:"
    • "Monitor drive D:"
    • "Monitor RAM consumption"
      "Monitor drive E:" is not configured as none of the client PCs does have a drive E: installed. If I need it then I could add this threshold specific to the client PC.

    To my "Server 1" I assign the thresholds:

    • "Monitor drive C:"
    • "Monitor drive D:"
    • "Monitor drive E:"
    • "VM Backup software"
    • "Monitor RAM consumption"
      I need to be able to define that the "Monitor RAM consumption" is disabled every day between 3am and 6am because I know it will be breached during that timeframe as the backup software is running the tasks and it is expected to use that high amount of memory

    To my "Server 2" I assign the thresholds:

    • "Monitor drive C:"
    • "Monitor drive D:"
      "Monitor drive E:" is not configured as the server has a local USB dongle. The dongle is always on 100% and therefore would always show as an error

    That said I need to add a specific additon for the assignment to groups.
    In my exmaple above I did assign threshold to the group "Client PCs". I would need the option to also assign thresholds to a device in the group and should be able to configured either "use group + device specific threshold" or "use device specific thresholds only".
    If there is an option to configure "use device specific thresholds and use group thresholds (please select which to exclude)" it would be awesome as this contains the parent-child-concept but I understand when this is too complicated.

    Sorry for my long post but if there is any input how other are dealing with the thresholds or if there is any way how to deal with my requirements that would be much appreciated.
    In case I totally missunderstood the threshold management please also let me know :)

    Thanks,
    Matthias

  • mjones
    mjones Member Posts: 184 ✭✭✭✭

    Fantastic write-up and examples.

    For me, I would be looking for something as simple as a setting when running a script or IT Automation Profile to set a pause on alerts.
    For example, a script to reboot a server. Would like to have a 30-60 minute pause on alerts since we know the agent will go offline.

  • mbudke
    mbudke Member Posts: 132 ✭✭✭

    Hi Team Atera,

    thanks a lot for opening the Support case which is much appreciated.
    Support did explain to me in a good detail what is currently possible with thresholds.

    I hope it is OK to share the content (without names) so the information can be distributed as I think the support representative did explain it in a good understandable way:

    Hi Matthias,
    I hope my email finds you well.
    It has been brought to our attention that you have some queries associated with our Threshold configuration.
    While reviewing the scenario you have mentioned to some extent this can be achieved through Atera.
    First of all, as a limitation, as I'm sure you are aware currently only one threshold can be assigned to a device so in our current scenario multiple thresholds would need to be created.
    The second thing you will need to take into consideration is our inheritance feature in Atera.
    The threshold assigned at a customer level will be overwritten by the threshold assigned at a folder level which will be overwritten by the threshold assigned at a device level.
    If no threshold is assigned to a particular device, it will be automatically inherited by the customer.
    As a note, regarding the RAM scenario, at the moment to achieve this, the alerts associated with that particular device would need to be paused or that alert type would need to be snoozed.
    For reference:
    https://support.atera.com/hc/en-us/articles/215285168-Snooze-Alerts-
    https://support.atera.com/hc/en-us/articles/360016563639-Pause-alerts
    Given the inheritance that I have mentioned above, within certain scenarios, you could assign a general threshold to your customer, a particular threshold to your folders, and based on the scenario assign another threshold to a device.
    Of course, an organization based on folders will also be required based on the device type/scenario, etc.
    https://support.atera.com/hc/en-us/articles/360024934374-Organize-Devices-With-Folders
    I do understand what you would like to achieve, and indeed multiple drives can be added to the same threshold, although to this extent it cannot be achieved.
    Looking forward to hearing from you on this matter!

    Unfortunately this confirms that my current expectations cannot be met.

    I am more than happy to raise it via UserVoice but @nina , @Yasmin from Atera can you please guide me on what would be the best-practice? Do you wish me to vote, create a new poll or comment in existing polls?
    Please see this comment:

    Thanks a lot,
    Matthias

  • nina
    nina Internal Posts: 428 ✭✭✭✭✭

    Hi @Matthias - I have passed your feedback to our Product Team; all valid suggestions!