Infrastructure monitoring best practices guide

Want even longer periods of uninterrupted sleep? Here are eight best practices to make dynamic infrastructure and server monitoring even easier with New Relic's infrastructure monitoring.

1. Install the infrastructure agent across your entire environment

Our infrastructure monitoring solution was designed to help enterprise customers monitor their large and dynamically changing environments at scale. In order to facilitate this, the UI is completely driven by tags that let you visualize aggregated metrics, events, and inventory for a large number of servers. To really get the most out of infrastructure monitoring, we recommend installing it across your entire environment, preferably even across multiple regions and clusters. This will provide a more accurate picture of the health of your host ecosystem and the impact your infrastructure has on your applications.

Want to achieve faster Mean Time To Resolution (MTTR)? Install the infrastructure agent on database servers, web servers, and any other host that supports your applications. When deploying the agent, leverage custom attributes to tag your hosts so that you can use those for filtering the data presented in the UI and for setting alerts. This is in addition to any Amazon EC2 tags you may be using which will auto-import when you enable the EC2 integration. You may also prefer to keep the agent logs separate from the system logs, which you can do through the configuration.

How to do it

  1. Leverage our install modules for config management tools such as Chef, Puppet and Ansible to easily deploy your agent across all your infrastructure.
  2. Read the instructions in the github repo for your config management tool referenced in the link above and define the custom_attributes you want to use to tag your hosts.
  3. Set the log_file attribute to your preferred location for the infrastructure agent logs.

If you are installing the agent on a single host, the process should only take a few minutes and you can find detailed instructions in our documentation.

2. Configure the native EC2 integration

If you have an AWS environment, in addition to installing the infrastructure agent on your EC2 instances to monitor them, we also recommend configuring the EC2 integration so that New Relic can automatically import all the tags and metadata associated with your AWS instances.

This allows you to filter down to a part of your infrastructure using the same AWS tags (example, ECTag_Role='Kafka'), and slice-and-dice your data in multiple ways. Additionally, our ‘Alerts’ and ‘Saved Filter Sets’ are completely tag-driven and dynamic, so they automatically add/remove instances matching these tags to give our users the most real-time views that scale with your cloud infrastructure.

3. Activate the integrations

Monitoring your infrastructure extends beyond just CPU, memory, and storage utilization. That’s why New Relic has out-of-the-box integrations that allow you to monitor all the services that support your hosts as well. Activate any of our integrations, including AWS Billing, AWS ELB, Amazon S3, MySQL, NGINX, and more, to extend monitoring to your AWS or on-host applications, and access the pre-configured dashboards that appear for each of them.

4. Create filter sets

With New Relic, users can create filter sets to organize hosts, cluster roles, and other resources based on criteria that matter the most to users. This allows you to optimize your resources by using a focused view to monitor, detect, and resolve any problems proactively. The attributes for filtering are populated from the auto-imported EC2 tags or custom tags that may be applied to hosts. You can combine as many filters as you want in a filter set, and save them to share with other people in your account.

You’ll also be able to see the color-coded health status of each host inside the filter set, so you can quickly identify problematic areas of your infrastructure. Additionally, filter sets can be used in the health map to get an overview of your infrastructure performance at a glance based on the filters that matter to your teams.

5. Create alert conditions

With New Relic, you can create alert conditions directly within the context of what you are currently monitoring with New Relic. For example, if you are viewing a filter set comprised of a large number of hosts and notice a problem, you don’t need to create an individual alert condition for every host within. Instead, we recommend initiating the alert condition directly from the chart of the metric you are viewing and creating it based on the filter tags.

This will create an alert condition for any hosts that match those tags, allowing our infrastructure monitoring to automatically remove hosts that go offline and add new hosts to the alert condition if they match those tags. Alerts configured once for the appropriate tags will scale correctly across all future hosts. And know that you can also leverage existing alert policies for infrastructure alert conditions.

6. View infrastructure data alongside APM data

The integration between New Relic APM and infrastructure monitoring lets you see your APM data and infrastructure data side by side, so you can find the root cause of problems more quickly, no matter where they originate. This allows users to view the performance relationship of your hosts and the applications running on them, allowing for quicker diagnosis of the issue and impact on the business’ health.

Use health maps to quickly spot any issues or alerts related to the health of your applications and how that connects to the supporting infrastructure. The first boxes starting from the top left are those that require your attention.

7. Access Infrastructure data using the Data explorer

Teams that use multiple New Relic capabilities find it useful to create a single dashboard to visually correlate the infrastructure’s health with application, browser and synthetics metrics. That’s where New Relic data exploration features comes in. All the granular metrics and events collected by infrastructure monitoring are stored in New Relic and are accessible to you immediately. Having access to the raw metrics means you can run more custom queries using NRQL, and also create dashboards to share infrastructure metrics with your team.

8. Update your agents regularly

New Relic’s software engineering team is constantly pushing out improvements and new features to improve our customers’ overall monitoring experience. In order to take advantage of all the awesomeness they’re delivering, we recommend regularly updating to the latest version of the infrastructure agent.

Want more user tips?

If you need more help, check out these support and learning resources: