Infrastructure monitoring best practices guide

Want even longer periods of uninterrupted sleep? Here are eight best practices to make dynamic infrastructure and server monitoring even easier with New Relic's infrastructure monitoring.

Tip

If you're looking for a tutorial, check out our walkthrough for troubleshooting with infrastructure host data. The tutorial series walks you through how to find data in the infrastructure UI to resolve an incident and make a resource decision about your hosts.

1. Install the infrastructure agent across your entire environment

Our infrastructure monitoring solution was designed to help enterprise customers monitor their large and dynamically changing environments at scale. In order to facilitate this, the UI is completely driven by tags that let you visualize aggregated metrics, events, and inventory for a large number of servers. To really get the most out of infrastructure monitoring, we recommend installing it across your entire environment, preferably even across multiple regions and clusters. This will provide a more accurate picture of the health of your host ecosystem and the impact your infrastructure has on your applications.

Want to achieve faster Mean Time To Resolution (MTTR)? Install the infrastructure agent on database servers, web servers, and any other host that supports your applications. When deploying the agent, leverage custom attributes to tag your hosts so that you can use those for filtering the data presented in the UI and for setting alerts. This is in addition to any Amazon EC2 tags you may be using which will auto-import when you enable the EC2 integration. You may also prefer to keep the agent logs separate from the system logs, which you can do through the configuration.

How to do it

Leverage our install modules for config management tools such as Chef, Puppet and Ansible to easily deploy your agent across all your infrastructure.
Read the instructions in the github repo for your config management tool referenced in the link above and define the custom_attributes you want to use to tag your hosts.
Set the log_file attribute to your preferred location for the infrastructure agent logs.

Tip

If you are installing the agent on a single host, the process should only take a few minutes and you can find detailed instructions in our documentation.

2. Configure the native EC2 integration

If you have an AWS environment, in addition to installing the infrastructure agent on your EC2 instances to monitor them, we also recommend configuring the EC2 integration so that New Relic can automatically import all the tags and metadata associated with your AWS instances.

This allows you to filter down to a part of your infrastructure using the same AWS tags (example, ECTag_Role='Kafka'), and slice-and-dice your data in multiple ways. Additionally, our and entity filter views are tag-driven and dynamic, so they automatically add/remove instances matching these tags to give our users the most real-time views that scale with your cloud infrastructure.

3. Activate the integrations

Monitoring your infrastructure extends beyond just CPU, memory, and storage utilization. That's why New Relic has out-of-the-box integrations that allow you to monitor all the services that support your hosts as well. Activate any of our integrations, including AWS CloudWatch, AWS Billing, AWS ELB, MySQL, NGINX, and more, to extend monitoring to your AWS or on-host applications, and access the pre-configured that appear for each of them.

4. Create views of host groupings

You can use the entity filter to save helpful views of entities. This helps you optimize your resources by using a focused view to monitor, detect, and resolve problems proactively. You'll also be able to see the color-coded health status of each host inside that saved view, so you can quickly identify problematic areas of your infrastructure.

5. Create alert conditions

With New Relic, you can create alert conditions directly within the context of your monitored entities. For example, if you're viewing a saved view comprised of a large number of hosts and notice a problem, you don't need to create an individual alert condition for every host within. Instead, we recommend initiating the alert condition directly from the chart of the metric you're viewing, and create it based on the filter tags.

This will create an alert condition for any hosts that match those tags, allowing our infrastructure monitoring to automatically remove hosts that go offline and add new hosts to the alert condition if they match those tags. Alerts configured once for the appropriate tags will scale correctly across all future hosts. And know that you can also leverage existing alert policies for infrastructure alert conditions.

6. View infrastructure data alongside APM data

The integration between APM and infrastructure monitoring lets you see your APM data and infrastructure data side by side, so you can find the root cause of problems more quickly, no matter where they originate. This allows users to view the performance relationship of your hosts and the applications running on them, allowing for quicker diagnosis of the issue and impact on the business' health.

Use health maps to quickly spot any issues or alerts related to the health of your applications and how that connects to the supporting infrastructure. The first boxes starting from the top left are those that require your attention.

7. Access infrastructure data using metrics and events

Teams that use multiple New Relic capabilities find it useful to create a single dashboard to visually correlate the health of their infrastructure with application, browser and synthetics metrics. That's where our data exploration features comes in. All the granular data collected by infrastructure monitoring is stored in New Relic, and is accessible to you immediately. Having access to the raw metrics means you can run more custom queries using NRQL, and also create dashboards to share infrastructure metrics with your team.

8. Update your agents regularly

New Relic's software engineering team is constantly pushing out improvements and new features to improve your overall monitoring experience. In order to take advantage of all the awesomeness we're delivering, we recommend regularly updating to the latest version of the infrastructure agent.

Want more user tips?

View training videos at New Relic University.
Read the documentation.
Check out our tutorials.
Ask a question in the New Relic Support Forum.