Cloud Monitoring metrics get 10-second resolution

Higher resolution metrics are critical for monitoring dynamically changing environments and rapidly changing application metrics. Examples where high resolution metrics are critical include high volume e-commerce, live streaming, autoscaling bursty workloads on Kubernetes clusters, and more. Higher resolution customPrometheus, and agent metrics are now generally available, and can be written at a granularity of 10 seconds. Previously these metric types could only be written once every 60 seconds.

How to write Monitoring agent metrics at 10-second resolution

The Cloud Monitoring agent is a collectd-based daemon that collects system and application metrics from virtual machine instances and sends them to Cloud Monitoring. The Monitoring agent collects disk, CPU, network, and process metrics. By default, agent metrics are written at 60-second granularity. You can modify the agent collectd.conf configuration to send metrics at 10-second granularity by changing the Interval value to ‘10’ in the Monitoring agent’s collectd.conf file.

After making this change, you will need to restart your agent (this may differ based on your operating system and distro):

sudo service stackdriver-agent restart

Higher resolution agent metrics require Monitoring agent version 6.0.1 or greater. You can find documentation for determining your agent version here.

Now that your Monitoring agent is emitting metrics at 10-second granularity, you can view them in Metrics Explorer by searching for metrics with the prefix “agent.googleapis.com/agent/”.

1 Metrics Explorer.jpg

How to write custom metrics at 10-second resolution

Custom metrics allow you to define and collect metric data that built-in Google Cloud metrics cannot provide. These could be specific to your application, infrastructure, or business. For example: “Latency of the shopping cart service” or “Returning customer rate” in an e-commerce application.

Custom metrics can be written in a variety of ways: via the Monitoring APICloud Monitoring client librariesOpenCensus/OpenTelemetry libraries, or the Cloud Monitoring agent.

We recommend using the OpenCensus libraries to write custom metrics for several reasons:

  1. It is open source and supports a wide range of languages and frameworks.
  2. OpenCensus provides vendor-agnostic support for the collection of metric and trace data.
  3. OpenCensus provides optimized collection of points and batching of Monitoring API calls. It also handles timing API calls for 10-second resolution and other time intervals, so that the Monitoring API won’t reject points for being written too frequently. It also handles retries, exponential backoff, and more, helping to ensure that your metric points make it to the monitoring system.
  4. OpenCensus allows you to export the collected data to a variety of backend applications and monitoring services, including Cloud Monitoring.

#google cloud platform #management tools #cloud #cloud computing

What is GEEK

Buddha Community

Cloud Monitoring metrics get 10-second resolution
Adaline  Kulas

Adaline Kulas

1594162500

Multi-cloud Spending: 8 Tips To Lower Cost

A multi-cloud approach is nothing but leveraging two or more cloud platforms for meeting the various business requirements of an enterprise. The multi-cloud IT environment incorporates different clouds from multiple vendors and negates the dependence on a single public cloud service provider. Thus enterprises can choose specific services from multiple public clouds and reap the benefits of each.

Given its affordability and agility, most enterprises opt for a multi-cloud approach in cloud computing now. A 2018 survey on the public cloud services market points out that 81% of the respondents use services from two or more providers. Subsequently, the cloud computing services market has reported incredible growth in recent times. The worldwide public cloud services market is all set to reach $500 billion in the next four years, according to IDC.

By choosing multi-cloud solutions strategically, enterprises can optimize the benefits of cloud computing and aim for some key competitive advantages. They can avoid the lengthy and cumbersome processes involved in buying, installing and testing high-priced systems. The IaaS and PaaS solutions have become a windfall for the enterprise’s budget as it does not incur huge up-front capital expenditure.

However, cost optimization is still a challenge while facilitating a multi-cloud environment and a large number of enterprises end up overpaying with or without realizing it. The below-mentioned tips would help you ensure the money is spent wisely on cloud computing services.

  • Deactivate underused or unattached resources

Most organizations tend to get wrong with simple things which turn out to be the root cause for needless spending and resource wastage. The first step to cost optimization in your cloud strategy is to identify underutilized resources that you have been paying for.

Enterprises often continue to pay for resources that have been purchased earlier but are no longer useful. Identifying such unused and unattached resources and deactivating it on a regular basis brings you one step closer to cost optimization. If needed, you can deploy automated cloud management tools that are largely helpful in providing the analytics needed to optimize the cloud spending and cut costs on an ongoing basis.

  • Figure out idle instances

Another key cost optimization strategy is to identify the idle computing instances and consolidate them into fewer instances. An idle computing instance may require a CPU utilization level of 1-5%, but you may be billed by the service provider for 100% for the same instance.

Every enterprise will have such non-production instances that constitute unnecessary storage space and lead to overpaying. Re-evaluating your resource allocations regularly and removing unnecessary storage may help you save money significantly. Resource allocation is not only a matter of CPU and memory but also it is linked to the storage, network, and various other factors.

  • Deploy monitoring mechanisms

The key to efficient cost reduction in cloud computing technology lies in proactive monitoring. A comprehensive view of the cloud usage helps enterprises to monitor and minimize unnecessary spending. You can make use of various mechanisms for monitoring computing demand.

For instance, you can use a heatmap to understand the highs and lows in computing visually. This heat map indicates the start and stop times which in turn lead to reduced costs. You can also deploy automated tools that help organizations to schedule instances to start and stop. By following a heatmap, you can understand whether it is safe to shut down servers on holidays or weekends.

#cloud computing services #all #hybrid cloud #cloud #multi-cloud strategy #cloud spend #multi-cloud spending #multi cloud adoption #why multi cloud #multi cloud trends #multi cloud companies #multi cloud research #multi cloud market

Adaline  Kulas

Adaline Kulas

1594166040

What are the benefits of cloud migration? Reasons you should migrate

The moving of applications, databases and other business elements from the local server to the cloud server called cloud migration. This article will deal with migration techniques, requirement and the benefits of cloud migration.

In simple terms, moving from local to the public cloud server is called cloud migration. Gartner says 17.5% revenue growth as promised in cloud migration and also has a forecast for 2022 as shown in the following image.

#cloud computing services #cloud migration #all #cloud #cloud migration strategy #enterprise cloud migration strategy #business benefits of cloud migration #key benefits of cloud migration #benefits of cloud migration #types of cloud migration

Cloud Monitoring metrics get 10-second resolution

Higher resolution metrics are critical for monitoring dynamically changing environments and rapidly changing application metrics. Examples where high resolution metrics are critical include high volume e-commerce, live streaming, autoscaling bursty workloads on Kubernetes clusters, and more. Higher resolution customPrometheus, and agent metrics are now generally available, and can be written at a granularity of 10 seconds. Previously these metric types could only be written once every 60 seconds.

How to write Monitoring agent metrics at 10-second resolution

The Cloud Monitoring agent is a collectd-based daemon that collects system and application metrics from virtual machine instances and sends them to Cloud Monitoring. The Monitoring agent collects disk, CPU, network, and process metrics. By default, agent metrics are written at 60-second granularity. You can modify the agent collectd.conf configuration to send metrics at 10-second granularity by changing the Interval value to ‘10’ in the Monitoring agent’s collectd.conf file.

After making this change, you will need to restart your agent (this may differ based on your operating system and distro):

sudo service stackdriver-agent restart

Higher resolution agent metrics require Monitoring agent version 6.0.1 or greater. You can find documentation for determining your agent version here.

Now that your Monitoring agent is emitting metrics at 10-second granularity, you can view them in Metrics Explorer by searching for metrics with the prefix “agent.googleapis.com/agent/”.

1 Metrics Explorer.jpg

How to write custom metrics at 10-second resolution

Custom metrics allow you to define and collect metric data that built-in Google Cloud metrics cannot provide. These could be specific to your application, infrastructure, or business. For example: “Latency of the shopping cart service” or “Returning customer rate” in an e-commerce application.

Custom metrics can be written in a variety of ways: via the Monitoring APICloud Monitoring client librariesOpenCensus/OpenTelemetry libraries, or the Cloud Monitoring agent.

We recommend using the OpenCensus libraries to write custom metrics for several reasons:

  1. It is open source and supports a wide range of languages and frameworks.
  2. OpenCensus provides vendor-agnostic support for the collection of metric and trace data.
  3. OpenCensus provides optimized collection of points and batching of Monitoring API calls. It also handles timing API calls for 10-second resolution and other time intervals, so that the Monitoring API won’t reject points for being written too frequently. It also handles retries, exponential backoff, and more, helping to ensure that your metric points make it to the monitoring system.
  4. OpenCensus allows you to export the collected data to a variety of backend applications and monitoring services, including Cloud Monitoring.

#google cloud platform #management tools #cloud #cloud computing

Carmen  Grimes

Carmen Grimes

1598959140

How to Monitor Third Party API Integrations

Many enterprises and SaaS companies depend on a variety of external API integrations in order to build an awesome customer experience. Some integrations may outsource certain business functionality such as handling payments or search to companies like Stripe and Algolia. You may have integrated other partners which expand the functionality of your product offering, For example, if you want to add real-time alerts to an analytics tool, you might want to integrate the PagerDuty and Slack APIs into your application.

If you’re like most companies though, you’ll soon realize you’re integrating hundreds of different vendors and partners into your app. Any one of them could have performance or functional issues impacting your customer experience. Worst yet, the reliability of an integration may be less visible than your own APIs and backend. If the login functionality is broken, you’ll have many customers complaining they cannot log into your website. However, if your Slack integration is broken, only the customers who added Slack to their account will be impacted. On top of that, since the integration is asynchronous, your customers may not realize the integration is broken until after a few days when they haven’t received any alerts for some time.

How do you ensure your API integrations are reliable and high performing? After all, if you’re selling a feature real-time alerting, you’re alerts better well be real-time and have at least once guaranteed delivery. Dropping alerts because your Slack or PagerDuty integration is unacceptable from a customer experience perspective.

What to monitor

Latency

Specific API integrations that have an exceedingly high latency could be a signal that your integration is about to fail. Maybe your pagination scheme is incorrect or the vendor has not indexed your data in the best way for you to efficiently query.

Latency best practices

Average latency only tells you half the story. An API that consistently takes one second to complete is usually better than an API with high variance. For example if an API only takes 30 milliseconds on average, but 1 out of 10 API calls take up to five seconds, then you have high variance in your customer experience. This is makes it much harder to track down bugs and harder to handle in your customer experience. This is why 90th percentile and 95th percentiles are important to look at.

Reliability

Reliability is a key metric to monitor especially since your integrating APIs that you don’t have control over. What percent of API calls are failing? In order to track reliability, you should have a rigid definition on what constitutes a failure.

Reliability best practices

While any API call that has a response status code in the 4xx or 5xx family may be considered an error, you might have specific business cases where the API appears to successfully complete yet the API call should still be considered a failure. For example, a data API integration that returns no matches or no content consistently could be considered failing even though the status code is always 200 OK. Another API could be returning bogus or incomplete data. Data validation is critical for measuring where the data returned is correct and up to date.

Not every API provider and integration partner follows suggested status code mapping

Availability

While reliability is specific to errors and functional correctness, availability and uptime is a pure infrastructure metric that measures how often a service has an outage, even if temporary. Availability is usually measured as a percentage of uptime per year or number of 9’s.

AVAILABILITY %DOWNTIME PER YEARDOWNTIME PER MONTHDOWNTIME PER WEEKDOWNTIME PER DAY90% (“one nine”)36.53 days73.05 hours16.80 hours2.40 hours99% (“two nines”)3.65 days7.31 hours1.68 hours14.40 minutes99.9% (“three nines”)8.77 hours43.83 minutes10.08 minutes1.44 minutes99.99% (“four nines”)52.60 minutes4.38 minutes1.01 minutes8.64 seconds99.999% (“five nines”)5.26 minutes26.30 seconds6.05 seconds864.00 milliseconds99.9999% (“six nines”)31.56 seconds2.63 seconds604.80 milliseconds86.40 milliseconds99.99999% (“seven nines”)3.16 seconds262.98 milliseconds60.48 milliseconds8.64 milliseconds99.999999% (“eight nines”)315.58 milliseconds26.30 milliseconds6.05 milliseconds864.00 microseconds99.9999999% (“nine nines”)31.56 milliseconds2.63 milliseconds604.80 microseconds86.40 microseconds

Usage

Many API providers are priced on API usage. Even if the API is free, they most likely have some sort of rate limiting implemented on the API to ensure bad actors are not starving out good clients. This means tracking your API usage with each integration partner is critical to understand when your current usage is close to the plan limits or their rate limits.

Usage best practices

It’s recommended to tie usage back to your end-users even if the API integration is quite downstream from your customer experience. This enables measuring the direct ROI of specific integrations and finding trends. For example, let’s say your product is a CRM, and you are paying Clearbit $199 dollars a month to enrich up to 2,500 companies. That is a direct cost you have and is tied to your customer’s usage. If you have a free tier and they are using the most of your Clearbit quota, you may want to reconsider your pricing strategy. Potentially, Clearbit enrichment should be on the paid tiers only to reduce your own cost.

How to monitor API integrations

Monitoring API integrations seems like the correct remedy to stay on top of these issues. However, traditional Application Performance Monitoring (APM) tools like New Relic and AppDynamics focus more on monitoring the health of your own websites and infrastructure. This includes infrastructure metrics like memory usage and requests per minute along with application level health such as appdex scores and latency. Of course, if you’re consuming an API that’s running in someone else’s infrastructure, you can’t just ask your third-party providers to install an APM agent that you have access to. This means you need a way to monitor the third-party APIs indirectly or via some other instrumentation methodology.

#monitoring #api integration #api monitoring #monitoring and alerting #monitoring strategies #monitoring tools #api integrations #monitoring microservices

Google Cloud: Caching Cloud Storage content with Cloud CDN

In this Lab, we will configure Cloud Content Delivery Network (Cloud CDN) for a Cloud Storage bucket and verify caching of an image. Cloud CDN uses Google’s globally distributed edge points of presence to cache HTTP(S) load-balanced content close to our users. Caching content at the edges of Google’s network provides faster delivery of content to our users while reducing serving costs.

For an up-to-date list of Google’s Cloud CDN cache sites, see https://cloud.google.com/cdn/docs/locations.

Task 1. Create and populate a Cloud Storage bucket

Cloud CDN content can originate from different types of backends:

  • Compute Engine virtual machine (VM) instance groups
  • Zonal network endpoint groups (NEGs)
  • Internet network endpoint groups (NEGs), for endpoints that are outside of Google Cloud (also known as custom origins)
  • Google Cloud Storage buckets

In this lab, we will configure a Cloud Storage bucket as the backend.

#google-cloud #google-cloud-platform #cloud #cloud storage #cloud cdn