1678197729
Многофакторная идентификация Azure (MFA) — это функция безопасности, которую Microsoft Azure предоставляет для добавления дополнительного уровня защиты к учетным записям пользователей. Точно так же Google Cloud Platform (GCP) также предоставляет аналогичную функцию под названием «Google Cloud Identity-Aware Proxy (IAP)».
Вот некоторые сходства и различия между Azure MFA и Google Cloud IAP.
Сходства между Azure MFA и Google Cloud IAP
Различия между Azure MFA и Google Cloud IAP
И Azure Multi-Factor Authentication (MFA), и Google Cloud Identity-Aware Proxy (IAP) обеспечивают дополнительный уровень безопасности для учетных записей пользователей, требуя от пользователей предоставления второго фактора аутентификации. Обе службы предлагают ряд методов аутентификации и могут быть настроены в соответствии с требованиями безопасности организации. Однако между этими двумя службами есть некоторые различия, такие как платформы, на которых они предоставляются, типы ресурсов, которые они используют для защиты, и дополнительные функции, которые они предлагают. В конечном счете, организации должны тщательно оценить свои потребности в безопасности и выбрать решение, которое наилучшим образом соответствует их требованиям. Независимо от того, какое решение выбрано, добавление дополнительного уровня безопасности с помощью многофакторной аутентификации является важным шагом в защите от несанкционированного доступа и защите конфиденциальных данных.
Оригинальный источник статьи: https://www.c-sharpcorner.com/
1678190281
Azure 多重身份验证 (MFA) 是 Microsoft Azure 提供的一项安全功能,用于为用户帐户添加额外的保护层。同样,Google Cloud Platform (GCP) 也提供了类似的功能,称为“Google Cloud Identity-Aware Proxy (IAP)”。
以下是 Azure MFA 和 Google Cloud IAP 之间的一些异同:
Azure MFA 和 Google Cloud IAP 之间的相似之处
Azure MFA 和 Google Cloud IAP 之间的区别
Azure 多重身份验证 (MFA) 和 Google Cloud Identity-Aware Proxy (IAP) 都通过要求用户提供第二个身份验证因素来为用户帐户提供额外的安全层。这两种服务都提供了一系列的身份验证方法,并且可以进行定制以满足组织的安全要求。但是,这两种服务之间存在一些差异,例如提供它们的平台、用于保护的资源类型以及它们提供的附加功能。最终,组织应仔细评估其安全需求并选择最适合其要求的解决方案。无论选择哪种解决方案,通过多因素身份验证添加额外的安全层是防止未经授权的访问和保护敏感数据的重要步骤。
文章原文出处:https: //www.c-sharpcorner.com/
1678190040
Azure Multi-Factor Authentication (MFA) is a security feature that Microsoft Azure provides for adding an extra layer of protection to user accounts. Similarly, Google Cloud Platform (GCP) also provides a similar feature called "Google Cloud Identity-Aware Proxy (IAP)".
Here are some similarities and differences between Azure MFA and Google Cloud IAP:
Similarities between Azure MFA and Google Cloud IAP
Differences between Azure MFA and Google Cloud IAP
Both Azure Multi-Factor Authentication (MFA) and Google Cloud Identity-Aware Proxy (IAP) provide an extra layer of security to user accounts by requiring users to provide a second factor of authentication. Both services offer a range of authentication methods and can be customized to meet the security requirements of an organization. However, there are some differences between the two services, such as the platforms they are provided on, the types of resources they are used to secure, and the additional features they offer. Ultimately, organizations should carefully evaluate their security needs and choose the solution that best fits their requirements. Regardless of which solution is chosen, adding an extra layer of security with multi-factor authentication is an important step in protecting against unauthorized access and securing sensitive data.
Original article source at: https://www.c-sharpcorner.com/
1675535700
In this Terraform tutorial we will learn about How to Deploy A Function in Google Cloud with Terraform. We have done it through the GCP's command line utility.
Now, we can create and run the same Cloud Function using Terraform.
When we deployed our function using Google's SDK directly, we had to use a command with several flags that could be grouped together in a deploy.sh
script:
gcloud functions deploy $FN_NAME \
--entry-point=$FN_ENTRY_POINT \
--runtime=nodejs16 \
--region=us-central1 \
--trigger-http \
--allow-unauthenticated
In this script, we are specifying exactly how we want our cloud function to be. The flags specify the entrypoint, the runtime, region, trigger and etc.
One could say we are describing how our infrastructure should be. Exactly what we could do with infrastructure as code - in this case, using Terraform!
main.tf
The main.tf
file is the starting point for Terraform to build and manage your infrastructure.
We can start by adding a provider. A provider is a plugin that lets you use the API operations of a specific cloud provider or service, such as AWS, Google Cloud, Azure etc.
provider "google" {
project = "project_name"
region = "us-central1"
}
But let's think about the following scenario: what if you wanted to create a generic template infrastructure that could be reused for different projects other than project_name
?
Here it comes the tfvars
file: a file in which you can put all your environment variables:
google_project = "project_name"
And now you can use this variable in your main.tf
(you also need to add a block telling Terraform you've declared a variable somewhere else):
variable "google_project_name" {}
provider "google" {
project = "${var.project_name}"
region = "us-central1"
}
Now, let's start to add the infrastructure specific to our project!
resource
sA Terraform resource is a unit of Terraform configuration that represents a real-world infrastructure object, such as an EC2 instance, an S3 bucket, or a virtual network. In our case, we are going to represent a cloud function.
We define these resources in blocks, where we describe the desired state of the resource - including properties such as the type, name, and other configuration options.
Understanding how state works is important because, every time Terraform applies changes to the infrastructure of our projects, it updates resources to match the desired state defined in the Terraform configuration.
resource
?Besides the definition previously mentioned, a Terraform resource is - syntatically - a block compound of three parts:
aws_instance
or google_compute_instance
.Alright. We are getting there.
Let's then create the resource
block for our Google Cloud Function!
Each resource block has its specific properties. You can find them in the docs of the Terraform provider you are using. For example, here is the docs for the cloud function we'll be creating:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/cloudfunctions_function
We can start by defining a few things, such as the name
, description
and runtime
:
resource "google_cloudfunctions_function" "my_function" {
name = "my_function"
description = "the function we are going to deploy"
runtime = "nodejs16"
}
Note: you may have noticed that we are repeating
my_function
twice here.
It happens because we have to set a name for the resource - in this case,my_function
, which is translated togoogle_cloudfunctions_function.my_second_fn
in Terraform - and we also have to set the value of the name field of the block, which is going to be used by Google - not Terraform - to identify your function.
However, even though we know these basic properties of our function, where is the source code? In the previous tutorial, Google SDK was able to look into our root directory to find our index.js
file. But here, we only have a Terraform file which specifies our desired state, but no mentions at all about where to find the source code for our function. Let's fix it.
From the docs, we know we have several ways available to specify in our resource block where to find the source code of our function. Let's do it with a storage bucket.
resource "google_storage_bucket" "source_bucket" {
name = "function-bucket"
location = "us-central1"
}
Now we have a bucket, but we also need a bucket object that stores our source code.
resource "google_storage_bucket_object" "source_code" {
name = "object-name"
bucket = google_storage_bucket.bucket.name
source = "path/to/local/file"
}
Note the source field.
Accordingly to the docs, we need to use a .zip
file to store the source code (as well as other files such as package.json
). We can transform our directory into a zip
file using a data "archive_file"
block:
data "archive_file" "my_function_zip" {
type = "zip"
source_dir = "${path.module}/src"
output_path = "${path.module}/src.zip"
}
path.module
is the filesystem path of the module where the expression is placed.
Therefore, now our main.tf
looks like this:
variable "google_project_name" {}
provider "google" {
project = "${var.google_project_name}"
region = "us-central1"
}
data "archive_file" "my_function_zip" {
type = "zip"
source_dir = "${path.module}/src"
output_path = "${path.module}/src.zip"
}
resource "google_cloudfunctions_function" "my_function" {
name = "myFunction"
description = "the function we are going to deploy"
runtime = "nodejs16"
trigger_http = true
ingress_settings = "ALLOW_ALL"
source_archive_bucket = google_storage_bucket.function_source_bucket.name
source_archive_object = google_storage_bucket_object.function_source_bucket_object.name
}
resource "google_storage_bucket" "function_source_bucket" {
name = "function-bucket-1234"
location = "us-central1"
}
resource "google_storage_bucket_object" "function_source_bucket_object" {
name = "function-bucket-object"
bucket = google_storage_bucket.function_source_bucket.name
source = data.archive_file.my_function_zip.output_path
}
We can deploy! But... There's still some things missing.
Using Google SDK we were able to get the URL of our function - since it has a HTTP trigger. It would be good to get this URL righ away.
Also, we needed to set IAM policies to let everyone trigger our function. How to do something similar in Terraform?
We can fix these things by adding two blocks: one which is for IAM policies and another to display the output - an output block.
In Terraform, an output block is used to define the desired values that should be displayed when Terraform applies changes to infrastructure.
If we runterraform plan
right now, we can see some properties that will be known once the infrastructure is created. Andhttps_trigger_url
is exactly what we are looking for!
output "function_url_trigger" {
value = google_cloudfunctions_function.my_function.https_trigger_url
}
resource "google_cloudfunctions_function_iam_member" "my_second_fn_iam" {
cloud_function = google_cloudfunctions_function.my_function.name
member = "allUsers"
role = "roles/cloudfunctions.invoker"
}
Now, we can run terraform apply
and get, as the output, the URL that triggers our function:
And finally, we can trigger it:
Still feel like you missed something? Take a look on the source code for this tutorial: https://github.com/wrongbyte-lab/tf-gcp-tutorial
Original article sourced at: https://dev.to
1675235760
Migrating data on Google Cloud BigQuery may seem like a straightforward task, until you run into having to match old data to tables with different schemas and data types. There are many approaches you can take to moving data, perhaps using SQL commands to transform the data to be compatible with the new schema.
However, SQL has limitations as a programming language, being a query-centric language for databases, it lacks the capabilities of more general purpose languages such as Python and Java. What if we want more programmatic control over data transformation? Let’s take a look at a simple approach using Apache Beam pipelines written in Python to be run on Google Cloud’s Dataflow service.
Apache Beam is an open source, unified model for defining and executing data processing pipelines using batch processing or streaming. To put it simply, a pipeline is just reading in data from a source, then running it through a series of transformations and processes to an output. In our example, we are simply reading data from a BigQuery table, running each row through the pipeline to match our desired target schema, and writing it to the target BigQuery table.
Beam has SDK’s available in Python, Go, Java, Scala, and SQL and can be run on a variety of platforms such as Apache Spark, Apache Flink, Google Cloud Dataflow, and more. In this example we will be focusing on running our pipeline on Dataflow using a pipeline written in the Apache Beam Python SDK.
Dataflow is a service provided by Google Cloud Platform for executing your data processing pipelines in a distributed workflow, along with a nice graphical interface to see the performance and execution stages of your pipeline in real time.
The approach for the pipeline will be fairly simple. We start with a source table in BigQuery with a certain schema and compare it to the desired schema for our target table. This pipeline will handle adding new columns, removing columns, and modifying data types of existing columns as well as setting default values where appropriate. With these in mind, we then determine the necessary changes to the old schema in order to match the new schema. These changes will be written to a configuration JSON that our pipeline will read and apply transformations to, and from there we write out the transformed data to our new table. If our transformations are correctly defined, we should have the old data in the new table with all our defined modifications.
The configuration JSON for defining transformations will be written in the format of a BigQuery JSON schema (link). With each column being a JSON object, we define only the columns we want to change/add/delete, with two additional fields: “mutation_type” and “default_value.”
“mutation_type” will be either “add,” “modify,” or “delete.”“default_value” will be the value set when adding a new field or modifying an old one, and it won’t be necessary for fields we want to delete.
We will store this configuration JSON in a Google Cloud Storage bucket from which our pipeline will read from. Also be sure to create the source and target tables in BigQuery as well as insert some test data in your source table to migrate.
For a more in-depth dive, the source code is located at https://github.com/tanwesley/Dataflow-BigQuery-Migration
For this demo, we will keep things simple. The following JSON will be our source schema:
Source BigQuery table schema
[
{
"name": "first_name",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "last_name",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "phone_number",
"type": "INTEGER",
"mode": "NULLABLE"
}
]
The following here is the schema we are aiming for:
Target BigQuery table schema
[
{
"name": "last_name",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "phone_number",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "address",
"type": "STRING",
"mode": "NULLABLE"
}
]
Comparing them, we can see we need to:
Based on that, here is what our configuration will be:
[
{
"name": "first_name",
"type": "STRING",
"mode": "REQUIRED",
"mutation_type": "delete"
},
{
"name": "phone_number",
"type": "STRING",
"mode": "NULLABLE",
"mutation_type": "modify",
"default_value": null
},
{
"name": "address",
"type": "STRING",
"mode": "NULLABLE",
"mutation_type": "add",
"default_value": null
}
]
The following code snippet is our pipeline definition contained within the run() method, which lays out each stage of the transformation with comments:
p = beam.Pipeline(options=pipeline_options)
p = beam.Pipeline(options=pipeline_options)
(p | 'Read old table' >> (beam.io.ReadFromBigQuery(table=known_args.input,
gcs_location=temp_location))
| 'Convert to new schema' >> beam.ParDo(OldToNewSchema(update_config))
| 'Write to BigQuery' >> (beam.io.WriteToBigQuery(table=known_args.output,
custom_gcs_temp_location=temp_location))
)
Now let’s take a look at the DoFn OldToNewSchema which does all the actual transformation in the pipeline:
class OldToNewSchema(beam.DoFn):
def __init__(self, config):
self.config = config
def process(self, data, config=None):
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
if config == None:
config = self.config
for c in config:
name = c.get('name')
if c.get('mode') == 'REPEATED':
logger.info(f"DATA: {data}")
logger.info(f"NESTED FIELDS: {data.get(name)}")
nested_fields = data.get(name)
for f in nested_fields:
data.update({ name: self.process(data=f,
config=c.get('fields')) })
logger.info(f"UPDATED DATA: {data}")
else:
mutation_type = c.get('mutation_type')
if mutation_type == 'add':
value = c.get('default_value')
data.update({ name: value })
elif mutation_type == 'modify':
value = self.data_conversion(data.get(name),
data.get('type'), c.get('type'))
data.update({ name: value })
elif mutation_type == 'delete':
data.pop(name)
logger.info(f"FINAL UPDATED DATA: {data}n")
return [data]
def data_conversion(self, data, old_type, new_type):
if new_type == 'STRING':
converted = str(data)
elif new_type == 'BOOLEAN':
if data == 'true' | data == 'True':
converted = True
elif data == 'false' | data == 'False':
converted = False
else:
converted = None
elif new_type == 'INTEGER':
try:
converted = int(data)
except:
converted = None
elif new_type == 'FLOAT':
try:
converted = float(data)
except:
converted = None
return converted
In Apache Beam, a DoFn works like a function applied to each row of data that passes through the pipeline, except we define it as a Python class. In this case our class is OldToNewSchema, which takes in an update configuration in the form of a dictionary we create from our configuration JSON in GCS. We must implement the process() method of the DoFn class to define the logic of our transformations; this is where all the heavy lifting is done.
Now that we know the details of the pipeline’s workflow, all we need to do now is submit a job to Dataflow.
First we must define the arguments we need to submit a job. Open the Google Cloud Console Command Line and configure the following:
EXPORT PROJECT="The Google Cloud project ID where you are working from"
EXPORT INPUT="The BigQuery table from which you wish to migrate from. Format: {project_id}.{dataset_name}.{table_name}"
EXPORT OUTPUT="The BigQuery table where you wish to migrate data to. Format: {project_id}.{dataset_name}.{table_name}"
EXPORT MIGRATE_CONFIG="The GCS path to the migration config JSON file which applies data to new schema."
EXPORT TEMP_LOCATION="The path to a GCS location where data will be temporarily stored to write to BigQuery."
EXPORT REGION="Any available Google Cloud Platform service region of your choice"
Now simply execute the Python file with all the required flags we defined:
python data_migration_test.py --project=PROJECT
--input=INPUT
--migrate_config=MIGRATE_CONFIG
--output=OUTPUT
--temp_location=TEMP_LOCATION
--runner=DataflowRunner
--region=REGION
Navigate to Dataflow on the Google Cloud Console and if all goes well, we should see the job we submitted in the Jobs tab. Give it some time to run and click on the job to observe the execution of the pipeline in real time.
If all goes well, the Dataflow UI console on Google Cloud will show each stage of the pipeline completing and the job will be marked completed. Check your target BigQuery table to see if your old data has successfully migrated to the new schema.
Source BigQuery table
Old data moved to new BigQuery table
Original article source at: https://blog.knoldus.com/
1673919035
This Google Could Platform full course will give you an introduction to Google Cloud Platform and will help you understand various important concepts that concern Cloud Computing and Google Cloud Platform with practical implementation
This Edureka video on 'Google Could Platform Full Course'' will give you an introduction to Google Cloud Platform and will help you understand various important concepts that concern Cloud Computing and Google Cloud Platform with practical implementation. Below are the topics covered in this Google Cloud Platform Tutorial:
#googlecloud #cloud #gcp #cloudcomputing
1670650800
Setup FTP on Google Cloud with VSFTP on Ubuntu 18.04. In this guide you are going to learn how to setup a FTP server and provide access to particular directory as chroot for a user.
This setup is tested on Google Compute Engine VM Instance running Ubuntu 18.04 LTS. This post also works fine for AWS EC2 Instance or DigitalOcean Droplet or Kamatera or Vultr or any other cloud hosting servers or VPS or Dedicated.
If you are using Google Cloud Platform to setup FTP you need the following steps to be done.
SSH to your EC2 Instance and perform the steps listed below.
I assume you have your server setup and configured.
You can configire FTP on any port you wish, now you will configure it in the default port 21, so you need to create a firewall rule to provide access to these ports.
We also open ports 40000 – 50000 for passive mode connections.
Go to VPC Network >> Firewall rules and click Create Firewall rules.
In Name enter ftp
In Targets select All instances in the network
In Source filter select IP ranges
In Source IP ranges enter 0.0.0.0/0
In Protocols and ports check TCP and enter 20, 21, 990, 40000-50000
.
Click Create.
If you are using UFW in your server make sure to open the port to allow connections to your server otherwise you cannot connect.
sudo ufw allow 20/tcp
sudo ufw allow 21/tcp
sudo ufw allow 990/tcp
sudo ufw allow 40000:50000/tcp
Now you can create a new user using the following command to test the FTP.
sudo useradd -m -c "Name, Role" -s /bin/bash username
Setup a password for that user.
sudo passwd username
VSFTP is a Very Secure File Transfer Protocol for Linux based systems. By default AWS or Google Cloud won’t allow password based authentication to the Virtual Machine instances.
With VSFTP you can run your own FTP server and create users and assign them to any directory and prevent access to other directories using chroot also.
Now you can install VSFTP using the following command.
sudo apt install vsftpd
Once the installation is completed you can configure VSFTP.
Start by creating a backup of the original VSFTP configuration file.
sudo cp /etc/vsftpd.conf /etc/vsftpd.conf.orig
Edit the vsftpd.conf
file and make the following changes.
sudo nano /etc/vsftpd.conf
Modify the following directives.
listen=YES
listen_ipv6=NO
Uncomment the following directives.
write_enable=YES
local_umask=022
chroot_local_user=YES
Add these configurations to the last.
seccomp_sandbox=NO
allow_writeable_chroot=YES
userlist_enable=YES
userlist_file=/etc/vsftpd.userlist
userlist_deny=NO
tcp_wrappers=YES
user_sub_token=$USER
user_config_dir=/etc/vsftpd/user_config_dir
pasv_min_port=40000
pasv_max_port=50000
Here you have configured a userlist_file
which holds the list of FTP users and user_config_dir
to hold the user specific configurations.
Add the user you have created before in the userlist
file.
echo "username" | sudo tee -a /etc/vsftpd.userlist
This command will create a file with the name vsftpd.userlist
and add the user to it and outputs the added user in the terminal.
Create a directory with the name user_config_dir
to hold the user specific configurations.
sudo mkdir -p /etc/vsftpd/user_config_dir
Create a new file with the name same as the username inside this directory.
sudo nano /etc/vsftpd/user_config_dir/username
Add the following line to that file.
local_root=/path/to/your/directory
Save the file and exit the editor.
Finally restart VSFTP.
sudo systemctl restart vsftpd
Now you need to prevent SSH access to the newly created user by adding the DenyUsers
directive in your sshd_config
.
sudo nano /etc/ssh/sshd_config
Add the following line to the bottom of the file.
DenyUsers username other-user
You can add multiple users separated by a space.
Restart SSH.
sudo systemctl restart ssh
Prepare yourself for a role working as an Information Technology Professional with Linux operating system
Now open your FTP client and enter your server external IP address as hostname, Port as 21, username with the username you created before and with the password.
Now you will be logged in to the server and you can only access the folder that is assigned to you.
Now you have learned how to setup FTP on your VM instance on Google Cloud Platform.
Thanks for your time. If you face any problem or any feedback, please leave a comment below.
Original article source at: https://www.cloudbooklet.com/
1670646780
How to Install Nginx on CentOS 8 – Google Cloud or AWS. Nginx is high performance light-weight HTTP and reverse proxy web server capable of handling large websites. This guide explains how to install Nginx on CentOS 8.
This tutorial is tested on Google Compute Engine VM Instance running CentOS 8. This setup will also work on other cloud services like AWS, DigitalOcean, etc or any VPS or Dedicated servers.
If you are using Google Cloud then you can learn how to setup CentOS 8 on a Google Compute Engine.
Start by updating the packages to the latest available version using the following commands.
sudo yum update
sudo yum upgrade
Once you have all packages are up to date you can proceed to install Nginx.
Nginx is available in the default CentOS repositories. So it is easy to install Nginx with just a single command.
sudo yum install nginx
Once the installation is completed, you need to start and enable Nginx to start on server boot.
sudo systemctl enable nginx
sudo systemctl start nginx
To verify the status of the Nginx service you can use the status
command.
sudo systemctl status nginx● nginx.service - The nginx HTTP and reverse proxy server
Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2019-11-27 08:34:48 UTC; 1min 1s ago
Process: 823 ExecStart=/usr/sbin/nginx (code=exited, status=0/SUCCESS)
Process: 821 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=0/SUCCESS)
Process: 820 ExecStartPre=/usr/bin/rm -f /run/nginx.pid (code=exited, status=0/SUCCESS)
Main PID: 825 (nginx)
This indicates Nginx is up and running on your CentOS server.
Upon installation Nginx creates predefined rules to accept connections on port 80 for HTTP and 443 for HTTPS
sudo firewall-cmd --permanent --zone=public --add-service=http
sudo firewall-cmd --permanent --zone=public --add-service=https
sudo firewall-cmd --reload
Now you can check your browser by pointing it to your public IP address (http://IP_Adddress
).
You will see the Nginx welcome page.
/etc/nginx
/etc/nginx/nginx.conf
/etc/nginx/conf.d
/var/www/html
/var/log/nginx/error.log
/var/log/nginx/access/log
Prepare yourself for a role working as an Information Technology Professional with Linux operating system
Now you have learned how to install Nginx on your CenOS 8 server on any hosting plaforms.
Thanks for your time. If you face any problem or any feedback, please leave a comment below.
Original article source at: https://www.cloudbooklet.com/
1670609400
Before moving forward to building an Analytics Stack on Google Cloud Platform, let's take a look at what A stack is? While many of us might confuse it with the Data Structure stack but that is not the case here. A stack here can be referred to as a collection of technologies. As technologies and the cloud are being evolved every second, we can integrate technologies/applications into our software/solution stack. Every business is refining its process by incorporating refined software stacks.
Building stack makes it easier to work with components as it brings modularity, increasing composability. Only with the oldest trick in the playbook divide and conquer while building a stack we break down complex components into simpler pieces that can be enhanced by adding other technologies just like the data structure stack!
At the most rudimentary level it is the bridge between raw data. It is the combination of coherent applications that combine and probe to realize the value of data. Let us look at an analogy like water data is necessary and pipelines bring it to your reservoir. As a building needs good plumbing, an organization that envisions to become data-driven and want to tap into this un extracted wealth needs a well-maintain it and have a competitive edge over others. As the most profitable businesses continue to set new benchmarks for productivity and innovation, their rivals, regardless of scale, must adopt analytics to stay competitive. Fortunately, the elements of an analytics stack are getting easier to set up, maintain, and scale at a lower cost.
With the amount of data flowing and the idea of stack add on the weight of maintaining it, it becomes crucial to have it for any company that wants to extract and get real value from their data. However, after working out solutions from existing resources, companies head towards a roadblock where they discover they lack the infrastructure to use their data fruitfully. They might not have the skill set required to analyze the information and change with it effectively. Every module, every component of it requires a uniques skillset. While there are many big sharks in the tank, let us talk about Google to build an analytics stack.
Ingest (app engine, pub/sub, cloud functions) While we Build an analytics stack on the GCP. We can explain components that can helps under different categories
Listed below are the components help for building it on the google cloud platform.
Google Cloud Functions are a serverless environment that enables you to build cloud applications. It is a lightweight compute solution that allows you to build stand-alone serverless applications without any overhead burden to manage the environment or servers. You write simple single-purpose applications run in an event-driven architecture with cloud functions.
It is an asynchronous mange service that enables you to build true event-driven applications by decoupling applications from each other. It helps to ingest data at high speed, high availability in real-time for streaming applications.Pub/Sub by google generally helps with Balancing workload in network cluster, implementing asynchronous workflows, distributing event notifications, Refreshing distributed caches, Logging to multiple systems, Data streaming from various processes or devices, and most importantly, Reliability improvement.
App Engine is a container service on Google's infrastructure available preconfigures with several available runtimes. It enables you to build and deploy load-heavy applications with the ability to process large volumes of data. Application run-in their own, independent containers enabling multi-server access that are easy to scale and have no overhead burden to manage Cloud applications
Capturing, analyzing, and real-time processing data is a tedious task. In addition to it, the data coming might be unstructured or semi-structured that is difficult to process and is not in the apt format required by the dependent downstream applications. GCP provides a solution for this, Dataflow. Data is a stream and batch processing service and is fast and cost-effective. Dataflow helps make the process automate and scale quickly with any managing clusters burden. It is based on a simple source-sink architecture to transform your data. It provides modularity and apache beam SDK that can be developed in languages like python and java.
Analysts and data scientists often fix that the data provided is not ready for immediate use and spend most of their time cleaning data. This is where data prep comes into play by escalating the process and makes business more responsive and data-driven. It is a visual data cleaning service that can visualize data, explore data, clean the data, and prepare data for further use.
Big Query helps in Data Warehouse for building it on google cloud platform.
While we build a stack, the most crucial key feature is data. Storing and querying this data might be time-consuming. The big query is google's data warehouse that solves the problem by incorporating fast SQL queries with the enterprise's reliable infrastructure.
Data Studio helps to build Dashboard on analytics stack on GCP.
Google is a free tool that helps to visualize data easily, informative, and shareable in the form of customizable dashboards. Allow connecting to various data sources. Visualize data with highly configurable charts and tables. Data Studio enables you to share informative insights with the team by speeding up the report creation process. In short, it allows you to narrate your Data through a story.
Stack Driver helps for monitoring the Analytics Stack.
It provides you with powerful monitoring analysis and diagnostics in the google cloud platform. Stack Driver equips you with insights into applications' health and performance, enabling you to find and fix issues faster.
Reporting pattern detections and exhaustion predictions.
Hence for concluding, we can say GCP provides you with a variety of technologies to build up an analytics stack on google cloud platform without the overhead of managing clusters and configurations. You can build an analytics pipeline from scratch from ingesting to ETL to prepare data warehousing to monitor. All can be done at a place with an enterprise's reliability infrastructure.Our Recommendation for your next read
Original article source at: https://www.xenonstack.com/
1670342880
How to Restrict a User to Specific Directory on Linux – Google Cloud. It is necessary to limit user with specific privileges by restricting SSH or allow only to access specific directory.
This guide provides a detailed guide to restrict users to access only a specific directories by modifying the SSH configuration file. This is also known as a chroot jail setup.
This guide is tested on Google Cloud Platform running Ubuntu 20.04 Linux machine. This setup will surely work on AWS, Azure or any cloud or any VPS or dedicated servers running any Linux distributions.
Create a new group to add all users inside this group.
sudo groupadd restriction
Now you can create user or add the existing user to the new restriction group.
If you want to create a new user you ca follow this command.
sudo useradd -g restriction username
-g restriction
will add the user to the restricted group we created above.If you need to prevent shell access you need to use the -s
flag with /bin/false
value which prevents SFTP access. If SFTP is blocked you cannot access the server with SSH keys. In this case you need to setup FTP, to install and configure VSFTP you can follow this setup.
Now here we wont block shell access.
If you need to add the existing user to the group you can use this command.
sudo usermod -g restriction username
You can use the same command to create unlimited users.
Once the user is created and assigned to the group you can configure SSH to limit access to specific directory.
Open the SSH configuration file /etc/ssh/sshd_config
sudo nano /etc/ssh/sshd_config
Go to the bottom of the file to find the line starting with Subsystem sftp /usr/lib/openssh/sftp-server
and replace it with the following.
Subsystem sftp internal-sftp
Finally add the below lines to bottom.
Match user username
ChrootDirectory /path/to/folder
ForceCommand internal-sftp
AllowTcpForwarding no
X11Forwarding no
Hit CTRL + X
followed by Y
and Enter
to save and exit the file.
Now restart the SSH service to apply the changes.
sudo systemctl restart ssh
For CentOS or Fedora you can use the following command to restart the SSH service.
sudo systemctl restart sshd
Once SSH is restarted you can access your instance you will be allowed only to view the directory that you used.
If you don’t have password based authentication enabled you can setup SFTP to access your instance or server and test your configuration using FileZilla or WinSCP or CyberDuck.
You you have your passwords setup you can use these commands to check.
Open a SFTP connection to your server with the sftp
command.
sftp username@IP_ADDRESS
Enter the password you have setup before when prompted.
Now you will be logged in to the server and can see the sftp>
prompt.
Run the pwd
command, if the configuration is working fine you will get the output as /
.
Prepare yourself for a role working as an Information Technology Professional with Linux operating system
Output
sftp> pwd
Remote working directory: /
Now you have learned how to restrict a user to specific directory in Linux.
Thanks for your time. If you face any problem or any feedback, please leave a comment below.
Original article source at: https://www.cloudbooklet.com/
1670332931
Setup Apache Virtual Hosts on Ubuntu 18.04 – Google Cloud. Apache Virtual Hosts allows you to configure your domain name on the server with specific document root, security policies, SSL and much more.
You can also configure more than one website on a single server with multiple Virtual Host configurations.
This setup is tested on an instance running Ubuntu 18.04 with Apache 2 installed on Google Cloud. So, this guide will be useful to configure Virtual hosts on any cloud services like AWS, Azure and any VPS or Dedicated servers.
The document root is the directory where the website files for a domain name are stored. You can set the document root to any location you wish, in this guide you will use the following directory structure.
/var/www/html
├── example.com
│ └── public
├── domain2.com
│ └── public
├── domain3.com
│ └── public
We can create a separate directory for each domain we want to host on our server inside the /var/www/html
directory. Within each of these directories, we will create a public
directory that will store the source code of each websites.
We shall use example.com
domain name as a demo for this setup.
sudo mkdir -p /var/www/example.com/public
Now create a new index.html file inside the document root, that is /var/www/html/example.com/public
Create a demo file using nano editor.
sudo nano /var/www/example.com/public/index.html<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
<meta charset="utf-8">
<title>Welcome to example.com</title>
</head>
<body>
<h1>Success! example.com home page!</h1>
</body>
</html>
As we are running the commands as a sudo
user and the newly created files and directories are owned by the root
user.
So you need to change the permission of the document root directory to the Apache user (www-data
) .
sudo chmod -R 755 /var/www/html/example.com
sudo chown -R www-data:www-data /var/www/html/example.com
By default on Ubuntu systems, Apache Virtual Hosts configuration files are stored in /etc/apache2/sites-available
directory and can be enabled by creating symbolic links to the /etc/apache2/sites-enabled
directory.
Open your editor of choice and create the following basic Virtual Host configuration file
sudo nano /etc/apache2/sites-available/example.conf<VirtualHost *:80>
ServerName example.com
ServerAlias www.example.com
ServerAdmin webmaster@example.com
DocumentRoot /var/www/html/example.com/public
<Directory /var/www/html/example.com/public>
Options -Indexes +FollowSymLinks
AllowOverride All
</Directory>
ErrorLog ${APACHE_LOG_DIR}/example.com-error.log
CustomLog ${APACHE_LOG_DIR}/example.com-access.log combined
</VirtualHost>
ServerName
: This is your domain name.ServerAlias
: All other domains that should match for this virtual host as well, such as the www
subdomain.DocumentRoot
: The directory from which Apache will serve the domain files.Options
: This directive controls which server features are available on the specific directory.-Indexes
: Prevents directory listings.FollowSymLinks
: This option tells your web server to follow the symbolic links.AllowOverride
: Allows to configure and override the settings using .htaccess
fileErrorLog
, CustomLog
: Specifies the location for log files.To enable the new virtual host file we need to create a symbolic link from the virtual host file to the sites-enabled
directory, which is read by apache2 during startup.
The easiest way to enable the virtual host is by using the a2ensite
helper tool.
sudo a2ensite example.conf
Once the configuration is enabled, test the configuration for any syntax errors with the following command.
sudo apachectl configtest
If there are no errors you will see the following output.
Syntax OK
Restart the Apache for the changes to take effect.
sudo service apache2 restart
To verify that everything is working as expected, open http://example.com
in your browser to view the output of the index.html
file created above.
Prepare yourself for a role working as an Information Technology Professional with Linux operating system
Now you have learned how to setup virtual hosts for Apache on Ubuntu 18.04 on Google Cloud Platform
Thanks for your time. If you face any problem or any feedback, please leave a comment below.
Original article source at: https://www.cloudbooklet.com/
1670106900
In this Terraform article, we will learn about using Terraform with Google Cloud Platform. Terraform allows teams to create and maintain reproducible infrastructure using human-readable code. Learn how to use Terraform together with the Google Cloud Platform.
Provisioning Immutable Infrastructure in GCP with Terraform
Infrastructure as code (IaC) is the practice of declaratively deploying infrastructure components (network, virtual machines, load balancers, etc.) using the same DevOps principles you use to develop applications. The same code always generates the same binary: Similarly, the same IaC code always provisions the same infrastructure components—no matter what environment you run it in. Used in conjunction with continuous delivery, IaC is a key DevOps practice.
IaC evolved to avoid environmental drift between different releases. Prior to this, teams maintained the configuration of each environment separately. This caused drifts in the environments over time, leading to inconsistencies among different environments. In turn, these inconsistencies caused issues in deployments and added to the workload of running and maintaining the environments.
IaC tools are both idempotent and declarative, which allows them to provision consistent and immutable infrastructure components, ensuring repeatable deployments and no environmental drifts. Idempotence means that no matter which state you start in, you'll always end up in the same final state. A declarative approach means that you define what the environment should look like, and the IaC tools take care of how to do it. The declarative code is usually written in well-documented code formats, such as JSON or YAML, and follows the same release cycle as application code. If you need to make a change to the infrastructure, you should change the code, rather than the infrastructure components directly.
Terraform by HashiCorp is an IaC tool that allows you to write your infrastructure configuration in human-readable and declarative files. Terraform’s plugin-based architecture helps you manage infrastructure on multiple cloud platforms, and its state management allows you to track changes throughout the deployments.
In this article, you will learn how to provision immutable infrastructure using Terraform on Google Cloud Platform (GCP).
Let's deploy a Cloud Run instance using Terraform. But before getting started, you need to set up gcloud
and terraform
on your system.
To install gcloud
, follow the instructions in the official documentation.
Once installed, authenticate using the command below, then continue to follow the instructions so that Terraform can use the credentials to authenticate.
$ gcloud auth application-default login
Next follow the instructions to install Terraform based on your platform. This demo uses Terraform v1.0.8.
Note: Terraform uses Hashicorp Configuration Language (HCL) to declare the infrastructure components. Terraform source code is written in files ending with a .tf
extension.
Now you’re ready to get started.
First, create a folder for all of your Terraform source code files. Let’s call it gcp-terraform-demo
.
Create a plugins.tf
file, where you will configure Terraform’s GCP plugin.
provider "google" {
project = "YOUR-PROJECT-ID"
region = "europe-west3"
version = "3.65.0"
}
This plugin implements Terraform resources to provision infrastructure components in GCP. You need to configure the Project ID of your GCP project to get started.
Next, create a main.tf
file, in which you will write resources that you want to provision. Start by provisioning a Google Cloud Storage bucket to store the state of your Terraform code.
Add the following resource to the main.tf
file:
resource "google_storage_bucket" "state-bucket" {
name = "terraform-state-bucket-demo"
location = "EU"
versioning {
enabled = true
}
}
Note: The name of a bucket should be globally unique, so you won’t be able to use the same bucket name in your demo.
Now you’re ready to run your code.
First, initialize your code by running the following command:
$ terraform init
This will initialize the backend for state and download the plugins that are defined in the plugins.tf
file.
You will see the following log lines:
Initializing the backend...
Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "google" (hashicorp/google) 3.65.0...
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Next, run the plan
command. The plan
command works by finding the current state of the infrastructure and figuring out what changes need to be applied to reach the desired state.
$ terraform plan -out planfile
You will then see the output below, which contains how many resources need to change. In this case, you only declared one bucket that doesn’t exist, so you see 1 to add.
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# google_storage_bucket.state-bucket will be created
+ resource "google_storage_bucket" "state-bucket" {
+ bucket_policy_only = (known after apply)
+ force_destroy = false
+ id = (known after apply)
+ location = "EU"
+ name = "terraform-state-bucket-demo"
+ project = (known after apply)
+ self_link = (known after apply)
+ storage_class = "STANDARD"
+ uniform_bucket_level_access = (known after apply)
+ url = (known after apply)
+ versioning {
+ enabled = true
}
}
Plan: 1 to add, 0 to change, 0 to destroy.
------------------------------------------------------------------------
This plan was saved to: planfile
To perform exactly these actions, run the following command to apply:
terraform apply "planfile"
You also stored this plan information in a file called planfile
by providing the -out
switch in the plan
command. In the next step, this will allow you to apply the exact changes that your plan
command showed you.
Apply these changes to provision your bucket.
$ terraform apply planfile
You’ll see the following output:
google_storage_bucket.state-bucket: Creating...
google_storage_bucket.state-bucket: Creation complete after 2s [id=terraform-state-bucket-demo]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.
State path: terraform.tfstate
Now go to your Google Cloud Console. You’ll see your bucket there.
Once the changes are applied, go to the gcp-terraform-demo
folder. There, you’ll see a terraform.tfstate
file that was created by applying the changes. This file stores the current state of your infrastructure components, but it’s on your local machine.
If someone else tried to run this code from another machine, they wouldn’t have access to this state, so they’d try to provision the same bucket again. This would fail because a bucket with the same name already exists. You can see the problem...
That’s where Terraform’s remote state comes into play. If you store the state in a GCS bucket (which everyone in your team can access, no matter from where you run your Terraform code), you’ll always start from the same state.
Add a state.tf
file with the following configuration:
terraform {
backend "gcs" {
bucket = "terraform-state-bucket-demo"
prefix = "demo/state"
}
}
Make sure the bucket name is the same as the one you provisioned in the main.tf
file.
Initialize the module again using the terraform init
command. This time, you’ll be asked if a local state already exists. You’ll also be asked if you wish to copy the local state to the remote backend. Type yes
The rest of the initialization will be the same as when you ran the Terraform init
command to initialize the module.
Now go to the Google Cloud Console and navigate to the bucket you created. You’ll see that the terraform.tfstate
file is copied from local machine to the bucket.
If you check your Terraform code into the SCM repository, anyone with access can clone and run it. The benefit of remote state is that it can be shared, so you can collaborate with your team.
With remote state out of the way, let’s move towards provisioning Cloud Run resources.
Add the following resources block to the main.tf
file:
resource "google_cloud_run_service" "nginx-service" {
name = "nginx-service"
location = "europe-west3"
template {
spec {
containers {
image = "marketplace.gcr.io/google/nginx1"
ports {
container_port = 80
}
}
}
}
traffic {
percent = 100
latest_revision = true
}
}
resource "google_cloud_run_service_iam_member" "member" {
location = google_cloud_run_service.nginx-service.location
project = google_cloud_run_service.nginx-service.project
service = google_cloud_run_service.nginx-service.name
role = "roles/run.invoker"
member = "allUsers"
}
Here you are adding two resources types: google_cloud_run_service
and google_cloud_run_service_iam_member
:
google_cloud_run_service
: Defines the Cloud Run service with the usual parameters, such as name
, location
, container image
, and ports
.google_cloud_run_service_iam_member
: Adds a member to Cloud Run Identity and Access Management (IAM). This particular resource allows allUsers (everyone on the Internet) to access your NGINX Cloud Run service.Note: This IAM configuration is sufficient for demo purposes. For production, make sure to narrow down access and only grant access to services which needs it.
Once you follow the plan and apply steps, you should see your nginx-service
in your Cloud Run dashboard. Navigate to the Details
page to get the URL of the service, then try to access it. You should see the NGINX default home page.
Now that you’ve seen how to provision infrastructure with Terraform, let’s look at how you can manage different environments using the same code base by using variables. Variables are placeholders for which you can provide the values at runtime. You can employ variables to use the same code with different variable values and provision infrastructure components in different environments.
Here, you’ll modify your code to use two variables: project
and environment
. You’ll then provision two different sets of Cloud Run services using the same code, but passing in different values.
First, add another file, called variables.tf
, with the following content:
variable "environment" {
type = string
}
variable "project" {
type = string
}
Now update the google_cloud_run_service
resource in main.tf
to use these variables.
resource "google_cloud_run_service" "nginx-service" {
name = "${var.environment}-nginx-service"
location = "europe-west3"
template {
spec {
containers {
image = "marketplace.gcr.io/google/nginx1"
ports {
container_port = 80
}
}
}
}
traffic {
percent = 100
latest_revision = true
}
}
You should also update the plugins.tf
to use the project variable.
provider "google" {
project = var.project
region = "europe-west3"
version = "3.65.0"
}
Now run the plan
command. It should prompt you to get the values for these variables. You can also do this via CLI or a variable file.
var.environment
Enter a value: development
var.project
Enter a value: coder-society
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
google_storage_bucket.state-bucket: Refreshing state... [id=terraform-state-bucket-demo]
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# google_cloud_run_service.nginx-service will be created
+ resource "google_cloud_run_service" "nginx-service" {
+ autogenerate_revision_name = false
+ id = (known after apply)
+ location = "europe-west3"
+ name = "development-nginx-service"
+ project = (known after apply)
+ status = (known after apply)
+ metadata {
+ annotations = (known after apply)
+ generation = (known after apply)
+ labels = (known after apply)
+ namespace = (known after apply)
+ resource_version = (known after apply)
+ self_link = (known after apply)
+ uid = (known after apply)
}
+ template {
+ metadata {
+ annotations = (known after apply)
+ generation = (known after apply)
+ labels = (known after apply)
+ name = (known after apply)
+ namespace = (known after apply)
+ resource_version = (known after apply)
+ self_link = (known after apply)
+ uid = (known after apply)
}
+ spec {
+ container_concurrency = (known after apply)
+ serving_state = (known after apply)
+ timeout_seconds = (known after apply)
+ containers {
+ image = "marketplace.gcr.io/google/nginx1"
+ ports {
+ container_port = 80
}
+ resources {
+ limits = (known after apply)
+ requests = (known after apply)
}
}
}
}
+ traffic {
+ latest_revision = true
+ percent = 100
}
}
# google_cloud_run_service_iam_member.member will be created
+ resource "google_cloud_run_service_iam_member" "member" {
+ etag = (known after apply)
+ id = (known after apply)
+ location = "europe-west3"
+ member = "allUsers"
+ project = (known after apply)
+ role = "roles/run.invoker"
+ service = "development-nginx-service"
}
Plan: 2 to add, 0 to change, 0 to destroy.
------------------------------------------------------------------------
Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.
Next, apply the changes. This will force Terraform to create/update/delete some of the resources to achieve the desired state.
Now you can run the same code with different variable values for environment and project to provision the same resources in different environments.
Once you are done with an environment, you can tear it down just as easily. Simply issue the following command:
$ terraform destroy
You will be asked for all the variable values. When prompted, type yes
. It’s important to note, however, that your state bucket won’t get deleted, and you’ll see an error message, as shown below.
Error: Error trying to delete bucket terraform-state-bucket-demo containing objects without `force_destroy` set to true
This is by design, since you don’t want someone to accidentally destroy the state bucket (that’s why you didn’t set force_destroy to true). All the other resources (Cloud Run service and IAM) will be successfully destroyed.
Now that your infrastructure components are defined via code, you’ll want to apply versioning practices to them—just like you do with software code. You can store your Terraform files in GIT and follow the same branching and versioning strategy that you used for your application code.
As shown earlier, if you add resources or modify the existing resources (in the code), Terraform will automatically detect the changes and do what’s needed to ensure that the final state of the infrastructure looks exactly the same as what was declared in the code.
Let’s say that you already have a lot of resources manually deployed in your Google Cloud. Now, you want to use Terraform to provision any future resources and you want to follow IaC principles. In such cases, you can import your existing cloud resources (which were deployed previously) into Terraform’s purview. Use tools like Terraformer to create the tf
resource files for existing infrastructure resources and import their state.
This tutorial shows you how to start implementing resources from scratch and follows best practices. However, you don't have to start from scratch all the time. Rather, you can use pre-defined Terraform modules that follow Google's best practices, available in the Cloud Foundation Toolkit Github repository. Using these modules will help you get started with Terraform more quickly.
IaC principles allow teams to provision repeatable and immutable infrastructure using DevOps practices. With Terraform and its human-readable configuration language, HCL, you can define the desired state of your infrastructure components and leave the rest up to the tool itself. The Terraform configuration files can be checked in to source control and can follow the same versioning strategy as your application code.
Original article sourced at: https://codersociety.com
1669723441
Cloud Pub/Sub is a message queuing service that allows you to exchange messages between applications and microservices. It’s a scalable, durable, and highly available message-passing system that helps you build event-driven architectures. In this tutorial, we will show you how to use the Google Cloud console to publish and receive messages in Cloud Pub/Sub. We will also provide some tips on how to use the service more effectively.
There are many benefits to using Pub/Sub, including the ability to:
-Easily scale your message processing
-Decouple your message producers from your message consumers
-Guarantee delivery of messages by using Google’s global infrastructure
-Monitor the health of your message processing system with built-in metrics and logs
Pub/Sub is a flexible, reliable, and cost-effective way to process large volumes of messages. it is extremely efficient at moving data between databases and in storage. A large queue of tasks can be easily distributed among multiple devices for the most balanced workload. And because Pub/Sub decouples your message producers from your message consumers, you can add or remove consumers without affecting your producers.
Pub/Sub consists of two different services:
Pub/Sub and Pub/Sub Lite are both horizontally scalable and managed messaging services. As the name suggests pub-sub lite is less powerful than pub-sub and In general pub-sub is the default choice.
The following questions can help you choose the right Pub/Sub messaging service:
Please go through the official document to understand the differences between the two.
First, we need to Set up a Google Cloud console project.
Create a Topic:
Subscribe to the Topic:
Publish a message on the topic:
Pull the data Published:
You’ll be able to see the messages you just published, The messages has the same data and time when the message was published.
Google Cloud Pub/Sub is a great way to receive and publish messages in an asynchronous manner. It’s simple to use and can be easily integrated into your existing applications. In this article, we’ve shown you how to use the Google Cloud console to publish and receive messages in Pub/Sub. Give it a try and see how it can benefit your application development workflow.
Original article source at: https://blog.knoldus.com/
1666061064
In this 𝐆𝐂𝐏 𝐓𝐞𝐫𝐫𝐚𝐟𝐨𝐫𝐦 tutorial will give you an overview of Terraform with Google Cloud Platform and will help you understand various important concepts that concern GCP Terraform with practical implementation. GCP Terraform Tutorial | What Is Terraform | Terraform With Google Cloud Platform
INTRODUCTION
The purpose of this article is to show a full Google Cloud Platform (GCP) environment built using Terraform automation. I’ll walk through the setup process to get Google Cloud Platform and Terraform. I will be creating everything from scratch: VPC network, four sub-networks — two in each region (labeling private and Public), firewall rules allowing HTTP traffic and ssh access, and finally creating two virtual instances one in each sub-network running as a web server.
At the end of my deployment, I will have a Google Cloud Platform (GCP) environment setup with two web servers running in different regions as shown below:
GCP Environment and Terraform directory structure
Let’s get started with defining some terms and technology:
Terraform: a tool used to turn infrastructure development into code.
Google Cloud SDK: command line utility for managing Google Cloud Platform resources.
Google Cloud Platform: cloud-based infrastructure environment.
Google Compute Engine: resource that provides virtual systems to Google Cloud Platform customers.
You might be asking — Why use Terraform?
Terraform is a tool and has become more popular because it has a simple syntax that allows easy modularity and works against multi-cloud. One important reason people consider Terraform is to manage their infrastructure as code.
Installing Terraform:
It is easy to install it, if you haven’t already. I am using Linux:
sudo yum install -y zip unzip (if these are not installed)
wget https://releases.hashicorp.com/terraform/0.X.X/terraform_0.X.X_linux_amd64.zip (replace x with your version)
unzip terraform_0.11.6_linux_amd64.zip
sudo mv terraform /usr/local/bin/
Confirm terraform binary is accessible: terraform — version
Make sure Terraform works:
$ terraform -v
Terraform v0.11.6
Downloading and configuring Google Cloud SDK
Now that we have Terraform installed, we need to set up the command line utility to interact with our services on Google Cloud Platform. This will allow us to authenticate to our account on Google Cloud Platform and subsequently use Terraform to manage infrastructure.
Download and install Google Cloud SDK:
$ curl https://sdk.cloud.google.com | bash
Initialize the gcloud environment:
$ gcloud init
You’ll be able to connect your Google account with the gcloud environment by following the on-screen instructions in your browser. If you’re stuck, try checking out the official documentation.
Configuring our Service Account on Google Cloud Platform
Next, I will create a project, set up a service account and set the correct permissions to manage the project’s resources.
· Create a project and name it whatever you’d like.
· Create a service account and specify the compute admin role.
· Download the generated JSON file and save it to your project’s directory.
TERRAFORM PROJECT FILE STRUCTURE
Terraform elaborates all the files inside the working directory so it does not matter if everything is contained in a single file or divided into many, although it is convenient to organize the resources in logical groups and split them into different files. Let’s look at how we can do this effectively:
Terraform File Structure
Root level: All tf files are contained in GCP folder
main.tf : This is where I execute terraform from. It contains following sections:
a) Provider section: defines Google as the provider
b) Module section: GCP resources that points to each module in module folder
c) Output section: Displaying outputs after Terrafrom apply
variable.tf: This is where I am defining all my variables that goes into main.tf. Modules variable.tf contains static values such as regions other variables that I am passing through main variables.tf.
Only main variable.tf needs to be modified. I kept it simple so I don’t have to modify every variable file under each module.
backend.tf: For capturing and saving tfstate on Google Storage bucket, that I can share with other developers.
Module Folders: I am using three main modules here. Global, ue1 and uc1
* global module has resources that are not region specific such as VPC Network, firewall, rules
* uc1 and ue1 module(s) has resources that are region based. The module creates four sub-networks (two public and two private network) two in each region and creating one instance of each region
Within my directory structure, I have packaged regional-based resources under one module and global resources in a separate module, that way I have to define Variable for a given region, once per module. IAM is another resource that you can define under the global module.
I am running terraform init, plan and apply from main folder where I have defined all GCP resources. I will post another article in the future dedicated to Terraform modules, when & why it is best to use modules and which resources should be packaged in a module.
Main.tf creates all GCP resources that are defined under each module folder. You can see the source is pointing to a relative path with my directory structure. You can also store modules on VCS such as GitHub.
provider "google" {
project = "${var.var_project}"
}
module "vpc" {
source = "../modules/global"
env = "${var.var_env}"
company = "${var.var_company}"
var_uc1_public_subnet = "${var.uc1_public_subnet}"
var_uc1_private_subnet= "${var.uc1_private_subnet}"
var_ue1_public_subnet = "${var.ue1_public_subnet}"
var_ue1_private_subnet= "${var.ue1_private_subnet}"
}
module "uc1" {
source = "../modules/uc1"
network_self_link = "${module.vpc.out_vpc_self_link}"
subnetwork1 = "${module.uc1.uc1_out_public_subnet_name}"
env = "${var.var_env}"
company = "${var.var_company}"
var_uc1_public_subnet = "${var.uc1_public_subnet}"
var_uc1_private_subnet= "${var.uc1_private_subnet}"
}
module "ue1" {
source = "../modules/ue1"
network_self_link = "${module.vpc.out_vpc_self_link}"
subnetwork1 = "${module.ue1.ue1_out_public_subnet_name}"
env = "${var.var_env}"
company = "${var.var_company}"
var_ue1_public_subnet = "${var.ue1_public_subnet}"
var_ue1_private_subnet= "${var.ue1_private_subnet}"
}
######################################################################
# Display Output Public Instance
######################################################################
output "uc1_public_address" { value = "${module.uc1.uc1_pub_address}"}
output "uc1_private_address" { value = "${module.uc1.uc1_pri_address}"}
output "ue1_public_address" { value = "${module.ue1.ue1_pub_address}"}
output "ue1_private_address" { value = "${module.ue1.ue1_pri_address}"}
output "vpc_self_link" { value = "${module.vpc.out_vpc_self_link}"}
Variable.tf
I have used variables for CIDR range for each sub-network, project name. I am also using variables to name resources gcp resources, so that I can easily identify which environment the resource belongs to. All variables are defined in the variables.tf file. Every variable is of type String.
variable "var_project" {
default = "project-name"
}
variable "var_env" {
default = "dev"
}
variable "var_company" {
default = "company-name"
}
variable "uc1_private_subnet" {
default = "10.26.1.0/24"
}
variable "uc1_public_subnet" {
default = "10.26.2.0/24"
}
variable "ue1_private_subnet" {
default = "10.26.3.0/24"
}
variable "ue1_public_subnet" {
default = "10.26.4.0/24"
}
VPC.tf
In the VPC file, I have configured routing-type as global and I have disabled creation of sub-networks (automatically) as GCP creates sub-networks in every region during VPC creation if not disabled. I am also creating and attaching Firewall to the VPC along with firewall rules to allow icmp, tcp and udp ports within internal network and external ssh access to my bastion host.
resource "google_compute_network" "vpc" {
name = "${format("%s","${var.company}-${var.env}-vpc")}"
auto_create_subnetworks = "false"
routing_mode = "GLOBAL"
}
resource "google_compute_firewall" "allow-internal" {
name = "${var.company}-fw-allow-internal"
network = "${google_compute_network.vpc.name}"
allow {
protocol = "icmp"
}
allow {
protocol = "tcp"
ports = ["0-65535"]
}
allow {
protocol = "udp"
ports = ["0-65535"]
}
source_ranges = [
"${var.var_uc1_private_subnet}",
"${var.var_ue1_private_subnet}",
"${var.var_uc1_public_subnet}",
"${var.var_ue1_public_subnet}"
]
}
resource "google_compute_firewall" "allow-http" {
name = "${var.company}-fw-allow-http"
network = "${google_compute_network.vpc.name}"
allow {
protocol = "tcp"
ports = ["80"]
}
target_tags = ["http"]
}
resource "google_compute_firewall" "allow-bastion" {
name = "${var.company}-fw-allow-bastion"
network = "${google_compute_network.vpc.name}"
allow {
protocol = "tcp"
ports = ["22"]
}
target_tags = ["ssh"]
}
Network.tf
In the network.tf file, I have set up public and private sub-network and attaching each sub-network to myVPC. The values for regions are coming out of variables.tf files defined within each sub-module folder (not shown here). I have two network.tf files one each module folder, the difference between the two is region us-east vs us-central.
resource "google_compute_subnetwork" "public_subnet" {
name = "${format("%s","${var.company}-${var.env}-${var.region_map["${var.var_region_name}"]}-pub-net")}"
ip_cidr_range = "${var.var_uc1_public_subnet}"
network = "${var.network_self_link}"
region = "${var.var_region_name}"
}
resource "google_compute_subnetwork" "private_subnet" {
name = "${format("%s","${var.company}-${var.env}-${var.region_map["${var.var_region_name}"]}-pri-net")}"
ip_cidr_range = "${var.var_uc1_private_subnet}"
network = "${var.network_self_link}"
region = "${var.var_region_name}"
}
Instance.tf
Here, I am creating a Ubuntu virtual machine instance and a network interface within the sub-network and then I am attaching the network interface to the instance. I am also running a userdata script which installs nginx as part of the instance creation and boot. I have two interface.tf files one each module folder, the difference between the two is region us-east vs us-central.
resource "google_compute_instance" "default" {
name = "${format("%s","${var.company}-${var.env}-${var.region_map["${var.var_region_name}"]}-instance1")}"
machine_type = "n1-standard-1"
#zone = "${element(var.var_zones, count.index)}"
zone = "${format("%s","${var.var_region_name}-b")}"
tags = ["ssh","http"]
boot_disk {
initialize_params {
image = "centos-7-v20180129"
}
}
labels {
webserver = "true"
}
metadata {
startup-script = <<SCRIPT
apt-get -y update
apt-get -y install nginx
export HOSTNAME=$(hostname | tr -d '\n')
export PRIVATE_IP=$(curl -sf -H 'Metadata-Flavor:Google' http://metadata/computeMetadata/v1/instance/network-interfaces/0/ip | tr -d '\n')
echo "Welcome to $HOSTNAME - $PRIVATE_IP" > /usr/share/nginx/www/index.html
service nginx start
SCRIPT
}
network_interface {
subnetwork = "${google_compute_subnetwork.public_subnet.name}"
access_config {
// Ephemeral IP
}
}
}
$ Terraform init
Initializing modules...
- module.vpc
- module.uc1
- module.ue1
Initializing provider plugins...
The following providers do not have any version constraints in configuration,
so the latest version was installed.
To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.
* provider.google: version = "~> 1.20"
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Terraform Plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
+ module.uc1.google_compute_instance.default
id: <computed>
boot_disk.#: "1"
boot_disk.0.auto_delete: "true"
boot_disk.0.device_name: <computed>
boot_disk.0.disk_encryption_key_sha256: <computed>
boot_disk.0.initialize_params.#: "1"
boot_disk.0.initialize_params.0.image: "debian-9-stretch-v20180227"
boot_disk.0.initialize_params.0.size: <computed>
boot_disk.0.initialize_params.0.type: <computed>
can_ip_forward: "false"
cpu_platform: <computed>
create_timeout: "4"
deletion_protection: "false"
guest_accelerator.#: <computed>
instance_id: <computed>
label_fingerprint: <computed>
labels.%: "1"
labels.webserver: "true"
machine_type: "n1-standard-1"
metadata_fingerprint: <computed>
name: "company-dev-uc1-instance1"
network_interface.#: "1"
network_interface.0.access_config.#: "1"
network_interface.0.access_config.0.assigned_nat_ip: <computed>
network_interface.0.access_config.0.nat_ip: <computed>
network_interface.0.access_config.0.network_tier: <computed>
network_interface.0.address: <computed>
network_interface.0.name: <computed>
network_interface.0.network_ip: <computed>
network_interface.0.subnetwork: "company-dev-uc1-pub-net"
network_interface.0.subnetwork_project: <computed>
project: <computed>
scheduling.#: <computed>
self_link: <computed>
tags.#: "2"
tags.2541227442: "http"
tags.4002270276: "ssh"
tags_fingerprint: <computed>
zone: "us-central1-a"
+ module.uc1.google_compute_subnetwork.private_subnet
id: <computed>
creation_timestamp: <computed>
fingerprint: <computed>
gateway_address: <computed>
ip_cidr_range: "10.26.1.0/24"
name: "company-dev-uc1-pri-net"
network: "${var.network_self_link}"
project: <computed>
region: "us-central1"
secondary_ip_range.#: <computed>
self_link: <computed>
+ module.uc1.google_compute_subnetwork.public_subnet
id: <computed>
creation_timestamp: <computed>
fingerprint: <computed>
gateway_address: <computed>
ip_cidr_range: "10.26.2.0/24"
name: "company-dev-uc1-pub-net"
network: "${var.network_self_link}"
project: <computed>
region: "us-central1"
secondary_ip_range.#: <computed>
self_link: <computed>
+ module.ue1.google_compute_instance.default
id: <computed>
boot_disk.#: "1"
boot_disk.0.auto_delete: "true"
boot_disk.0.device_name: <computed>
boot_disk.0.disk_encryption_key_sha256: <computed>
boot_disk.0.initialize_params.#: "1"
boot_disk.0.initialize_params.0.image: "centos-7-v20180129"
boot_disk.0.initialize_params.0.size: <computed>
boot_disk.0.initialize_params.0.type: <computed>
can_ip_forward: "false"
cpu_platform: <computed>
create_timeout: "4"
deletion_protection: "false"
guest_accelerator.#: <computed>
instance_id: <computed>
label_fingerprint: <computed>
labels.%: "1"
labels.webserver: "true"
machine_type: "n1-standard-1"
metadata_fingerprint: <computed>
name: "company-dev-ue1-instance1"
network_interface.#: "1"
network_interface.0.access_config.#: "1"
network_interface.0.access_config.0.assigned_nat_ip: <computed>
network_interface.0.access_config.0.nat_ip: <computed>
network_interface.0.access_config.0.network_tier: <computed>
network_interface.0.address: <computed>
network_interface.0.name: <computed>
network_interface.0.network_ip: <computed>
network_interface.0.subnetwork: "company-dev-ue1-pub-net"
network_interface.0.subnetwork_project: <computed>
project: <computed>
scheduling.#: <computed>
self_link: <computed>
tags.#: "2"
tags.2541227442: "http"
tags.4002270276: "ssh"
tags_fingerprint: <computed>
zone: "us-east1-b"
+ module.ue1.google_compute_subnetwork.private_subnet
id: <computed>
creation_timestamp: <computed>
fingerprint: <computed>
gateway_address: <computed>
ip_cidr_range: "10.26.3.0/24"
name: "company-dev-ue1-pri-net"
network: "${var.network_self_link}"
project: <computed>
region: "us-east1"
secondary_ip_range.#: <computed>
self_link: <computed>
+ module.ue1.google_compute_subnetwork.public_subnet
id: <computed>
creation_timestamp: <computed>
fingerprint: <computed>
gateway_address: <computed>
ip_cidr_range: "10.26.4.0/24"
name: "company-dev-ue1-pub-net"
network: "${var.network_self_link}"
project: <computed>
region: "us-east1"
secondary_ip_range.#: <computed>
self_link: <computed>
+ module.vpc.google_compute_firewall.allow-bastion
id: <computed>
allow.#: "1"
allow.803338340.ports.#: "1"
allow.803338340.ports.0: "22"
allow.803338340.protocol: "tcp"
creation_timestamp: <computed>
destination_ranges.#: <computed>
direction: <computed>
name: "company-fw-allow-bastion"
network: "company-dev-vpc"
priority: "1000"
project: <computed>
self_link: <computed>
source_ranges.#: <computed>
target_tags.#: "1"
target_tags.4002270276: "ssh"
+ module.vpc.google_compute_firewall.allow-http
id: <computed>
allow.#: "1"
allow.272637744.ports.#: "1"
allow.272637744.ports.0: "80"
allow.272637744.protocol: "tcp"
creation_timestamp: <computed>
destination_ranges.#: <computed>
direction: <computed>
name: "company-fw-allow-http"
network: "company-dev-vpc"
priority: "1000"
project: <computed>
self_link: <computed>
source_ranges.#: <computed>
target_tags.#: "1"
target_tags.2541227442: "http"
+ module.vpc.google_compute_firewall.allow-internal
id: <computed>
allow.#: "3"
allow.1367131964.ports.#: "0"
allow.1367131964.protocol: "icmp"
allow.2250996047.ports.#: "1"
allow.2250996047.ports.0: "0-65535"
allow.2250996047.protocol: "tcp"
allow.884285603.ports.#: "1"
allow.884285603.ports.0: "0-65535"
allow.884285603.protocol: "udp"
creation_timestamp: <computed>
destination_ranges.#: <computed>
direction: <computed>
name: "company-fw-allow-internal"
network: "company-dev-vpc"
priority: "1000"
project: <computed>
self_link: <computed>
source_ranges.#: "4"
source_ranges.1778211439: "10.26.2.0/24"
source_ranges.2728495562: "10.26.3.0/24"
source_ranges.3215243634: "10.26.4.0/24"
source_ranges.4016646337: "10.26.1.0/24"
+ module.vpc.google_compute_network.vpc
id: <computed>
auto_create_subnetworks: "false"
gateway_ipv4: <computed>
name: "company-dev-vpc"
project: <computed>
routing_mode: "GLOBAL"
self_link: <computed>
Plan: 10 to add, 0 to change, 0 to destroy.
------------------------------------------------------------------------
Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.
Terraform apply Outputs
Output from Terraform apply
Google Console Output Screenshots:
GCP Network
GCP Instance dashboard
NGINX installed using metadata
Terraform destroy output
Terraform is great because of its vibrant open source community, its simple module paradigm & the fact that it’s cloud agnostic. However, there are limitations with their open source tool.
Terraform Enterprise (TFE) edition provides a host of additional features and functionality that solves open source issues and enable enterprises to effectively scale Terraform implementations across the organization — unlocking infrastructure bottlenecks and freeing up developers to innovate, rather than configure servers!
#terraform #gcp #googlecloud #cloudcomputing
1666060302
In this Google Could Platform tutorial will give you an introduction to Google Cloud Platform and will help you understand various important concepts that concern Cloud Computing and Google Cloud Platform with practical implementation. Google Cloud Platform Tutorial | What is Google Cloud Platform | GCP Training
Following pointers are covered in this Google Cloud Platform Tutorial:
Do you have the knowledge and skills to design a mobile gaming analytics platform that collects, stores, and analyzes large amounts of bulk and real-time data?
Well, after reading this article, you will.
I aim to take you from zero to hero in Google Cloud Platform (GCP) in just one article. I will show you how to:
Once I have explained all the topics in this list, I will share with you a solution to the system I described.
If you do not understand some parts of it, you can go back to the relevant sections. And if that is not enough, visit the links to the documentation that I have provided.
Are you up for a challenge? I have selected a few questions from old GCP Professional Certification exams. They will test your understanding of the concepts explained in this article.
I recommend trying to solve both the design and the questions on your own, going back to the guide if necessary. Once you have an answer, compare it to the proposed solution.
Try to go beyond what you are reading and ask yourself what would happen if requirement X changed:
And any other scenarios you can think of.
At the end of the day, you are not paid just for what you know but for your thought process and the decisions you make. That is why it is vitally important that you exercise this skill.
At the end of the article, I'll provide more resources and next steps if you want to continue learning about GCP.
GCP currently offers a 3 month free trial with $300 US dollars of free credit. You can use it to get started, play around with GCP, and run experiments to decide if it is the right option for you.
You will NOT be charged at the end of your trial. You will be notified and your services will stop running unless you decide to upgrade your plan.
I strongly recommend using this trial to practice. To learn, you have to try things on your own, face problems, break things, and fix them. It doesn't matter how good this guide is (or the official documentation for that matter) if you do not try things out.
Consuming resources from GCP, like storage or computing power, provides the following benefits:
GCP makes it easy to experiment and use the resources you need in an economical way.
In general, you will only be charged for the time your instances are running. Google will not charge you for stopped instances. However, if they consume resources, like disks or reserved IPs, you might incur charges.
Here are some ways you can optimize the cost of running your applications in GCP.
GCP provides different machine families with predefined amounts of RAM and CPUs:
Besides, you can create your custom machine with the amount of RAM and CPUs you need.
You can use preemptible virtual machines to save up to 80% of your costs. They are ideal for fault-tolerant, non-critical applications. You can save the progress of your job in a persistent disk using a shut-down script to continue where you left off.
Google may stop your instances at any time (with a 30-second warning) and will always stop them after 24 hours.
To reduce the chances of getting your VMs shut down, Google recommends:
Note: Start-up and shut-down scripts apply to non-preemptible VMS as well. You can use them the control the behavior of your machine when it starts or stops. For instance, to install software, download data, or backup logs.
There are two options to define these scripts:
This latter is preferred because it is easier to create many instances and to manage the script.
The longer you use your virtual machines (and Cloud SQL instances), the higher the discount - up to 30%. Google does this automatically for you.
You can get up to 57% discount if you commit to a certain amount of CPU and RAM resources for a period of 1 to 3 years.
To estimate your costs, use the Price Calculator. This helps prevent any surprises with your bills and create budget alerts.
In this section, I will explain how you can manage and administer your Google Cloud resources.
There are four types of resources that can be managed through Resource Manager:
There are quotas that limit the maximum number of resources you can create to prevent unexpected spikes in billing. However, most quotas can be increased by opening a support ticket.
Resources in GCP follow a hierarchy via a parent/child relationship, similar to a traditional file system, where:
This hierarchical organization helps you manage common aspects of your resources, such as access control and configuration settings.
You can create super admin accounts that have access to every resource in your organization. Since they are very powerful, make sure you follow Google's best practices.
Labels are key-value pairs you can use to organize your resources in GCP. Once you attach a label to a resource (for instance, to a virtual machine), you can filter based on that label. This is useful also to break down your bills by labels.
Some common use cases:
These two similar concepts seem to generate some confusion. I have summarized the differences in this table:
LABELS | NETWORK TAGS |
---|---|
Applied to any GCP resource | Applied only for VPC resources |
Just organize resources | Affect how resources work (ex: through application of firewall rules) |
Simply put, Cloud IAM controls who can do what on which resource. A resource can be a virtual machine, a database instance, a user, and so on.
It is important to notice that permissions are not directly assigned to users. Instead, they are bundled into roles, which are assigned to members. A policy is a collection of one or more bindings of a set of members to a role.
In a GCP project, identities are represented by Google accounts, created outside of GCP, and defined by an email address (not necessarily @gmail.com). There are different types:
Regarding service accounts, some of Google's best practices include:
A role is a collection of permissions. There are three types of roles:
When assigning roles, follow the principle of least privilege, too. In general, prefer predefined over primitive roles.
Cloud Deployment Manager automates repeatable tasks like provisioning, configuration, and deployments for any number of machines.
It is Google's Infrastructure as Code service, similar to Terraform - although you can deploy only GCP resources. It is used by GCP Marketplace to create pre-configured deployments.
You define your configuration in YAML files, listing the resources (created through API calls) you want to create and their properties. Resources are defined by their name (VM-1, disk-1), type (compute.v1.disk, compute.v1.instance) and properties (zone:europe-west4, boot:false).
To increase performance, resources are deployed in parallel. Therefore you need to specify any dependencies using references. For instance, do not create virtual machine VM-1 until the persistent disk disk-1 has been created. In contrast, Terraform would figure out the dependencies on its own.
You can modularize your configuration files using templates so that they can be independently updated and shared. Templates can be defined in Python or Jinja2. The contents of your templates will be inlined in the configuration file that references them.
Deployment Manager will create a manifest containing your original configuration, any templates you have imported, and the expanded list of all the resources you want to create.
Operations provide a set of tools for monitoring, logging, debugging, error reporting, profiling, and tracing of resources in GCP (AWS and even on-premise).
Cloud Logging is GCP's centralized solution for real-time log management. For each of your projects, it allows you to store, search, analyze, monitor, and alert on logging data:
Logs are a named collection of log entries. Log entries record status or events and includes the name of its log, for example, compute.googleapis.com/activity. There are two main types of logs:
First, User Logs:
Second, Security logs, divided into:
They are specific to VPC networks (which I will introduce later). VPC flow logs record a sample of network flows sent from and received by VM instances, which can be later access in Cloud Logging.
They can be used to monitor network performance, usage, forensics, real-time security analysis, and expense optimization.
Note: you may want to log your billing data for analysis. In this case, you do not create a sink. You can directly export your reports to BigQuery.
Cloud Monitoring lets you monitor the performance of your applications and infrastructure, visualize it in dashboards, create uptime checks to detect resources that are down and alert you based on these checks so that you can fix problems in your environment. You can monitor resources in GCP, AWS, and even on-premise.
It is recommended to create a separate project for Cloud Monitoring since it can keep track of resources across multiple projects.
Also, it is recommended to install a monitoring agent in your virtual machines to send application metrics (including many third-party applications) to Cloud Monitoring. Otherwise, Cloud Monitoring will only display CPU, disk traffic, network traffic, and uptime metrics.
To receive alerts, you must declare an alerting policy. An alerting policy defines the conditions under which a service is considered unhealthy. When the conditions are met, a new incident will be created and notifications will be sent (via email, Slack, SMS, PagerDuty, etc).
A policy belongs to an individual workspace, which can contain a maximum of 500 policies.
Trace helps find bottlenecks in your services. You can use this service to figure out how long it takes to handle a request, which microservice takes the longest to respond, where to focus to reduce the overall latency, and so on.
It is enabled by default for applications running on Google App Engine (GAE) - Standard environment - but can be used for applications running on GCE, GKE, and Google App Engine Flexible.
Error Reporting will aggregate and display errors produced in services written in Go, Java, Node.js, PHP, Python, Ruby, or .NET. running on GCE, GKE, GAP, Cloud Functions, or Cloud Run.
Debug lets you inspect the application's state without stopping your service. Currently supported for Java, Go, Node.js and Python. It is automatically integrated with GAE but can be used on GCE, GKE, and Cloud Run.
Profiler that continuously gathers CPU usage and memory-allocation information from your applications. To use it, you need to install a profiling agent.
In this section, I will cover both Google Cloud Storage (for any type of data, including files, images, video, and so on), the different database services available in GCP, and how to decide which storage option works best for you.
GCS is Google's storage service for unstructured data: pictures, videos, files, scripts, database backups, and so on.
Objects are placed in buckets, from which they inherit permissions and storage classes.
Storage classes provide different SLAs for storing your data to minimize costs for your use case. A bucket's storage class can be changed (under some restrictions), but it will affect new objects added to the bucket only.
In addition to Google's console, you can interact with GCS from your command line, using gsutil. You can use specify:
Another option to upload files to GCS is Storage Transfer Service (STS), a service that imports data to a GCS bucket from:
If you need to upload huge amounts of data (from hundreds of terabytes up to one petabyte) consider Data Transfer Appliance: ship your data to a Google facility. Once they have uploaded the data to GCS, the process of data rehydration reconstitutes the files so that they can be accessed again.
You can define rules that determine what will happen to an object (will it be archived or deleted) when a certain condition is met.
For example, you could define a policy to automatically change the storage class of an object from Standard to Nearline after 30 days and to delete it after 180 days.
This is the way a rule can be defined:
{
"lifecycle":{
"rule":[
{
"action":{
"type":"Delete"
},
"condition":{
"age":30,
"isLive":true
}
},
{
"action":{
"type":"Delete"
},
"condition":{
"numNewerVersions":2
}
},
{
"action":{
"type":"Delete"
},
"condition":{
"age":180,
"isLive":false
}
}
]
}
}
It will be applied through gsutils or a REST API call. Rules can be created also through the Google Console.
In addition to IAM roles, you can use Access Control Lists (ACLs) to manage access to the resources in a bucket.
Use IAM roles when possible, but remember that ACLs grant access to buckets and individual objects, while IAM roles are project or bucket wide permissions. Both methods work in tandem.
To grant temporary access to users outside of GCP, use Signed URLs.
Bucket locks allow you to enforce a minimum retention period for objects in a bucket. You may need this for auditing or legal reasons.
Once a bucket is locked, it cannot be unlocked. To remove, you need to first remove all objects in the bucket, which you can only do after they all have reached the retention period specified by the retention policy. Only then, you can delete the bucket.
You can include the retention policy when you are creating the bucket or add a retention policy to an existing bucket (it retroactively applies to existing objects in the bucket too).
Fun fact: the maximum retention period is 100 years.
Cloud SQL and Cloud Spanner are two managed database services available in GCP. If you do not want to deal with all the work necessary to maintain a database online, they are a great option. You can always spin a virtual machine and manage your own database.
Cloud SQL provides access to a managed MySQL or PostgreSQL database instance in GCP. Each instance is limited to a single region and has a maximum capacity of 30 TB.
Google will take care of the installation, backups, scaling, monitoring, failover, and read replicas. For availability reasons, replicas must be defined in the same region but a different zone from the primary instances.
Data can be easily imported (first uploading the data to Google Cloud Storage and then to the instance) and exported using SQL dumps or CSV files format. Data can be compressed to reduce costs (you can directly import .gz files). For "lift and shift" migrations, this is a great option.
If you need global availability or more capacity, consider using Cloud Spanner.
Cloud Spanner is globally available and can scale (horizontally) very well.
These two features make it capable of supporting different use cases than Cloud SQL and more expensive too. Cloud Spanner is not an option for lift and shift migrations.
Similarly, GCP provides two managed NoSQL databases, Bigtable and Datastore, as well as an in-memory database service, Memorystore.
Datastore is a completely no-ops, highly-scalable document database ideal for web and mobile applications: game states, product catalogs, real-time inventory, and so on. It's great for:
By default, Datastore has a built-in index that improves performance on simple queries. You can create your own indices, called composite indexes, defined in YAML format.
If you need extreme throughput (huge number of reads/writes per second), use Bigtable instead.
Bigtable is a NoSQL database ideal for analytical workloads where you can expect a very high volume of writes, reads in the milliseconds, and the ability to store terabytes to petabytes of information. It's great for:
Bigtable requires the creation and configuration of your nodes (as opposed to the fully-managed Datastore or BigQuery). You can add or remove nodes to your cluster with zero downtime. The simplest way to interact with Bigtable is the command-line tool cbt.
Bigtable's performance will depend on the design of your database schema.
Since this topic is worth an article on its own, I recommend you read the documentation.
It provides a managed version of Redis and Memcache (in-memory databases), resulting in very fast performance. Instances are regional, like Cloud SQL, and have a capacity of up to 300 GB.
Google loves decision trees. This one will help you choose the right database your your projects. For unstructured data consider GCS or process it using Dataflow (discussed later).
You can use the same network infrastructure that Google uses to run its services: YouTube, Search, Maps, Gmail, Drive, and so on.
Google infrastructure is divided into:
GCP infrastructure is designed in a way that all traffic between regions travels through a global private network, resulting in better security and performance.
On top of this infrastructure, you can build networks for your resources, Virtual Private Clouds. They are software-defined networks, where all the traditional network concepts apply:
You can create hybrid networks connecting your on-premise infrastructure to your VPC.
When you create a project, a default network will be created with subnets in each region (auto mode). You can delete this network, but you need to create at least one network to be able to create virtual machines.
You can also create your custom networks, where no subnets are created by default and you have full control over subnet creation (custom mode).
The main goal of a VPC is the separation of network resources. A GCP project is a way to organize resources and manage permissions.
Users of project A need permissions to access resources in project B. All users can access any VPC defined in any project to which they belong. Within the same VPC, resources in subnet 1 need to be granted access to resources in subnet 2.
In terms of IAM roles, there is a distinction between who can create network resources (Network admin, to create subnets, virtual machines, and so on) and who is responsible for the security of the resources (Security Admin, to create firewall rules, SSL certificates, and so on).
The Compute Instance Admin role combines both roles.
As usual, there are quotas and limits to what you can do in a VPC, amongst them:
Shared VPCs are a way to share resources between different projects within the same organization. This allows you to control billing and manage access to the resources in different projects, following the principle of least privilege. Otherwise you'd have to put all the resources in a single project.
To design a shared VPC, projects fall under three categories:
You will only be able to communicate between resources created after you define your host and service projects. Any existing resources before this will not be part of the shared VPC.
Shared VPCs can be used when all the projects belong to the same organization. However, if:
VPC Network peering is the right solution.
In the next section, I will discuss how to connect your VPC(s) with networks outside of GCP.
There are three options to connect your on-premise infrastructure to GCP:
Each of them with different capabilities, use cases, and prices that I will describe in the following sections.
With Cloud VPN, your traffic travels through the public internet over an encrypted tunnel. Each tunnel has a maximum capacity of 3 Gb per second and you can use a maximum of 8 for better performance. These two characteristics make VPN the cheapest option.
You can define two types of routes between your VPC and your on-premise networks:
Your traffic gets encrypted and decrypted by VPN Gateways (in GCP, they are regional resources).
To have a more robust connection, consider using multiple VPN gateways and tunnels. In case of failure, this redundancy guarantees that traffic will still flow.
With Cloud VPN, traffic travels through the public internet. With Cloud Interconnect, there is a direct physical connection between your on-premises network and your VPC. This option will be more expensive but will provide the best performance.
There are two types of interconnect available, depending on how you want your connection to GCP to materialize:
Cloud peering is not a GCP service, but you can use it to connect your network to Google's network and access services like Youtube, Drive, or GCP services.
A common use case is when you need to connect to Google but don't want to do it over the public internet.
In GCP, load balancers are pieces of software that distribute user requests among a group of instances.
A load balancer may have multiple backends associated with it, having rules to decide the appropriate backend for a given request.
There are different types of load balancers. They differ in the type of traffic (HTTP vs TCP/UDP - Layer 7 or Layer 4), whether they handle external or internal traffic, and whether their scope is regional or global:
For the visual learners:
Cloud DNS is Google's managed Domain Name System (DNS) host, both for internal and external (public) traffic. It will map URLs like https://www.freecodecamp.org/ to an IP address. It is the only service in GCP with 100% SLA - it is available 100% of the time.
Cloud DNS is Google's Content Delivery Network. If you have data that does not change often (images, videos, CSS, etc.) it makes sense to cache it close to your users. Cloud CDN provides 90 Edges Point of Presence (POP) to cache the data close to your end-users.
After the first request, static data can be stored in a POP, usually much closer to your user than your main servers. Thus, in subsequent requests, you can retrieve the data faster from the POP and reduce the load on your backend servers.
I will present 4 places where your code can run in GCP:
Note: there is a 5th option: Firebase is Google's mobile platform that helps you quickly develop apps.
Compute engine allows you to spin up virtual machines in GCP. This section will be longer since GCE provides the infrastructure where GKE and GAE run.
In the introduction, I talked about the different types of VMs you can create in GCE. Now, I will cover where to store the data, how to back it up, and how to create instances with all the data and configuration you need.
Your data can be stored in Persistent disks, Local SSDs, or in Cloud Storage.
Persistent disks provide durable and reliable block storage. They are not local to the machine. Rather, they are networked attached, which has its pros and cons:
Every instance will need one boot disk and it must be of this type.
Local SSDs are attached to a VM to which they provide high-performance ephemeral storage. As of now, you can attach up to eight 375GB local SSDs to the same instance. However, this data will be lost if the VM is killed.
Local SSDs can only be attached to a machine when it is created, but you can attach both local SSDs and persistent disks to the same machine.
Both types of disks are zonal resources.
Cloud Storage
We have extensively covered GCS in a previous section. GCS is not a filesystem, but you can use GCS-Fuse to mount GCS buckets as filesystems in Linux or macOS systems. You can also let apps download and upload data to GCS using standard filesystem semantics.
Snapshots are backups of your disks. To reduce space, they are created incrementally:
This is enough to restore the state of your disk.
Even though snapshots can be taken without stopping the instance, it is best practice to at least reduce its activity, stop writing data to disk, and flush buffers. This helps you make sure you get an accurate representation of the content of the disk.
Images refer to the operating system images needed to create boot disks for your instances. There are two types of images:
You might be asking yourself what is the difference between an image and a snapshot. Mainly, their purpose. Snapshots are taken as incremental backups of a disk while images are created to spin up new virtual machines and configure instance templates.
Note on images vs startup scripts:
For simple setups, startup scripts are also an option. They can be used to test changes quickly, but the VMs will take longer to be ready compared to using an image where all the needed software is installed, configured, and so on.
Instance groups let you treat a group of instances as a single unit and they come in two flavors:
To create a MIGs, you need to define an instance template, specifying your machine type, zone, OS image, startup and shutdown scripts, and so on. Instance templates are immutable.
To update a MIG, you need to create a new template and use the Managed Instance Group Updated to deploy the new version to every machine in the group.
This functionality can be used to create canary tests, deploying your changes to a small fraction of your machines first.
Visit this link to know more about Google's recommendations to ensure an application deployed via a managed instance group can handle the load even if an entire zone fails.
To increase the security of your infrastructure in GCE, have a look at:
App Engine is a great choice when you want to focus on the code and let Google handle your infrastructure. You just need to choose the region where your app will be deployed (this cannot be changed once it is set). Amongst its main use cases are websites, mobile apps, and game backends.
You can easily update the version of your app that is running via the command line or the Google Console.
Also, if you need to deploy a risky update to your application, you can split the traffic between the old and the risky versions for a canary deployment. Once you are happy with the results, you can route all the traffic to the new version.
There are two App Engine environments:
Regardless of the environment, there are no up-front costs and you only pay for what you use (billed per second).
Memcache is a built-in App Engine, giving you the possibility to choose between a shared cache (default, free option) or a dedicated cache for better performance.
Visit this link to know more about the best practices you should follow to maximize the performance of your app.
Kubernetes is an open-source container orchestration system, developed by Google.
Kubernetes is a very extensive topic in itself and I will not cover here. You just need to know that GKE makes it easy to run and manage your Kubernetes clusters on GCP.
Google also provides Container Registry to store your container images - think of it as your private Docker Hub.
Note: You can use Cloud Build to run your builds in GCP and, among other things, produce Docker images and store them in Container Registry. Cloud Build can import your code from Google Cloud Storage, Cloud Source Repository, GitHub, or Bitbucket.
Cloud Functions are the equivalent of Lambda functions in AWS. Cloud functions are serverless. They let you focus on the code and not worry about the infrastructure where it is going to run.
With Cloud Functions it is easy to respond to events such as uploads to a GCS bucket or messages in a Pub/Sub topic. You are only charged for the time your function is running in response to an event.
BigQuery is Google's serverless data warehousing and provides analytics capabilities for petabyte-scale databases.
BigQuery automatically backs up your tables, but you can always export them to GCS to be on the safe side - incurring extra costs.
Data can be ingested in batches (for instance, from a GCS bucket) or from a stream in multiple formats: CSV, JSON, Parquet, or Avro (most performant). Also, you can query data that resides in external sources, called federated sources, for example, GCS buckets.
You can interact with your data in BigQuery using SQL via the
bq query 'SELECT field FROM ....
User-Defined Functions allow you to combine SQL queries with JavaScript functions to create complex operations.
BigQuery is a columnar data store: records are stored in columns. Tables are collections of columns and datasets are collections of tables.
Jobs are actions to load, export, query, or copy data that BigQuery runs on your behalf.
Views are virtual tables defined by a SQL query and are useful sharing data with others when you want to control exactly what they have access to.
Two important concepts related to tables are:
Using IAM roles, you can control access at a project, dataset, or view level, but not at the table level. Roles are complex for BigQuery, so I recommend checking the documentation.
For instance, the jobUser role only lets you run jobs while the user role lets you run jobs and create datasets (but not tables).
Your costs depend on how much data you store and stream into BigQuery and how much data you query. To reduce costs, BigQuery automatically caches previous queries (per user). This behavior can be disabled.
When you don't edit data for 90 days, it automatically moves to a cheaper storage class. You pay for what you use, but it is possible to opt for a flat rate (only if you need more than the 2000 slots that are allocated by default).
Check these links to see how to optimize your performance and costs.
Pub/Sub is Google's fully-managed message queue, allowing you to decouple publishers (adding messages to the queue) and subscribers (consuming messages from the queue).
Although it is similar to Kafka, Pub/Sub is not a direct substitute. They can be combined in the same pipeline (Kafka deployed on-premise or even in GKE). There are open-source plugins to connect Kafka to GCP, like Kafka Connect.
Pub/Sub guarantees that every message will be delivered at least once but it does not guarantee that messages will be processed in order. It is usually connected to Dataflow to process the data, ensure that the messages are processed in order, and so on.
Pub/Sub support both push and pull modes:
Cloud Tasks is another fully-managed service to execute tasks asynchronously and manage messages between services. However, there are differences between Cloud Tasks and Pub/Sub:
For more details, check out this link.
Cloud Dataflow is Google's managed service for stream and batch data processing, based on Apache Beam.
You can define pipelines that will transform your data, for example before it is ingested in another service like BigQuery, BigTable, or Cloud ML. The same pipeline can process both stream and batch data.
A common pattern is to stream data into Pub/Sub, let's say from IoT devices, process it in Dataflow, and store it for analysis in BigQuery.
But Pub/Sub does not guarantee that the order in which messages are pushed to the topics will be the order in which the messages are consumed. However, this can be done with Dataflow.
Cloud Dataproc is Google's managed the Hadoop and Spark ecosystem. It lets you create and manage your clusters easily and turn them off when you are not using them, to reduce costs.
Dataproc can only be used to process batch data, while Dataflow can handle also streaming data.
Google recommends using Dataproc for a lift and leverage migration of your on-premise Hadoop clusters to the cloud:
Otherwise, you should choose Cloud Dataflow.
Cloud Dataprep provides you with a web-based interface to clean and prepare your data before processing. The input and output formats include, among others, CSV, JSON, and Avro.
After defining the transformations, a Dataflow job will run. The transformed data can be exported to GCS, BigQuery, etc.
Cloud Composer is Google's fully-managed Apache Airflow service to create, schedule, monitor, and manage workflows. It handles all the infrastructure for you so that you can concentrate on combining the services I have described above to create your own workflows.
Under the hood, a GKE cluster will be created with Airflow in it and GCS will be used to store files.
Covering the basics of machine learning would take another article. So here, I assume you are familiar with it and will show you how to train and deploy your models in GCP.
We'll also look at what APIs are available to leverage Google's machine learning capabilities in your services, even if you are not an expert in this area.
AI Platform provides you with a fully-managed platform to use machine learning libraries like Tensorflow. You just need to focus on your model and Google will handle all the infrastructure needed to train it.
After your model is trained, you can use it to get online and batch predictions.
Google lets you use your data to train their models. You can leverage models to build applications that are based on natural language processing (for example, document classification or sentiment analysis applications), speech processing, machine translation, or video processing (video classification or object detection).
Data Studio lets you create visualizations and dashboards based on data that resides in Google services (YouTube Analytics, Sheets, AdWords, local upload), Google Cloud Platform (BigQuery, Cloud SQL, GCS, Spanner), and many third-party services, storing your reports in Google Drive.
Data Studio is not part of GCP, but G-Suite, thus its permissions are not managed using IAM.
There are no additional costs for using Data Studio, other than the storage of the data, queries in BigQuery, and so on. Caching can be used to improve performance and reduce costs.
Datalab lets you explore, analyze, and visualize data in BigQuery, ML Engine, Compute Engine, Cloud Storage, and Stackdriver.
It is based on Jupyter notebooks and supports Python, SQL, and Javascript code. Your notebooks can be shared via the Cloud Source Repository.
Cloud Datalab itself is free of charge, but it will create a virtual machine in GCE for which you will be billed.
Google Cloud encrypts data both at rest (data stored on disk) and in transit (data traveling in the network), using AES implemented via Boring SSL.
You can manage the encryption keys yourself (both storing them in GCP or on-premise) or let Google handle them.
GCP encrypts data stored at rest by default. Your data will be divided into chunks. Each chunk is distributed across different machines and encrypted with a unique key, called a data encryption key (DEK).
Keys are generated and managed by Google but you can also manage the keys yourself, as we will see later in this guide.
To add an extra security layer, all communications between two GCP services or from your infrastructure to GCP are encrypted at one or more network layers. Your data would not be compromised if your messages were to be intercepted.
As I mentioned earlier, you can let Google manage the keys for you or you can manage them yourself.
Google KMS is the service that allows you to manage your encryption keys. You can create, rotate, and destroy symmetric encryption keys. All keys related activity is registered in logs. These keys are referred to as customer-managed encryption keys.
In GCS, they are used to encrypt:
And Google uses server-side keys to handle the rest of the metadata, including the object's name.
The DEKs used to encrypt your data are also encrypted using key encryption keys (KEKs), in a process called envelope encryption. By default, KEKs are rotated every 90 days.
It is important to note that KMS does not store secrets. KMS is a central repository for KEKs. Only the keys that GCP needs to encrypt secrets that are stored somewhere else, for instance in Secrets management.
Note: For GCE and GCS, you have the possibility of keeping your keys on-premise and let Google retrieve them to encrypt and decrypt your data. These are known as customer-supplied keys.
Identity-Aware Proxy allows you to control the access GCP applications via HTTPs without installing any VPN software or adding extra code in your application to handle login.
Your applications are visible to the public internet, but only accessible to authorized users, implementing a zero-trust security access model.
Furthermore, with TCP forwarding you can prevent services like SSH to be exposed to the public internet.
Cloud Armor protects your infrastructure from distributed denial of service (DDoS) attacks. You define rules (for example to whitelist or deny certain IP addresses or CIDR ranges) to create security policies, which are enforced at the Point of Presence level (closer to the source of the attack).
Cloud Armor gives you the option of previewing the effects of your policies before activating them.
Data Loss Prevention is a fully-managed service designed to help you discover, classify, and protect sensitive data, like:
DLP is integrated with GCS, BigQuery, and Datastore. Also, the source of the data can be outside of GCP.
You can specify what type of data you're interested in, called info type, define your own types (based on dictionaries of words and phrases or based on regex expressions), or let Google use the default which can be time-consuming for large amounts of data.
For each result, DLP will return the likelihood of that piece of data matches a certain info type: LIKELIHOOD_UNSPECIFIED, VERY_UNLIKELY, UNLIKELY, POSSIBLE, LIKELY, VERY_LIKELY.
After detecting a piece of PII, DLP can transform it so that it cannot be mapped back to the user. DLP uses multiple techniques to de-identify your sensitive data like tokenization, bucketing, and date shifting. DLP can detect and redact sensitive data in images too.
VPC Service Control helps prevent data exfiltration. It allows you to define a perimeter around resources you want to protect. You can define what services and from what networks these resources can be accessed.
Cloud Web Security Scanner scanner applications running in Compute Engine, GKE, and App Engine for common vulnerabilities such as passwords in plain text, invalid headers, outdated libraries, and cross-site scripting attacks. It simulates a real user trying to click on your buttons, inputting text in your text fields, and so on.
It is part of Cloud Security Command Center.
If you're interested in learning more about GCP, I recommend checking the free practice exams for the different certifications. Whether you are preparing for a GCP or not you can use them to find gaps in your knowledge:
Note: Some questions are based on case studies. Links to the case studies will be provided in the exams so that you have the full context to properly understand and answer the question.
I've extracted 10 questions from some of the exams above. Some of them are pretty straightforward. Others require deep thought and deciding what is the best solution when more than one option is a viable solution.
Your customer is moving their corporate applications to Google Cloud. The security team wants detailed visibility of all resources in the organization. You use the Resource Manager to set yourself up as the Organization Administrator.
Which Cloud Identity and Access Management (Cloud IAM) roles should you give to the security team while following Google's recommended practices?
A. Organization viewer, Project owner
B. Organization viewer, Project viewer
C. Organization administrator, Project browser
D. Project owner, Network administrator
Your company wants to try out the cloud with low risk. They want to archive approximately 100 TB of their log data to the cloud and test the serverless analytics features available to them there, while also retaining that data as a long-term disaster recovery backup.
Which two steps should they take? (Choose two)
A. Load logs into BigQuery.
B. Load logs into Cloud SQL.
C. Import logs into Cloud Logging.
D. Insert logs into Cloud Bigtable.
E. Upload log files into Cloud Storage.
Your company wants to track whether someone is present in a meeting room reserved for a scheduled meeting.
There are 1000 meeting rooms across 5 offices on 3 continents. Each room is equipped with a motion sensor that reports its status every second.
You want to support the data ingestion needs of this sensor network. The receiving infrastructure needs to account for the possibility that the devices may have inconsistent connectivity.
Which solution should you design?
A. Have each device create a persistent connection to a Compute Engine instance and write messages to a custom application.
B. Have devices poll for connectivity to Cloud SQL and insert the latest messages on a regular interval to a device-specific table.
C. Have devices poll for connectivity to Cloud Pub/Sub and publish the latest messages on a regular interval to a shared topic for all devices.
D. Have devices create a persistent connection to an App Engine application fronted by Cloud Endpoints, which ingest messages and write them to Cloud Datastore.
To reduce costs, the Director of Engineering has required all developers to move their development infrastructure resources from on-premises virtual machines (VMs) to Google Cloud.
These resources go through multiple start/stop events during the day and require the state to persist.
You have been asked to design the process of running a development environment in Google Cloud while providing cost visibility to the finance department.
Which two steps should you take? (Choose two)
A. Use persistent disks to store the state. Start and stop the VM as needed.
B. Use the --auto-delete flag on all persistent disks before stopping the VM.
C. Apply the VM CPU utilization label and include it in the BigQuery billing export.
D. Use BigQuery billing export and labels to relate cost to groups.
E. Store all state in a Local SSD, snapshot the persistent disks and terminate the VM.
The database administration team has asked you to help them improve the performance of their new database server running on Compute Engine.
The database is used for importing and normalizing the company’s performance statistics. It is built with MySQL running on Debian Linux. They have an n1-standard-8 virtual machine with 80 GB of SSD zonal persistent disk which they can't restart until the next maintenance event.
What should they change to get better performance from this system as soon as possible and in a cost-effective manner?
A. Increase the virtual machine’s memory to 64 GB.
B. Create a new virtual machine running PostgreSQL.
C. Dynamically resize the SSD persistent disk to 500 GB.
D. Migrate their performance metrics warehouse to BigQuery.
Your organization has a 3-tier web application deployed in the same Google Cloud Virtual Private Cloud (VPC).
Each tier (web, API, and database) scales independently of the others. Network traffic should flow through the web to the API tier, and then on to the database tier. Traffic should not flow between the web and the database tier.
How should you configure the network with minimal steps?
A. Add each tier to a different subnetwork.
B. Set up software-based firewalls on individual VMs.
C. Add tags to each tier and set up routes to allow the desired traffic flow.
D. Add tags to each tier and set up firewall rules to allow the desired traffic flow.
You are developing an application on Google Cloud that will label famous landmarks in users’ photos. You are under competitive pressure to develop a predictive model quickly. You need to keep service costs low.
What should you do?
A. Build an application that calls the Cloud Vision API. Inspect the generated MID values to supply the image labels.
B. Build an application that calls the Cloud Vision API. Pass client image locations as base64-encoded strings.
C. Build and train a classification model with TensorFlow. Deploy the model using the AI Platform Prediction. Pass client image locations as base64-encoded strings.
D. Build and train a classification model with TensorFlow. Deploy the model using the AI Platform Prediction. Inspect the generated MID values to supply the image labels.
You set up an autoscaling managed instance group to serve web traffic for an upcoming launch.
After configuring the instance group as a backend service to an HTTP(S) load balancer, you notice that virtual machine (VM) instances are being terminated and re-launched every minute. The instances do not have a public IP address.
You have verified that the appropriate web response is coming from each instance using the curl command. You want to ensure that the backend is configured correctly.
What should you do?
A. Ensure that a firewall rule exists to allow source traffic on HTTP/HTTPS to reach the load balancer.
B. Assign a public IP to each instance and configure a firewall rule to allow the load balancer to reach the instance public IP.
C. Ensure that a firewall rule exists to allow load balancer health checks to reach the instances in the instance group.
D. Create a tag on each instance with the name of the load balancer. Configure a firewall rule with the name of the load balancer as the source and the instance tag as the destination.
You created a job that runs daily to import highly sensitive data from an on-premises location to Cloud Storage. You also set up a streaming data insert into Cloud Storage via a Kafka node that is running on a Compute Engine instance.
You need to encrypt the data at rest and supply your own encryption key. Your key should not be stored in the Google Cloud.
What should you do?
A. Create a dedicated service account and use encryption at rest to reference your data stored in Cloud Storage and Compute Engine data as part of your API service calls.
B. Upload your own encryption key to Cloud Key Management Service and use it to encrypt your data in Cloud Storage. Use your uploaded encryption key and reference it as part of your API service calls to encrypt your data in the Kafka node hosted on Compute Engine.
C. Upload your own encryption key to Cloud Key Management Service and use it to encrypt your data in your Kafka node hosted on Compute Engine.
D. Supply your own encryption key, and reference it as part of your API service calls to encrypt your data in Cloud Storage and your Kafka node hosted on Compute Engine.
You are designing a relational data repository on Google Cloud to grow as needed. The data will be transactionally consistent and added from any location in the world. You want to monitor and adjust node count for input traffic, which can spike unpredictably.
What should you do?
A. Use Cloud Spanner for storage. Monitor storage usage and increase node count if more than 70% utilized.
B. Use Cloud Spanner for storage. Monitor CPU utilization and increase node count if more than 70% utilized for your time span.
C. Use Cloud Bigtable for storage. Monitor data stored and increase node count if more than 70% is utilized.
D. Use Cloud Bigtable for storage. Monitor CPU utilization and increase node count if more than 70% utilized for your time span.
At the beginning of this article, I said you'd learn how to design a mobile gaming analytics platform that collects, stores, and analyzes vast amounts of player-telemetry both from bulks of data and real-time events.
So, do you think you can do it?
Take a pen and a piece of paper and try to come up with your own solution based on the services I have described here. If you get stuck, the following questions might help:
I have purposely defined the problem in a very vague way. This is what you can expect when you are facing this sort of challenge: uncertainty. It is part of your job to gather requirements and document your assumptions.
Do not worry if your solution does not look like Google's. This is just one possible solution. Learning to design complex systems is a skill that takes a lifetime to master. Luckily, you're headed in the right direction.
This guide will help you get started on GCP and give you a broad perspective of what you can do with it.
By no means will you be an expert after finishing this guide, or any other guide for that matter. The only way to really learn is by practicing.
#googlecloud #cloud #gcp #cloudcomputing