Amya  Gleichner

Amya Gleichner


Cloud Foundations for Data Scientists


Nowadays, more and more companies are moving to developing and deploying applications in cloud-based environments. One of the main motivations for cloud computing is that it gets rid of all the problematics associated with setting up and managing the used hardware and software. This is accomplished by remotely renting computer resources available in data centres maintained by a Cloud provider.

In this way, companies and individuals can make use remotely of the hardware and software setups and workflows provided by different cloud providers without having to worry about buying the equipment, set up different environments and maintain them over time. Using this type of approach, makes therefore possible for users to focus just on developing and deploying their services (eg. website, database, apps, data analytics) without having to think about any possible overhead. Ultimately, this can lead to faster/continuous development and improved customer satisfaction, in line with common DevOps (Development-Operations) principles.

The main idea behind cloud services closely resembles any other type of utility and subscription-based system which has been developed in the past. For example, in these days, each of us makes use of utilities such as electricity/water/gas without having to worry about how these systems works and are delivered to us. What we care about is that our providers can give us a reliable service and that we have to pay just for how much we use or not all of these services.

Some of the main cloud providers in use are:

All of these different providers, have their own terminologies for the provided services but they all share the same principles outlined in this article.

Cloud Concepts

Each different cloud provider is able to offer a wide variety of services and applications, these can be categorised into four main types:

  • Computing: virtual machines, containers, etc…
  • Networking: security services to make different services interact with each other.
  • Storage: space to store any type of file or databases for applications.
  • Analytics: data processing and visualization tools to analyse and create insights from the stored data.

In order to make resources renting costs low and make the best use out of the available hardware, cloud services heavily rely on virtualization techniques such as Virtual Machines and Containers.

Virtual Machines (VMs)

Virtual Machines have been created in order to emulate computers hardware (Figure 1). Different instances of virtual machines can be created to run on a single computer simultaneously, each of them having a preferred operating system, memory and storage allocation. Once created a virtual machine, this can then be accessed remotely using a desktop client and used like any other physical computer (eg. install programs, run applications, etc…). In this way, hardware usage can be maximised by using a single computer as a collection of different computers.

Image for post

Figure 1: Virtual Machines Infrastructure

Another great advantage of using VMs in the cloud is that our system can become easily scalable. Therefore, if are need more resources than planned to complete a project, extra memory and storage capability can be provided on the fly by the cloud provider (reducing so any possible downtime in the deployment).

Additionally, if we want to make sure that our service is always running, it can also be possible to create a backup VMs that will automatically run in case any problem might arise with the original one. This can be particularly useful when adding new features on services such as websites or applications and we want to first make available the newer version of the service to just a subset of the audience so that to get some sort of feedback of if the change can bring a positive impact or not to the service (A/B testing).


One problem with Virtual Machines is that each virtual machine has its own operating system. For example, if we have a complex application which needs Windows for some operations and Linux for some others then we would need to run two separate Virtual Machines and make the two communicate with each other. This, therefore, leads to an overhead of resources which could be avoided. Additionally, when developing in a team an application it then can be also possible that some packages dependencies conflicts might arise when moving from different environments (e.g. “It worked on my machine”).

In order to avoid these types of problem, Containers have been ideated. Containers, package together apps with their dependencies and deploys them to a container host (which acts as a service to abstract away the operating system). In this way, the overhead memory cost due to having multiple operating systems can now be avoided and more Containers can be able to run on a single machine than what it would have been possible to do on a single machine using Virtual Machines. Additionally, now our applications can now easily move between different operating systems without having to think about managing the underlying dependencies.

Virtual Machines virtualise the hardware, while Containers virtualise the operating system [1].

One of the most common services which can be used in order to create containers is Docker (Figure 2).

Image for post

Figure 2: Containers Infrastructure

One of the most efficient ways to create complex systems is to divide them into different Container applications each having a distinguishable role. For example, if we are working on a website it could be a good idea to divide the website into three main containers: front-end, back-end and database. Every time we want to introduce new features, we can then update our Container of interest without having to touch the other ones. As we add more components to our application and its complexity increases more and more Containers might be needed. In order to efficiently maintain and organise Containers, services such as Kubernetes have been created.

Kubernetes, have been developed in order to make Container-based applications easily scale in Cloud-based environments. Some examples of services which Kubernetes can provide are:

  • Optimising workload between Containers and scale the number of resources needed based on demand.
  • In case there is any fault in a Container, a new instance can be created in order to replace it.
  • If after the launch of a new feature, some problem arises it can be easily possible to go back to the previous version of our application.

Making use of Kubernetes to orchestrate our applications, can then allow us to follow DevOps practices such as Continuous Integration (CI) and Continuous Delivery (CD). The main focus of Continuous Integration is to make sure that code changes perfectly integrates with the current code infostructure, while Continuous Delivery focuses on making the code base always ready for deployment (able to automatically pass any building and testing mechanism in place).

#data-science #artificial-intelligence #cloud #aws #developer

What is GEEK

Buddha Community

Cloud Foundations for Data Scientists
Adaline  Kulas

Adaline Kulas


Multi-cloud Spending: 8 Tips To Lower Cost

A multi-cloud approach is nothing but leveraging two or more cloud platforms for meeting the various business requirements of an enterprise. The multi-cloud IT environment incorporates different clouds from multiple vendors and negates the dependence on a single public cloud service provider. Thus enterprises can choose specific services from multiple public clouds and reap the benefits of each.

Given its affordability and agility, most enterprises opt for a multi-cloud approach in cloud computing now. A 2018 survey on the public cloud services market points out that 81% of the respondents use services from two or more providers. Subsequently, the cloud computing services market has reported incredible growth in recent times. The worldwide public cloud services market is all set to reach $500 billion in the next four years, according to IDC.

By choosing multi-cloud solutions strategically, enterprises can optimize the benefits of cloud computing and aim for some key competitive advantages. They can avoid the lengthy and cumbersome processes involved in buying, installing and testing high-priced systems. The IaaS and PaaS solutions have become a windfall for the enterprise’s budget as it does not incur huge up-front capital expenditure.

However, cost optimization is still a challenge while facilitating a multi-cloud environment and a large number of enterprises end up overpaying with or without realizing it. The below-mentioned tips would help you ensure the money is spent wisely on cloud computing services.

  • Deactivate underused or unattached resources

Most organizations tend to get wrong with simple things which turn out to be the root cause for needless spending and resource wastage. The first step to cost optimization in your cloud strategy is to identify underutilized resources that you have been paying for.

Enterprises often continue to pay for resources that have been purchased earlier but are no longer useful. Identifying such unused and unattached resources and deactivating it on a regular basis brings you one step closer to cost optimization. If needed, you can deploy automated cloud management tools that are largely helpful in providing the analytics needed to optimize the cloud spending and cut costs on an ongoing basis.

  • Figure out idle instances

Another key cost optimization strategy is to identify the idle computing instances and consolidate them into fewer instances. An idle computing instance may require a CPU utilization level of 1-5%, but you may be billed by the service provider for 100% for the same instance.

Every enterprise will have such non-production instances that constitute unnecessary storage space and lead to overpaying. Re-evaluating your resource allocations regularly and removing unnecessary storage may help you save money significantly. Resource allocation is not only a matter of CPU and memory but also it is linked to the storage, network, and various other factors.

  • Deploy monitoring mechanisms

The key to efficient cost reduction in cloud computing technology lies in proactive monitoring. A comprehensive view of the cloud usage helps enterprises to monitor and minimize unnecessary spending. You can make use of various mechanisms for monitoring computing demand.

For instance, you can use a heatmap to understand the highs and lows in computing visually. This heat map indicates the start and stop times which in turn lead to reduced costs. You can also deploy automated tools that help organizations to schedule instances to start and stop. By following a heatmap, you can understand whether it is safe to shut down servers on holidays or weekends.

#cloud computing services #all #hybrid cloud #cloud #multi-cloud strategy #cloud spend #multi-cloud spending #multi cloud adoption #why multi cloud #multi cloud trends #multi cloud companies #multi cloud research #multi cloud market

 iOS App Dev

iOS App Dev


Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

Sid  Schuppe

Sid Schuppe


Benefits of Hybrid Cloud for Data Warehouse

In today’s market reliable data is worth its weight in gold, and having a single source of truth for business-related queries is a must-have for organizations of all sizes. For decades companies have turned to data warehouses to consolidate operational and transactional information, but many existing data warehouses are no longer able to keep up with the data demands of the current business climate. They are hard to scale, inflexible, and simply incapable of handling the large volumes of data and increasingly complex queries.

These days organizations need a faster, more efficient, and modern data warehouse that is robust enough to handle large amounts of data and multiple users while simultaneously delivering real-time query results. And that is where hybrid cloud comes in. As increasing volumes of data are being generated and stored in the cloud, enterprises are rethinking their strategies for data warehousing and analytics. Hybrid cloud data warehouses allow you to utilize existing resources and architectures while streamlining your data and cloud goals.

#cloud #data analytics #business intelligence #hybrid cloud #data warehouse #data storage #data management solutions #master data management #data warehouse architecture #data warehouses

Ian  Robinson

Ian Robinson


Data Science: Advice for Aspiring Data Scientists | Experfy Insights

Around once a month, I get emailed by a student of some type asking how to get into Data Science, I’ve answered it enough that I decided to write it out here so I can link people to it. So if you’re one of those students, welcome!

I’ll segment this into basic advice, which can be found quite easily if you just google ‘how to get into data science’ and advice that is less common, but advice that I’ve found very useful over the years. I’ll start with the latter, and move on to basic advice. Obviously take this with a grain of salt as all advice comes with a bit of survivorship bias.

Less Basic Advice:

1. Find a solid community

2. Apply Data Science to Things you Enjoy

3. Minimize the ‘Clicks to Proof of Competence’

4. Learn Through Research or Entry Level Jobs

#big data & cloud #data science #data scientist #statistics #aspiring data scientist #advice for aspiring data scientists

Java Questions

Java Questions


50 Data Science Jobs That Opened Just Last Week

Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

In this article, we list down 50 latest job openings in data science that opened just last week.

(The jobs are sorted according to the years of experience r

1| Data Scientist at IBM

**Location: **Bangalore

Skills Required: Real-time anomaly detection solutions, NLP, text analytics, log analysis, cloud migration, AI planning, etc.

Apply here.

2| Associate Data Scientist at PayPal

**Location: **Chennai

Skills Required: Data mining experience in Python, R, H2O and/or SAS, cross-functional, highly complex data science projects, SQL or SQL-like tools, among others.

Apply here.

3| Data Scientist at Citrix

Location: Bangalore

Skills Required: Data modelling, database architecture, database design, database programming such as SQL, Python, etc., forecasting algorithms, cloud platforms, designing and developing ETL and ELT processes, etc.

Apply here.

4| Data Scientist at PayPal

**Location: **Bangalore

Skills Required: SQL and querying relational databases, statistical programming language (SAS, R, Python), data visualisation tool (Tableau, Qlikview), project management, etc.

Apply here.

5| Data Science at Accenture

**Location: **Bibinagar, Telangana

Skills Required: Data science frameworks Jupyter notebook, AWS Sagemaker, querying databases and using statistical computer languages: R, Python, SLQ, statistical and data mining techniques, distributed data/computing tools such as Map/Reduce, Flume, Drill, Hadoop, Hive, Spark, Gurobi, MySQL, among others.

#careers #data science #data science career #data science jobs #data science news #data scientist #data scientists #data scientists india