One day after Big Data player Pivotal Software Inc. changed its business model by open sourcing core technologies, Microsoft today announced related product updates with a definite open source slant.
The “new and enhanced” data services include an Azure HDInsight preview that runs on Linux, and the general availability of Storm on HDInsight, Azure Machine Learning, and Informatica technology on the Microsoft Azure cloud.
“Just about every interesting innovation that’s going on – in data today, in machine learning and other areas – has its roots in an open source ecosystem,” Pivotal CEO Paul Maritz said yesterday at a live streaming event.
Perhaps an exaggeration, but the underlying meaning was grokked by Microsoft years ago, and the company is in the middle of a swing to openness and interoperability, led by new CEO Satya Nadella and top lieutenants such as T. K. “Ranga” Rengarajan, head of the data platform.
“Azure Machine Learning reflects our support for open source,” stated a blog post today authored by Rengarajan and machine learning exec Joseph Sirosh. “The Python programming language is a first-class citizen in Azure Machine Learning Studio, along with R, the popular language of statisticians.” Microsoft acquired stewardship of the R language earlier this year.
Data developers can now use the Machine Learning Marketplace to discover appropriate APIs and prebuilt services for common concerns such as recommendation engines, detecting anomalies and forecasting.
The open source story continues with Storm for Azure HDInsight. Azure HDInsight is Microsoft’s cloud service based on 100 percent Apache Hadoop technology, open sourced by the Apache Software Foundation.
“Storm is an open source stream analytics platform that can process millions of data ‘events’ in real time as they are generated by sensors and devices,” Microsoft said. “Using Storm with HDInsight, customers can deploy and manage applications for real-time analytics and Internet-of-Things (IoT) scenarios in a few minutes with just a few clicks. We are also making Storm available for both .NET and Java and the ability to develop, deploy and debug real-time Storm applications directly in Visual Studio. That helps developers to be productive in the environments they know best.” Microsoft added Storm integration last fall.
Of course, there’s nothing more open source than the Linux OS, and Azure HDInsight is now available as a preview project running on Ubuntu clusters. Ubuntu is a popular Linux distribution, described by Microsoft as “the leading scale-out Linux.”
Adding Linux support in addition to Windows “is particularly compelling for people that already use Hadoop on Linux on-premises like on Hortonworks Data Platform, because they can use common Linux tools, documentation, and templates and extend their deployment to Azure with hybrid cloud connections,” Microsoft said.
Also, to increase customer options for leveraging technology from Microsoft partners, the Redmond software giant announced that Informatica data integration technology will be available in the Azure Marketplace.
“Today, Informatica is announcing the availability of its Cloud Integration Secure Agent on Microsoft Azure and Linux Virtual Machines as well as an Informatica Cloud Connector for Microsoft Azure Storage,” Informatica exec Ronen Schwartz said in a blog post today. “Users of Azure data services such as Azure HDInsight, Azure Machine Learning and Azure Data Factory can make their data work with access to the broadest set of data sources including on-premises applications, databases, cloud applications and social data.”
All the Microsoft news comes during the Strata + Hadoop World conference underway in San Jose, Calif.
“These new services are part of our continued investment in a broad portfolio of solutions to unlock insights from data,” Microsoft said. “They can help businesses dramatically improve their performance, enable governments to better serve their citizenry, or accelerate new advancements in science. Our goal is to make Big Data technology simpler and more accessible to the greatest number of people possible: Big Data pros, data scientists and app developers, but also everyday businesspeople and IT managers.”
#coding #azure #visualstudiomagazine #azure data
An extensively researched list of top Microsoft big data analytics and solution with ratings & reviews to help find the best Microsoft big data solutions development companies around the world.
An exclusive list of Microsoft Big Data consulting and solution providers, after examining various factors of expert big data analytics firms and found the equivalent matches that boast the ace qualities with proven fineness in data analytics. For business growth and enterprise acceleration getting inputs from the whole data of the organization have become necessary, thus we bring to you the most trustworthy Microsoft Big Data consultants and solutions providers for your assistance.
Let’s take a look at the List of Best Microsoft big data solutions Companies.
#microsoft big data solutions development companies #microsoft big data analytics and solution #microsoft big data consultants #microsoft big data developers #microsoft big data #microsoft big data solution providers
If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.
If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.
In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.
#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition
According to the International Open Data Charter(1), it defines open data as those digital data that are made available with the technical and legal characteristics necessary so that they can be freely used, reused and redistributed by anyone, at any time and anywhere.
But what are the bases that are governed to comply with the definition of open data, the International Open Data Charter gives us the principles:
- Open by default:
There must be free access to government data.
Governments must adopt strategies for the creation, use, exchange, and harmonization of open data.
Open data must not violate the right to privacy.
- Timely and comprehensive:
The prioritization of what data to open should be done in consultation with current and potential users.
The data must be comprehensive, accurate, and high quality.
- Accessible and usable:
Open data helps improve decision making.
There should be no bureaucratic and/or administrative barriers to accessing the data.
#data-science #open-source #data-visualization #open-data #data #data analysis
The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.
This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.
As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).
This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.
#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management
This keynote from Sunil Kamath explores the #Azure Data mission, how open source fits into the Azure Data team’s efforts, and why there’s never been a better time to work with data. Sunil covers how the Azure Data team is working to help developers and customers—how we’re working with partners and customers to make an impact, especially in this era of COVID-19. Sunil then highlights several recent projects, from healthcare bots across the globe, online learning in Korea, enabling emergency response and essential services to be provided in India, helping financial institutions help small businesses in need—as well as supporting the increase in people working from home and collaborating over Microsoft Teams, due to worldwide quarantine and shelter-in-place orders.
Sunil also discusses open source efforts in the #AzureData team, including our investment in PostgreSQL committers; acquiring Postgres company Citus Data (Citus is an open source extension that scales out Postgres horizontally); what the Windows telemetry team is doing with Postgres and Citus on Azure; Azure Data Studio; and how Systems Imagination is using Azure SQL Database and Big Data Clusters in the fight against cancer.
0:33 My journey with open source
4:52 Azure Data efforts to help during COVID-19
7:58 Our team’s contribution to Work from Home
9:58 Microsoft loves Open Source
12:18 How the Windows telemetry team uses Postgres & Citus
Sunil Kamath has been working on Open Source projects since 2002, particularly with open source databases to help developers fuel innovation. As Director for Open Source Databases on the Azure cloud platform, Sunil creates the vision, strategy, and long-range product plans for the Postgres, MySQL, and MariaDB open source database services running on Azure.
#azure #coding #microsoft #azure data