NumPy

NumPy

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
米 萱萱

米 萱萱

1680070980

在 Python 和 Numpy 中使用奇异值分离 (linalg.svd)

在此 pythonn - Numpy 教程中,我们将了解 Numpy linalg.svd:Python 中的奇异值分解。在数学中,矩阵的奇异值分解 (SVD) 是指将矩阵分解为三个单独的矩阵。它是矩阵特征值分解的更一般化版本。它进一步与极性分解有关。

在 Python 中,使用数值 python 或 numpy 库很容易计算复数或实数矩阵的奇异分解。numpy 库由各种线性代数函数组成,包括用于计算矩阵奇异值分解的函数。

机器学习模型中,奇异值分解被广泛用于训练模型和神经网络。它有助于提高准确性和减少数据中的噪音。奇异值分解将一个向量转换为另一个向量,而它们不一定具有相同的维度。因此,它使向量空间中的矩阵操作更加容易和高效。它也用于回归分析

Numpy linalg.svd() 函数的语法

python中计算矩阵奇异值分解的函数属于numpy模块,名为linalg.svd()。

numpy linalg.svd() 的语法如下:

numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False)

您可以根据您的要求自定义 true 和 false 布尔值。

该函数的参数如下:

  • A->array_like:这是需要计算奇异值分解的矩阵。根据需要,它可以是真实的或复杂的。它的维度应该> = 2。
  • full_matrices->boolean value(可选):如果设置为 true,则给定矩阵的 Hermitian 转置为正方形,如果为 false,则不是。
  • compute_uv->boolen value(optional):决定是否在奇异值分解的基础上计算Hermitian转置。
  • hermitian->boolean value(可选):给定的矩阵被认为是 hermitian(即对称的,具有实数值),这可能提供更有效的计算方法。

该函数根据上述参数返回三种类型的矩阵:

  • S->array_like:包含奇异值的向量,按降序排列,维度与原始矩阵相同。
  • u->array_like:这是一个可选的解决方案,当 compute_uv 设置为 True 时返回。它是一组具有奇异值的向量。
  • v-> array_like:单一数组集仅在 compute_uv 设置为 True 时返回。

当奇异值不同时,它会引发LinALgError 。

设置的先决条件

在深入研究示例之前,请确保您已在本地系统中安装了 numpy 模块。这是使用线性代数函数(如本文中讨论的函数)所必需的。在您的终端中运行以下命令。

pip install numpy

这就是您现在所需要的,让我们看看我们将如何在下一节中实现代码。

要在 Python 中计算奇异值分解 (SVD),请使用 NumPy 库的 linalg.svd() 函数。它的语法是 numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False),其中 A 是计算 SVD 的矩阵。它返回三个矩阵:S、U 和 V。

示例 1:计算 3×3 矩阵的奇异值分解

在第一个示例中,我们将采用 3X3 矩阵并按以下方式计算其奇异值分解:

#importing the numpy module
import numpy as np
#using the numpy.array() function to create an array
A=np.array([[2,4,6],
       [8,10,12],
       [14,16,18]])
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

输出将是:

the output is=
s(the singular value) =  [3.36962067e+01 2.13673903e+00 8.83684950e-16]
u =  [[-0.21483724  0.88723069  0.40824829]
 [-0.52058739  0.24964395 -0.81649658]
 [-0.82633754 -0.38794278  0.40824829]]
v =  [[-0.47967118 -0.57236779 -0.66506441]
 [-0.77669099 -0.07568647  0.62531805]
 [-0.40824829  0.81649658 -0.40824829]]

示例 1

示例 1

示例 2:计算随机矩阵的奇异值分解

在这个例子中,我们将使用numpy.random.randint()函数来创建一个随机矩阵。让我们开始吧!

#importing the numpy module
import numpy as np
#using the numpy.array() function to craete an array
A=np.random.randint(5, 200, size=(3,3))
#display the created matrix
print("The input matrix is=",A)
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

输出将如下所示:

The input matrix is= [[ 36  74 101]
 [104 129 185]
 [139 121 112]]
the output is=
s(the singular value) =  [348.32979681  61.03199722  10.12165841]
u =  [[-0.3635535  -0.48363012 -0.79619769]
 [-0.70916514 -0.41054007  0.57318554]
 [-0.60408084  0.77301925 -0.19372034]]
v =  [[-0.49036384 -0.54970618 -0.67628871]
 [ 0.77570499  0.0784348  -0.62620264]
 [ 0.39727203 -0.83166766  0.38794824]]

示例 2

示例 2

建议:Numpy linalg.eigvalsh:特征值计算指南

包起来

在本文中,我们探讨了数学中奇异值分解的概念以及如何使用 Python 的 numpy 模块对其进行计算。我们使用 linalg.svd() 函数来计算给定矩阵和随机矩阵的奇异值分解。Numpy 为执行线性代数运算提供了一种高效且易于使用的方法,使其在机器学习、神经网络和回归分析中具有很高的价值。继续探索 numpy 中的其他线性代数函数,以增强您在 Python 中的数学工具集。

文章来源:https: //www.askpython.com

#python  #numpy 

在 Python 和 Numpy 中使用奇异值分离 (linalg.svd)

Разделение единственного числа в Python и Numpy (linalg.svd)

В этом руководстве по pythonn — Numpy мы узнаем о Numpy linalg.svd: разложение по единственному значению в Python. В математике разложение матрицы по сингулярным числам (SVD) относится к разложению матрицы на три отдельные матрицы. Это более обобщенная версия разложения матриц по собственным значениям. Это также связано с полярными разложениями.

В Python легко вычислить сингулярное разложение сложной или вещественной матрицы, используя числовой python или библиотеку numpy. Библиотека numpy состоит из различных линейных алгебраических функций, включая функцию для вычисления разложения матрицы по сингулярным числам.

В моделях машинного обучения разложение по сингулярным числам широко используется для обучения моделей и в нейронных сетях. Это помогает повысить точность и уменьшить шум в данных. Разложение по сингулярным значениям преобразует один вектор в другой, при этом они не обязательно имеют одинаковую размерность. Следовательно, это делает матричные операции в векторных пространствах более простыми и эффективными. Он также используется в регрессионном анализе .

Синтаксис функции Numpy linalg.svd()

Функция, которая вычисляет разложение матрицы по сингулярным числам в python, принадлежит модулю numpy с именем linalg.svd() .

Синтаксис numpy linalg.svd() следующий:

numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False)

Вы можете настроить истинные и ложные логические значения в соответствии с вашими требованиями.

Параметры функции приведены ниже:

  • A->array_like: это требуемая матрица, для которой вычисляется разложение по сингулярным числам. Он может быть реальным или сложным по мере необходимости. Его размер должен быть >= 2.
  • full_matrices->boolean value(необязательно): если установлено значение true, то эрмитовское транспонирование данной матрицы является квадратом, если оно false, то это не так.
  • calculate_uv->boolen value (необязательно): определяет, следует ли вычислять эрмитову транспонирование в дополнение к разложению по сингулярным значениям.
  • hermitian->boolean value(необязательно): Данная матрица считается эрмитовой (то есть симметричной, с действительными значениями), что может обеспечить более эффективный метод вычислений.

Функция возвращает три типа матриц на основе указанных выше параметров:

  • S->array_like : вектор, содержащий сингулярные значения в порядке убывания с размерами, такими же, как исходная матрица.
  • u->array_like : это необязательное решение, которое возвращается, когда для параметра calculate_uv установлено значение True. Это набор векторов с сингулярными значениями.
  • v-> array_like : Набор унитарных массивов возвращается только в том случае, если для параметра calculate_uv установлено значение True.

Он вызывает LinALgError , когда сингулярные значения различаются.

Предварительные условия для настройки

Прежде чем мы углубимся в примеры, убедитесь, что в вашей локальной системе установлен модуль numpy. Это необходимо для использования линейных алгебраических функций, подобных той, что обсуждается в этой статье. Запустите следующую команду в своем терминале.

pip install numpy

Это все, что вам нужно прямо сейчас, давайте посмотрим, как мы будем реализовывать код в следующем разделе.

Чтобы вычислить разложение по сингулярным значениям (SVD) в Python, используйте функцию linalg.svd() из библиотеки NumPy. Его синтаксис таков: numpy.linalg.svd(A, full_matrices=True, calculate_uv=True, hermitian=False), где A — матрица, для которой вычисляется SVD. Он возвращает три матрицы: S, U и V.

Пример 1. Вычисление сингулярного разложения матрицы 3×3

В этом первом примере мы возьмем матрицу 3X3 и вычислим ее разложение по сингулярным числам следующим образом:

#importing the numpy module
import numpy as np
#using the numpy.array() function to create an array
A=np.array([[2,4,6],
       [8,10,12],
       [14,16,18]])
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

Вывод будет:

the output is=
s(the singular value) =  [3.36962067e+01 2.13673903e+00 8.83684950e-16]
u =  [[-0.21483724  0.88723069  0.40824829]
 [-0.52058739  0.24964395 -0.81649658]
 [-0.82633754 -0.38794278  0.40824829]]
v =  [[-0.47967118 -0.57236779 -0.66506441]
 [-0.77669099 -0.07568647  0.62531805]
 [-0.40824829  0.81649658 -0.40824829]]

Пример 1

Пример 1

Пример 2. Вычисление сингулярного разложения случайной матрицы

В этом примере мы будем использовать функцию numpy.random.randint() для создания случайной матрицы. Давайте погрузимся в это!

#importing the numpy module
import numpy as np
#using the numpy.array() function to craete an array
A=np.random.randint(5, 200, size=(3,3))
#display the created matrix
print("The input matrix is=",A)
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

Вывод будет следующим:

The input matrix is= [[ 36  74 101]
 [104 129 185]
 [139 121 112]]
the output is=
s(the singular value) =  [348.32979681  61.03199722  10.12165841]
u =  [[-0.3635535  -0.48363012 -0.79619769]
 [-0.70916514 -0.41054007  0.57318554]
 [-0.60408084  0.77301925 -0.19372034]]
v =  [[-0.49036384 -0.54970618 -0.67628871]
 [ 0.77570499  0.0784348  -0.62620264]
 [ 0.39727203 -0.83166766  0.38794824]]

Пример 2

Пример 2

Предложено: Numpy linalg.eigvalsh: руководство по вычислению собственных значений .

Подведение итогов

В этой статье мы рассмотрели концепцию разложения по сингулярным числам в математике и способы ее вычисления с помощью модуля Python numpy. Мы использовали функцию linalg.svd() для вычисления разложения по сингулярным числам как заданных, так и случайных матриц. Numpy предоставляет эффективный и простой в использовании метод выполнения операций линейной алгебры, что делает его очень ценным для машинного обучения, нейронных сетей и регрессионного анализа. Продолжайте изучать другие линейные алгебраические функции в numpy, чтобы расширить свой набор математических инструментов в Python.

Источник статьи: https://www.askpython.com

#python #numpy 

Разделение единственного числа в Python и Numpy (linalg.svd)
Mélanie  Faria

Mélanie Faria

1680061020

Usando separação de valor singular em Python e Numpy (linalg.svd)

Neste tutorial pythonn - Numpy, aprenderemos sobre Numpy linalg.svd: Decomposição de valor singular em Python. Em matemática, uma decomposição de valor singular (SVD) de uma matriz refere-se à fatoração de uma matriz em três matrizes separadas. É uma versão mais generalizada de uma decomposição de valores próprios de matrizes. Está ainda relacionado com as decomposições polares.

Em Python, é fácil calcular a decomposição singular de uma matriz complexa ou real usando o python numérico ou a biblioteca numpy. A biblioteca numpy consiste em várias funções algébricas lineares, incluindo uma para calcular a decomposição do valor singular de uma matriz.

Em modelos de aprendizado de máquina , a decomposição de valor singular é amplamente utilizada para treinar modelos e em redes neurais. Ajuda a melhorar a precisão e a reduzir o ruído nos dados. A decomposição em valor singular transforma um vetor em outro sem que eles tenham necessariamente a mesma dimensão. Portanto, torna a manipulação de matrizes em espaços vetoriais mais fácil e eficiente. Também é usado na análise de regressão .

Sintaxe da função Numpy linalg.svd()

A função que calcula a decomposição do valor singular de uma matriz em python pertence ao módulo numpy, chamado linalg.svd() .

A sintaxe do numpy linalg.svd () é a seguinte:

numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False)

Você pode personalizar os valores booleanos verdadeiro e falso com base em seus requisitos.

Os parâmetros da função são dados a seguir:

  • A->array_like: Esta é a matriz necessária cuja decomposição de valor singular está sendo calculada. Pode ser real ou complexo, conforme necessário. Sua dimensão deve ser >= 2.
  • full_matrices->boolean value(opcional): Se definido como true, então a transposta Hermitiana da matriz dada é um quadrado, se for false então não é.
  • compute_uv->boolen value(opcional): Determina se a transposição hermitiana deve ser calculada ou não além da decomposição do valor singular.
  • hermitian->valor booleano (opcional): A matriz fornecida é considerada hermitiana (ou seja, simétrica, com valores reais), o que pode fornecer um método de cálculo mais eficiente.

A função retorna três tipos de matrizes com base nos parâmetros mencionados acima:

  • S->array_like : O vetor contendo os valores singulares na ordem decrescente com as mesmas dimensões da matriz original.
  • u->array_like : Esta é uma solução opcional que é retornada quando compute_uv é definido como True. É um conjunto de vetores com valores singulares.
  • v-> array_like : Conjunto de arrays unitários retornados apenas quando compute_uv é definido como True.

Gera um LinALgError quando os valores singulares são diversos.

Pré-requisitos para configuração

Antes de mergulharmos nos exemplos, certifique-se de ter o módulo numpy instalado em seu sistema local. Isso é necessário para usar funções algébricas lineares como a discutida neste artigo. Execute o seguinte comando em seu terminal.

pip install numpy

Isso é tudo que você precisa agora, vamos ver como vamos implementar o código na próxima seção.

Para calcular a Decomposição de Valor Singular (SVD) em Python, use a função linalg.svd() da biblioteca NumPy. Sua sintaxe é numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False), onde A é a matriz para a qual SVD está sendo calculado. Ele retorna três matrizes: S, U e V.

Exemplo 1: Calculando a Decomposição de Valor Singular de uma Matriz 3 × 3

Neste primeiro exemplo, pegaremos uma matriz 3X3 e calcularemos sua decomposição de valor singular da seguinte maneira:

#importing the numpy module
import numpy as np
#using the numpy.array() function to create an array
A=np.array([[2,4,6],
       [8,10,12],
       [14,16,18]])
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

A saída será:

the output is=
s(the singular value) =  [3.36962067e+01 2.13673903e+00 8.83684950e-16]
u =  [[-0.21483724  0.88723069  0.40824829]
 [-0.52058739  0.24964395 -0.81649658]
 [-0.82633754 -0.38794278  0.40824829]]
v =  [[-0.47967118 -0.57236779 -0.66506441]
 [-0.77669099 -0.07568647  0.62531805]
 [-0.40824829  0.81649658 -0.40824829]]

Exemplo 1

Exemplo 1

Exemplo 2: Calculando a Decomposição de Valor Singular de uma Matriz Aleatória

Neste exemplo, usaremos a função numpy.random.randint() para criar uma matriz aleatória. Vamos entrar nisso!

#importing the numpy module
import numpy as np
#using the numpy.array() function to craete an array
A=np.random.randint(5, 200, size=(3,3))
#display the created matrix
print("The input matrix is=",A)
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

A saída será a seguinte:

The input matrix is= [[ 36  74 101]
 [104 129 185]
 [139 121 112]]
the output is=
s(the singular value) =  [348.32979681  61.03199722  10.12165841]
u =  [[-0.3635535  -0.48363012 -0.79619769]
 [-0.70916514 -0.41054007  0.57318554]
 [-0.60408084  0.77301925 -0.19372034]]
v =  [[-0.49036384 -0.54970618 -0.67628871]
 [ 0.77570499  0.0784348  -0.62620264]
 [ 0.39727203 -0.83166766  0.38794824]]

Exemplo 2

Exemplo 2

Sugerido: Numpy linalg.eigvalsh: um guia para cálculo de valores próprios .

Empacotando

Neste artigo, exploramos o conceito de decomposição de valor singular em matemática e como calculá-la usando o módulo numpy do Python. Usamos a função linalg.svd() para calcular a decomposição de valor singular de matrizes fornecidas e aleatórias. O Numpy fornece um método eficiente e fácil de usar para realizar operações de álgebra linear, tornando-o altamente valioso em aprendizado de máquina, redes neurais e análise de regressão. Continue explorando outras funções algébricas lineares em numpy para aprimorar seu conjunto de ferramentas matemáticas em Python.

Fonte do artigo em: https://www.askpython.com

#python  #numpy 

Usando separação de valor singular em Python e Numpy (linalg.svd)
Phung Dang

Phung Dang

1679997240

Sử dụng Phân tách giá trị số ít trong Python và Numpy (linalg.svd)

Trong hướng dẫn pythonn - Numpy này, chúng ta sẽ tìm hiểu về Numpy linalg.svd: Phân tách giá trị số ít trong Python. Trong toán học, phân tích giá trị đơn lẻ (SVD) của ma trận đề cập đến việc phân tích ma trận thành ba ma trận riêng biệt. Nó là một phiên bản tổng quát hơn của phép phân tách giá trị riêng của ma trận. Nó liên quan nhiều hơn đến sự phân hủy cực.

Trong Python, thật dễ dàng để tính toán phép phân tách số ít của một ma trận thực hoặc phức bằng cách sử dụng python số hoặc thư viện numpy. Thư viện numpy bao gồm các hàm đại số tuyến tính khác nhau, bao gồm một hàm để tính toán phân tích giá trị đơn lẻ của ma trận.

Trong các mô hình học máy , phân tách giá trị đơn lẻ được sử dụng rộng rãi để huấn luyện các mô hình và trong các mạng thần kinh. Nó giúp cải thiện độ chính xác và giảm nhiễu trong dữ liệu. Phép phân tích giá trị đơn biến đổi một vectơ thành một vectơ khác mà không nhất thiết chúng phải có cùng chiều. Do đó, nó làm cho thao tác ma trận trong không gian vectơ dễ dàng và hiệu quả hơn. Nó cũng được sử dụng trong phân tích hồi quy .

Cú pháp của hàm Numpy linalg.svd()

Hàm tính toán phân tách giá trị số ít của ma trận trong python thuộc về mô-đun numpy, có tên là linalg.svd() .

Cú pháp của numpy linalg.svd() như sau:

numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False)

Bạn có thể tùy chỉnh các giá trị boolean đúng và sai dựa trên yêu cầu của mình.

Các tham số của chức năng được đưa ra dưới đây:

  • A->array_like: Đây là ma trận bắt buộc có sự phân tách giá trị đơn lẻ đang được tính toán. Nó có thể là thực hoặc phức tạp theo yêu cầu. Kích thước của nó phải là> = 2.
  • full_matrices->boolean value(tùy chọn): Nếu được đặt thành true, thì phép chuyển vị Hermitian của ma trận đã cho là một hình vuông, nếu nó sai thì không.
  • toán_uv->giá trị boolen (tùy chọn): Nó xác định liệu phép chuyển vị Hermiti có được tính toán hay không ngoài việc phân tách giá trị số ít.
  • hermitian->giá trị boolean (tùy chọn): Ma trận đã cho được coi là hermitian (nghĩa là đối xứng, với các giá trị thực) có thể cung cấp một phương pháp tính toán hiệu quả hơn.

Hàm trả về ba loại ma trận dựa trên các tham số được đề cập ở trên:

  • S->array_like : Vectơ chứa các giá trị số ít theo thứ tự giảm dần với kích thước giống như ma trận ban đầu.
  • u->array_like : Đây là một giải pháp tùy chọn được trả về khi compute_uv được đặt thành True. Nó là một tập hợp các vectơ có giá trị số ít.
  • v->array_like : Tập hợp các mảng đơn vị chỉ được trả về khi compute_uv được đặt thành True.

Nó làm tăng LinALgError khi các giá trị đơn lẻ đa dạng.

Điều kiện tiên quyết để thiết lập

Trước khi chúng tôi đi sâu vào các ví dụ, hãy đảm bảo rằng bạn đã cài đặt mô-đun numpy trong hệ thống cục bộ của mình. Điều này là cần thiết để sử dụng các hàm đại số tuyến tính giống như hàm được thảo luận trong bài viết này. Chạy lệnh sau trong thiết bị đầu cuối của bạn.

pip install numpy

Đó là tất cả những gì bạn cần ngay bây giờ, hãy xem cách chúng tôi sẽ triển khai mã trong phần tiếp theo.

Để tính toán Phân tách giá trị số ít (SVD) trong Python, hãy sử dụng hàm linalg.svd() của thư viện NumPy. Cú pháp của nó là numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False), trong đó A là ma trận mà SVD đang được tính toán. Nó trả về ba ma trận: S, U và V.

Ví dụ 1: Tính toán phân tích giá trị đơn lẻ của ma trận 3 × 3

Trong ví dụ đầu tiên này, chúng ta sẽ lấy một ma trận 3X3 và tính toán phân tích giá trị đơn lẻ của nó theo cách sau:

#importing the numpy module
import numpy as np
#using the numpy.array() function to create an array
A=np.array([[2,4,6],
       [8,10,12],
       [14,16,18]])
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

Đầu ra sẽ là:

the output is=
s(the singular value) =  [3.36962067e+01 2.13673903e+00 8.83684950e-16]
u =  [[-0.21483724  0.88723069  0.40824829]
 [-0.52058739  0.24964395 -0.81649658]
 [-0.82633754 -0.38794278  0.40824829]]
v =  [[-0.47967118 -0.57236779 -0.66506441]
 [-0.77669099 -0.07568647  0.62531805]
 [-0.40824829  0.81649658 -0.40824829]]

ví dụ 1

ví dụ 1

Ví dụ 2: Tính toán phân tích giá trị đơn lẻ của một ma trận ngẫu nhiên

Trong ví dụ này, chúng ta sẽ sử dụng hàm numpy.random.randint() để tạo một ma trận ngẫu nhiên. Hãy đi vào nó!

#importing the numpy module
import numpy as np
#using the numpy.array() function to craete an array
A=np.random.randint(5, 200, size=(3,3))
#display the created matrix
print("The input matrix is=",A)
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

Đầu ra sẽ như sau:

The input matrix is= [[ 36  74 101]
 [104 129 185]
 [139 121 112]]
the output is=
s(the singular value) =  [348.32979681  61.03199722  10.12165841]
u =  [[-0.3635535  -0.48363012 -0.79619769]
 [-0.70916514 -0.41054007  0.57318554]
 [-0.60408084  0.77301925 -0.19372034]]
v =  [[-0.49036384 -0.54970618 -0.67628871]
 [ 0.77570499  0.0784348  -0.62620264]
 [ 0.39727203 -0.83166766  0.38794824]]

ví dụ 2

ví dụ 2

Đề xuất: Numpy linalg.eigvalsh: Hướng dẫn tính toán giá trị riêng .

kết thúc

Trong bài viết này, chúng ta đã khám phá khái niệm phân tách giá trị số ít trong toán học và cách tính toán nó bằng cách sử dụng mô-đun numpy của Python. Chúng tôi đã sử dụng hàm linalg.svd() để tính toán phân tách giá trị số ít của cả ma trận đã cho và ma trận ngẫu nhiên. Numpy cung cấp một phương pháp hiệu quả và dễ sử dụng để thực hiện các phép toán đại số tuyến tính, làm cho nó có giá trị cao trong học máy, mạng thần kinh và phân tích hồi quy. Tiếp tục khám phá các hàm đại số tuyến tính khác trong numpy để nâng cao bộ công cụ toán học của bạn trong Python.

Nguồn bài viết tại: https://www.askpython.com

#python #numpy 

Sử dụng Phân tách giá trị số ít trong Python và Numpy (linalg.svd)

Using Singular Value Separation in Python and Numpy (linalg.svd)

In this pythonn - Numpy tutorial we will learn about Numpy linalg.svd: Singular Value Decomposition in Python. In mathematics, a singular value decomposition (SVD) of a matrix refers to the factorization of a matrix into three separate matrices. It is a more generalized version of an eigenvalue decomposition of matrices. It is further related to the polar decompositions.

In Python, it is easy to calculate the singular decomposition of a complex or a real matrix using the numerical python or the numpy library. The numpy library consists of various linear algebraic functions including one for calculating the singular value decomposition of a matrix.

In machine learning models, singular value decomposition is widely used to train models and in neural networks. It helps in improving accuracy and in reducing the noise in data. Singular value decomposition transforms one vector into another without them necessarily having the same dimension. Hence, it makes matrix manipulation in vector spaces easier and efficient. It is also used in regression analysis.

Syntax of Numpy linalg.svd() function

The function that calculates the singular value decomposition of a matrix in python belongs to the numpy module, named linalg.svd() .

The syntax of the numpy linalg.svd () is as follows:

numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False)

You can customize the true and false boolean values based on your requirements.

The parameters of the function are given below:

  • A->array_like: This is the required matrix whose singular value decomposition is being calculated. It can be real or complex as required. It’s dimension should be >= 2.
  • full_matrices->boolean value(optional): If set to true, then the Hermitian transpose of the given matrix is a square, if it’s false then it isn’t.
  • compute_uv->boolen value(optional): It determines whether the Hermitian transpose is to be calculated or not in addition to the singular value decomposition.
  • hermitian->boolean value(optional): The given matrix is considered hermitian(that is symmetric, with real values) which might provide a more efficient method for computation.

The function returns three types of matrices based on the parameters mentioned above:

  • S->array_like: The vector containing the singular values in the descending order with dimensions same as the original matrix.
  • u->array_like: This is an optional solution that is returned when compute_uv is set to True. It is a set of vectors with singular values.
  • v-> array_like: Set of unitary arrays only returned when compute_uv is set to True.

It raises a LinALgError when the singular values diverse.

Prerequisites for setup

Before we dive into the examples, make sure you have the numpy module installed in your local system. This is required for using linear algebraic functions like the one discussed in this article. Run the following command in your terminal.

pip install numpy

That’s all you need right now, let’s look at how we will implement the code in the next section.

To calculate Singular Value Decomposition (SVD) in Python, use the NumPy library’s linalg.svd() function. Its syntax is numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False), where A is the matrix for which SVD is being calculated. It returns three matrices: S, U, and V.

Example 1: Calculating the Singular Value Decomposition of a 3×3 Matrix

In this first example we will take a 3X3 matrix and compute its singular value decomposition in the following way:

#importing the numpy module
import numpy as np
#using the numpy.array() function to create an array
A=np.array([[2,4,6],
       [8,10,12],
       [14,16,18]])
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

The output will be:

the output is=
s(the singular value) =  [3.36962067e+01 2.13673903e+00 8.83684950e-16]
u =  [[-0.21483724  0.88723069  0.40824829]
 [-0.52058739  0.24964395 -0.81649658]
 [-0.82633754 -0.38794278  0.40824829]]
v =  [[-0.47967118 -0.57236779 -0.66506441]
 [-0.77669099 -0.07568647  0.62531805]
 [-0.40824829  0.81649658 -0.40824829]]

Example 1

Example 1

Example 2: Calculating the Singular Value Decomposition of a Random Matrix

In this example, we will be using the numpy.random.randint() function to create a random matrix. Let’s get into it!

#importing the numpy module
import numpy as np
#using the numpy.array() function to craete an array
A=np.random.randint(5, 200, size=(3,3))
#display the created matrix
print("The input matrix is=",A)
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

The output will be as follows:

The input matrix is= [[ 36  74 101]
 [104 129 185]
 [139 121 112]]
the output is=
s(the singular value) =  [348.32979681  61.03199722  10.12165841]
u =  [[-0.3635535  -0.48363012 -0.79619769]
 [-0.70916514 -0.41054007  0.57318554]
 [-0.60408084  0.77301925 -0.19372034]]
v =  [[-0.49036384 -0.54970618 -0.67628871]
 [ 0.77570499  0.0784348  -0.62620264]
 [ 0.39727203 -0.83166766  0.38794824]]

Example 2

Example 2

Suggested: Numpy linalg.eigvalsh: A Guide to Eigenvalue Computation.

Wrapping Up

In this article, we explored the concept of singular value decomposition in mathematics and how to calculate it using Python’s numpy module. We used the linalg.svd() function to compute the singular value decomposition of both given and random matrices. Numpy provides an efficient and easy-to-use method for performing linear algebra operations, making it highly valuable in machine learning, neural networks, and regression analysis. Keep exploring other linear algebraic functions in numpy to enhance your mathematical toolset in Python.

Article source at: https://www.askpython.com

#python #numpy 

Using Singular Value Separation in Python and Numpy (linalg.svd)

Как сортировать массивы NumPy в Python

Многие популярные библиотеки Python используют NumPy как основу своей инфраструктуры. Помимо нарезки, нарезки и управления массивами, библиотека NumPy предлагает различные функции, позволяющие сортировать элементы в массиве.

Сортировка массива полезна во многих приложениях информатики.

Он позволяет упорядочивать данные в упорядоченной форме, быстро находить элементы и экономить место для хранения данных.

После установки пакета импортируйте его, выполнив следующую команду:

import numpy

Алгоритмы сортировки NumPy

Функция numpy.sort() позволяет сортировать массив с использованием различных алгоритмов сортировки. Вы можете указать тип используемого алгоритма, установив параметр «вид».

По умолчанию используется « быстрая сортировка ». Другие алгоритмы сортировки, которые поддерживает NumPy, включают сортировку слиянием, пирамидальную сортировку, интросортировку и стабильную сортировку.

Если вы установите для параметра kind значение «stable», функция автоматически выберет лучший алгоритм стабильной сортировки на основе типа данных массива.

В общем, сортировка слиянием и стабильная сортировка сопоставляются с временной сортировкой и сортировкой по основанию под прикрытием, в зависимости от типа данных.

Алгоритмы сортировки можно охарактеризовать по их средней скорости работы, пространственной сложности и производительности в наихудшем случае.

Более того, стабильный алгоритм сортировки сохраняет элементы в их относительном порядке, даже если у них одинаковые ключи. Вот краткое изложение свойств алгоритмов сортировки NumPy.

Тип алгоритмаСредняя скоростьХудший случайХудшее пространство

 

Стабильный

быстрая сортировка1О (п ^ 2)0нет
Сортировка слиянием2О (п * журнал (п))~n/2да
сортировка по времени2О (п * журнал (п))~n/2да
сортировка кучей3О (п * журнал (п))0нет

Стоит отметить, что функция NumPy numpy.sort() возвращает отсортированную копию массива. Однако это не так при сортировке по последней оси.

Кроме того, сортировка по последней оси выполняется быстрее и требует меньше места по сравнению с другими осями.

Давайте создадим массив чисел и отсортируем его, используя выбранный нами алгоритм. Функция numpy.sort() принимает аргумент, чтобы установить параметр «вид» в соответствии с нашим выбором алгоритма.

a = [1,2,8,9,6,1,3,6]
numpy.sort(a, kind='quicksort')

Сортировать по возрастанию

По умолчанию NumPy сортирует массивы в порядке возрастания. Вы можете просто передать свой массив функции numpy.sort(), которая принимает массивоподобный объект в качестве аргумента.

Функция возвращает копию отсортированного массива, а не сортирует его на месте. Если вы хотите отсортировать массив на месте, вам нужно создать объект ndarray с помощью функции numpy.array().

Сортировка на месте

Во-первых, давайте создадим объект ndarray.

a = numpy.array([1,2,1,3])

Чтобы отсортировать массив на месте, мы можем использовать метод sort из класса ndarray:

a.sort(axis= -1, kind=None, order=None)

Сортировка путем создания копии массива

Используя функцию numpy.sort, вы можете сортировать любой объект, подобный массиву, без необходимости создавать объект ndarray. Это вернет копию массива того же типа и формы, что и исходный массив.

a = [1,2,1,3]
numpy.sort(a)

Сортировать по убыванию

Если вы хотите отсортировать массив в порядке убывания, вы можете использовать ту же функцию numpy.sort(). Использование синтаксиса массива array[::-1] позволяет перевернуть массив.

Сортировка на месте

Чтобы отсортировать ndarray на месте, вызовите numpy.ndarray.sort().

a = numpy.array([1,2,1,3])
a[::-1].sort()
print(a)

Сортировка путем создания копии массива

В качестве альтернативы вы можете использовать numpy.sort(array)[::-1] для создания копии обратного массива, отсортированного от наибольшего к наименьшему значению.

a = [1,2,1,3]
print(numpy.sort(a)[::-1])

Сортировка 2D-массива

В предыдущем примере наш массив представляет собой одномерный объект. Метод принимает необязательный параметр «ось», который используется для указания оси, по которой сортируется массив.

Это используется при работе с многомерными массивами. В качестве аргумента принимает целое число. Если аргумент не передается, используется значение по умолчанию, равное -1.

Это возвращает массив, отсортированный по последней оси. Кроме того, вы можете указать ось, по которой следует сортировать, установив для этого параметра соответствующее целочисленное значение.

Прежде чем указать ось, вам нужно понять, как работают оси NumPy.

Оси NumPy

В NumPy массивы аналогичны матрицам в математике. Они состоят из осей, которые аналогичны осям в декартовой системе координат.

В двумерном массиве NumPy оси могут быть идентифицированы как двумерная декартова система координат, которая имеет ось x и ось y.

Ось X — это ось строки, которая представлена ​​как 0. Она направлена ​​вниз. Ось Y — это ось столбца, которая проходит горизонтально в направлении.

Чтобы отсортировать 2D-массив NumPy по строке или столбцу, вы можете установить для параметра оси значение 0 или 1 соответственно.

Начнем с создания двумерного массива NumPy:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]])
numpy.sort(a, axis= 1, kind=None, order=None) 

Сортировка 3D-массива

Сортировка трехмерного массива очень похожа на сортировку двумерного массива. В предыдущем примере мы работали с двумерным массивом. Если мы создадим трехмерный массив, у нас будет 3 оси.

В этом случае ось x представлена ​​как 0, ось y представлена ​​как 1, а ось z представлена ​​как 2.

Давайте создадим массив 3D NumPy.

a = numpy.array([[[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]], [[12, 11, 13, 23], [23, 7, 12, 14], [31, 34, 33, 17]], [[10, 6, 13, 22], [34, 7, 20, 14], [31, 34, 33, 7]]])

Затем мы можем установить ось = 2 для сортировки по третьей оси.

numpy.sort(a, axis= 2, kind=None, order=None) 

Сортировать по столбцу

Существуют различные способы сортировки массива NumPy по столбцу. Вы можете установить параметр оси или параметр порядка в функции numpy.sort().

В приведенном выше примере мы научились сортировать массив вместе со всеми его столбцами, установив для параметра «ось» значение 1. Мы можем отсортировать массив по определенному столбцу, используя атрибут «порядок».

Сортировка по порядку

Вы можете отсортировать массив NumPy на основе поля или последовательности полей при условии, что вы определяете его с полями в dtype массива.

Это особенно полезно при работе со столбцами в электронной таблице, когда вы хотите отсортировать таблицу, используя поле определенного столбца.

numpy.sort() позволит вам сделать это легко. Это позволяет вам передать поле в виде строки в параметре «заказ».

numpy.sort(a, axis=- 1, kind=None, order=None) 

Давайте создадим массив с полями, определенными как «имя», «возраст» и «оценка».

dtype = [('name', 'S10'), ('age', int), ('score', float)]
values =  [('Alice', 18, 78), ('Bob', 19, 80), ('James', 17, 81)]
a = numpy.array(values, dtype=dtype)

Затем вы можете указать, какое поле сортировать, передав его в виде строки параметру «порядок».

numpy.sort(a, order='score')

Сортировать по нескольким столбцам

Если вы хотите отсортировать массив более чем по одному полю, вы можете определить порядок сортировки, используя несколько полей в качестве параметра «порядок».

Вы можете указать, какие поля сравнивать, передав аргумент в виде списка параметру «порядок». Нет необходимости указывать все поля, так как NumPy использует неуказанные поля в том порядке, в котором они появляются в dtype.

numpy.sort(a, order=['score', 'name'])

Сортировать по строке

Точно так же, как вы сортируете 2D-массив NumPy по столбцу (устанавливая ось = 1), вы можете установить для параметра оси значение 0, чтобы отсортировать массив по строке. Используя тот же пример, что и выше, мы можем отсортировать 2D-массив по строкам следующим образом:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]])
numpy.sort(a, axis= 0, kind=None, order=None) 

Приведенный выше метод сортирует все строки в массиве. Если вы хотите отсортировать только определенную строку массива, вам нужно будет проиндексировать эту строку.

В таких случаях пригодится функция numpy.argsort(). Он выполняет косвенную сортировку по указанной оси и возвращает массив индексов в отсортированном порядке.

Обратите внимание, что функция не возвращает отсортированный массив. Вместо этого он возвращает массив той же формы, содержащий индексы в отсортированном порядке.

Затем вы можете передать значения, возвращенные в исходный массив, чтобы изменить расположение строк.

Используя тот же массив, что и выше:

a = numpy.array([[10, 11, 13, 22],  [23, 7, 20, 14],  [31, 11, 33, 17]])

Давайте отсортируем его по 3-й строке, т.е. строке в позиции индекса 2.

indices = numpy.argsort(a[2])

Мы можем передать результат в наш массив, чтобы получить отсортированный массив на основе 2-й строки.

sorted = a[:, indices]
print(sorted)

Сортировка по столбцу до указанной строки или от определенной строки

Вы можете сортировать массив до указанной строки или из определенной строки, а не сортировать весь массив. Это легко сделать с помощью оператора [].

Например, рассмотрим следующий массив.

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17], [17, 12, 33, 16]])

Если вы хотите отсортировать только первые две строки массива, вы можете передать нарезанный массив функции numpy.sort().

index = 2
numpy.sort(a[:index])

Это возвращает отсортированный фрагмент исходного массива.

Точно так же, если вы хотите отсортировать 2-ю и 3-ю строки массива, вы можете сделать это следующим образом:

numpy.sort(a[1:3])

Теперь, если вы хотите отсортировать столбец массива, используя только диапазон строк, вы можете использовать тот же оператор [] для разделения столбца.

Используя тот же массив, что и выше, если мы хотим отсортировать первые 3 строки 2-го столбца, мы можем разрезать массив следующим образом:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17], [17, 12, 33, 16]])
sort_array = a[0:3, 1]
numpy.sort(sort_array)

Сортировать по дате и времени

Если вы работаете с данными, содержащими элемент времени, вы можете отсортировать их по дате или времени.

В Python есть модуль для работы с временными данными, который упрощает работу. Затем вы можете отсортировать данные, используя numpy.sort().

Во-первых, давайте импортируем модуль datetime.

import datetime

Затем мы можем создать массив NumPy, в котором хранятся объекты даты и времени.

a = numpy.array([datetime.datetime(2021, 1, 1, 12, 0), datetime.datetime(2021, 9, 1, 12, 0), datetime.datetime(2021, 5, 1, 12, 0)])

Чтобы отсортировать массив, мы можем передать его в numpy.sort().

numpy.sort(a)

Сортировать с лямбда

В Python вы можете создать анонимную функцию, используя ключевое слово «лямбда». Такие функции полезны, когда вам нужно использовать их только временно в вашем коде.

NumPy поддерживает использование лямбда-функций внутри массива. Вы можете передать функцию для перебора каждого элемента в массиве.

Рассмотрим случай, когда мы хотим получить четные элементы из массива. Кроме того, мы хотим отсортировать полученный четный массив.

Мы можем использовать лямбда-функцию, чтобы сначала отфильтровать значения и передать их в numpy.sort().

Начнем с создания массива.

a = [2,3,6,4,2,8,9,5,2,0,1,9]
even = list(filter(lambda x: x%2==0, a))
numpy.sort(even)

Сортировка по значениям NaN

По умолчанию NumPy сортирует массив таким образом, что значения NaN помещаются последними. Это создает неоднозначность, когда вы хотите получить индекс минимального или максимального элемента в массиве.

Например, взгляните на следующий фрагмент кода:

a = numpy.array([35, 55, 33, 17])

Если мы хотим получить наименьший элемент в массиве, мы можем использовать функцию numpy.argmin(). Но если массив содержит значения NaN, функция numpy.argmin() возвращает индекс значения NaN как наименьший элемент.

a = numpy.array([35, numpy.nan, 33, 17])
numpy.argmin(a)

Точно так же, когда вы хотите получить индекс самого большого массива, numpy.argmax() также возвращает индекс значения NaN как самого большого элемента.

numpy.argmax(a)

При работе со значениями NaN в массиве вместо этого следует использовать numpy.nanargmin() и numpy.nanargmax(). Эти функции возвращают индексы минимального и максимального значений на указанной оси, игнорируя при этом все значения NaN.

Здесь функции вернут правильный индекс минимального и максимального значений в указанном выше массиве.

numpy.nanargmin(a)
numpy.nanargmax(a)

Сортировка массива NumPy, содержащего числа с плавающей запятой

NumPy легко обрабатывает тип данных с плавающей запятой, и сортировка не требует дополнительной работы. Вы можете передать массив с плавающей запятой так же, как и любой другой массив.

a = numpy.array([[10.3, 11.42, 10.002, 22.2], [7.08, 7.089, 10.20, 12.2], [7.4, 8.09, 3.6, 17]])
numpy.sort(a)

Заключение

Широкий спектр функций сортировки NumPy позволяет легко сортировать массивы для любой задачи. Независимо от того, работаете ли вы с одномерным массивом или многомерным массивом, NumPy сортирует его для вас эффективно и в сжатом коде.

Здесь мы обсудили лишь некоторые возможности функций сортировки NumPy.

Оригинальный источник статьи: https://likegeeks.com/

#python #numpy #arrays 

Как сортировать массивы NumPy в Python
津田  淳

津田 淳

1679771400

如何在 Python 中对 NumPy 数组进行排序

许多 Python 的流行库在底层使用NumPy作为其基础设施的基本支柱。除了切片、切块和操作数组之外,NumPy 库还提供了各种函数,可让您对数组中的元素进行排序。

对数组进行排序在计算机科学的许多应用中都很有用。

它允许您以有序的形式组织数据、快速查找元素并以节省空间的方式存储数据。

安装包后,通过运行以下命令将其导入:

import numpy

NumPy 排序算法

numpy.sort() 函数允许您使用各种排序算法对数组进行排序。您可以通过设置“种类”参数来指定要使用的算法种类。

默认使用“快速排序”。NumPy 支持的其他排序算法包括 mergesort、heapsort、introsort 和 stable。

如果将 kind 参数设置为 'stable',该函数会根据数组数据类型自动选择最稳定的排序算法。

通常,'mergesort' 和 'stable' 都映射到 timesort 和 radixsort,具体取决于数据类型。

排序算法可以通过它们的平均运行速度、空间复杂度和最坏情况下的性能来表征。

此外,稳定的排序算法使项目保持相对顺序,即使它们具有相同的键。下面是 NumPy 排序算法属性的总结。

算法种类平均速度最坏的情况下最差空间

 

稳定的

快速排序1个O(n^2)0
合并排序2个O(n*log(n))~n/2是的
时间排序2个O(n*log(n))~n/2是的
堆排序3个O(n*log(n))0

值得注意的是,NumPy 的 numpy.sort() 函数返回数组的排序副本。但是,沿最后一个轴排序时情况并非如此。

与其他轴相比,沿最后一个轴排序的速度也更快,并且需要的空间更少。

让我们创建一个数字数组并使用我们选择的算法对其进行排序。numpy.sort() 函数接受一个参数来将“kind”参数设置为我们选择的算法。

a = [1,2,8,9,6,1,3,6]
numpy.sort(a, kind='quicksort')

按升序排序

默认情况下,NumPy 按升序对数组进行排序。您可以简单地将数组传递给 numpy.sort() 函数,该函数将类似数组的对象作为参数。

该函数返回已排序数组的副本,而不是就地排序。如果要就地对数组进行排序,则需要使用 numpy.array() 函数创建一个 ndarray 对象。

就地排序

首先,让我们构造一个 ndarray 对象。

a = numpy.array([1,2,1,3])

要就地对数组进行排序,我们可以使用 ndarray 类中的 sort 方法:

a.sort(axis= -1, kind=None, order=None)

通过复制数组进行排序

通过使用 numpy.sort 函数,您可以对任何类似数组的对象进行排序,而无需创建 ndarray 对象。这将返回与原始数组具有相同类型和形状的数组副本。

a = [1,2,1,3]
numpy.sort(a)

按降序排列

如果要按降序对数组进行排序,可以使用相同的 numpy.sort() 函数。使用数组语法 array[::-1] 可以反转数组。

就地排序

要就地对 ndarray 进行排序,请调用 numpy.ndarray.sort()。

a = numpy.array([1,2,1,3])
a[::-1].sort()
print(a)

通过复制数组进行排序

或者,您可以使用 numpy.sort(array)[::-1] 创建从最大值到最小值排序的反向数组的副本。

a = [1,2,1,3]
print(numpy.sort(a)[::-1])

排序二维数组

在前面的示例中,我们的数组是一维对象。该方法采用可选参数“axis”,用于指定对数组进行排序的轴。

这在处理多维数组时使用。它需要一个整数作为参数。如果未传递任何参数,它会使用设置为 -1 的默认值。

这将返回一个沿最后一个轴排序的数组。或者,您可以通过将此参数设置为相应的整数值来指定排序所沿的轴。

在指定轴之前,您需要了解 NumPy 轴的工作原理。

NumPy 轴

在 NumPy 中,数组类似于数学中的矩阵。它们由类似于笛卡尔坐标系中的轴的轴组成。

在 2D NumPy 数组中,轴可以标识为具有 x 轴和 y 轴的二维笛卡尔坐标系。

x 轴是行轴,用 0 表示。它向下运行。y 轴是在 direction 上水平运行的列轴。

要按行或列对二维 NumPy 数组进行排序,可以将轴参数分别设置为 0 或 1。

让我们从创建一个 2D NumPy 数组开始:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]])
numpy.sort(a, axis= 1, kind=None, order=None) 

排序 3D 数组

对 3D 数组进行排序与​​对 2D 数组进行排序非常相似。在前面的示例中,我们使用了二维数组。如果我们创建一个 3D 数组,我们将有 3 个轴。

在这种情况下,x 轴表示为 0,y 轴表示为 1,z 轴表示为 2。

让我们创建一个 3D NumPy 数组。

a = numpy.array([[[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]], [[12, 11, 13, 23], [23, 7, 12, 14], [31, 34, 33, 17]], [[10, 6, 13, 22], [34, 7, 20, 14], [31, 34, 33, 7]]])

接下来,我们可以设置 axis=2 来沿第三轴排序。

numpy.sort(a, axis= 2, kind=None, order=None) 

按列排序

有多种方法可以按列对 NumPy 数组进行排序。您可以在 numpy.sort() 函数中设置 'axis' 参数或 'order' 参数。

在上面的示例中,我们学习了如何通过将“axis”参数设置为 1 来对数组及其所有列进行排序。我们可以使用“order”属性沿特定列对数组进行排序。

使用顺序排序

您可以根据字段或字段序列对 NumPy 数组进行排序,前提是您使用数组数据类型中的字段定义它。

这在处理电子表格中的列时特别有用,您希望使用特定列的字段对表格进行排序。

numpy.sort() 让你轻松做到这一点。它允许您在“order”参数中将字段作为字符串传递。

numpy.sort(a, axis=- 1, kind=None, order=None) 

让我们创建一个数组,其中的字段定义为“姓名”、“年龄”和“分数”。

dtype = [('name', 'S10'), ('age', int), ('score', float)]
values =  [('Alice', 18, 78), ('Bob', 19, 80), ('James', 17, 81)]
a = numpy.array(values, dtype=dtype)

然后,您可以通过将字段作为字符串传递给“order”参数来指定要排序的字段。

numpy.sort(a, order='score')

按多列排序

如果您希望按多个字段对数组进行排序,您可以使用多个字段作为 'order' 参数来定义排序顺序。

您可以通过将参数作为列表传递给“order”参数来指定要比较的字段。没有必要指定所有字段,因为 NumPy 按照它们在 dtype 中出现的顺序使用未指定的字段。

numpy.sort(a, order=['score', 'name'])

按行排序

正如您按列对 2D NumPy 数组进行排序(通过设置 axis=1)一样,您可以将 axis 参数设置为 0 以按行对数组进行排序。使用与上面相同的示例,我们可以按行对二维数组进行排序:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]])
numpy.sort(a, axis= 0, kind=None, order=None) 

上面的方法对数组中的所有行进行排序。如果只想对数组的特定行进行排序,则需要为该行建立索引。

numpy.argsort() 函数在这种情况下会派上用场。它沿指定的轴执行间接排序,并返回按排序顺序排列的索引数组。

请注意,该函数不返回排序后的数组。相反,它返回一个相同形状的数组,其中包含按排序顺序排列的索引。

然后,您可以将返回的值传递给原始数组以更改行的位置。

使用与上面相同的数组:

a = numpy.array([[10, 11, 13, 22],  [23, 7, 20, 14],  [31, 11, 33, 17]])

让我们按第 3 行对其进行排序,即索引位置为 2 的行。

indices = numpy.argsort(a[2])

我们可以将结果传递给我们的数组,以检索基于第二行的排序数组。

sorted = a[:, indices]
print(sorted)

按列排序直到指定行或从特定行

您可以对数组进行排序直到指定行或从特定行排序,而不是对整个数组进行排序。使用 [] 运算符很容易做到这一点。

例如,考虑以下数组。

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17], [17, 12, 33, 16]])

如果您只想对数组的前 2 行进行排序,则可以将切片数组传递给 numpy.sort() 函数。

index = 2
numpy.sort(a[:index])

这将返回原始数组的排序切片。

同样,如果你想从数组的第 2 行和第 3 行开始排序,你可以按如下方式进行:

numpy.sort(a[1:3])

现在,如果您只想使用一定范围的行对数组的列进行排序,您可以使用相同的 [] 运算符对该列进行切片。

使用与上面相同的数组,如果我们希望对第 2 列的前 3 行进行排序,我们可以将数组切片为:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17], [17, 12, 33, 16]])
sort_array = a[0:3, 1]
numpy.sort(sort_array)

按日期时间排序

如果您正在处理具有时间元素的数据,您可能希望根据日期或时间对其进行排序。

Python 有一个用于处理时间数据的模块,使其易于使用。然后,您可以使用 numpy.sort() 对数据进行排序。

首先,让我们导入 datetime 模块。

import datetime

接下来,我们可以创建一个存储日期时间对象的 NumPy 数组。

a = numpy.array([datetime.datetime(2021, 1, 1, 12, 0), datetime.datetime(2021, 9, 1, 12, 0), datetime.datetime(2021, 5, 1, 12, 0)])

要对数组进行排序,我们可以将其传递给 numpy.sort()。

numpy.sort(a)

使用 Lambda 排序

在 Python 中,您可以使用“lambda”关键字创建匿名函数。当您只需要在代码中临时使用它们时,这些函数很有用。

NumPy 支持在数组中使用 lambda 函数。您可以传递函数以迭代数组中的每个元素。

考虑我们想要从数组中检索偶数元素的情况。此外,我们想要对生成的偶数数组进行排序。

我们可以使用 lambda 函数首先过滤出值并将其传递给 numpy.sort()。

让我们从创建一个数组开始。

a = [2,3,6,4,2,8,9,5,2,0,1,9]
even = list(filter(lambda x: x%2==0, a))
numpy.sort(even)

使用 NaN 值排序

默认情况下,NumPy 以将 NaN 值推到最后的方式对数组进行排序。当您想要检索数组中最小或最大元素的索引时,这会产生歧义。

例如,看看下面的代码片段:

a = numpy.array([35, 55, 33, 17])

如果我们想要检索数组中的最小元素,我们可以使用 numpy.argmin() 函数。但是,如果数组包含 NaN 值,则 numpy.argmin() 函数返回 NaN 值的索引作为最小元素。

a = numpy.array([35, numpy.nan, 33, 17])
numpy.argmin(a)

同样,当你想检索最大数组的索引时,numpy.argmax() 也会返回 NaN 值的索引作为最大元素。

numpy.argmax(a)

在处理数组中的 NaN 值时,我们应该改用 numpy.nanargmin() 和 numpy.nanargmax()。这些函数返回指定轴中最小值和最大值的索引,同时忽略所有 NaN 值。

在这里,函数将返回上述数组中最小值和最大值的正确索引。

numpy.nanargmin(a)
numpy.nanargmax(a)

对包含浮点数的 NumPy 数组进行排序

NumPy 无缝处理浮点数据类型,排序不需要任何额外的工作。您可以像传递任何其他数组一样传递浮点数组。

a = numpy.array([[10.3, 11.42, 10.002, 22.2], [7.08, 7.089, 10.20, 12.2], [7.4, 8.09, 3.6, 17]])
numpy.sort(a)

结论

NumPy 广泛的排序函数使得为任何任务对数组排序变得容易。无论您使用的是一维数组还是多维数组,NumPy 都能以简洁的代码高效地为您排序。

在这里,我们只讨论了 NumPy 排序函数的一些功能。

文章原文出处:https: //likegeeks.com/

#python #numpy #arrays 

如何在 Python 中对 NumPy 数组进行排序

How to Sort NumPy Arrays in Python

Many of Python’s popular libraries use NumPy under the hood as a fundamental pillar of their infrastructure. Beyond slicing, dicing, and manipulating arrays, the NumPy library offers various functions that allow you to sort elements in an array.

Sorting an array is useful in many applications of computer science.

It lets you organize data in ordered form, look up elements quickly, and store data in a space-efficient manner.

Once you’ve installed the package, import it by running the following command:

import numpy

NumPy Sort Algorithms

The numpy.sort() function allows you to sort an array using various sorting algorithms. You can specify the kind of algorithm to use by setting the ‘kind’ parameter.

The default uses ‘quicksort’. Other sorting algorithms that NumPy supports include mergesort, heapsort, introsort, and stable.

If you set the kind parameter to ‘stable’, the function automatically chooses the best stable sorting algorithm based upon the array data type.

In general, ‘mergesort’ and ‘stable’ are both mapped to timesort and radixsort under the cover, depending on the data type.

The sorting algorithms can be characterized by their average running speed, space complexity, and worst-case performance.

Moreover, a stable sorting algorithm keeps the items in their relative order, even when they have the same keys. Here is a summary of the properties of NumPy’s sorting algorithms.

Kind of AlgorithmAverage SpeedWorst CaseWorst Space

 

Stable

quicksort1O(n^2)0no
mergesort2O(n*log(n))~n/2yes
timesort2O(n*log(n))~n/2yes
heapsort3O(n*log(n))0no

It is worth noting that NumPy’s numpy.sort() function returns a sorted copy of an array. However, this is not the case when sorting along the last axis.

It is also faster to sort along the last axis and requires less space compared to other axes.

Let’s create an array of numbers and sort it using our choice of algorithm. The numpy.sort() function takes in an argument to set the ‘kind’ parameter to our choice of algorithm.

a = [1,2,8,9,6,1,3,6]
numpy.sort(a, kind='quicksort')

Sort in Ascending Order

By default, NumPy sorts arrays in ascending order. You can simply pass your array to the numpy.sort() function that takes an array-like object as an argument.

The function returns a copy of the sorted array rather than sorting it in-place. If you want to sort an array in-place, you need to create an ndarray object using the numpy.array() function.

Sort in-place

First, let’s construct an ndarray object.

a = numpy.array([1,2,1,3])

To sort an array in-place, we can use the sort method from the ndarray class:

a.sort(axis= -1, kind=None, order=None)

Sort by making a copy of the array

By using numpy.sort function, you can sort any array-like object without needing to create an ndarray object. This will return a copy of the array of the same type and shape as the original array.

a = [1,2,1,3]
numpy.sort(a)

Sort in Descending Order

If you want to sort an array in descending order, you can make use of the same numpy.sort() function. Using the array syntax array[::-1] lets you reverse the array.

Sort in-place

To sort an ndarray in-place, call numpy.ndarray.sort().

a = numpy.array([1,2,1,3])
a[::-1].sort()
print(a)

Sort by making a copy of the array

Alternatively, you can use numpy.sort(array)[::-1] to create a copy of a reverse array that is sorted from the largest to smallest value.

a = [1,2,1,3]
print(numpy.sort(a)[::-1])

Sort 2D Array

In the previous example, our array is a 1D object. The method takes an optional parameter ‘axis’ that is used to specify the axis along which to sort the array.

This is used when working with multidimensional arrays. It takes an integer as an argument. If no argument is passed, it uses the default value that is set to -1.

This returns an array that is sorted along the last axis. Alternatively, you can specify the axis along which to sort by setting this parameter to the corresponding integer value.

Before specifying the axis, you need to understand how NumPy axes work.

NumPy Axes

In NumPy, arrays are analogous to matrices in math. They consist of axes that are similar to the axes in a Cartesian coordinate system.

In a 2D NumPy array, the axes could be identified as a 2-dimensional Cartesian coordinate system that has an x-axis and the y axis.

The x-axis is the row axis which is represented as 0. It runs downwards in direction. The y-axis is the column axis that runs horizontally in direction.

To sort a 2D NumPy array by a row or column, you can set the axis parameter to 0 or 1, respectively.

Let’s begin by creating a 2D NumPy array:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]])
numpy.sort(a, axis= 1, kind=None, order=None) 

Sort 3D Array

Sorting a 3D array is quite similar to sorting a 2D array. We worked with a 2D array in the previous example. If we create a 3D array, we will have 3 axes.

In that case, the x-axis is represented as 0, the y-axis is represented as 1, and the z-axis is represented as 2.

Let’s create a 3D NumPy array.

a = numpy.array([[[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]], [[12, 11, 13, 23], [23, 7, 12, 14], [31, 34, 33, 17]], [[10, 6, 13, 22], [34, 7, 20, 14], [31, 34, 33, 7]]])

Next, we can set the axis=2 to sort along the third axis.

numpy.sort(a, axis= 2, kind=None, order=None) 

Sort by Column

There are various ways to sort a NumPy array by a column. You can set the ‘axis’ parameter or the ‘order’ parameter in the numpy.sort() function.

In the above example, we learned how to sort an array along with all its columns by setting the ‘axis’ parameter to 1. We can sort an array along a particular column using the ‘order’ attribute.

Sort Using Order

You can sort a NumPy array based on a field or a sequence of fields, provided that you define it with fields in the array’s dtype.

This is especially useful when working with columns in a spreadsheet where you wish to sort the table using the field of a specific column.

The numpy.sort() let’s you do this easily. It allows you to pass the field as a string in the ‘order’ parameter.

numpy.sort(a, axis=- 1, kind=None, order=None) 

Let’s create an array with fields defined as ‘name’, ‘age’, and ‘score’.

dtype = [('name', 'S10'), ('age', int), ('score', float)]
values =  [('Alice', 18, 78), ('Bob', 19, 80), ('James', 17, 81)]
a = numpy.array(values, dtype=dtype)

You can then specify which field to sort by passing it as a string to the ‘order’ parameter.

numpy.sort(a, order='score')

Sort by Multiple Columns

If you wish to sort the array by more than one field, you can define the sort order by using multiple fields as the ‘order’ parameter.

You can specify which fields to compare by passing the argument as a list to the ‘order’ parameter. It is not necessary to specify all fields as NumPy uses the unspecified fields in the order in which they come up in the dtype.

numpy.sort(a, order=['score', 'name'])

Sort by Row

Just as you sort a 2D NumPy array by column (by setting axis=1), you can set the axis parameter to 0 to sort the array by row. Using the same example as above, we can sort the 2D array by rows as:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]])
numpy.sort(a, axis= 0, kind=None, order=None) 

The above method sorts all the rows in the array. If you want to sort only a specific row of the array, you will need to index that row.

The numpy.argsort() function comes in handy in such cases. It performs an indirect sort along the specified axis and returns an array of indices in sorted order.

Note that the function doesn’t return the sorted array. Rather, it returns an array of the same shape that contains the indices in sorted order.

You can then pass the values returned to the original array to change the positioning of rows.

Using the same array as above:

a = numpy.array([[10, 11, 13, 22],  [23, 7, 20, 14],  [31, 11, 33, 17]])

Let’s sort it by the 3rd row, i.e. the row at index position 2.

indices = numpy.argsort(a[2])

We can pass the result to our array to retrieve a sorted array based on the 2nd row.

sorted = a[:, indices]
print(sorted)

Sort by Column till Specified Row or from Specific Row

You can sort an array till a specified row or from a specific row rather than sorting the whole array. This is easy to do with the [] operator.

For instance, consider the following array.

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17], [17, 12, 33, 16]])

If you only wish to sort the first 2 rows of the array, you can pass a sliced array to numpy.sort() function.

index = 2
numpy.sort(a[:index])

This returns a sorted slice of the original array.

Similarly, if you wish to sort from the 2nd and 3rd rows of the array, you can do it as follows:

numpy.sort(a[1:3])

Now, if you want to sort a column of the array only using a range of rows, you can use the same [] operator to slice the column.

Using the same array as above, if we wish to sort first 3 rows of the 2nd column, we can slice the array as:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17], [17, 12, 33, 16]])
sort_array = a[0:3, 1]
numpy.sort(sort_array)

Sort by Datetime

If you’re working with data that has an element of time, you may want to sort it based upon the date or time.

Python has a module for working with time data that makes it easy to work with. You can then sort the data using numpy.sort().

Firstly, let’s import the datetime module.

import datetime

Next, we can create a NumPy array that stores datetime objects.

a = numpy.array([datetime.datetime(2021, 1, 1, 12, 0), datetime.datetime(2021, 9, 1, 12, 0), datetime.datetime(2021, 5, 1, 12, 0)])

To sort the array, we can pass it to numpy.sort().

numpy.sort(a)

Sort with Lambda

In Python, you can create an anonymous function using the ‘lambda’ keyword. Such functions are useful when you only need to use them temporarily in your code.

NumPy supports the usage of lambda functions within an array. You can pass the function to iterate over each element in the array.

Consider a case where we want to retrieve even elements from an array. Furthermore, we want to sort the resulting even array.

We can use a lambda function to first filter out the values and pass it to numpy.sort().

Let’s begin by creating an array.

a = [2,3,6,4,2,8,9,5,2,0,1,9]
even = list(filter(lambda x: x%2==0, a))
numpy.sort(even)

Sort with NaN Values

By default, NumPy sorts the array in a way that NaN values are pushed to the last. This creates ambiguity when you want to retrieve the index of the minimum or the maximum element in the array.

For instance, take a look at the following code snippet:

a = numpy.array([35, 55, 33, 17])

If we want to retrieve the smallest element in the array, we can use the numpy.argmin() function. But, if the array contains NaN values, the numpy.argmin() function returns the index of the NaN value as the smallest element.

a = numpy.array([35, numpy.nan, 33, 17])
numpy.argmin(a)

Similarly, when you want to retrieve the index of the largest array, numpy.argmax() also returns the index of the NaN value as the largest element.

numpy.argmax(a)

When dealing with NaN values in an array, we should use numpy.nanargmin() and numpy.nanargmax() instead. These functions return the indices of the minimum and maximum values in the specified axis, while ignoring all NaN values.

Here, the functions will return the correct index of the minimum and maximum values in the above array.

numpy.nanargmin(a)
numpy.nanargmax(a)

Sort NumPy Array Containing Floats

NumPy handles float data type seamlessly, and sorting one does not require any extra work. You can pass a float array the same way as you pass any other array.

a = numpy.array([[10.3, 11.42, 10.002, 22.2], [7.08, 7.089, 10.20, 12.2], [7.4, 8.09, 3.6, 17]])
numpy.sort(a)

Conclusion

NumPy’s wide range of sorting functions make it easy to sort arrays for any task. Whether you’re working with a 1-D array or a multidimensional array, NumPy sorts it for you efficiently and in a concise code.

Here, we have discussed just a few capabilities of NumPy’s sort functions.

Original article source at: https://likegeeks.com/

#python #numpy #arrays 

How to Sort NumPy Arrays in Python
Rui  Silva

Rui Silva

1679661960

Como normalizar uma matriz em NumPy em Python?

Neste tutorial de normalização NumPy, vamos aprender como normalizar um array usando a biblioteca NumPy do Python. Mas antes de entrarmos nisso, vamos primeiro tentar entender a definição e o significado de NumPy e Normalização.

Normalização 

Geralmente, a normalização é um processo usado para redimensionar os valores reais de um atributo numérico em um intervalo de 0 a 1. A normalização ajuda a organizar os dados de forma que pareçam semelhantes em todas as áreas e registros. Existem várias vantagens da normalização de dados, como redução de redundância, redução de complexidade, clareza e aquisição de dados de maior qualidade.

Normalmente a normalização de dados é muito utilizada em Machine Learning. A normalização ajuda a tornar o treinamento do modelo menos sensível à escala de recursos no Machine Learning. Ao usar os dados para treinar um modelo, somos obrigados a dimensionar os dados para que todos os valores numéricos estejam no mesmo intervalo e os valores grandes não sobrecarreguem os valores menores. Isso permite que os modelos encontrem pesos melhores, o que, por sua vez, resulta em um modelo mais preciso. Em termos simples, a normalização ajuda o modelo a prever as saídas com mais e mais precisão.

Agora, a próxima pergunta que surge é como realizar a normalização de dados? Um dos métodos de realizar a normalização de dados é usar a linguagem Python. Para isso, o Python disponibiliza aos usuários a biblioteca NumPy, que contém a função “linalg.norm()”, que é utilizada para normalizar os dados. A função de normalização usa uma matriz como entrada, normaliza os valores da matriz no intervalo de 0 a 1 usando alguma fórmula e fornece a matriz normalizada como saída. Isso veremos em detalhes em breve. Mas antes disso, vamos entender o significado e as aplicações do NumPy.

NumPy  

NumPy, como o nome sugere, significa Numerical Python. NumPy é uma biblioteca Python embutida que é usada para trabalhar com arrays. Agora, como já sabemos que em Python, pode-se criar um array usando listas, então por que exigimos NumPy para isso? Bem, o NumPy fornece uma maneira mais rápida de trabalhar com as matrizes em comparação com as listas tradicionais. 

Para usar o NumPy em seu sistema, você precisa instalar a biblioteca NumPy usando pip. Abaixo está o comando que é usado para instalar o NumPy em um sistema – 

pip install numpy

Após a instalação, precisamos importar esta biblioteca para nosso aplicativo/programa para utilizar suas funções. Abaixo está a sintaxe de importação da biblioteca numpy usando python –

Import numpy

Agora vamos ver um exemplo de como criar um array de uma dimensão usando a biblioteca numpy –

import numpy  as np           # importing numpy library

my_array = np.array([10, 30, 50, 70, 90])    #defining the input array

print(“This is my array - ”, my_array)         # Printing the array

A saída do programa acima será a seguinte –

Este é o meu array – [10, 30, 50, 70, 90]

Vamos ver um exemplo de como criar um array de duas dimensões usando a biblioteca NumPy –

import numpy as np # importing numpy library as np

two_d_array = np.array([[10, 30, 50, 70, 90], [20, 40, 60, 80, 100]])  # defining the 2 D array

print(“This is a two dimensional array - ”,  two_d_array)  # printing the array

A saída do programa acima será a seguinte –

Esta é uma matriz bidimensional - [[10 30 50 70 90]

 [20 40 60 80 100]]

Funções NumPy

A biblioteca NumPy contém várias funções, o que facilita o trabalho nos campos de matrizes, álgebra linear, polinômios e transformada de Fourier. Alguns deles estão listados abaixo: 

Adicionar –   a função numpy.add() é usada para realizar a adição de dois arrays.

Subtrair – a função numpy.subtract() é usada para realizar a subtração de dois arrays.

Multiply – a função numpy.multiply() é usada para realizar a multiplicação de dois arrays.

Divide – a função numpy.divide() é usada para realizar a divisão de dois arrays.

Min – a função numpy.min() é usada para encontrar o valor mínimo de uma matriz. 

Max – a função numpy.max() é usada para encontrar o valor máximo de uma matriz.

Média – a função numpy.mean() é usada para calcular a média de uma matriz. 

Var – a função numpy.var() é usada para calcular a variância de um array. 

Std – a função numpy.std() é usada para calcular o desvio padrão de uma matriz.

Ponto – a função numpy.dot() é usada para encontrar o produto escalar de dois arrays.

Cross – a função numpy.cross() é usada para encontrar o produto cruzado de dois arrays.

Inner – a função numpy.inner() é usada para executar o produto interno de dois arrays.

Outer – a função numpy.outer() é usada para executar o produto externo de dois arrays.

Transpose – a função numpy.transpose() é usada para gerar a transposição de um array.

Concatenar – a função numpy.concatenate() é usada para concatenar dois ou mais arrays.

Semelhante às funções acima, a biblioteca NumPy também contém várias funções para realizar cálculos algébricos lineares. Essas funções podem ser encontradas no submódulo linalg. Linalg é um submódulo da biblioteca NumPy que significa Álgebra Linear e é usado para resolver diferentes quebra-cabeças algébricos. Vejamos algumas das funções do submódulo linalg, que são mencionadas abaixo – 

Det – a função numpy.linalg.det() é usada para calcular o determinante de uma matriz (matriz).

Inv – a função numpy.linalg.inv() é usada para calcular o inverso de uma matriz (matriz).

Eig – a função numpy.linalg.eig() é usada para calcular os autovalores e os autovetores de uma matriz quadrada (matriz).

Norma – a função numpy.linalg.norm() é usada para encontrar a norma de uma matriz (matriz). Esta é a função que vamos usar para realizar a normalização numpy. Esta função recebe um array ou matriz como argumento e retorna a norma desse array.

Agora, como sabemos, qual função deve ser usada para normalizar um array. Vamos tentar entender o conceito teórico da normalização de um array. E depois veremos como escrever um programa de normalização completo para um array de uma dimensão e também para um array de duas dimensões. 

Portanto, a norma que usaremos em nosso código é chamada de norma euclidiana ou norma de Frobenius. Esta norma é usada para calcular a matriz normalizada. A fórmula matemática para normalizar uma matriz é mostrada abaixo - 

Onde, 

v cap – representa o array ou matriz normalizada.

V – representa a matriz de entrada.

|v|- representa a norma euclidiana ou o determinante de uma matriz.

Agora temos a ideia e a compreensão de todos os termos e funções relevantes que serão usados ​​em nosso programa de normalização NumPy de uma matriz usando Python. Então, vamos ver a implementação do mesmo olhando para os exemplos abaixo – 

1. Normalização da matriz unidimensional (1D) - 

a.) Normalização de uma matriz 1D predefinida - 

import numpy as np             # importing numpy library as np                    

pre_one_array = np.array([10, 20, 30, 40, 50])   # defining a 1D array

print(pre_one_array)                  # printing the array

norm = np.linalg.norm(pre_one_array)     # To find the norm of the array

print(norm)                        # Printing the value of the norm

normalized_array = pre_one_array/norm  # Formula used to perform array normalization

print(normalized_array)            # printing the normalized array

A saída do programa acima será a seguinte – 

[10 20 30 40 50]

74.161984871

[0,13483997 0,26967994 0,40451992 0,53935989 0,67419986]

Aqui, como podemos ver, todos os valores da matriz de saída estão entre 0 e 1. Portanto, fica claro que a matriz 1D de entrada predefinida foi normalizada com sucesso.

b.) Normalização de uma matriz 1D aleatória - 

Se quisermos normalizar uma matriz 1D com valores aleatórios, o método abaixo será usado para o mesmo – 

import numpy as np          # importing numpy library as np                    

ran_one_array = np.random.rand(5)*10   # defining a random array of 5 elements using  rand function of random sub module of the numpy library. Here 10 represents the range of the values of the elements which will be between 0 to 10

print(ran_one_array)                  # printing the array

norm = np.linalg.norm(ran_one_array)         # To find the norm of the array

print(norm)                                # Printing the value of the norm

normalized_array = ran_one_array/norm  # Formula used to perform array normalization

print(normalized_array)                # printing the normalized array

A saída do programa acima será a seguinte –

[ 2,66782852 6,70146289 5,38289872 0,52054369 9,62171167]

13.1852498544

[ 0,20233432 0,50825452 0,40825155 0,03947924 0,72973298]

Aqui, como podemos ver, todos os valores da matriz de saída estão entre 0 e 1. Portanto, fica claro que a matriz 1D de entrada aleatória foi normalizada com sucesso.

2. Normalização da matriz bidimensional (2D) - 

a.) Normalização de uma matriz 2D predefinida - 

import numpy as np             # importing numpy library as np                    

pre_two_array = np.array([[10, 30, 50, 70, 90], [20, 40, 60, 80, 100], [5, 15, 25, 35, 45], [55, 65, 75, 85, 95], [11, 22, 33, 44, 55]])    # defining a 2D array having 5 rows and 5 columns

print(pre_two_array)                       # printing the array

norm = np.linalg.norm(pre_two_array)       # To find the norm of the array

print(norm)                                # Printing the value of the norm

normalized_array = pre_two_array/norm  # Formula used to perform array normalization

print(normalized_array)                # printing the normalized array

A saída do programa acima será a seguinte – 

[[10 30 50 70 90]

 [ 20 40 60 80 100]

 [ 5 15 25 35 45]

 [ 55 65 75 85 95]

 [ 11 22 33 44 55]]

280.008928429

[[0,03571315 0,10713944 0,17856573 0,24999203 0,32141832]

 [0,07142629 0,14285259 0,21427888 0,28570518 0,35713147]

 [0,01785657 0,05356972 0,08928287 0,12499601 0,16070916]

 [ 0,19642231 0,23213545 0,2678486 0,30356175 0,3392749 ]

 [0,03928446 0,07856892 0,11785338 0,15713785 0,19642231]]

Aqui, como podemos ver, todos os valores da matriz de saída estão entre 0 e 1. Portanto, fica claro que a matriz 2D de entrada predefinida foi normalizada com sucesso.

b.) Normalização de uma matriz 2D aleatória - 

Se quisermos normalizar uma matriz 2D com valores aleatórios, o método abaixo será usado para o mesmo –

import numpy as np            # importing numpy library as np                    

ran_two_array = np.random.rand(5, 5)*10   # defining a random array of 5 rows and 5 columns using  rand function of random sub module of the numpy library. Here 10 represents the range of the values of the elements which will be between 0 and 10

print(ran_two_array)                       # printing the array

norm = np.linalg.norm(ran_two_array)       # To find the norm of the array

print(norm)                                # Printing the value of the norm

normalized_array = ran_two_array/norm   # Formula used to perform array normalization

print(normalized_array)                 # printing the normalized array

A saída do programa acima será a seguinte – 

[[4.57411295 8.65220668 9.63324979 1.9971668 3.23869927]

 [0,84966168 5,90483284 0,47779068 3,28578339 2,45708816]

 [ 5.85465399 4.49030481 9.12849734 9.05088372 2.16890579]

 [ 1,24442784 3,31225636 5,72207596 3,9220778 1,45400695]

 [ 5,49354678 3,63828521 3,66439748 3,75588512 4,4547876 ]]

25.1725603225

[[0,18171028 0,3437158 0,38268852 0,07933904 0,12865991]

 [ 0,03375349 0,23457419 0,01898062 0,13053036 0,09760978]

 [0,23258079 0,17838093 0,36263682 0,35955356 0,08616151]

 [0,04943589 0,13158202 0,22731402 0,15580766 0,05776158]

 [ 0,21823552 0,14453378 0,14557111 0,14920553 0,17696998]]

Aqui, como podemos ver, todos os valores da matriz de saída estão entre 0 e 1. Portanto, fica claro que a matriz 2D de entrada aleatória foi normalizada com sucesso.

Com isso, chegamos ao final deste tutorial de normalização NumPy. Esperamos que agora você entenda o conceito de Normalização NumPy. Neste tutorial de normalização NumPy, abordamos a definição de normalização, suas vantagens e suas aplicações. Também vimos a definição e o uso da biblioteca NumPy e suas várias outras funções. Em seguida, aprendemos o conceito teórico e a fórmula por trás do processo de normalização. E por último, mas não menos importante, implementamos a normalização em uma matriz unidimensional, bem como em uma matriz bidimensional usando a biblioteca NumPy do Python enquanto verificamos as respectivas saídas.

Descubra o verdadeiro valor dos dados aprendendo com professores de renome mundial do MIT com Data Science and Machine Learning: Making Data-Driven Decisions do MIT IDSS e The Applied Data Science Program do MIT Professional Education. Os programas, com currículos elaborados pelo corpo docente do MIT, são complementados por sessões de aprendizagem orientadas com especialistas do setor que permitirão que você resolva problemas de negócios da vida real e crie um portfólio com as mais recentes habilidades de ciência de dados e aprendizado de máquina. 

Fonte do artigo original em: https://www.mygreatlearning.com

#numpy  #python  

Como normalizar uma matriz em NumPy em Python?
Royce  Reinger

Royce Reinger

1678499040

Quaternion: Add Built-in Support for Quaternions to Numpy

Quaternions in numpy

This Python module adds a quaternion dtype to NumPy.

The code was originally based on code by Martin Ling (which he wrote with help from Mark Wiebe), but has been rewritten with ideas from rational to work with both python 2.x and 3.x (and to fix a few bugs), and greatly expands the applications of quaternions.

See also the pure-python package quaternionic.

Quickstart

conda install -c conda-forge quaternion

or

python -m pip install --upgrade --force-reinstall numpy-quaternion

Optionally add --user after install in the second command if you're not using a python environment — though you should start.

Dependencies

The basic requirements for this code are reasonably current versions of python and numpy. In particular, python versions 3.8 through 3.10 are routinely tested. Earlier python versions, including 2.7, will work with older versions of this package; they might still work with more recent versions of this package, but even numpy no longer supports python previous to 3.8, so your mileage may vary. Also, any numpy version greater than 1.13.0 should work, but the tests are run on the most recent release at the time of the test.

However, certain advanced functions in this package (including squad, mean_rotor_in_intrinsic_metric, integrate_angular_velocity, and related functions) require scipy and can automatically use numba. Scipy is a standard python package for scientific computation, and implements interfaces to C and Fortran codes for optimization (among other things) need for finding mean and optimal rotors. Numba uses LLVM to compile python code to machine code, accelerating many numerical functions by factors of anywhere from 2 to 2000. It is possible to run all the code without numba, but these particular functions can be anywhere from 4 to 400 times slower without it.

Both scipy and numba can be installed with pip or conda. However, because conda is specifically geared toward scientific python, it is generally more robust for these more complicated packages. In fact, the main anaconda package comes with both numba and scipy. If you prefer the smaller download size of miniconda (which comes with minimal extras), you'll also have to run this command:

conda install numpy scipy numba

Installation

Assuming you use conda to manage your python installation (which is currently the preferred choice for science and engineering with python), you can install this package simply as

conda install -c conda-forge quaternion

If you prefer to use pip, you can instead do

python -m pip install --upgrade --force-reinstall numpy-quaternion

(See here for a veteran python core contributor's explanation of why you should always use python -m pip instead of just pip or pip3.) The --upgrade --force-reinstall options are not always necessary, but will ensure that pip will update numpy if it has to.

If you refuse to use conda, you might want to install inside your home directory without root privileges. (Conda does this by default anyway.) This is done by adding --user to the above command:

python -m pip install --user --upgrade --force-reinstall numpy-quaternion

Note that pip will attempt to compile the code — which requires a working C compiler.

Finally, there's also the fully manual option of just downloading the code, changing to the code directory, and running

python -m pip install --upgrade --force-reinstall .

This should work regardless of the installation method, as long as you have a compiler hanging around.

Basic usage

The full documentation can be found on Read the Docs, and most functions have docstrings that should explain the relevant points. The following are mostly for the purposes of example.

>>> import numpy as np
>>> import quaternion
>>> np.quaternion(1,0,0,0)
quaternion(1, 0, 0, 0)
>>> q1 = np.quaternion(1,2,3,4)
>>> q2 = np.quaternion(5,6,7,8)
>>> q1 * q2
quaternion(-60, 12, 30, 24)
>>> a = np.array([q1, q2])
>>> a
array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], dtype=quaternion)
>>> np.exp(a)
array([quaternion(1.69392, -0.78956, -1.18434, -1.57912),
       quaternion(138.909, -25.6861, -29.9671, -34.2481)], dtype=quaternion)

Note that this package represents a quaternion as a scalar, followed by the x component of the vector part, followed by y, followed by z. These components can be accessed directly:

>>> q1.w, q1.x, q1.y, q1.z
(1.0, 2.0, 3.0, 4.0)

However, this only works on an individual quaternion; for arrays it is better to use "vectorized" operations like as_float_array.

The following ufuncs are implemented (which means they run fast on numpy arrays):

add, subtract, multiply, divide, log, exp, power, negative, conjugate,
copysign, equal, not_equal, less, less_equal, isnan, isinf, isfinite, absolute

Quaternion components are stored as double-precision floating point numbers — floats, in python language, or float64 in more precise numpy language. Numpy arrays with dtype=quaternion can be accessed as arrays of doubles without any (slow, memory-consuming) copying of data; rather, a view of the exact same memory space can be created within a microsecond, regardless of the shape or size of the quaternion array.

Comparison operations follow the same lexicographic ordering as tuples.

The unary tests isnan and isinf return true if they would return true for any individual component; isfinite returns true if it would return true for all components.

Real types may be cast to quaternions, giving quaternions with zero for all three imaginary components. Complex types may also be cast to quaternions, with their single imaginary component becoming the first imaginary component of the quaternion. Quaternions may not be cast to real or complex types.

Several array-conversion functions are also included. For example, to convert an Nx4 array of floats to an N-dimensional array of quaternions, use as_quat_array:

>>> import numpy as np
>>> import quaternion
>>> a = np.random.rand(7, 4)
>>> a
array([[ 0.93138726,  0.46972279,  0.18706385,  0.86605021],
       [ 0.70633523,  0.69982741,  0.93303559,  0.61440879],
       [ 0.79334456,  0.65912598,  0.0711557 ,  0.46622885],
       [ 0.88185987,  0.9391296 ,  0.73670503,  0.27115149],
       [ 0.49176628,  0.56688076,  0.13216632,  0.33309146],
       [ 0.11951624,  0.86804078,  0.77968826,  0.37229404],
       [ 0.33187593,  0.53391165,  0.8577846 ,  0.18336855]])
>>> qs = quaternion.as_quat_array(a)
>>> qs
array([ quaternion(0.931387262880247, 0.469722787598354, 0.187063852060487, 0.866050210100621),
       quaternion(0.706335233363319, 0.69982740767353, 0.933035590130247, 0.614408786768725),
       quaternion(0.793344561317281, 0.659125976566815, 0.0711557025000925, 0.466228847713644),
       quaternion(0.881859869074069, 0.939129602918467, 0.736705031709562, 0.271151494174001),
       quaternion(0.491766284854505, 0.566880763189927, 0.132166320200012, 0.333091463422536),
       quaternion(0.119516238634238, 0.86804077992676, 0.779688263524229, 0.372294043850009),
       quaternion(0.331875925159073, 0.533911652483908, 0.857784598617977, 0.183368547490701)], dtype=quaternion)

[Note that quaternions are printed with full precision, unlike floats, which is why you see extra digits above. But the actual data is identical in the two cases.] To convert an N-dimensional array of quaternions to an Nx4 array of floats, use as_float_array:

>>> b = quaternion.as_float_array(qs)
>>> b
array([[ 0.93138726,  0.46972279,  0.18706385,  0.86605021],
       [ 0.70633523,  0.69982741,  0.93303559,  0.61440879],
       [ 0.79334456,  0.65912598,  0.0711557 ,  0.46622885],
       [ 0.88185987,  0.9391296 ,  0.73670503,  0.27115149],
       [ 0.49176628,  0.56688076,  0.13216632,  0.33309146],
       [ 0.11951624,  0.86804078,  0.77968826,  0.37229404],
       [ 0.33187593,  0.53391165,  0.8577846 ,  0.18336855]])

It is also possible to convert a quaternion to or from a 3x3 array of floats representing a rotation matrix, or an array of N quaternions to or from an Nx3x3 array of floats representing N rotation matrices, using as_rotation_matrix and from_rotation_matrix. Similar conversions are possible for rotation vectors using as_rotation_vector and from_rotation_vector, and for spherical coordinates using as_spherical_coords and from_spherical_coords. Finally, it is possible to derive the Euler angles from a quaternion using as_euler_angles, or create a quaternion from Euler angles using from_euler_angles — though be aware that Euler angles are basically the worst things ever.1 Before you complain about those functions using something other than your favorite conventions, please read this page.

Bug reports and feature requests

Bug reports and feature requests are entirely welcome (with very few exceptions). The best way to do this is to open an issue on this code's github page. For bug reports, please try to include a minimal working example demonstrating the problem.

Pull requests are also entirely welcome, of course, if you have an idea where the code is going wrong, or have an idea for a new feature that you know how to implement.

This code is routinely tested on recent versions of both python (3.8 though 3.10) and numpy (>=1.13). But the test coverage is not necessarily as complete as it could be, so bugs may certainly be present, especially in the higher-level functions like mean_rotor_....

Acknowledgments

This code is, of course, hosted on github. Because it is an open-source project, the hosting is free, and all the wonderful features of github are available, including free wiki space and web page hosting, pull requests, a nice interface to the git logs, etc. Github user Hannes Ovrén (hovren) pointed out some errors in a previous version of this code and suggested some nice utility functions for rotation matrices, etc. Github user Stijn van Drongelen (rhymoid) contributed some code that makes compilation work with MSVC++. Github user Jon Long (longjon) has provided some elegant contributions to substantially improve several tricky parts of this code. Rebecca Turner (9999years) and Leo Stein (duetosymmetry) did all the work in getting the documentation onto Read the Docs.

Every change in this code is automatically tested on Travis-CI. This service integrates beautifully with github, detecting each commit and automatically re-running the tests. The code is downloaded and installed fresh each time, and then tested, on each of the five different versions of python. This ensures that no change I make to the code breaks either installation or any of the features that I have written tests for. Travis-CI also automatically builds the conda and pip versions of the code hosted on anaconda.org and pypi respectively. These are all free services for open-source projects like this one.

The work of creating this code was supported in part by the Sherman Fairchild Foundation and by NSF Grants No. PHY-1306125 and AST-1333129.


1 Euler angles are awful

Euler angles are pretty much the worst things ever and it makes me feel bad even supporting them. Quaternions are faster, more accurate, basically free of singularities, more intuitive, and generally easier to understand. You can work entirely without Euler angles (I certainly do). You absolutely never need them. But if you really can't give them up, they are mildly supported.


Download Details:

Author: Moble
Source Code: https://github.com/moble/quaternion 
License: MIT license

#machinelearning #python #math #robotics #physics #numpy 

Quaternion: Add Built-in Support for Quaternions to Numpy

Neural Network From Scratch In Python | Implement a complete multi-layer neural network

We'll learn the theory of neural networks, then use Python and NumPy to implement a complete multi-layer neural network. We'll cover the forward pass, loss functions, the backward pass (backpropagation and gradient descent), and the training loop. At the end, we'll use our neural network to predict the weather.

Chapters

00:00:00 Neural network introduction
00:10:05 Activation functions
00:12:10 Multiple layers
00:15:18 Multiple hidden units
00:23:52 The forward pass
00:32:46 The backward pass
00:48:08 Layer 1 gradients
00:56:24 Network training algorithm
01:00:13 Full network implementation
01:06:44 Training loop

You can find the text version of this lesson here - https://github.com/VikParuchuri/zero_to_gpt/blob/master/explanations/dense.ipynb 

And the complete lesson list for the zero to gpt series here - https://github.com/VikParuchuri/zero_to_gpt 

Subscribe: https://www.youtube.com/@Dataquestio/featured 

#machinelearning #python #numpy 

Neural Network From Scratch In Python | Implement a complete multi-layer neural network

Learn The Theory of Classificatio | Classification With Neural Networks

We'll use a neural network for classification.  In classification, we categorize data, and use the neural network to predict which category each example is in.

You'll learn the theory of classification, including the negative log likelihood loss function, and the sigmoid and softmax activation functions.  Then you'll implement a classifier in NumPy that can predict whether a telescope saw a star, galaxy, or quasar.

Chapters

00:00 - Classification intro
04:15 - Sigmoid activation
08:27 - Binary NLL
14:38 - Binary classification
26:40 - Multiclass encoding
30:05 - Softmax function
35:46 - Multiclass NLL
41:11 - Multiclass classification

You can read the full lesson here - https://github.com/VikParuchuri/zero_to_gpt/blob/master/explanations/classification.ipynb  .

And see the previous lessons in this series here - https://github.com/VikParuchuri/zero_to_gpt 

Subscribe: https://www.youtube.com/@Dataquestio/featured 

#machinelearning #numpy 

Learn The Theory of Classificatio | Classification With Neural Networks
Royce  Reinger

Royce Reinger

1677556560

TensorboardX: Tensorboard for Pytorch (and Chainer, Mxnet, Numpy, ...)

TensorboardX   

Write TensorBoard events with simple function call.

The current release (v2.5) is tested on anaconda3, with PyTorch 1.11.0 / torchvision 0.12 / tensorboard 2.9.0.

Support scalar, image, figure, histogram, audio, text, graph, onnx_graph, embedding, pr_curve, mesh, hyper-parameters and video summaries.

FAQ

Install

pip install tensorboardX

or build from source:

pip install 'git+https://github.com/lanpa/tensorboardX'

You can optionally install crc32c to speed up.

pip install crc32c

Starting from tensorboardX 2.1, You need to install soundfile for the add_audio() function (200x speedup).

pip install soundfile

Example

# demo.py

import torch
import torchvision.utils as vutils
import numpy as np
import torchvision.models as models
from torchvision import datasets
from tensorboardX import SummaryWriter

resnet18 = models.resnet18(False)
writer = SummaryWriter()
sample_rate = 44100
freqs = [262, 294, 330, 349, 392, 440, 440, 440, 440, 440, 440]

for n_iter in range(100):

    dummy_s1 = torch.rand(1)
    dummy_s2 = torch.rand(1)
    # data grouping by `slash`
    writer.add_scalar('data/scalar1', dummy_s1[0], n_iter)
    writer.add_scalar('data/scalar2', dummy_s2[0], n_iter)

    writer.add_scalars('data/scalar_group', {'xsinx': n_iter * np.sin(n_iter),
                                             'xcosx': n_iter * np.cos(n_iter),
                                             'arctanx': np.arctan(n_iter)}, n_iter)

    dummy_img = torch.rand(32, 3, 64, 64)  # output from network
    if n_iter % 10 == 0:
        x = vutils.make_grid(dummy_img, normalize=True, scale_each=True)
        writer.add_image('Image', x, n_iter)

        dummy_audio = torch.zeros(sample_rate * 2)
        for i in range(x.size(0)):
            # amplitude of sound should in [-1, 1]
            dummy_audio[i] = np.cos(freqs[n_iter // 10] * np.pi * float(i) / float(sample_rate))
        writer.add_audio('myAudio', dummy_audio, n_iter, sample_rate=sample_rate)

        writer.add_text('Text', 'text logged at step:' + str(n_iter), n_iter)

        for name, param in resnet18.named_parameters():
            writer.add_histogram(name, param.clone().cpu().data.numpy(), n_iter)

        # needs tensorboard 0.4RC or later
        writer.add_pr_curve('xoxo', np.random.randint(2, size=100), np.random.rand(100), n_iter)

dataset = datasets.MNIST('mnist', train=False, download=True)
images = dataset.test_data[:100].float()
label = dataset.test_labels[:100]

features = images.view(100, 784)
writer.add_embedding(features, metadata=label, label_img=images.unsqueeze(1))

# export scalar data to JSON for external processing
writer.export_scalars_to_json("./all_scalars.json")
writer.close()

Screenshots

Demo.gif

Using TensorboardX with Comet

TensorboardX now supports logging directly to Comet. Comet is a free cloud based solution that allows you to automatically track, compare and explain your experiments. It adds a lot of functionality on top of tensorboard such as dataset management, diffing experiments, seeing the code that generated the results and more.

This works out of the box and just require an additional line of code. See a full code example in this Colab Notebook

comet.gif

Tweaks

To add more ticks for the slider (show more image history), check https://github.com/lanpa/tensorboardX/issues/44 or https://github.com/tensorflow/tensorboard/pull/1138

Reference


Download Details:

Author: lanpa
Source Code: https://github.com/lanpa/tensorboardX 
License: MIT license

#machinelearning #python #visualization #numpy #pytorch

TensorboardX: Tensorboard for Pytorch (and Chainer, Mxnet, Numpy, ...)
Royce  Reinger

Royce Reinger

1677039180

A Scalable General Purpose Micro-framework for Defining Dataflows

Welcome to Hamilton's Github Repository

Hamilton

The general purpose micro-framework for creating dataflows from python functions!

Specifically, Hamilton defines a novel paradigm, that allows you to specify a flow of (delayed) execution, that forms a Directed Acyclic Graph (DAG). It was originally built to solve creating wide (1000+) column dataframes. Core to the design of Hamilton is a clear mapping of function name to dataflow output. That is, Hamilton forces a certain paradigm with writing functions, and aims for DAG clarity, easy modifications, with always unit testable and naturally documentable code.

Getting Started

Here's a quick getting started guide to get you up and running in less than 15 minutes. If you need help join our slack community to chat/ask Qs/etc. For the latest updates, follow us on twitter!

Installation

Requirements:

  • Python 3.7+

To get started, first you need to install hamilton. It is published to pypi under sf-hamilton:

pip install sf-hamilton

Note: to use the DAG visualization functionality, you should instead do:

pip install "sf-hamilton[visualization]"

While it is installing we encourage you to start on the next section.

Note: the content (i.e. names, function bodies) of our example code snippets are for illustrative purposes only, and don't reflect what we actually do internally.

Hamilton in <15 minutes

Hamilton is a new paradigm when it comes to creating, um, dataframes (let's use dataframes as an example, otherwise you can create ANY python object). Rather than thinking about manipulating a central dataframe, as is normal in some data engineering/data science work, you instead think about the column(s) you want to create, and what inputs are required. There is no need for you to think about maintaining this dataframe, meaning you do not need to think about any "glue" code; this is all taken care of by the Hamilton framework.

For example rather than writing the following to manipulate a central dataframe object df:

df['col_c'] = df['col_a'] + df['col_b']

you write

def col_c(col_a: pd.Series, col_b: pd.Series) -> pd.Series:
    """Creating column c from summing column a and column b."""
    return col_a + col_b

In diagram form: example 

The Hamilton framework will then be able to build a DAG from this function definition.

So let's create a "Hello World" and start using Hamilton!

Your first hello world.

By now, you should have installed Hamilton, so let's write some code.

  • Create a file my_functions.py and add the following functions:
import pandas as pd

def avg_3wk_spend(spend: pd.Series) -> pd.Series:
    """Rolling 3 week average spend."""
    return spend.rolling(3).mean()

def spend_per_signup(spend: pd.Series, signups: pd.Series) -> pd.Series:
    """The cost per signup in relation to spend."""
    return spend / signups

The astute observer will notice we have not defined spend or signups as functions. That is okay, this just means these need to be provided as input when we come to actually wanting to create a dataframe.

Note: functions can take or create scalar values, in addition to any python object type.

  • Create a my_script.py which is where code will live to tell Hamilton what to do:
import sys
import logging
import importlib

import pandas as pd
from hamilton import driver

logging.basicConfig(stream=sys.stdout)
initial_columns = {  # load from actuals or wherever -- this is our initial data we use as input.
    # Note: these do not have to be all series, they could be scalar inputs.
    'signups': pd.Series([1, 10, 50, 100, 200, 400]),
    'spend': pd.Series([10, 10, 20, 40, 40, 50]),
}
# we need to tell hamilton where to load function definitions from
module_name = 'my_functions'
module = importlib.import_module(module_name) # or we could just do `import my_functions`
dr = driver.Driver(initial_columns, module)  # can pass in multiple modules
# we need to specify what we want in the final dataframe.
output_columns = [
    'spend',  # or module.spend
    'signups',  # or module.signups
    'avg_3wk_spend',  # or module.avg_3wk_spend
    'spend_per_signup',  # or module.spend_per_signup
]
# let's create the dataframe!
# if you only did `pip install sf-hamilton` earlier:
df = dr.execute(output_columns)
# else if you did `pip install "sf-hamilton[visualization]"` earlier:
# dr.visualize_execution(output_columns, './my-dag.dot', {})
print(df)
  • Run my_script.py

python my_script.py

You should see the following output:

   spend  signups  avg_3wk_spend  spend_per_signup
0     10        1            NaN            10.000
1     10       10            NaN             1.000
2     20       50      13.333333             0.400
3     40      100      23.333333             0.400
4     40      200      33.333333             0.200
5     50      400      43.333333             0.125

You should see the following image if you ran dr.visualize_execution(output_columns, './my-dag.dot', {}):

hello_world_image

Congratulations - you just created your Hamilton dataflow that created a dataframe!

Example Hamilton Dataflows

We have a growing list of examples showcasing how one might use Hamilton. You can find them all under the examples/ directory. E.g.

Slack Community

We have a small but active community on slack. Come join us!

Used internally by:

To add your company, make a pull request to add it here.

Contributing

We take contributions, large and small. We operate via a Code of Conduct and expect anyone contributing to do the same.

To see how you can contribute, please read our contributing guidelines and then our developer setup guide.

Blog Posts

Videos of talks

Watch the video

Citing Hamilton

We'd appreciate citing Hamilton by referencing one of the following:

@inproceedings{DBLP:conf/vldb/KrawczykI22,
  author    = {Stefan Krawczyk and Elijah ben Izzy},
  editor    = {Satyanarayana R. Valluri and Mohamed Za{\"{\i}}t},
  title     = {Hamilton: a modular open source declarative paradigm for high level
               modeling of dataflows},
  booktitle = {1st International Workshop on Composable Data Management Systems,
               CDMS@VLDB 2022, Sydney, Australia, September 9, 2022},
  year      = {2022},
  url       = {https://cdmsworkshop.github.io/2022/Proceedings/ShortPapers/Paper6\_StefanKrawczyk.pdf},
  timestamp = {Wed, 19 Oct 2022 16:20:48 +0200},
  biburl    = {https://dblp.org/rec/conf/vldb/KrawczykI22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

@inproceedings{CEURWS:conf/vldb/KrawczykIQ22,
  author    = {Stefan Krawczyk and Elijah ben Izzy and Danielle Quinn},
  editor    = {Cinzia Cappiello and Sandra Geisler and Maria-Esther Vidal},
  title     = {Hamilton: enabling software engineering best practices for data transformations via generalized dataflow graphs},
  booktitle = {1st International Workshop on Data Ecosystems co-located with 48th International Conference on Very Large Databases (VLDB 2022)},
  pages     = {41--50},
  url       = {https://ceur-ws.org/Vol-3306/paper5.pdf},
  year      = {2022}
}

Prescribed Development Workflow

In general we prescribe the following:

  1. Ensure you understand Hamilton Basics.
  2. Familiarize yourself with some of the Hamilton decorators. They will help keep your code DRY.
  3. Start creating Hamilton Functions that represent your work. We suggest grouping them in modules where it makes sense.
  4. Write a simple script so that you can easily run things end to end.
  5. Join our Slack community to chat/ask Qs/etc.

For the backstory on Hamilton we invite you to watch a roughly-9 minute lightning talk on it that we gave at the apply conference: video, slides.

PyCharm Tips

If you're using Hamilton, it's likely that you'll need to migrate some code. Here are some useful tricks we found to speed up that process.

Live templates

Live templates are a cool feature and allow you to type in a name which expands into some code.

E.g. For example, we wrote one to make it quick to stub out Hamilton functions: typing graphfunc would turn into ->

def _(_: pd.Series) -> pd.Series:
   """"""
   return _

Where the blanks are where you can tab with the cursor and fill things in. See your pycharm preferences for setting this up.

Multiple Cursors

If you are doing a lot of repetitive work, one might consider multiple cursors. Multiple cursors allow you to do things on multiple lines at once.

To use it hit option + mouse click to create multiple cursors. Esc to revert back to a normal mode.

Usage analytics & data privacy

By default, when using Hamilton, it collects anonymous usage data to help improve Hamilton and know where to apply development efforts.

We capture three types of events: one when the Driver object is instantiated, one when the execute() call on the Driver object completes, and one for most Driver object function invocations. No user data or potentially sensitive information is or ever will be collected. The captured data is limited to:

  • Operating System and Python version
  • A persistent UUID to indentify the session, stored in ~/.hamilton.conf.
  • Error stack trace limited to Hamilton code, if one occurs.
  • Information on what features you're using from Hamilton: decorators, adapters, result builders.
  • How Hamilton is being used: number of final nodes in DAG, number of modules, size of objects passed to execute(), the name of the Driver function being invoked.

If you're worried, see telemetry.py for details.

If you do not wish to participate, one can opt-out with one of the following methods:

  • Set it to false programmatically in your code before creating a Hamilton driver:
from hamilton import telemetry
telemetry.disable_telemetry()
  • Set the key telemetry_enabled to false in ~/.hamilton.conf under the DEFAULT section:
[DEFAULT]
telemetry_enabled = False
  • Set HAMILTON_TELEMETRY_ENABLED=false as an environment variable. Either setting it for your shell session:
export HAMILTON_TELEMETRY_ENABLED=false
  • or passing it as part of the run command:
HAMILTON_TELEMETRY_ENABLED=false python NAME_OF_MY_DRIVER.py

Contributors

Code Contributors

  • Stefan Krawczyk (@skrawcz)
  • Elijah ben Izzy (@elijahbenizzy)
  • Danielle Quinn (@danfisher-sf)
  • Rachel Insoft (@rinsoft-sf)
  • Shelly Jang (@shellyjang)
  • Vincent Chu (@vslchusf)
  • Christopher Prohm (@chmp)
  • James Lamb (@jameslamb)
  • Avnish Pal (@bovem)
  • Sarah Haskins (@frenchfrywpepper)
  • Thierry Jean (@zilto)

Bug Hunters/Special Mentions

  • Nils Olsson (@nilsso)
  • Michał Siedlaczek (@elshize)
  • Alaa Abedrabbo (@AAbedrabbo)
  • Shreya Datar (@datarshreya)
  • Baldo Faieta (@baldofaieta)
  • Anwar Brini (@AnwarBrini)

For the backstory on how Hamilton came about, see our blog post!.


Download Details:

Author: Stitchfix
Source Code: https://github.com/stitchfix/hamilton 
License: BSD-3-Clause-Clear license

#machinelearning #python #datascience #numpy 

A Scalable General Purpose Micro-framework for Defining Dataflows
Royce  Reinger

Royce Reinger

1676717940

Xarray: N-D Labeled Arrays and Datasets in Python

Xarray: N-D labeled arrays and datasets

xarray (formerly xray) is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!

Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays, which allows for a more intuitive, more concise, and less error-prone developer experience. The package includes a large and growing library of domain-agnostic functions for advanced analytics and visualization with these data structures.

Xarray was inspired by and borrows heavily from pandas, the popular data analysis package focused on labelled tabular data. It is particularly tailored to working with netCDF files, which were the source of xarray's data model, and integrates tightly with dask for parallel computing.

Why xarray?

Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called "tensors") are an essential part of computational science. They are encountered in a wide range of fields, including physics, astronomy, geoscience, bioinformatics, engineering, finance, and deep learning. In Python, NumPy provides the fundamental data structure and API for working with raw ND arrays. However, real-world datasets are usually more than just raw numbers; they have labels which encode information about how the array values map to locations in space, time, etc.

Xarray doesn't just keep track of labels on arrays -- it uses them to provide a powerful and concise interface. For example:

  • Apply operations over dimensions by name: x.sum('time').
  • Select values by label instead of integer location: x.loc['2014-01-01'] or x.sel(time='2014-01-01').
  • Mathematical operations (e.g., x - y) vectorize across multiple dimensions (array broadcasting) based on dimension names, not shape.
  • Flexible split-apply-combine operations with groupby: x.groupby('time.dayofyear').mean().
  • Database like alignment based on coordinate labels that smoothly handles missing values: x, y = xr.align(x, y, join='outer').
  • Keep track of arbitrary metadata in the form of a Python dictionary: x.attrs.

Documentation

Learn more about xarray in its official documentation at https://docs.xarray.dev/.

Try out an interactive Jupyter notebook.

Contributing

You can find information about contributing to xarray at our Contributing page.

Get in touch

  • Ask usage questions ("How do I?") on GitHub Discussions.
  • Report bugs, suggest features or view the source code on GitHub.
  • For less well defined questions or ideas, or to announce other projects of interest to xarray users, use the mailing list.

NumFOCUS

Xarray is a fiscally sponsored project of NumFOCUS, a nonprofit dedicated to supporting the open source scientific computing community. If you like Xarray and want to support our mission, please consider making a donation to support our efforts.

History

Xarray is an evolution of an internal tool developed at The Climate Corporation. It was originally written by Climate Corp researchers Stephan Hoyer, Alex Kleeman and Eugene Brevdo and was released as open source in May 2014. The project was renamed from "xray" in January 2016. Xarray became a fiscally sponsored project of NumFOCUS in August 2018.

Download Details:

Author: Pydata
Source Code: https://github.com/pydata/xarray 
License: Apache-2.0 license

#machinelearning #python #numpy #pandas 

Xarray: N-D Labeled Arrays and Datasets in Python