Oral  Brekke

Oral Brekke

1678099140

GPT_index: LlamaIndex (GPT Index)

🗂️ LlamaIndex 🦙 (GPT Index)

LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data.

🚀 Overview

NOTE: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!

Context

  • LLMs are a phenomenonal piece of technology for knowledge generation and reasoning.
  • A big limitation of LLMs is context size (e.g. Davinci's limit is 4096 tokens. Large, but not infinite).
  • The ability to feed "knowledge" to LLMs is restricted to this limited prompt size and model weights.

Proposed Solution

At its core, LlamaIndex contains a toolkit designed to easily connect LLM's with your external data. LlamaIndex helps to provide the following:

  • A set of data structures that allow you to index your data for various LLM tasks, and remove concerns over prompt size limitations.
  • Data connectors to your common data sources (Google Docs, Slack, etc.).
  • Cost transparency + tools that reduce cost while increasing performance.

Each data structure offers distinct use cases and a variety of customizable parameters. These indices can then be queried in a general purpose manner, in order to achieve any task that you would typically achieve with an LLM:

  • Question-Answering
  • Summarization
  • Text Generation (Stories, TODO's, emails, etc.)
  • and more!

💡 Contributing

Interesting in contributing? See our Contribution Guide for more details.

📄 Documentation

Full documentation can be found here: https://gpt-index.readthedocs.io/en/latest/.

Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!

💻 Example Usage

pip install llama-index

Examples are in the examples folder. Indices are in the indices folder (see list of indices below).

To build a simple vector store index:

import os
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'

from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex(documents)

To save to and load from disk:

# save to disk
index.save_to_disk('index.json')
# load from disk
index = GPTSimpleVectorIndex.load_from_disk('index.json')

To query:

index.query("<question_text>?")

🔧 Dependencies

The main third-party package requirements are tiktoken, openai, and langchain.

All requirements should be contained within the setup.py file. To run the package locally without building the wheel, simply run pip install -r requirements.txt.

📖 Citation

Reference to cite if you use LlamaIndex in a paper:

@software{Liu_LlamaIndex_2022,
author = {Liu, Jerry},
doi = {10.5281/zenodo.1234},
month = {11},
title = {{LlamaIndex}},
url = {https://github.com/jerryjliu/gpt_index},
year = {2022}
}

⚠️ NOTE: We are rebranding GPT Index as LlamaIndex! We will carry out this transition gradually.

2/25/2023: By default, our docs/notebooks/instructions now reference "LlamaIndex" instead of "GPT Index".

2/19/2023: By default, our docs/notebooks/instructions now use the llama-index package. However the gpt-index package still exists as a duplicate!

2/16/2023: We have a duplicate llama-index pip package. Simply replace all imports of gpt_index with llama_index if you choose to pip install llama-index.


PyPi:

Documentation: https://gpt-index.readthedocs.io/en/latest/.

Twitter: https://twitter.com/gpt_index.

Discord: https://discord.gg/dGcwcsnxhU.

LlamaHub (community library of data loaders): https://llamahub.ai


Download Details:

Author: jerryjliu
Source Code: https://github.com/jerryjliu/gpt_index 
License: MIT license

#python #gpt #index 

GPT_index: LlamaIndex (GPT Index)
Gordon  Murray

Gordon Murray

1677079389

How to Create Index in Oracle

How to Create Index in Oracle

You can use the CREATE INDEX statement to create an index in Oracle. Here's the basic syntax,

CREATE INDEX index_name
ON table_name (column1, column2, ...);

In this syntax, index_name is the name of the index you want to create, and table_name is the name of the table on which you want to create the index. You can also specify one or more column names in parentheses to indicate which columns you want to include in the index.

For example, let's say you have a table called "employees" with columns "employee_id", "last_name", and "first_name", and you want to create an index on the "last_name" column. You can do this with the following SQL statement,

CREATE INDEX emp_last_name_idx
ON employees (last_name);

This will create an index called "emp_last_name_idx" on the "last_name" column of the "employees" table. You can then use this index to improve the performance of queries that filter or sort by the "last_name" column.

Original article source at: https://www.c-sharpcorner.com/

#oracle #index 

How to Create Index in Oracle

Fix The Issue Of Unavailable Values in Kibana for Index Fields

Hello Readers!! We are again back with a new interesting topic with this blog. While using kibana some of you may have faced an issue of unavailable values in Kibana for index fields with dot notations. So In this blog, we will see why we face this problem and what we can do to resolve this issue.

Why do we not get data in Fields with Dot Notations?

If you are facing the issue of empty values in Kibana for index field names containing dots, this issue is caused by Kibana’s treatment of dots in field names. In Elasticsearch, we get the data but not in kibana. In Kibana, dots are used as separators in the field names, which can result in empty values in the visualization if the dots are not properly escaped. This issue may occur because Elasticsearch and Kibana handle field names with dots differently. 

Index Fields with Dot Notations

What are scripted fields in Kibana?

Scripted fields in Kibana are calculated fields that are generated using a script. They are used to derive new values based on existing data and to manipulate the data before it is displayed in visualizations and dashboards. Scripted fields can be created using either Painless or Lucene expressions, and they can be used in conjunction with other fields to provide additional insights into your data.

Some common use cases for scripted fields include:

  • Deriving new values based on existing data, such as calculating the difference between two fields.
  • Formatting data, such as converting timestamps into human-readable dates.
  • Aggregating data, such as counting the number of unique values in a field.
  • Transforming data

How to fix this using scripted fields in Kibana?

As this issue occurs because Elasticsearch and Kibana handle field names with dots differently. There can be a number of ways to solve this issue. So here we will use Kibana scripted field scripts. Follow the following steps in Kibana:

Step 1: Go to the “Management” section in Kibana and select “Index Patterns.” Find the index pattern that contains the fields with dots and click on it.

This is my index pattern.

Index Fields

You can see here the index_field containing dot notations:

Index Fields

Step 2: Move to the “Scripted Fields” tab and after this click on “Add Scripted Field.”

scripted fields

Give the scripted field a name without dots, for example, “field_without_dots.” Also select the type as per your respective fields.

create field

Step 3: In the script field, enter the following code:

return doc['field_with_dots'].value

script

Replace “field_with_dots” with the actual name of the field that contains dots.

And now click “Create field” to save the scripted field. As you can see below my scripted field is created successfully.

kibana fields

Now, this scripted field can now be used in discover, visualizations, and dashboards, just like all other index fields.

Yes, we are all done now!! I hope this will help you somewhere.

Conclusion:

Thank you for sticking to the end. In this blog, we have learned how we can fix the issue of unavailable values in Kibana for index fields with dot notations. This is really very useful. I hope this blog helped you somewhere. Please share if you liked this blog. Kindly reach out to me for any related queries.

HAPPY LEARNING! 

Original article source at: https://blog.knoldus.com/

#kibana #index #fields 

Fix The Issue Of Unavailable Values in Kibana for Index Fields
Gordon  Murray

Gordon Murray

1673466300

How to Loading and Indexing Data In MarkLogic

With MarkLogic being a document-oriented database, data is commonly stored in a JSON or XML document format.

If the data to bring into the MarkLogic is not already structured in JSON or XML means if it is currently in a relational database, there are various ways to export or transform it from the source.

For example, many relational databases provide an option to export relational data in XML or in JSON format, or a SQL script could be written to fetch the data from the database, outputting it in an XML or JSON structure. Or, using Marklogic rows from a .csv file can be imported as XML and JSON documents.

In any case, it is normal to first denormalize the data being exported from the relational database to first put the content back together in its original state. Denormalization, which naturally occurs when working with documents in their original form, greatly reduces the need for joins and acceleration performance.

Schema Agnostic

As we know that schema is something having a set of rules for a particular structure of the database. While we talk about data quality then schemas are helpful as quality matters a lot with quality reliability and a proper actional database is going to present.

Now if we talk about the schema-agnostic then it is something the database is not bounded by any schema but it is aware of it. Schemas are optional in MarkLogic. Data is going to be loaded in its original data form. To address a group of documents within a database, directories, collections and internal structure of documents can be used. With MarkLogic easily supports data from disparate systems all in the same database.

Required Document Size and Structure

When loading a document, it is the best choice to have one document per entity. Marklogic is the most performant with many small documents, rather than one large document. The target document size is 1KB to 100KB but can be larger.

For Example, rather than loading a bunch of students all as one document, have each student be a document.

Whenever defining a document remember that use XML document and attribute names or JSON property names. Make document names human-readable so do not create generic names. Using this convention help indexes be efficient.

<items>

<item>

<product> Mouse </product>

<price> 1000 </price>

<quantity> 3 </quantity>

</item>

<item>

<product> Keyboard </product>

<price> 2000 </price>

<quantity> 2 </quantity>

</item>

</items>

Indexing Documents

As documents are loaded, all the words in each document and the structure of each document, are indexed. So documents are easily searchable.

The document can be loaded into the MarkLogic in many ways:

  • MarkLogic Content Pump.
  • Data movement SDK.
  • Rest APIs
  • Java API or Node js API.
  • XQuery
  • Javascript Functions.

Reading a Document

To read a document, the URI of the document is used.

XQuery Example : fn:doc("college/course-101.json")

JavaScript Example : fn:doc("account/order-202.json")

Rest API Example : curl --abc --user admin:admin: -X GET "http://localhost:8055/v1/document?uri=/accounting/order-10072.json"

Splitting feature of MLCP

MLCP has the feature of splitting the long XML documents, where each occurrence of a designated element becomes an individual XML document in the database. This is useful when multiple records are all contained within one large XML file. Such as a list of students, courses, details, etc.

The -input_file_type aggregates option is used to split a large document into individual documents. The aggregate_record-element option is used element used to designate a new document. The -uri_id is used to create a URI for each document.

While it is fine to have a mix of XML and JSON documents in the same database, it is also possible to transform content from one format to other. You can easily transform the files by following the below steps.

xquery version "1.0-ml";

import module namespace json = "http://abc.com/xdmp/json" at "abc/json/json.xqy";

json:transform-to-json(fn:doc("doc-01.xml"), json:config("custom"))

A Marklogic content pump can be used to import the rows from the .csv file to a MarkLogic database. We are able to the data during the process or afterward in the database. Ways to modify content once it is already in the database include using the data movement SDK, XQuery, Js, etc.

Conclusion

As we know that MarkLogic is a database that facilitates many things like we can load the data, indexing the data, transforming the data, and splitting the data.

References:

Original article source at: https://blog.knoldus.com/

#loading #data #index 

How to Loading and Indexing Data In MarkLogic

What Is R Programming Language? introduction & Basics

In this R article, we will learn about What Is R Programming Language? introduction & Basics. R is a programming language developed by Ross Ihaka and Robert Gentleman in 1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine learning algorithms, linear regression, time series, statistical inference to name a few. Most of the R libraries are written in R, but for heavy computational tasks, C, C++, and Fortran codes are preferred.

Data analysis with R is done in a series of steps; programming, transforming, discovering, modeling and communicating the results

  • Program: R is a clear and accessible programming tool
  • Transform: R is made up of a collection of libraries designed specifically for data science
  • Discover: Investigate the data, refine your hypothesis and analyze them
  • Model: R provides a wide array of tools to capture the right model for your data
  • Communicate: Integrate codes, graphs, and outputs to a report with R Markdown or build Shiny apps to share with the world.

What is R used for?

  • Statistical inference
  • Data analysis
  • Machine learning algorithm

As conclusion, R is the world’s most widely used statistics programming language. It’s the 1st choice of data scientists and supported by a vibrant and talented community of contributors. R is taught in universities and deployed in mission-critical business applications.

R-environment setup

Windows Installation – We can download the Windows installer version of R from R-3.2.2 for windows (32/64)
 

As it is a Windows installer (.exe) with the name “R-version-win.exe”. You can just double click and run the installer accepting the default settings. If your Windows is a 32-bit version, it installs the 32-bit version. But if your windows are 64-bit, then it installs both the 32-bit and 64-bit versions.

After installation, you can locate the icon to run the program in a directory structure “R\R3.2.2\bin\i386\Rgui.exe” under the Windows Program Files. Clicking this icon brings up the R-GUI which is the R console to do R Programming. 
 

R basic Syntax

R Programming is a very popular programming language that is broadly used in data analysis. The way in which we define its code is quite simple. The “Hello World!” is the basic program for all the languages, and now we will understand the syntax of R programming with the “Hello world” program. We can write our code either in the command prompt, or we can use an R script file.

R command prompt

Once you have R environment setup, then it’s easy to start your R command prompt by just typing the following command at your command prompt −
$R
This will launch R interpreter and you will get a prompt > where you can start typing your program as follows −
 

>myString <- "Hello, World"
>print (myString)
[1] "Hello, World!"

Here the first statement defines a string variable myString, where we assign a string “Hello, World!” and then the next statement print() is being used to print the value stored in myString variable.

R data-types

While doing programming in any programming language, you need to use various variables to store various information. Variables are nothing but reserved memory locations to store values. This means that when you create a variable you reserve some space in memory.

In contrast to other programming languages like C and java in R, the variables are not declared as some data type. The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable. There are many types of R-objects. The frequently used ones are −

  • Vectors
  • Lists
  • Matrices
  • Arrays
  • Factors
  • Data Frames

Vectors

#create a vector and find the elements which are >5
v<-c(1,2,3,4,5,6,5,8)
v[v>5]

#subset
subset(v,v>5)

#position in the vector created in which square of the numbers of v is >10 holds good
which(v*v>10)

#to know the values 
v[v*v>10]

Output: [1] 6 8 Output: [1] 6 8 Output: [1] 4 5 6 7 8 Output: [1] 4 5 6 5 8

Matrices

A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function.

#matrices: a vector with two dimensional attributes
mat<-matrix(c(1,2,3,4))
 
mat1<-matrix(c(1,2,3,4),nrow=2)
mat1

Output:     [,1] [,2] [1,]    1    3 [2,]    2    4

mat2<-matrix(c(1,2,3,4),ncol=2,byrow=T)
mat2

Output:       [,1] [,2] [1,]    1    2 [2,]    3    4

mat3<-matrix(c(1,2,3,4),byrow=T)
mat3

#transpose of matrix
mattrans<-t(mat)
mattrans

#create a character matrix called fruits with elements apple, orange, pear, grapes
fruits<-matrix(c("apple","orange","pear","grapes"),2)
#create 3×4 matrix of marks obtained in each quarterly exams for 4 different subjects 
X<-matrix(c(50,70,40,90,60, 80,50, 90,100, 50,30, 70),nrow=3)
X

#give row names and column names
rownames(X)<-paste(prefix="Test.",1:3)
subs<-c("Maths", "English", "Science", "History")
colnames(X)<-subs
X

Output:       [,1]  [1,]    1  [2,]    2  [3,]    3  [4,]    4 Output:      [,1] [,2] [,3] [,4]  [1,]    1    2    3    4 Output:      [,1] [,2] [,3] [,4]  [1,]   50   90   50   50  [2,]   70   60   90   30  [3,]   40   80  100   70 Output:   Maths English Science History  Test. 1    50      90      50      50  Test. 2    70      60      90      30  Test. 3    40      80     100      70

Arrays

While matrices are confined to two dimensions, arrays can be of any number of dimensions. The array function takes a dim attribute which creates the required number of dimensions. In the below example we create an array with two elements which are 3×3 matrices each.

#Arrays
arr<-array(1:24,dim=c(3,4,2))
arr

#create an array using alphabets with dimensions 3 rows, 2 columns and 3 arrays
arr1<-array(letters[1:18],dim=c(3,2,3))

#select only 1st two matrix of an array
arr1[,,c(1:2)]

#LIST
X<-list(u=2, n='abc')
X
X$u
 [,1] [,2] [,3] [,4]
 [,1] [,2] [,3] [,4]
 [,1] [,2]
 [,1] [,2]

Dataframes

Data frames are tabular data objects. Unlike a matrix in a data frame, each column can contain different modes of data. The first column can be numeric while the second column can be character and the third column can be logical. It is a list of vectors of equal length.

#Dataframes
students<-c("J","L","M","K","I","F","R","S")
Subjects<-rep(c("science","maths"),each=2)
marks<-c(55,70,66,85,88,90,56,78)
data<-data.frame(students,Subjects,marks)
#Accessing dataframes
data[[1]]

data$Subjects
data[,1]

Output: [1] J L M K I F R S Levels: F I J K L M R S Output:   data$Subjects   [1] science science maths   maths   science science maths   maths     Levels: maths science 

Factors

Factors are the r-objects which are created using a vector. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.

Factors are created using the factor() function. The nlevels function gives the count of levels.

#Factors
x<-c(1,2,3)
factor(x)

#apply function
data1<-data.frame(age=c(55,34,42,66,77),bmi=c(26,25,21,30,22))
d<-apply(data1,2,mean)
d

#create two vectors age and gender and find mean age with respect to gender
age<-c(33,34,55,54)
gender<-factor(c("m","f","m","f"))
tapply(age,gender,mean)

Output: [1] 1 2 3 Levels: 1 2 3 Output:  age  bmi 54.8 24.8 Output:  f  m         44 44

R Variables

A variable provides us with named storage that our programs can manipulate. A variable in R can store an atomic vector, a group of atomic vectors, or a combination of many R objects. A valid variable name consists of letters, numbers, and the dot or underlines characters.

Rules for writing Identifiers in R

  1. Identifiers can be a combination of letters, digits, period (.), and underscore (_).
  2. It must start with a letter or a period. If it starts with a period, it cannot be followed by a digit.
  3. Reserved words in R cannot be used as identifiers.

Valid identifiers in R

total, sum, .fine.with.dot, this_is_acceptable, Number5

Invalid identifiers in R

tot@l, 5um, _fine, TRUE, .0ne

Best Practices

Earlier versions of R used underscore (_) as an assignment operator. So, the period (.) was used extensively in variable names having multiple words. Current versions of R support underscore as a valid identifier but it is good practice to use a period as word separators.
For example, a.variable.name is preferred over a_variable_name or alternatively we could use camel case as aVariableName.

Constants in R

Constants, as the name suggests, are entities whose value cannot be altered. Basic types of constant are numeric constants and character constants.

Numeric Constants

All numbers fall under this category. They can be of type integer, double or complex. It can be checked with the typeof() function.
Numeric Constants followed by L are regarded as integers and those followed by i are regarded as complex.

> typeof(5)
> typeof(5L)
> typeof(5L)

[1] “double” [1] “double” [[1] “double”

Character Constants

Character constants can be represented using either single quotes (‘) or double quotes (“) as delimiters.

> 'example'
> typeof("5")

[1] "example" [1] "character"

R Operators

Operators – Arithmetic, Relational, Logical, Assignment, and some of the Miscellaneous Operators that R programming language provides. 

There are four main categories of Operators in the R programming language.

  1. Arithmetic Operators
  2. Relational Operators
  3. Logical Operators
  4. Assignment Operators
  5. Mixed Operators

x <- 35
y<-10

   x+y       > x-y     > x*y       > x/y      > x%/%y     > x%%y   > x^y   [1] 45      [1] 25    [1] 350    [1] 3.5      [1] 3      [1] 5 [1]2.75e+15 

Logical Operators

The below table shows the logical operators in R. Operators & and | perform element-wise operation producing result having a length of the longer operand. But && and || examines only the first element of the operands resulting in a single length logical vector.

a <- c(TRUE,TRUE,FALSE,0,6,7)
b <- c(FALSE,TRUE,FALSE,TRUE,TRUE,TRUE)
a&b 
[1] FALSE TRUE FALSE FALSE TRUE TRUE
a&&b
[1] FALSE
> a|b
[1] TRUE TRUE FALSE TRUE TRUE TRUE
> a||b
[1] TRUE
> !a
[1] FALSE FALSE TRUE TRUE FALSE FALSE
> !b
[1] TRUE FALSE TRUE FALSE FALSE FALSE

R functions

Functions are defined using the function() directive and are stored as R objects just like anything else. In particular, they are R objects of class “function”. Here’s a simple function that takes no arguments simply prints ‘Hi statistics’.

#define the function
f <- function() {
print("Hi statistics!!!")
}
#Call the function
f()

Output: [1] "Hi statistics!!!"

Now let’s define a function called standardize, and the function has a single argument x which is used in the body of a function.

#Define the function that will calculate standardized score.
standardize = function(x) {
m = mean(x)
sd = sd(x)
result = (x – m) / sd
result
}
input<- c(40:50) #Take input for what we want to calculate a standardized score.
standardize(input) #Call the function

Output:   standardize(input) #Call the function   [1] -1.5075567 -1.2060454 -0.9045340 -0.6030227 -0.3015113 0.0000000 0.3015113 0.6030227 0.9045340 1.2060454 1.5075567 

Loop Functions

R has some very useful functions which implement looping in a compact form to make life easier. The very rich and powerful family of applied functions is made of intrinsically vectorized functions. These functions in R allow you to apply some function to a series of objects (eg. vectors, matrices, data frames, or files). They include:

  1. lapply(): Loop over a list and evaluate a function on each element
  2. sapply(): Same as lapply but try to simplify the result
  3. apply(): Apply a function over the margins of an array
  4. tapply(): Apply a function over subsets of a vector
  5. mapply(): Multivariate version of lapply

There is another function called split() which is also useful, particularly in conjunction with lapply.

R Vectors

A vector is a sequence of data elements of the same basic type. Members in a vector are officially called components. Vectors are the most basic R data objects and there are six types of atomic vectors. They are logical, integer, double, complex, character, and raw.

The c() function can be used to create vectors of objects by concatenating things together. 
x <- c(1,2,3,4,5) #double
x #If you use only x auto-printing occurs
l <- c(TRUE, FALSE) #logical
l <- c(T, F) ## logical
c <- c("a", "b", "c", "d") ## character
i <- 1:20 ## integer
cm <- c(2+2i, 3+3i) ## complex
print(l)
print(c)
print(i)
print(cm)

You can see the type of each vector using typeof() function in R.
typeof(x)
typeof(l)
typeof(c)
typeof(i)
typeof(cm)

Output: print(l) [1] TRUE FALSE   print(c)   [1] "a" "b" "c" "d"   print(i)   [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20   print(cm)   [1] 2+2i 3+3i Output: typeof(x) [1] "double"   typeof(l)   [1] "logical"   typeof(c)   [1] "character"   typeof(i)   [1] "integer"   typeof(cm)   [1] "complex" 

Creating a vector using seq() function:

We can use the seq() function to create a vector within an interval by specifying step size or specifying the length of the vector. 

seq(1:10) #By default it will be incremented by 1
seq(1, 20, length.out=5) # specify length of the vector
seq(1, 20, by=2) # specify step size

Output: > seq(1:10) #By default it will be incremented by 1 [1] 1 2 3 4 5 6 7 8 9 10 > seq(1, 20, length.out=5) # specify length of the vector [1] 1.00 5.75 10.50 15.25 20.00 > seq(1, 20, by=2) # specify step size [1] 1 3 5 7 9 11 13 15 17 19

Extract Elements from a Vector:

Elements of a vector can be accessed using indexing. The vector indexing can be logical, integer, or character. The [ ] brackets are used for indexing. Indexing starts with position 1, unlike most programming languages where indexing starts from 0.

Extract Using Integer as Index:

We can use integers as an index to access specific elements. We can also use negative integers to return all elements except that specific element.

x<- 101:110
x[1]   #access the first element
x[c(2,3,4,5)] #Extract 2nd, 3rd, 4th, and 5th elements
x[5:10]        #Extract all elements from 5th to 10th
x[c(-5,-10)] #Extract all elements except 5th and 10th
x[-c(5:10)] #Extract all elements except from 5th to 10th 

Output:   x[1] #Extract the first element   [1] 101   x[c(2,3,4,5)] #Extract 2nd, 3rd, 4th, and 5th elements   [1] 102 103 104 105   x[5:10] #Extract all elements from 5th to 10th   [1] 105 106 107 108 109 110   x[c(-5,-10)] #Extract all elements except 5th and 10th   [1] 101 102 103 104 106 107 108 109   x[-c(5:10)] #Extract all elements except from 5th to 10th   [1] 101 102 103 104 

Extract Using Logical Vector as Index:

If you use a logical vector for indexing, the position where the logical vector is TRUE will be returned.

x[x < 105]
x[x>=104]

Output:   x[x < 105] [1] 101 102 103 104 x[x>=104]   [1] 104 105 106 107 108 109 110 

Modify a Vector in R:

We can modify a vector and assign a new value to it. You can truncate a vector by using reassignments. Check the below example. 

x<- 10:12
x[1]<- 101 #Modify the first element
x
x[2]<-102 #Modify the 2nd element
x
x<- x[1:2] #Truncate the last element
x 

Output:   x   [1] 101 11 12   x[2]<-102 #Modify the 2nd element   x   [1] 101 102 12   x<- x[1:2] #Truncate the last element   x   [1] 101 102 

Arithmetic Operations on Vectors:

We can use arithmetic operations on two vectors of the same length. They can be added, subtracted, multiplied, or divided. Check the output of the below code.

# Create two vectors.
v1 <- c(1:10)
v2 <- c(101:110)

# Vector addition.
add.result <- v1+v2
print(add.result)
# Vector subtraction.
sub.result <- v2-v1
print(sub.result)
# Vector multiplication.
multi.result <- v1*v2
print(multi.result)
# Vector division.
divi.result <- v2/v1
print(divi.result)

Output:   print(add.result)   [1] 102 104 106 108 110 112 114 116 118 120   print(sub.result)   [1] 100 100 100 100 100 100 100 100 100 100   print(multi.result)   [1] 101 204 309 416 525 636 749 864 981 1100   print(divi.result)   [1] 101.00000 51.00000 34.33333 26.00000 21.00000 17.66667 15.28571 13.50000 12.11111 11.00000 

Find Minimum and Maximum in a Vector:

The minimum and the maximum of a vector can be found using the min() or the max() function. range() is also available which returns the minimum and maximum in a vector.

x<- 1001:1010
max(x) # Find the maximum
min(x) # Find the minimum
range(x) #Find the range

Output:   max(x) # Find the maximum   [1] 1010   min(x) # Find the minimum   [1] 1001   range(x) #Find the range   [1] 1001 1010 

R Lists

The list is a data structure having elements of mixed data types. A vector having all elements of the same type is called an atomic vector but a vector having elements of a different type is called list.
We can check the type with typeof() or class() function and find the length using length()function.

x <- list("stat",5.1, TRUE, 1 + 4i)
x
class(x)
typeof(x)
length(x)

Output:   x   [[1]]   [1] "stat"   [[2]]   [1] 5.1   [[3]]   [1] TRUE   [[4]]   [1] 1+4i   class(x)   [1] “list”   typeof(x)   [1] “list”   length(x)   [1] 4 

You can create an empty list of a prespecified length with the vector() function.

x <- vector("list", length = 10)
x

Output:   x   [[1]]   NULL   [[2]]   NULL   [[3]]   NULL   [[4]]   NULL   [[5]]   NULL   [[6]]   NULL   [[7]]   NULL   [[8]]   NULL   [[9]]   NULL   [[10]]   NULL 

How to extract elements from a list?

Lists can be subset using two syntaxes, the $ operator, and square brackets []. The $ operator returns a named element of a list. The [] syntax returns a list, while the [[]] returns an element of a list.

# subsetting
l$e
l["e"]
l[1:2]
l[c(1:2)] #index using integer vector
l[-c(3:length(l))] #negative index to exclude elements from 3rd up to last.
l[c(T,F,F,F,F)] # logical index to access elements

Output: > l$e [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 0 0 0 0 0 0 0 0 0 [2,] 0 1 0 0 0 0 0 0 0 0 [3,] 0 0 1 0 0 0 0 0 0 0 [4,] 0 0 0 1 0 0 0 0 0 0 [5,] 0 0 0 0 1 0 0 0 0 0 [6,] 0 0 0 0 0 1 0 0 0 0 [7,] 0 0 0 0 0 0 1 0 0 0 [8,] 0 0 0 0 0 0 0 1 0 0 [9,] 0 0 0 0 0 0 0 0 1 0 [10,] 0 0 0 0 0 0 0 0 0 1 > l["e"] $e [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 0 0 0 0 0 0 0 0 0 [2,] 0 1 0 0 0 0 0 0 0 0 [3,] 0 0 1 0 0 0 0 0 0 0 [4,] 0 0 0 1 0 0 0 0 0 0 [5,] 0 0 0 0 1 0 0 0 0 0 [6,] 0 0 0 0 0 1 0 0 0 0 [7,] 0 0 0 0 0 0 1 0 0 0 [8,] 0 0 0 0 0 0 0 1 0 0 [9,] 0 0 0 0 0 0 0 0 1 0 [10,] 0 0 0 0 0 0 0 0 0 1 > l[1:2] [[1]] [1] 1 2 3 4 [[2]] [1] FALSE > l[c(1:2)] #index using integer vector [[1]] [1] 1 2 3 4 [[2]] [1] FALSE > l[-c(3:length(l))] #negative index to exclude elements from 3rd up to last. [[1]] [1] 1 2 3 4 [[2]] [1] FALSE l[c(T,F,F,F,F)] [[1]] [1] 1 2 3 4

Modifying a List in R:

We can change components of a list through reassignment.

l[["name"]] <- "Kalyan Nandi"
l

Output: [[1]] [1] 1 2 3 4 [[2]] [1] FALSE [[3]] [1] “Hello Statistics!” $d function (arg = 42) { print(“Hello World!”) } $name [1] “Kalyan Nandi”

R Matrices

In R Programming Matrix is a two-dimensional data structure. They contain elements of the same atomic types. A Matrix can be created using the matrix() function. R can also be used for matrix calculations. Matrices have rows and columns containing a single data type. In a matrix, the order of rows and columns is important. Dimension can be checked directly with the dim() function and all attributes of an object can be checked with the attributes() function. Check the below example.

Creating a matrix in R

m <- matrix(nrow = 2, ncol = 3)
dim(m)
attributes(m)
m <- matrix(1:20, nrow = 4, ncol = 5)
m

Output:   dim(m)   [1] 2 3   attributes(m)   $dim   [1] 2 3   m <- matrix(1:20, nrow = 4, ncol = 5)   m   [,1] [,2] [,3] [,4] [,5]   [1,] 1 5 9 13 17   [2,] 2 6 10 14 18   [3,] 3 7 11 15 19   [4,] 4 8 12 16 20 

Matrices can be created by column-binding or row-binding with the cbind() and rbind() functions.

x<-1:3
y<-10:12
z<-30:32
cbind(x,y,z)
rbind(x,y,z)

Output:   cbind(x,y,z)   x y z   [1,] 1 10 30   [2,] 2 11 31   [3,] 3 12 32   rbind(x,y,z)   [,1] [,2] [,3]   x 1 2 3   y 10 11 12   z 30 31 32 

By default, the matrix function reorders a vector into columns, but we can also tell R to use rows instead.

x <-1:9
matrix(x, nrow = 3, ncol = 3)
matrix(x, nrow = 3, ncol = 3, byrow = TRUE)

Output   cbind(x,y,z)   x y z   [1,] 1 10 30   [2,] 2 11 31   [3,] 3 12 32   rbind(x,y,z)   [,1] [,2] [,3]   x 1 2 3   y 10 11 12   z 30 31 32 

R Arrays

In R, Arrays are the data types that can store data in more than two dimensions. An array can be created using the array() function. It takes vectors as input and uses the values in the dim parameter to create an array. If you create an array of dimensions (2, 3, 4) then it creates 4 rectangular matrices each with 2 rows and 3 columns. Arrays can store only data type.

Give a Name to Columns and Rows:

We can give names to the rows, columns, and matrices in the array by setting the dimnames parameter.

v1 <- c(1,2,3)
v2 <- 100:110
col.names <- c("Col1","Col2","Col3","Col4","Col5","Col6","Col7")
row.names <- c("Row1","Row2")
matrix.names <- c("Matrix1","Matrix2")
arr4 <- array(c(v1,v2), dim=c(2,7,2), dimnames = list(row.names,col.names, matrix.names))
arr4

Output: , , Matrix1 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Row1 1 3 101 103 105 107 109 Row2 2 100 102 104 106 108 110 , , Matrix2 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Row1 1 3 101 103 105 107 109 Row2 2 100 102 104 106 108 110

Accessing/Extracting Array Elements:

# Print the 2nd row of the 1st matrix of the array.
print(arr4[2,,1])
# Print the element in the 2nd row and 4th column of the 2nd matrix.
print(arr4[2,4,2])
# Print the 2nd Matrix.
print(arr4[,,2])

Output: > print(arr4[2,,1]) Col1 Col2 Col3 Col4 Col5 Col6 Col7 2 100 102 104 106 108 110 > > # Print the element in the 2nd row and 4th column of the 2nd matrix. > print(arr4[2,4,2]) [1] 104 > > # Print the 2nd Matrix. > print(arr4[,,2]) Col1 Col2 Col3 Col4 Col5 Col6 Col7 Row1 1 3 101 103 105 107 109 Row2 2 100 102 104 106 108 110

R Factors

Factors are used to represent categorical data and can be unordered or ordered. An example might be “Male” and “Female” if we consider gender. Factor objects can be created with the factor() function.

x <- factor(c("male", "female", "male", "male", "female"))
x
table(x)

Output:   x   [1] male female male male female   Levels: female male   table(x)   x   female male     2      3 

By default, Levels are put in alphabetical order. If you print the above code you will get levels as female and male. But if you want to get your levels in a particular order then set levels parameter like this.

x <- factor(c("male", "female", "male", "male", "female"), levels=c("male", "female"))
x
table(x)

Output:   x   [1] male female male male female   Levels: male female   table(x)   x   male female    3      2 

R Dataframes

Data frames are used to store tabular data in R. They are an important type of object in R and are used in a variety of statistical modeling applications. Data frames are represented as a special type of list where every element of the list has to have the same length. Each element of the list can be thought of as a column and the length of each element of the list is the number of rows. Unlike matrices, data frames can store different classes of objects in each column. Matrices must have every element be the same class (e.g. all integers or all numeric).

Creating a Data Frame:

Data frames can be created explicitly with the data.frame() function.

employee <- c('Ram','Sham','Jadu')
salary <- c(21000, 23400, 26800)
startdate <- as.Date(c('2016-11-1','2015-3-25','2017-3-14'))
employ_data <- data.frame(employee, salary, startdate)
employ_data
View(employ_data)

Output: employ_data employee salary startdate 1 Ram 21000 2016-11-01 2 Sham 23400 2015-03-25 3 Jadu 26800 2017-03-14   View(employ_data) 

Get the Structure of the Data Frame:

If you look at the structure of the data frame now, you see that the variable employee is a character vector, as shown in the following output:

str(employ_data)

Output: > str(employ_data) 'data.frame': 3 obs. of 3 variables: $ employee : Factor w/ 3 levels "Jadu","Ram","Sham": 2 3 1 $ salary : num 21000 23400 26800 $ startdate: Date, format: "2016-11-01" "2015-03-25" "2017-03-14"

Note that the first column, employee, is of type factor, instead of a character vector. By default, data.frame() function converts character vector into factor. To suppress this behavior, we can pass the argument stringsAsFactors=FALSE.

employ_data <- data.frame(employee, salary, startdate, stringsAsFactors = FALSE)
str(employ_data)

Output: 'data.frame': 3 obs. of 3 variables: $ employee : chr "Ram" "Sham" "Jadu" $ salary : num 21000 23400 26800 $ startdate: Date, format: "2016-11-01" "2015-03-25" "2017-03-14"

R Packages

The primary location for obtaining R packages is CRAN.

You can obtain information about the available packages on CRAN with the available.packages() function.
a <- available.packages()

head(rownames(a), 30) # Show the names of the first 30 packages
Packages can be installed with the install.packages() function in R.  To install a single package, pass the name of the lecture to the install.packages() function as the first argument.
The following code installs the ggplot2 package from CRAN.
install.packages(“ggplot2”)
You can install multiple R packages at once with a single call to install.packages(). Place the names of the R packages in a character vector.
install.packages(c(“caret”, “ggplot2”, “dplyr”))
 

Loading packages
Installing a package does not make it immediately available to you in R; you must load the package. The library() function is used to load packages into R. The following code is used to load the ggplot2 package into R. Do not put the package name in quotes.
library(ggplot2)
If you have Installed your packages without root access using the command install.packages(“ggplot2″, lib=”/data/Rpackages/”). Then to load use the below command.
library(ggplot2, lib.loc=”/data/Rpackages/”)
After loading a package, the functions exported by that package will be attached to the top of the search() list (after the workspace).
library(ggplot2)

search()

R – CSV() files

In R, we can read data from files stored outside the R environment. We can also write data into files that will be stored and accessed by the operating system. R can read and write into various file formats like CSV, Excel, XML, etc.

Getting and Setting the Working Directory

We can check which directory the R workspace is pointing to using the getwd() function. You can also set a new working directory using setwd()function.

# Get and print current working directory.
print(getwd())

# Set current working directory.
setwd("/web/com")

# Get and print current working directory.
print(getwd())

Output: [1] "/web/com/1441086124_2016" [1] "/web/com"

Input as CSV File

The CSV file is a text file in which the values in the columns are separated by a comma. Let’s consider the following data present in the file named input.csv.

You can create this file using windows notepad by copying and pasting this data. Save the file as input.csv using the save As All files(*.*) option in notepad.

Reading a CSV File

Following is a simple example of read.csv() function to read a CSV file available in your current working directory −

data <- read.csv("input.csv")
print(data)
  id,   name,    salary,   start_date,     dept

R- Charts and Graphs

R- Pie Charts

Pie charts are created with the function pie(x, labels=) where x is a non-negative numeric vector indicating the area of each slice and labels= notes a character vector of names for the slices.

Syntax

The basic syntax for creating a pie-chart using the R is −

pie(x, labels, radius, main, col, clockwise)

Following is the description of the parameters used −

  • x is a vector containing the numeric values used in the pie chart.
  • labels are used to give a description of the slices.
  • radius indicates the radius of the circle of the pie chart. (value between −1 and +1).
  • main indicates the title of the chart.
  • col indicates the color palette.
  • clockwise is a logical value indicating if the slices are drawn clockwise or anti-clockwise.

Simple Pie chart

# Simple Pie Chart
slices <- c(10, 12,4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")
pie(slices, labels = lbls, main="Pie Chart of Countries")

 

3-D pie chart

The pie3D( ) function in the plotrix package provides 3D exploded pie charts.

# 3D Exploded Pie Chart
library(plotrix)
slices <- c(10, 12, 4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")
pie3D(slices,labels=lbls,explode=0.1,
   main="Pie Chart of Countries ")

R -Bar Charts

A bar chart represents data in rectangular bars with a length of the bar proportional to the value of the variable. R uses the function barplot() to create bar charts. R can draw both vertical and Horizontal bars in the bar chart. In the bar chart, each of the bars can be given different colors.

Let us suppose, we have a vector of maximum temperatures (in degree Celsius) for seven days as follows.

max.temp <- c(22, 27, 26, 24, 23, 26, 28)
barplot(max.temp)

Some of the frequently used ones are, “main” to give the title, “xlab” and “ylab” to provide labels for the axes, names.arg for naming each bar, “col” to define color, etc.

We can also plot bars horizontally by providing the argument horiz=TRUE.

# barchart with added parameters
barplot(max.temp,
main = "Maximum Temperatures in a Week",
xlab = "Degree Celsius",
ylab = "Day",
names.arg = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"),
col = "darkred",
horiz = TRUE)

Simply doing barplot(age) will not give us the required plot. It will plot 10 bars with height equal to the student’s age. But we want to know the number of students in each age category.

This count can be quickly found using the table() function, as shown below.

> table(age)
age
16 17 18 19 
1  2  6  1

Now plotting this data will give our required bar plot. Note below, that we define the argument “density” to shade the bars.

barplot(table(age),
main="Age Count of 10 Students",
xlab="Age",
ylab="Count",
border="red",
col="blue",
density=10
)

 

A histogram represents the frequencies of values of a variable bucketed into ranges. Histogram is similar to bar chat but the difference is it groups the values into continuous ranges. Each bar in histogram represents the height of the number of values present in that range.

R creates histogram using hist() function. This function takes a vector as an input and uses some more parameters to plot histograms.

Syntax

The basic syntax for creating a histogram using R is −

hist(v,main,xlab,xlim,ylim,breaks,col,border)

Following is the description of the parameters used −

  • v is a vector containing numeric values used in the histogram.
  • main indicates the title of the chart.
  • col is used to set the color of the bars.
  • border is used to set the border color of each bar.
  • xlab is used to give a description of the x-axis.
  • xlim is used to specify the range of values on the x-axis.
  • ylim is used to specify the range of values on the y-axis.
  • breaks are used to mention the width of each bar.

Example

A simple histogram is created using input vector, label, col, and border parameters.

The script given below will create and save the histogram in the current R working directory.

# Create data for the graph.
v <-  c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.
png(file = "histogram.png")

# Create the histogram.
hist(v,xlab = "Weight",col = "yellow",border = "blue")

# Save the file.
dev.off()

 

Range of X and Y values

To specify the range of values allowed in X axis and Y axis, we can use the xlim and ylim parameters.

The width of each bar can be decided by using breaks.

# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.
png(file = "histogram_lim_breaks.png")

# Create the histogram.
hist(v,xlab = "Weight",col = "green",border = "red", xlim = c(0,40), ylim = c(0,5),
   breaks = 5)

# Save the file.
dev.off()

R vs SAS – Which Tool is Better?

The debate around data analytics tools has been going on forever. Each time a new one comes out, comparisons transpire. Although many aspects of the tool remain subjective, beginners want to know which tool is better to start with.
The most popular and widely used tools for data analytics are R and SAS. Both of them have been around for a long time and are often pitted against each other. So, let’s compare them based on the most relevant factors.

  1. Availability and Cost: SAS is widely used in most private organizations as it is a commercial software. It is more expensive than any other data analytics tool available. It might thus be a bit difficult buying the software if you are an individual professional or a student starting out. On the other hand, R is an open source software and is completely free to use. Anyone can begin using it right away without having to spend a penny. So, regarding availability and cost, R is hands down the better tool.
  2. Ease of learning: Since SAS is a commercial software, it has a whole lot of online resources available. Also, those who already know SQL might find it easier to adapt to SAS as it comes with PROC SQL option. The tool has a user-friendly GUI. It comes with an extensive documentation and tutorial base which can help early learners get started seamlessly. Whereas, the learning curve for R is quite steep. You need to learn to code at the root level and carrying out simple tasks demand a lot of time and effort with R. However, several forums and online communities post religiously about its usage.
  3. Data Handling Capabilities: When it comes to data handling, both SAS and R perform well, but there are some caveats for the latter. While SAS can even churn through terabytes of data with ease, R might be constrained as it makes use of the available RAM in the machine. This can be a hassle for 32-bit systems with low RAM capacity. Due to this, R can at times become unresponsive or give an ‘out of memory’ error. Both of them can run parallel computations, support integrations for Hadoop, Spark, Cloudera and Apache Pig among others. Also, the availability of devices with better RAM capacity might negate the disadvantages of R.
  4. Graphical Capabilities: Graphical capabilities or data visualization is the strongest forte of R. This is where SAS lacks behind in a major way. R has access to packages like GGPlot, RGIS, Lattice, and GGVIS among others which provide superior graphical competency. In comparison, Base SAS is struggling hard to catch up with the advancements in graphics and visualization in data analytics. Even the graphics packages available in SAS are poorly documented which makes them difficult to use.
  5. Advancements in Tool: Advancements in the industry give way to advancements in tools, and both SAS and R hold up pretty well in this regard. SAS, being a corporate software, rolls out new features and technologies frequently with new versions of its software. However, the updates are not as fast as R since it is open source software and has many contributors throughout the world. Alternatively, the latest updates in SAS are pushed out after thorough testing, making them much more stable, and reliable than R. Both the tools come with a fair share of pros & cons.
  6. Job Scenario: Currently, large corporations insist on using SAS, but SMEs and start-ups are increasingly opting for R, given that it’s free. The current job trend seems to show that while SAS is losing its momentum, R is gaining potential. The job scenario is on the cusp of change, and both the tools seem strong, but since R is on an uphill path, it can probably witness more jobs in the future, albeit not in huge corporates.
  7. Deep Learning Support: While SAS has just begun work on adding deep learning support, R has added support for a few packages which enable deep learning capabilities in the tool. You can use KerasR and keras package in R which are mere interfaces for the original Keras package built on Python. Although none of the tools are excellent facilitators of deep learning, R has seen some recent active developments on this front.
  8. Customer Service Support and Community: As one would expect from full-fledged commercial software, SAS offers excellent customer service support as well as the backing of a helpful community. Since R is free open-source software, expecting customer support will be hard to justify. However, it has a vast online community that can help you with almost everything. On the other hand, no matter what problem you face with SAS, you can immediately reach out to their customer support and get it solved without any hassles.

Final Verdict
As per estimations by the Economic Times, the analytics industry will grow to $16 billion till 2025 in India. If you wish to venture into this domain, there can’t be a better time. Just start learning the tool you think is better based on the comparison points above.


Original article source at: https://www.mygreatlearning.com

#r #programming 

What Is R Programming Language? introduction & Basics

How You Can Fix The Unidentified index Notice in PHP

The unidentified index notice in PHP appears when you try to access an array variable with a key that doesn’t exist.

For example, suppose you have an associative array named $user with the following values:

<?php
$user = [
    "name" => "Nathan",
    "age" => 28,
    "hobby" => "programming",
];

Suppose you try to access the $user variable with the key user_id.

Because the $user variable doesn’t have a user_id key, PHP will respond with the unidentified index notice:

<?php
print $user["user_id"];

The code above will produce the following output:

Notice: Undefined index: user_id in ... on line ...

The notice above means PHP doesn’t know what you mean with user_id index in the code.

To solve this issue, you need to make sure that the array key exists by calling the isset() function:

<?php
$user = [
    "name" => "Nathan",
    "age" => 28,
    "hobby" => "programming",
];

if (isset($user["user_id"])) {
    print $user["user_id"];
} else {
    print "user_id does not exists";
}

A fellow once asked me, “isn’t it enough to put the variable inside an if statement without isset()?”

Without the isset() function, PHP will still emit the “undefined index” notice.

You need both the isset() function and the if statement to remove the notice.

This issue frequently appears when you are accessing data from the $_POST or $_GET variable.

The solution also works for these global variables:

// 👇 check if the name variable exists in $_POST
if (isset($_POST["name"])) {
    print $_POST["name"];
}

// 👇 check if the query variable exists in $_GET
if (isset($_GET["query"])) {
    print $_GET["query"];
}

If you are assigning the value to a variable, you can use the ternary operator to assign a default value to that variable.

Consider the example below:

// assign name from $_POST into $name
// otherwise, put User0 to $name
$name = isset($_POST["name"]) ? $_POST["name"] : "User0";

The ternary operator allows you to write a shorter code for the if..else check.

Now you’ve learned how to fix the unidentified index notice in PHP. Good work! 👍

Original article source at: https://sebhastian.com/ 

#php #index 

How You Can Fix The Unidentified index Notice in PHP

TwoBasedindexing.jl: Two-based Indexing

TwoBasedIndexing

This package implements two-based indexing in Julia. Two-based indexing affects only your code. Functions from other packages/modules will still function properly, but when you index into the arrays they return, the indices will start at 2 instead of 1. This makes it easy to gradually transition your codebase from obsolete one-based indexing to proper two-based indexing.

Usage

julia> using TwoBasedIndexing

julia> twobased() # enable two-based indexing in current module

julia> x = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3

julia> for i = 2:4 println(x[i]) end
1
2
3

julia> x[2] = 2
2

julia> x
3-element Array{Int64,1}:
 2
 2
 3

TODO

  • Don't rewrite non-numeric indexes or numeric indexes into Associatives
  • Rewrite BoundsErrors

Download Details:

Author: Simonster
Source Code: https://github.com/simonster/TwoBasedIndexing.jl 
License: View license

#julia #index 

TwoBasedindexing.jl: Two-based Indexing

Package to Calculate Leaf Area Index From Hemisperical Images

LeafAreaIndex

Tools to work with hemispherical pictures for the determination of Leaf Area Index (LAI).

Quick introduction

Install the package through

Pkg.clone("https://github.com/ETC-UA/LeafAreaIndex.jl")

The basic type used by this package is a PolarImage. You construct a PolarImage from a CameraLens type and an Image (or in general, an AbstractMatrix). Note that for LAI calculations typically only the blue channel of the image is used.

You can load the image eg. with the Images package:

using Images
img = imread("image.jpg")
imgblue = blue(img) #take the blue channel

or in case you have the raw image from the camera, we provide a more accurate, dedicated function to extract the pixels from the blue channel (using dcraw under the hood):

using LeafAreaIndex
imgblue = rawblueread("image.NEF")

Because the mapping of pixels on the image to coordinates in the scene is dependent on your camera setup, you must construct a configuration object with this information. A CameraLens type is constructed given an image size, the coordinates of the lens center and the (inverse) projection function. The projection function maps polar distance ρ [in pixels] on the image to the zenith angle θ [in radians] of the scene and is usually not linear. This project function depends on the specific (fish-eye) used and is usually polynomial approximated up to 2nd order as f(ρ/ρmax) = a₁θ + a₂θ² with ρmax the maximum visible radius. More general you can submit a vector A with the polynomial coefficients. The maximum radius ρmax and the lens center depends on the combination of camera together with the lens (and the image size depends obviously on the camera).

using LeafAreaIndex
mycameralens = CameraLens( (height, width), (centeri, centerj), ρmax, A)

The basic PolarImage type is then constructed:

polarimg = PolarImage(imgblue, mycameralens)

The first processing step is automatic thresholding (default method Ridler Calvard):

thresh = threshold(polarimg)

In the second step the (effective) LAI is estimated through the inversion model. The default method assumes an ellipsoidal leave angle distribution and uses a non-linear optimization method.

LAIe = inverse(polarimg, thresh)

Finally, the clumping factor can be estimated with the method of Lang Xiang (default with 45ᵒ segments in full view angle):

clump = langxiang(polarimg, thresh)

With clumping correction we obtain LAI = LAIe / clump.

Further methods

For images taken (always vertically upwards) on a domain with a slope of eg 10ᵒ and sloping downward to the East, you must include this information in your PolarImage with the Slope(inclination, direction) function:

myslope = SlopeParams(10/180*pi, pi/2)
polarimg = PolarImage(imgblue, mycameralens, myslope)

For downward taken (crop) images, create a mask to cut out the photographer's shoes and use the RedMax() method instead of thresholding to separate soil from (green) plant material

mymask = MaskParams(pi/3, -2*pi/3, -pi/3)
polarimg = PolarImage(imgblue, mycameralens, mymask)
LAIe = inverse(polarimg, RedMax())

Besides the default Ridler Calvard method, two more automatic thresholdingmethods Edge Detection and Minimum algorithm can be used:

thresh  = threshold(polarimg, RidlerCalvard())
thresh2 = threshold(polarimg, EdgeDetection())
thresh3 = threshold(polarimg, MinimumThreshold())

Further LAI estimation methods for the inversion model are available: * The EllipsLUT also assumes an ellipsoidal leaf angle distribution, but uses a Lookup Table approach instead of optimization approach. * The Zenith57 method uses a ring around the view angle of 57ᵒ (1 rad) where the ALIA influence is minimal; * The Miller method integrates several zenith rings assuming a constant leaf angle; and * The Lang method uses a first order regression on the Miller method.

LAI  = inverse(polarimg, thresh, EllipsOpt())
LAI2 = inverse(polarimg, thresh, EllipsLUT())
LAI3 = inverse(polarimg, thresh, Zenith57())
LAI4 = inverse(polarimg, thresh, Miller())
LAI5 = inverse(polarimg, thresh, Lang())

For the clumping factor, besides the method from Lang & Xiang, also the (experimental) method from Chen & Chilar is available:

clump2 = chencihlar(polarimg, thresh, 0, pi/2)

Lower level methods

Under the hood several lower level methods are used to access pixels and calculated gapfractions. We suggest to look at the code for their definition and usage.

To access the pixels in a particular zenith range, pixels(polarimg, pi/6, pi/3) will return a vector with pixels quickly, sorted by increasing ρ (and then by polar angles ϕ for identical ρ). A shortcut pixels(polarimg) is translated to pixels(polarimg, 0, pi/2).

The segments function can further split these ring pixels in n segments (eg. for clumping calculation). It returns a vector with n elements, each again a vector with the segment pixels.

For the gapfraction, we suggest (see online documentation) to use the contact frequencies $K(\theta_V) = -\ln[T(\theta_v)] \cos\theta_V$ for LAI inversion calculations, with $T$ the gapfraction and $\theta_V$ the view angle. The input N determines the number of rings between view angles θ1 and θ2 for a polar image with a certain threshold. The function returns a vector with angle edges of the rings, the weighted average midpoint angle for each ring and the contact frequency for each ring.

θedges, θmid, K = contactfreqs(polimg, θ1, θ2, N, thresh)

In case of problems or suggestion, don't hesitate to submit an issue through the issue tracker or code suggestions through a pull request.

Documentation

View the full documentation on (https://etc-ua.github.io/LeafAreaIndex.jl).

Download Details:

Author: ETC-UA
Source Code: https://github.com/ETC-UA/LeafAreaIndex.jl 
License: View license

#julia #index #images 

Package to Calculate Leaf Area Index From Hemisperical Images
Anissa  Barrows

Anissa Barrows

1659685980

ShouldaMatchers: Simple one-liner Tests for Common Rails Functionality

Shoulda Matchers

Shoulda Matchers provides RSpec- and Minitest-compatible one-liners to test common Rails functionality that, if written by hand, would be much longer, more complex, and error-prone.

Quick links

📖 Read the documentation for the latest version.
📢 See what's changed in recent versions.

Getting started

RSpec

Start by including shoulda-matchers in your Gemfile:

group :test do
  gem 'shoulda-matchers', '~> 5.0'
end

Then run bundle install.

Now you need to configure the gem by telling it:

  • which matchers you want to use in your tests
  • that you're using RSpec so that it can make those matchers available in your example groups

Rails apps

If you're working on a Rails app, simply place this at the bottom of spec/rails_helper.rb (or in a support file if you so choose):

Shoulda::Matchers.configure do |config|
  config.integrate do |with|
    with.test_framework :rspec
    with.library :rails
  end
end

Non-Rails apps

If you're not working on a Rails app, but you still make use of ActiveRecord or ActiveModel in your project, you can still use this gem too! In that case, you'll want to place the following configuration at the bottom of spec/spec_helper.rb:

Shoulda::Matchers.configure do |config|
  config.integrate do |with|
    with.test_framework :rspec

    # Keep as many of these lines as are necessary:
    with.library :active_record
    with.library :active_model
  end
end

Minitest

If you're using our umbrella gem Shoulda, then make sure that you're using the latest version:

group :test do
  gem 'shoulda', '~> 4.0'
end

Otherwise, add shoulda-matchers to your Gemfile:

group :test do
  gem 'shoulda-matchers', '~> 4.0'
end

Then run bundle install.

Now you need to configure the gem by telling it:

  • which matchers you want to use in your tests
  • that you're using Minitest so that it can make those matchers available in your test case classes

Rails apps

If you're working on a Rails app, simply place this at the bottom of test/test_helper.rb:

Shoulda::Matchers.configure do |config|
  config.integrate do |with|
    with.test_framework :minitest
    with.library :rails
  end
end

Non-Rails apps

If you're not working on a Rails app, but you still make use of ActiveRecord or ActiveModel in your project, you can still use this gem too! In that case, you'll want to place the following configuration at the bottom of test/test_helper.rb:

Shoulda::Matchers.configure do |config|
  config.integrate do |with|
    with.test_framework :minitest

    # Keep as many of these lines as are necessary:
    with.library :active_record
    with.library :active_model
  end
end

Usage

Most of the matchers provided by this gem are useful in a Rails context, and as such, can be used for different parts of a Rails app:

As the name of the gem indicates, most matchers are designed to be used in "one-liner" form using the should macro, a special directive available in both RSpec and Shoulda. For instance, a model test case may look something like:

# RSpec
RSpec.describe MenuItem, type: :model do
  describe 'associations' do
    it { should belong_to(:category).class_name('MenuCategory') }
  end

  describe 'validations' do
    it { should validate_presence_of(:name) }
    it { should validate_uniqueness_of(:name).scoped_to(:category_id) }
  end
end

# Minitest (Shoulda)
class MenuItemTest < ActiveSupport::TestCase
  context 'associations' do
    should belong_to(:category).class_name('MenuCategory')
  end

  context 'validations' do
    should validate_presence_of(:name)
    should validate_uniqueness_of(:name).scoped_to(:category_id)
  end
end

See below for the full set of matchers that you can use.

On the subject of subject

For both RSpec and Shoulda, the subject is an implicit reference to the object under test, and through the use of should as demonstrated above, all of the matchers make use of subject internally when they are run. A subject is always set automatically by your test framework in any given test case; however, in certain cases it can be advantageous to override it. For instance, when testing validations in a model, it is customary to provide a valid model instead of a fresh one:

# RSpec
RSpec.describe Post, type: :model do
  describe 'validations' do
    # Here we're using FactoryBot, but you could use anything
    subject { build(:post) }

    it { should validate_presence_of(:title) }
  end
end

# Minitest (Shoulda)
class PostTest < ActiveSupport::TestCase
  context 'validations' do
    subject { build(:post) }

    should validate_presence_of(:title)
  end
end

When overriding the subject in this manner, then, it's important to provide the correct object. When in doubt, provide an instance of the class under test. This is particularly necessary for controller tests, where it is easy to accidentally write something like:

RSpec.describe PostsController, type: :controller do
  describe 'GET #index' do
    subject { get :index }

    # This may work...
    it { should have_http_status(:success) }
    # ...but this will not!
    it { should permit(:title, :body).for(:post) }
  end
end

In this case, you would want to use before rather than subject:

RSpec.describe PostsController, type: :controller do
  describe 'GET #index' do
    before { get :index }

    # Notice that we have to assert have_http_status on the response here...
    it { expect(response).to have_http_status(:success) }
    # ...but we do not have to provide a subject for render_template
    it { should render_template('index') }
  end
end

Availability of RSpec matchers in example groups

Rails projects

If you're using RSpec, then you're probably familiar with the concept of example groups. Example groups can be assigned tags order to assign different behavior to different kinds of example groups. This comes into play especially when using rspec-rails, where, for instance, controller example groups, tagged with type: :controller, are written differently than request example groups, tagged with type: :request. This difference in writing style arises because rspec-rails mixes different behavior and methods into controller example groups vs. request example groups.

Relying on this behavior, Shoulda Matchers automatically makes certain matchers available in certain kinds of example groups:

  • ActiveRecord and ActiveModel matchers are available only in model example groups, i.e., those tagged with type: :model or in files located under spec/models.
  • ActionController matchers are available only in controller example groups, i.e., those tagged with type: :controller or in files located under spec/controllers.
  • The route matcher is available in routing example groups, i.e., those tagged with type: :routing or in files located under spec/routing.
  • Independent matchers are available in all example groups.

As long as you're using Rails, you don't need to worry about these details — everything should "just work".

Non-Rails projects

What if you are using ActiveModel or ActiveRecord outside of Rails, however, and you want to use model matchers in a certain example group? Then you'll need to manually include the module that holds those matchers into that example group. For instance, you might have to say:

RSpec.describe MySpecialModel do
  include Shoulda::Matchers::ActiveModel
  include Shoulda::Matchers::ActiveRecord
end

If you have a lot of similar example groups in which you need to do this, then you might find it more helpful to tag your example groups appropriately, then instruct RSpec to mix these modules into any example groups that have that tag. For instance, you could add this to your rails_helper.rb:

RSpec.configure do |config|
  config.include(Shoulda::Matchers::ActiveModel, type: :model)
  config.include(Shoulda::Matchers::ActiveRecord, type: :model)
end

And from then on, you could say:

RSpec.describe MySpecialModel, type: :model do
  # ...
end

should vs is_expected.to

In this README and throughout the documentation, you'll notice that we use the should form of RSpec's one-liner syntax over is_expected.to. Beside being the namesake of the gem itself, this is our preferred syntax as it's short and sweet. But if you prefer to use is_expected.to, you can do that too:

RSpec.describe Person, type: :model do
  it { is_expected.to validate_presence_of(:name) }
end

Matchers

Here is the full list of matchers that ship with this gem. If you need details about any of them, make sure to consult the documentation!

ActiveModel matchers

ActiveRecord matchers

ActionController matchers

  • filter_param tests parameter filtering configuration.
  • permit tests that an action places a restriction on the params hash.
  • redirect_to tests that an action redirects to a certain location.
  • render_template tests that an action renders a template.
  • render_with_layout tests that an action is rendered with a certain layout.
  • rescue_from tests usage of the rescue_from macro.
  • respond_with tests that an action responds with a certain status code.
  • route tests your routes.
  • set_session makes assertions on the session hash.
  • set_flash makes assertions on the flash hash.
  • use_after_action tests that an after_action callback is defined in your controller.
  • use_around_action tests that an around_action callback is defined in your controller.
  • use_before_action tests that a before_action callback is defined in your controller.

Routing matchers

Independent matchers

  • delegate_method tests that an object forwards messages to other, internal objects by way of delegation.

Extensions

Over time our community has created extensions to Shoulda Matchers. If you've created something that you want to share, please let us know!

Contributing

Have a fix for a problem you've been running into or an idea for a new feature you think would be useful? Take a look at the Contributing document for instructions on setting up the repo on your machine, understanding the codebase, and creating a good pull request.

Compatibility

Shoulda Matchers is tested and supported against Ruby 2.6+, Rails 5.2+, RSpec 3.x, and Minitest 5.x.

  • For Ruby < 2.4 and Rails < 4.1 compatibility, please use v3.1.3.
  • For Ruby < 3.0 and Rails < 6.1 compatibility, please use v4.5.1.

Versioning

Shoulda Matchers follows Semantic Versioning 2.0 as defined at https://semver.org.

Team

Shoulda Matchers is maintained by Elliot Winkler and Gui Albuk.

Copyright/License

Shoulda Matchers is copyright © 2006-2021 Tammer Saleh and thoughtbot, inc. It is free and opensource software and may be redistributed under the terms specified in the LICENSE file.

About thoughtbot

thoughtbotthoughtbot

The names and logos for thoughtbot are trademarks of thoughtbot, inc.

We are passionate about open source software. See our other projects. We are available for hire.


Author: thoughtbot
Source code: https://github.com/thoughtbot/shoulda-matchers
License: MIT license

#ruby   #ruby-on-rails 
 

ShouldaMatchers: Simple one-liner Tests for Common Rails Functionality
Dexter  Goodwin

Dexter Goodwin

1657321920

Search-index: Persistent, Network Resilient, Full Text Search Library

search-index

A network resilient, persistent full-text search library for the browser and Node.js    

Quick start

const si = require('search-index')

// initialize an index
const { PUT, QUERY } = await si()

// add documents to the index
await PUT( /* objects */ )

// read documents from the index
const results = await QUERY( /* query */ )

Documentation

Author: Fergiemcdowall
Source Code: https://github.com/fergiemcdowall/search-index 
License: MIT license

#javascript #node #browser #index #search 

Search-index: Persistent, Network Resilient, Full Text Search Library
Dexter  Goodwin

Dexter Goodwin

1657265460

Changes-index: Create indexes From A Leveldb Changes Feed

changes-index

create indexes from a leveldb changes feed

This package provides a way to create a materialized view on top of an append-only log.

To update an index, just change the index code and delete the indexed data.

example

Create a change feed and set up an index. Here we'll create an index for keys that start with the prefix 'user!' on the name and hackerspace properties.

var level = require('level');
var sublevel = require('subleveldown');
var changes = require('changes-feed');
var changesdown = require('changesdown');
var chi = require('../');

var argv = require('minimist')(process.argv.slice(2));

var up = level('/tmp/test.db', { valueEncoding: 'json' });
var feed = changes(sublevel(up, 'feed'));
var db = changesdown(sublevel(up, 'db'), feed, { valueEncoding: 'json' });

var indexes = chi({
    ixdb: level('/tmp/index.db', { valueEncoding: 'json' }),
    chdb: db,
    feed: feed
});
indexes.add(function (row, cb) {
    if (/^user!/.test(row.key)) {
        cb(null, {
            'user.name': row.value.name,
            'user.space': row.value.hackerspace
        });
    }
    else cb()
});

now we can create users:

if (argv._[0] === 'create') {
    var id = require('crypto').randomBytes(16).toString('hex');
    var name = argv._[1], space = argv._[2];
    var value = { name: name, hackerspace: space };
    
    userExists(name, function (err, ex) {
        if (err) return console.error(err);
        if (ex) return console.error('name in use');
        
        db.put('user!' + id, value, function (err) {
            if (err) console.error(err);
        });
    });
    
    function userExists (name, cb) {
        indexes.createReadStream('user.name', name, { gte: name, lte: name })
            .pipe(through.obj(write, end))
        ;
        function write (row, enc, next) { cb(null, true) }
        function end () { cb(null, false) }
    }
}

or clear (and implicitly regenerate) an existing index:

else if (argv._[0] === 'clear') {
    indexes.clear(argv._[1], function (err) {
        if (err) console.error(err);
    });
}

With these indexes we can list users by name and space:

else if (argv._[0] === 'by-name') {
    indexes.createReadStream('user.name', argv)
        .on('data', console.log)
    ;
}
else if (argv._[0] === 'by-space') {
    indexes.createReadStream('user.space', argv)
        .on('data', console.log)
    ;
}

methods

var chi = require('changes-index')

var indexes = chi(opts)

You must provide:

  • opts.ixdb - levelup database to use for indexing
  • opts.chdb - wrapped changesdown levelup database to lookup primary records
  • opts.feed - changes-feed handle wired up to the chdb

indexes.add(fn)

Create an index from a function fn(row, cb) that will be called for each put and delete. Your function fn must call cb(err, ix) with ix, an object mapping index names to values. The values from ix will be sorted according to the algorithm from bytewise.

indexes.createReadStream(name, opts)

Create a readable object-mode stream of the primary documents inserted into chdb based on the index given by name.

The stream will produce row objects with:

  • row.key - the key name put into changedown
  • row.value - the value put into changedown
  • row.index - the index value generated by the index function
  • row.exists - whether the key existed prior to this operation
  • row.prev - the previous set of indexes or null if the key was created
  • row.change - the monotonically increasing change sequence from changes-feed

This read stream can be bounded by all the usual levelup options:

  • opts.lt
  • opts.lte
  • opts.gt
  • opts.gte
  • opts.limit
  • opts.reverse

plus:

  • opts.eq

which is the same as setting opts.gte and opts.lte to the same value. This isn't common in ordinary levelup but is very common when dealing with indexes that map to other keys.

indexes.clear(name, cb)

Delete the index for name, calling cb(err) when finished.

versioning

The internals of this module may change between patch releases, which may affect how data is stored on disk.

When you upgrade this package over existing data, you should delete the indexes first.

Author: Substack
Source Code: https://github.com/substack/changes-index 
License: View license

#javascript #index #leveldb 

Changes-index: Create indexes From A Leveldb Changes Feed
Dexter  Goodwin

Dexter Goodwin

1657258020

Level-tree-index: Tree indexer for Leveldb / Levelup

A streaming tree structure index for leveldb.

Reference every value in your leveldb to its parent, e.g. by setting value.parentKey to the key of the parent, then level-tree-index will keep track of the full path for each value, allow you to look up parents and children, stream the entire tree or a part thereof and even perform streaming search queries on the tree.

This is useful for implementing e.g. nested comments.

level-tree-index works for all keyEncodings. It works for the json valueEncoding automatically and works for other valueEncodings if you provide custom functions for the opts.pathProp and opts.parentProp options. level-tree-index works equally well with string and buffer paths.

level-tree-index automatically keeps the tree updated as you add, change or delete from the database.

Usage

// db contains your data and idb is used to store the index
var tree = treeIndexer(db, idb);

db.put('1', {name: "foo"}, function(err) {
  if(err) fail(err);

  db.put('2', {parentKey: '1', name: "bar"}, function(err) {
    if(err) fail(err);

    db.put('3', {parentKey: '2', name: "baz"}, function(err) {
      if(err) fail(err);

      // wait for index to finish building
      setTimeout(function() {

        // stream child-paths of 'foo' recursively
        var s = tree.stream('foo');

        s.on('data', function(data) {
          console.log(data.path, data.key, data.value);
        });

      }, 500);
    });
  });
});

Read the unit tests in tests/ for more.

API

treeIndex(db, idb, [opts]) [constructor]

  • db: Your database to be indexed
  • idb: Database to use for storing the tree index

opts:

pathProp: 'name' // property used to construct the path
parentProp: 'parentKey' // property that references key of parent
sep: 0x1f, // path separator. can be string or unicode/ascii character code
pathArray: false, // for functions that output paths, output paths as arrays
ignore: false, // set to a function to selectively ignore 
listen: true, // listen for changes on db and update index automatically
uniquefy: false, // add uniqueProp to end of pathProp to ensure uniqueness
uniqProp: 'unique', // property used for uniqueness
uniqSep: 0x1e, // like `sep` but separates pathProp from uniqProp
levelup: false // if true, returns a levelup instance instead
orphanPath: 'orphans' // parent path of orphans

Both pathProp and parentProp can be either a string, a buffer or a function.

If a function is used then the function will be passed a value from your database as the only argument. The pathProp function is expected to return a string or buffer that will be used to construct the path by joining multiple returned pathProp values with the opts.sep value as separator. The parentProp function is expected to return the key in db of the parent.

opts.sep can be a buffer of a string and is used as a separator to construct the path to each node in the tree.

opts.ignore can be set to a function which will receive the key and value for each change and if it returns something truthy then that value will be ignored by the tree indexer, e.g:

Setting orphanPath to a string, buffer or array will cause all orphaned rows to have orphanPath as their parent path. Setting orphanPath to null will cause orphaned rows to be ignored (not indexed). An orphan is defined as a row with its parentProp set to a non-falsy value but where the referenced parent does not exist in the database. This can happen e.g. if a parent is deleted but its children are left in the database.

// ignore items where the .name property starts with an underscore
ignore: function(key, value) {
  if(typeof value === 'object') {
    if(typeof value.name === 'string') {
      if(value.name[0] === '_') {
        return true;
      }     
    }
  }
  return false;
} 

If opts.listen is true then level-tree-index will listen to operations on db and automatically update the index. Otherwise the index will only be updated when .put/.del/batch is called directly on the level-tree-index instance. This option is ignored when opts.levelup is true.

If opts.levelup is true then instead of a level-tree-index instance a levelup instance will be returned with all of the standard levelup API + the level-tree-index API. All calls to .put, .del or .batch will operate on the database given as the db argument and only call their callbacks once the tree index has been updated.

Limitations when using levelup:true:

  • Chained batch mode is not implemented.
  • It is currently not possible not to wait for the tree index to update so it will take longer before the .put, .del and .batch callbacks are called.
  • Key and value encoding happens before the data gets to level-tree-index so opts.pathProp and opts.parentProp must be set to functions and if you're using valueEncoding:'json' then those functions will receive the stringified json data.

See tests/levelup.js for how to use the levelup:true mode.

.getRoot(cb)

Get the path and key of the root element. E.g:

tree.getRoot(function(err, path, key) {
  console.log("Path of root element:", path);
  console.log("Key of root element:", key);
});

.stream([parentPath], [opts])

Recursively stream descendants starting from parentPath. If parentPath is falsy then the entire tree will be streamed to the specified depth.

Opts:

depth: 0, // how many (grand)children deep to go. 0 means infinite
paths: true, // output the path for each child
keys: true, // output the key for each child
values: true, // output the value for each child
pathArray: undefined, // output paths as arrays
ignore: false, // optional function that returns true for values to ignore
match: null, // Stream only matching elements. A string, buffer or function.
matchAncestors: false, // include ancestors of matches if true
gt: undefined, // specify gt directly, must then also specify lt or lte
gte: undefined, // specify gte directly, must then also specify lt or lte
lt: undefined, // specify lt directly, must then also specify gt or gte
lte: undefined // specify lte directly, must then also specify lt or gte

If parentPath is not specified then .gt/.gte and .lt/.lte must be specified.

opts.depth is currently not usable at the same time as opts.match.

If more than one of opts.paths, opts.keys and opts.values is true then the stream will output objects with these as properties.

opts.ignore can be set to a function. This function will receive whatever the stream is about to output (which depends on opts.paths, opts.keys and opts.values) and if the function returns true then those values will not be emitted by the stream.

opts.match allows for streaming search queries on the tree. If set to a string or buffer it will match any path that contains that string or buffer. If set to a RegEx then it will run a .match on the path with that RegEx (only works for string paths). If set to a function then that function will be called with the path as first argument and with the second argument depending on the values of opts.paths, opts.keys and opts.values, e.g:

match: function(path, o) {
  if(o.value.name.match("cattens")) {
   return true;
  }
  return false;
}

Setting opts.matchAncestors to true modifies the behaviour of opts.match to also match all ancestors of matched elements. Ancestors of matched elements will then be streamed in the correct order before the matched element. This requires some buffering so may slow down matches on very large tree indexes.

When using opts.lt/opts.lte you can use the convenience function .lteKey(key). E.g. to stream all paths that begin with 'foo.bar' you could run:

levelTree.stream({
  gte: 'foo.bar',
  lte: levelTree.lteKey('foo.bar')
});

Keep in mind that the above example would also return paths like 'foo.barbar'.

.lteKey(key)

Convenience function that, according to leveldb alphabetical ordering, returns the last possible string or buffer that begins with the specified string or buffer.

.parentStream(path, [opts])

Stream tree index ancestor paths starting from path. Like .stream() but traverses ancestors instead of descendants.

Opts:

height: 0, // how many (grand)children up to go. 0 means infinite
includeCurrent: true, // include the node specified by path in the stream 
paths: true, // output the path for each child
keys: true, // output the key for each child
values: true, // output the value for each child
pathArray: undefined, // output paths as arrays

.parents(path, [opts], cb)

Same as .parentStream but calls back with the results as an array.

.getFromPath(path, cb)

Get key and value from path.

Callback: cb(err, key, value)

.path(key, [opts], cb)

Get tree path given a key.

opts.pathArray: undefined // if true, split path into array 

Callback: cb(err, path)

.parentFromValue(value, cb)

Get parent value given a value.

Callback: cb(err, parentValue)

.parentPath(key, [opts], cb)

Get parent path given a key.

opts.pathArray: undefined // if true, split path into array

Callback: cb(err, parentPath)

.parentPathFromValue(key, [opts], cb)

Get parent path given a value.

opts.pathArray: undefined // if true, split path into array

Callback: cb(err, parentPath)

.parentFromPath(path, cb)

Get parent value given a path.

Callback: cb(err, parentValue)

.parentPathFromPath(path, [opts], cb)

Get parent path given a path.

opts.pathArray: undefined // if true, split path into array

Note: this function can be called synchronously

Callback: cb(err, parentPath)

.children(path, [opts], cb)

Get array of children given a value.

Same usage as .stream but this version isn't streaming.

Callback: cb(err, childArray)

.childrenFromKey(path, [opts], cb)

Same as .children but takes a key as input.

Same usage as .stream but this version isn't streaming.

Callback: cb(err, childArray)

.pathStream(parentPath, [opts])

Same as .stream with only opts.paths set to true.

.keyStream(parentPath, [opts])

Same as .stream with only opts.keys set to true.

.valueStream(parentPath, [opts])

Same as .stream with only opts.values set to true.

.clear(cb)

Clear the index. Deletes all of the index's data in the index db.

.build(cb)

Build the index from scratch.

Note: You will likely want to .clear the index first or call .rebuild instead.

.rebuild(cb)

Clear and then build the index.

.put(key, value, [opts], cb)

If you need to wait for the tree index to update after a .put operation then you can use .put directly on the level-tree-index instance and give it a callback. Calling .put this way is much less efficient so if you are planning to use this feature most of the time then you should look into using level-tree-index with the levelup:true option instead.

.del(key, [opts], cb)

Allows you to wait for the tree index to finish building using a callback. Same as .put above but for deletion.

Uniqueness

The way level-tree-index works requires that each indexed database entry has a globally unique path. In other words no two siblings can share the same pathProp.

You might get into a situation where you really need multiple siblings with an identical pathProp. Then you might wonder if you coulds just append e.g. a random string to each pathProp before passing it to level-tree-index and then strip it away again before e.g. showing the data to users.

Well, level-tree-index provides helpers for exactly that. You can set opts.uniquefy to true in the constructor. You will then need database each entry to have a property that, combined with its pathProp, makes it unique. This can be as simple as a long randomly generated string. As with pathProp you will have to inform level-tree-index about this property with uniqProp.

You will then run into the problem that you no longer know the actual path names since they have the uniqueness added. You can either get the actual path name using the synchronous function .getPathName(val) where val is the value from the key-value pair for which you want the path. Or you can call .put or .batch directly on your level-tree-index instance and they will pass your callback a second argument which for .put is the actual path name and for .batch is an array of path names for all put operations.

When uniqefy is turned on any functions returning paths will now be returning paths with the uniqueness data appended. You can use the convenience function .nicify(path) to convert these paths into normal paths without the uniqueness data. For .stream and any functions described as "same as .stream but ..." you can add set opts.nicePaths to true and in you will receive the nicified paths back with each result.

Async quirks

Note that when you call .put, .del or .batch on your database level-tree-index will not be able to delay the callback so you cannot expect the tree index to be up to date when the callback is called. That is why you see the setTimeout used in the usage example above. You can instead call .put, .del or .batch directly on the level-tree-index instance and your callback will not be called until the index has finished building. This works but if opts.listen is set to true then an inefficient and inelegant workaround is used (in order to prevent the change listener from attempting to update the already updated index) which could potentially slow things down.

If you want to wait for the index to update most of the time then you should probably either set opts.listen to false or use the levelup mode by calling the constructor with opts.levelup set to true, though that has its own drawbacks, especially if using valueEncoding:'json'. See the constructor API documentation for more.

Technical explanation

I normal operation (opts.levelup == false) level-tree-index will listen for any changes on your database and update its index every time a change occurs. This is implemented using leveup change event listeners which run after the database operation has already completed.

When running .put or .del directly on level-tree-index the operation is performed on the underlying database then the tree index is updated and then the callback is called. Since we can't turn off the change event listeners for a specific operation, level-tree-index has to remember the operations performed directly through .put or .del on the level-tree-index instance such that the change event listener can ignore them to prevent the tree-index update operation from being called twice. This is done by hashing the entire operation, saving the hash and then checking the hash of each operation picked up by the change event listeners agains the saved hash. This is obviously inefficient. If this feature is never used then nothing is ever hashed nor compared so performance will not be impacted.

ToDo

Before version 1.0

  • Get opts.depth working with opts.match.

Author: Biobricks
Source Code: https://github.com/biobricks/level-tree-index 
License: AGPLv3

#javascript #tree #index #leveldb 

Level-tree-index: Tree indexer for Leveldb / Levelup
Dexter  Goodwin

Dexter Goodwin

1657243080

Level-match-index: Index & Filter LevelDB Databases & Watch

level-match-index

Index and filter LevelDB databases and watch for future changes.

Example

Set up the view indexes and filters:


var Index = require('level-match-index')
var level = require('level')
var sub = require('level-sublevel')

var db = sub(level('database', {valueEncoding: 'json'}))

var views = {
  
  post: Index(db, {
    match: { type: 'post' },
    index: [ 'id' ],
    single: true
  }),

  postsByTag: Index(db, {
    match: { type: 'post' },
    index: [ {many: 'tags'} ] // index each tag in array seperately
  }),

  commentsByPost: Index(db, {
    match: { type: 'comment' },
    index: [ 'postId' ]
  })

}

Add some data:

var post1 = {
  id: 'post-1', // used for matching as specified above
  type: 'post', //
  title: 'Typical Blog Post Example',
  tags: [ 'test post', 'long winded' ],
  body: 'etc...',
  date: Date.now()
}

var post2 = {
  id: 'post-2',
  type: 'post',
  title: 'Typical Blog Post Example',
  tags: [ 'test post', 'exciting' ],
  body: 'etc...',
  date: Date.now()
}

var comment1 = {
  id: 'comment-1',
  type: 'comment', // used for matching as specified above
  postId: post1.id, //
  name: 'Matt McKegg',
  body: 'cool story bro',
  date: Date.now()
}

var comment2 = {
  id: 'comment-2',
  type: 'comment', 
  postId: post1.id, 
  name: 'Joe Blogs',
  body: 'I do not understand!',
  date: Date.now()
}

db.batch([
  {key: post1.id, value: post1, type: 'put'},
  {key: post2.id, value: post2, type: 'put'},
  {key: comment1.id, value: comment1, type: 'put'},
  {key: comment2.id, value: comment2, type: 'put'}
])

Now query the views:

var result = {post: null, comments: []}

views.post(post1.id).read().on('data', function(data){
  result.post = data.value
}).on('end', getComments)

function getComments(){
  views.commentsByPost(post1.id).read().on('data', function(data){
    result.comments.push(data.value)
  }).on('end', finish)
}

function finish(){
  t.deepEqual(result, {
    post: post1,
    comments: [ comment1, comment2 ]
  })
}

Or by tags:

var posts = []
views.postsByTag('long winded').read().on('data', function(data){
  tags.push(data.value)
}).on('end', finish)

function finish(){
  t.deepEqual(posts, [ post1 ])
}

Watch for future changes:

var comment3 = {
  id: 'comment-2',
  type: 'comment', // used for matching as specified above
  postId: post1.id, //
  name: 'Bobby',
  body: 'Done yet?',
  date: Date.now()
}

var remove = views.commentsByPost(post1.id).watch(function(ch){
  // function is called with each change
  t.deepEqual(ch.value, comment3)
})

db.put(newComment.id, newComment)

// remove the watcher hook if no longer needed
remove()

Query params

Same example as above but instead of specifying the postId for comments index, pull it out using a query:

var result = {post: null, comments: []}

views.post(post1.id).read().on('data', function(data){
  result.post = data.value
}).on('end', getComments)

function getComments(){
  // specify a value to extract as query and specify where to get it from as read option
  views.commentsByPost({ $query: 'post.id' }).read({ 
    data: result 
  }).on('data', function(data){
    result.comments.push(data.value)
  }).on('end', finish)
}

function finish(){
  t.deepEqual(result, {
    post: post1,
    comments: [ comment1, comment2 ]
  })
}

Author: mmckegg
Source Code: https://github.com/mmckegg/level-match-index 
License: 

#javascript #match #index 

Level-match-index: Index & Filter LevelDB Databases & Watch
Lawrence  Lesch

Lawrence Lesch

1657232640

Level-secondary: Secondary indexes for Leveldb

level-secondary

Secondary indexes for leveldb.

Example

Create 2 indexes on top of a posts database.

var level = require('level');
var Secondary = require('level-secondary');
var sub = require('level-sublevel');

var db = sub(level(__dirname + '/db', {
  valueEncoding: 'json'
}));

var posts = db.sublevel('posts');

// add a title index
posts.byTitle = Secondary(posts, 'title');

// add a length index
// append the post.id for unique indexes with possibly overlapping values
posts.byLength = Secondary(posts, 'length', function(post){
  return post.body.length + '!' + post.id;
});

posts.put('1337', {
  id: '1337',
  title: 'a title',
  body: 'lorem ipsum'
}, function(err) {
  if (err) throw err;

  posts.byTitle.get('a title', function(err, post) {
    if (err) throw err;
    console.log('get', post);
    // => get: { id: '1337', title: 'a title', body: 'lorem ipsum' }

    posts.del('1337', function(err) {
      if (err) throw err;
      posts.byTitle.get('a title', function(err) {
        console.log(err.name);
        // => NotFoundError
      });
    });
  });

  posts.byLength.createReadStream({
    start: 10,
    end: 20
  }).on('data', console.log.bind(console, 'read'));
  // => read { key: '1337', value: { id: '1337', title: 'a title', body: 'lorem ipsum' } }

  posts.byLength.createKeyStream({
    start: 10,
    end: 20
  }).on('data', console.log.bind(console, 'key'));
  // => key 1337

  posts.byLength.createValueStream({
    start: 10,
    end: 20
  }).on('data', console.log.bind(console, 'value'));
  // => value { id: '1337', title: 'a title', body: 'lorem ipsum' }
});

API

Secondary(db, name[, reduce])

Return a secondary index that either indexes property name or uses a custom reduce function to map values to indexes.

Secondary#get(key, opts[, cb])

Get the value that has been indexed with key.

Secondary#create{Key,Value,Read}Stream(opts)

Create a readable stream that has indexes as keys and indexed data as values.

Secondary#manifest

A level manifest that you can pass to multilevel.

Breaking changes

1.0.0

What used to be

db = Secondary('name', db);

is now

db.byName = Secondary(db, 'name');

Also hooks are used, so it works perfectly with batches across multiple sublevels.

Installation

With npm do:

npm install level-secondary

Author: juliangruber
Source Code: https://github.com/juliangruber/level-secondary 
License: MIT

#javascript #index #leveldb 

Level-secondary: Secondary indexes for Leveldb
Dexter  Goodwin

Dexter Goodwin

1657228320

Level-indexer: Generic Property indexer for Leveldb

level-indexer

Generic indexer for leveldb. Only stores document keys for space efficiency.

npm install level-indexer

Usage

var indexer = require('level-indexer')

// create a index (by country)
var country = indexer(db, ['country']) // index by country

country.add({
  key: 'mafintosh',
  name: 'mathias',
  country: 'denmark'
})

country.add({
  key: 'maxogden',
  name: 'max',
  country: 'united states'
})

var stream = country.find({
  gte:{country:'denmark'},
  lte:{country:'denmark'}
})

// or using the shorthand syntax

var stream = country.find('denmark')

stream.on('data', function(key) {
  console.log(key) // will print mafintosh
})

The stored index is prefix with the index key names which means you can use the same levelup instance to store multiple indexes.

API

index = indexer(db, [prop1, prop2, ...], [options])

Creates a new index using the given properties. Options include

{
  map: function(key, cb) {
    // map find results to another value
    db.get(key, cb)
  }
}

index.add(doc, [key], [cb])

Add a document to the index. The document needs to have a key or provide one. Only the key will be stored in the index.

index.remove(doc, [key], [cb])

Remove a document from the index.

index.key(doc, [key])

Returns the used leveldb key. Useful if you want to batch multiple index updates together yourself

var batch = [{type:'put', key:index.key(doc), value:doc.key}, ...]

stream = index.find(options, [cb])

Search the index. Use options.{gt,gte,lt,lte} to scope your search.

// find everyone in the age range 20-50 in denmark

var index = indexer(db, ['country', 'age'])

...
var stream = index.find({
  gt: {
    country: 'denmark',
    age: 20
  },
  lt: {
    country: 'denmark',
    age: 50
  }
})

Optionally you can specify the ranges using arrays

var stream = index.find({
  gt: ['denmark', 20],
  lt: ['denmark', 50]
})

Or if you do not care about ranges

var stream = index.find(['denmark', 20])

// equivalent to

var stream = index.find({
  gte: ['denmark', 20],
  lte: ['denmark', 20]
})

The stream will contain the keys of the documents that where found in the index. Use options.map to map the to the document values.

Options also include the regular levelup db.createReadStream options.

If you set cb the stream will be buffered and passed as an array.

index.findOne(options, cb)

Only find the first match in the index and pass that to the callbck

Author: Mafintosh
Source Code: https://github.com/mafintosh/level-indexer 
License: MIT license

#javascript #index #leveldb 

Level-indexer: Generic Property indexer for Leveldb