1559376851

# Deep Learning With TensorFlow 2.0

So, Let’s begin 😈…

## Linear Algebra

Linear algebra is the branch of mathematics concerning linear equations and linear functions and their representations through matrices and vector spaces.

Machine Learning relies heavily on Linear Algebra, so it is essential to understand what vectors and matrices are, what operations you can perform with them, and how they can be useful.

If you are already familiar with linear algebra, feel free to skip this chapter but note that the implementation of certain functions are different between Tensorflow 1.0 and Tensorflow 2.0 so you should atleast skim through the code.

If you have had no exposure at all to linear algebra, this chapter will teach you enough to read this book.

# Installs
!pip install -q tensorflow==2.0.0-alpha0

# Imports
import tensorflow as tf
import sys
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

"""
If you are running this notebook in Google colab, make sure to upload the helpers.py file to your
session before running it, but if you are running this in Binder, then you
don't have to worry about it. The helpers.py file will be in the notebook
folder in GitHub.

"""
from helpers import vector_plot, plot_transform



## Scalars, Vectors, Matrices and Tensors

Scalars: are just a single number. For example temperature, which is denoted by just one number.

Vectors: are an array of numbers. The numbers are arranged in order and we can identify each individual number by its index in that ordering. We can think of vectors as identifying points in space, with each element giving the coordinate along a different axis. In simple terms, a vector is an arrow representing a quantity that has both magnitude and direction wherein the length of the arrow represents the magnitude and the orientation tells you the direction. For example wind, which has a direction and magnitude.

Matrices: A matrix is a 2D-array of numbers, so each element is identified by two indices instead of just one. If a real valued matrix A has a height of m and a width of n, then we say that A∈Rm×n. We identify the elements of the matrix as Am,n where m represents the row and n represents the column.

Tensors: In the general case, are an array of numbers arranged on a regular grid with a variable number of axes is knows as a tensor. We identify the elements of a tensor A at coordinates(i, j, k) by writing Ai,j,k. But to truly understand tensors, we need to expand the way we think of vectors as only arrows with a magnitude and direction. Remember that a vector can be represented by three components, namely the x, y and z components (basis vectors). If you have a pen and a paper, let’s do a small experiment, place the pen vertically on the paper and slant it by some angle and now shine a light from top such that the shadow of the pen falls on the paper, this shadow, represents the x component of the vector “pen” and the height from the paper to the tip of the pen is the y component. Now, let’s take these components to describe tensors, imagine, you are Indiana Jones or a treasure hunter and you are trapped in a cube and there are three arrows flying towards you from the three faces (to represent x, y, z axis) of the cube 😬, I know this will be the last thing you would think in such a situation but you can think of those three arrows as vectors pointing towards you from the three faces of the cube and you can represent those vectors (arrows) in x, y and z components, now that is a rank 2 tensor (matrix) with 9 components. Remember that this is a very very simple explanation of tensors. Following is a representation of a tensor:

We can add matrices to each other as long as they have the same shape, just by adding their corresponding elements:

In tensorflow a:

• Rank 0 Tensor is a Scalar
• Rank 1 Tensor is a Vector
• Rank 2 Tensor is a Matrix
• Rank 3 Tensor is a 3-Tensor
• Rank n Tensor is a n-Tensor
# let's create a ones 3x3 rank 2 tensor
rank_2_tensor_A = tf.ones([3, 3], name='MatrixA')
print("3x3 Rank 2 Tensor A: \n{}\n".format(rank_2_tensor_A))

# let's manually create a 3x3 rank two tensor and specify the data type as float
rank_2_tensor_B = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]], name='MatrixB', dtype=tf.float32)
print("3x3 Rank 2 Tensor B: \n{}\n".format(rank_2_tensor_B))

# addition of the two tensors
print("Rank 2 Tensor C with shape={} and elements: \n{}".format(rank_2_tensor_C.shape, rank_2_tensor_C))


# Let's see what happens if the shapes are not the same
two_by_three = tf.ones([2, 3])
try:
except:
print("""Incompatible shapes to add with two_by_three of shape {0} and 3x3 Rank 2 Tensor B of shape {1}
""".format(two_by_three.shape, rank_2_tensor_B.shape))



We can also add a scalar to a matrix or multiply a matrix by a scalar, just by performing that operation on each element of a matrix:

# Create scalar a, c and Matrix B
rank_0_tensor_a = tf.constant(2, name="scalar_a", dtype=tf.float32)
rank_2_tensor_B = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]], name='MatrixB', dtype=tf.float32)
rank_0_tensor_c = tf.constant(3, name="scalar_c", dtype=tf.float32)

# multiplying aB
multiply_scalar = tf.multiply(rank_0_tensor_a, rank_2_tensor_B)

print("""Original Rank 2 Tensor B: \n{0} \n\nScalar a: {1}
Rank 2 Tensor for aB: \n{2} \n\nScalar c: {3} \nRank 2 Tensor D = aB + c: \n{4}
""".format(rank_2_tensor_B, rank_0_tensor_a, multiply_scalar, rank_0_tensor_c, rank_2_tensor_D))



One important operation on matrices is the transpose. The transpose of a matrix is the mirror image of the martrix across a diagonal line, called the main diagonal. We denote the transpose of a matrix A as A⊤ and is defined as such: (A⊤)i,j=Aj,i

# Creating a Matrix E
rank_2_tensor_E = tf.constant([[1, 2, 3], [4, 5, 6]])
# Transposing Matrix E
transpose_E = tf.transpose(rank_2_tensor_E, name="transposeE")

print("""Rank 2 Tensor E of shape: {0} and elements: \n{1}\n
Transpose of Rank 2 Tensor E of shape: {2} and elements: \n{3}""".format(rank_2_tensor_E.shape, rank_2_tensor_E, transpose_E.shape, transpose_E))



In deep learning we allow the addition of matrix and a vector, yielding another matrix where Ci,j=Ai,j+bj. In other words, the vector b is added to each row of the matrix. This implicit copying of b to many locations is called broadcasting

# Creating a vector b
rank_1_tensor_b = tf.constant([[4.], [5.], [6.]])
# Broadcasting a vector b to a matrix A such that it yields a matrix F = A + b

print("""Rank 2 tensor A: \n{0}\n \nRank 1 Tensor b: \n{1}
\nRank 2 tensor F = A + b:\n{2}""".format(rank_2_tensor_A, rank_1_tensor_b, rank_2_tensor_F))



### Multiplying Matrices and Vectors

To define the matrix product of matrices A and B, A must have the same number of columns as B. If A is of shape m x n and B is of shape n x p, then C is of shape m x p.

If you do not recall how matrix multiplication is performed, take a look at:

# Matrix A and B with shapes (2, 3) and (3, 4)
mmv_matrix_A = tf.ones([2, 3], name="matrix_A")
mmv_matrix_B = tf.constant([[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]], name="matrix_B", dtype=tf.float32)

# Matrix Multiplication: C = AB with C shape (2, 4)
matrix_multiply_C = tf.matmul(mmv_matrix_A, mmv_matrix_B, name="matrix_multiply_C")

print("""Matrix A: shape {0} \nelements: \n{1} \n\nMatrix B: shape {2} \nelements: \n{3}
\nMatrix C: shape {4} \nelements: \n{5}""".format(mmv_matrix_A.shape, mmv_matrix_A, mmv_matrix_B.shape, mmv_matrix_B, matrix_multiply_C.shape, matrix_multiply_C))



To get a matrix containing the product of the individual elements, we use element wise product or Hadamard product and is denoted as A⊙B.

"""
Note that we use multiply to do element wise matrix multiplication and matmul
to do matrix multiplication
"""
# Creating new Matrix A and B with shapes (3, 3)
element_matrix_A = tf.ones([3, 3], name="element_matrix_A")
element_matrix_B = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]], name="element_matrix_B", dtype=tf.float32)

# Element wise multiplication of Matrix A and B
element_wise_C = tf.multiply(element_matrix_A, element_matrix_B, name="element_wise_C")

print("""Matrix A: shape {0} \nelements: \n{1} \n\nMatrix A: shape {2} \nelements: \n{3}\n
Matrix C: shape {4} \nelements: \n{5}""".format(element_matrix_A.shape, element_matrix_A, element_matrix_B.shape, element_matrix_B, element_wise_C.shape, element_wise_C))



To compute the dot product between A and B we compute Ci,j as the dot product between row i of A and column j of B.

# Creating Matrix A and B with shapes (3, 3)
dot_matrix_A = tf.ones([3, 3], name="dot_matrix_A")
dot_matrix_B = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]], name="dot_matrix_B", dtype=tf.float32)

# Dot product of A and B
dot_product_C = tf.tensordot(dot_matrix_A, dot_matrix_B, axes=1, name="dot_product_C")

print("""Matrix A: shape {0} \nelements: \n{1} \n\nMatrix B: shape {2} \nelements: \n{3}\n
Matrix C: shape {4} \nelements: \n{5}""".format(dot_matrix_A.shape, dot_matrix_A, dot_matrix_B.shape, dot_matrix_B, dot_product_C.shape, dot_product_C))


Some properties of matrix multiplication (Distributive Property):

# Common Matrices to check all the matrix Properties
matrix_A = tf.constant([[1, 2], [3, 4]], name="matrix_a")
matrix_B = tf.constant([[5, 6], [7, 8]], name="matrix_b")
matrix_C = tf.constant([[9, 1], [2, 3]], name="matrix_c")

# Distributive Property
print("Matrix A: \n{} \n\nMatrix B: \n{} \n\nMatrix C: \n{}\n".format(matrix_A, matrix_B, matrix_C))

# AB + AC
distributive_RHS = tf.add(tf.matmul(matrix_A, matrix_B), tf.matmul(matrix_A, matrix_C), name="RHS")

# A(B+C)
distributive_LHS = tf.matmul(matrix_A, (tf.add(matrix_B, matrix_C)), name="LHS")

"""
Following is another way a conditional statement can be implemented from tensorflow
This might not seem very useful now but I want to introduce it here so you can
figure out how it works for a simple example.
"""
# To compare each element in the matrix, you need to reduce it first and check if it's equal
predictor = tf.reduce_all(tf.equal(distributive_RHS, distributive_LHS))

# condition to act on if predictor is True
def true_print(): print("""Distributive property is valid
RHS: AB + AC: \n{} \n\nLHS: A(B+C): \n{}""".format(distributive_RHS, distributive_LHS))

# condition to act on if predictor is False
def false_print(): print("""You Broke the Distributive Property of Matrix
RHS: AB + AC: \n{} \n\nis NOT Equal to LHS: A(B+C): \n{}""".format(distributive_RHS, distributive_LHS))

tf.cond(predictor, true_print, false_print)


Some properties of matrix multiplication (Associative property):

# Associative property
print("Matrix A: \n{} \n\nMatrix B: \n{} \n\nMatrix C: \n{}\n".format(matrix_A, matrix_B, matrix_C))

# (AB)C
associative_RHS = tf.matmul(tf.matmul(matrix_A, matrix_B), matrix_C)

# A(BC)
associative_LHS = tf.matmul(matrix_A, tf.matmul(matrix_B, matrix_C))

# To compare each element in the matrix, you need to reduce it first and check if it's equal
predictor = tf.reduce_all(tf.equal(associative_RHS, associative_LHS))

# condition to act on if predictor is True
def true_print(): print("""Associative property is valid
RHS: (AB)C: \n{} \n\nLHS: A(BC): \n{}""".format(associative_RHS, associative_LHS))

# condition to act on if predictor is False
def false_print(): print("""You Broke the Associative Property of Matrix
RHS: (AB)C: \n{} \n\nLHS: A(BC): \n{}""".format(associative_RHS, associative_LHS))

tf.cond(predictor, true_print, false_print)


Some properties of matrix multiplication (Matrix multiplication is not commutative):

# Matrix multiplication is not commutative
print("Matrix A: \n{} \n\nMatrix B: \n{}\n".format(matrix_A, matrix_B))

# Matrix A times B
commutative_RHS = tf.matmul(matrix_A, matrix_B)

# Matrix B times A
commutative_LHS = tf.matmul(matrix_B, matrix_A)

predictor = tf.logical_not(tf.reduce_all(tf.equal(commutative_RHS, commutative_LHS)))
def true_print(): print("""Matrix Multiplication is not commutative
RHS: (AB): \n{} \n\nLHS: (BA): \n{}""".format(commutative_RHS, commutative_LHS))

def false_print(): print("""You made Matrix Multiplication commutative
RHS: (AB): \n{} \n\nLHS: (BA): \n{}""".format(commutative_RHS, commutative_LHS))

tf.cond(predictor, true_print, false_print)


Some properties of matrix multiplication (Transpose):

# Transpose of a matrix
print("Matrix A: \n{} \n\nMatrix B: \n{}\n".format(matrix_A, matrix_B))

# Tensorflow transpose function
transpose_RHS = tf.transpose(tf.matmul(matrix_A, matrix_B))

# If you are doing matrix multiplication tf.matmul has a parameter to take the tranpose and then matrix multiply
transpose_LHS = tf.matmul(matrix_B, matrix_A, transpose_a=True, transpose_b=True)

predictor = tf.reduce_all(tf.equal(transpose_RHS, transpose_LHS))
def true_print(): print("""Transpose property is valid
RHS: (AB):^T \n{} \n\nLHS: (B^T A^T): \n{}""".format(transpose_RHS, transpose_LHS))

def false_print(): print("""You Broke the Transpose Property of Matrix
RHS: (AB):^T \n{} \n\nLHS: (B^T A^T): \n{}""".format(transpose_RHS, transpose_LHS))

tf.cond(predictor, true_print, false_print)


### Identity and Inverse Matrices

Linear algebra offers a powerful tool called matrix inversion that enables us to analytically solve Ax=b for many values of A To describe matrix inversion, we first need to define the concept of an identity matrix. An identity matrix is a matrix that does not change any vector when we multiply that vector by that matrix.

Such that:

The structure of the identity matrix is simple: all the entries along the main diagonal are 1, while all the other entries are zero.

# let's create a identity matrix I
identity_matrix_I = tf.eye(3, 3, dtype=tf.float32, name='IdentityMatrixI')
print("Identity matrix I: \n{}\n".format(identity_matrix_I))

# let's create a 3x1 vector x
iim_vector_x = tf.constant([[4], [5], [6]], name='Vector_x', dtype=tf.float32)
print("Vector x: \n{}\n".format(iim_vector_x))

# Ix will result in x
iim_matrix_C = tf.matmul(identity_matrix_I, iim_vector_x, name='MatrixC')
print("Matrix C from Ix: \n{}".format(iim_matrix_C))


The matrix inverse of A is denoted as A−1, and it is defined as the matrix such that:

iim_matrix_A = tf.constant([[2, 3], [2, 2]], name='MatrixA', dtype=tf.float32)

try:
# Tensorflow function to take the inverse
inverse_matrix_A = tf.linalg.inv(iim_matrix_A)

# Creating a identity matrix using tf.eye
identity_matrix = tf.eye(2, 2, dtype=tf.float32, name="identity")

iim_RHS = identity_matrix
iim_LHS = tf.matmul(inverse_matrix_A, iim_matrix_A, name="LHS")

predictor = tf.reduce_all(tf.equal(iim_RHS, iim_LHS))
def true_print(): print("""A^-1 times A equals the Identity Matrix
Matrix A: \n{0} \n\nInverse of Matrix A: \n{1} \n\nRHS: I: \n{2} \n\nLHS: A^(-1) A: \n{3}""".format(iim_matrix_A, inverse_matrix_A, iim_RHS, iim_LHS))
def false_print(): print("Condition Failed")
tf.cond(predictor, true_print, false_print)

except:
print("""A^-1 doesnt exist
Matrix A: \n{} \n\nInverse of Matrix A: \n{} \n\nRHS: I: \n{}
\nLHS: (A^(-1) A): \n{}""".format(iim_matrix_A, inverse_matrix_A, iim_RHS, iim_LHS))



If you try different values for Matrix A, you will see that, not all A has an inverse and we will discuss the conditions for the existence of A−1 in the following section.

We can then solve the equation Ax=b as:

This process depends on it being possible to find A−1.

We can calculate the inverse of a matrix by:

Lets see how we can solve a simple linear equation: 2x + 3y = 6 and 4x + 9y = 15

# The above system of equation can be written in the matrix format as:
sys_matrix_A = tf.constant([[2, 3], [4, 9]], dtype=tf.float32)
sys_vector_B = tf.constant([[6], [15]], dtype=tf.float32)
print("Matrix A: \n{} \n\nVector B: \n{}\n".format(sys_matrix_A, sys_vector_B))

# now to solve for x: x = A^(-1)b
sys_x = tf.matmul(tf.linalg.inv(sys_matrix_A), sys_vector_B)
print("Vector x is: \n{} \nWhere x = {} and y = {}".format(sys_x, sys_x[0], sys_x[1]))


## Linear Dependence and Span

For A−1 to exits, Ax=b must have exactly one solution for every value of b. It is also possible for the system of equations to have no solutions or infinitely many solutions for some values of b. This is simply because we are dealing with linear systems and two lines can’t cross more than once. So, they can either cross once, cross never, or have infinite crossing, meaning the two lines are superimposed.

Hence if both x and y are solutions then:

z=αx+(1−α)y is also a solution for any real α

The span of a set of vectors is the set of all linear combinations of the vectors. Formally, a linear combination of some set of vectors v1,⋯,vn is given by multiplying each vecor v(i) by a corresponding scalar coefficient and adding the results:

Determining whether Ax=b has a solution thus amounts to testing whether b is in the span of the columns of A. This particular span is known as the column space or the range, of A .

In order for the system Ax=b to have a solution for all values of b∈Rm, we require that the column space of A be all of Rm.

A set of vectors v1,⋯,vn is linearly independent if the only solution to the vector equation λ1v1+⋯λnvn=0 is λi=0 ∀i. If a set of vectors is not linearly independent, then it is linearly dependent.

For the matrix to have an inverse, the matrix must be square, that is, we require that m = n and that all the columns be linearly independent. A square matrix with linearly dependent columns is known as singular.

If A is not square or is square but singular, solving the equation is still possible, but we cannot use the method of matrix inversion to find the solution.

So far we have discussed matrix inverses as being multiplied on the left. It is also possible to define an inverse that is multiplied on the right. For square matrixes, the left inverse and right inverse are equal.

# Lets start by finding for some value of A and x, what the result of x is
lds_matrix_A = tf.constant([[3, 1], [1, 2]], name='MatrixA', dtype=tf.float32)
lds_vector_x = tf.constant([2, 3], name='vectorX', dtype=tf.float32)
lds_b = tf.multiply(lds_matrix_A, lds_vector_x, name="b")

# Now let's see if an inverse for Matrix A exists
try:
inverse_A = tf.linalg.inv(lds_matrix_A)
print("Matrix A is successfully inverted: \n{}".format(inverse_A))
except:
print("Inverse of Matrix A: \n{} \ndoesn't exist. ".format(lds_matrix_A))

# Let's find the value of x using x = A^(-1)b
verify_x = tf.matmul(inverse_A, lds_b, name="verifyX")
predictor = tf.equal(lds_vector_x[0], verify_x[0][0]) and tf.equal(lds_vector_x[1], verify_x[1][1])

def true_print(): print("""\nThe two x values match, we proved that if a matrix A is invertible
Then x = A^T b, \nwhere x: {}, \n\nA^T: \n{}, \n\nb: \n{}""".format(lds_vector_x, inverse_A, lds_b))

def false_print(): print("""\nThe two x values don't match.
Vector x: {} \n\nA^(-1)b: \n{}""".format(lds_vector_x, verify_x))

tf.cond(predictor, true_print, false_print)


Note that, finding inverses can be a challenging process if you want to calculate it, but using tensorflow or any other library, you can easily check if the inverse of the matrix exists. If you know the conditions and know how to solve matrix equations using tensorflow, you should be good, but for the reader who wants to go deeper, check Linear Dependence and Span for further examples and definitions.

### Norms

In machine learning if we need to measure the size of vectors, we use a function called a norm. And norm is what is generally used to evaluate the error of a model. Formally, the LP norm is given by:

for p∈R,p≥1On an intuitive level, the norm of a vector x measures the distance from the origin to the point x

More rigorously, a norm is any function f that satisfies the following properties:

The L2 norm with p=2 is known as the Euclidean norm. Which is simply the Euclidean distance from the origin to the point identified by x. It is also common to measure the size of a vector using the squared L2 norm, which can be calculated simply as x⊤x

# Euclidean distance between square root(3^2 + 4^2) calculated by setting ord='euclidean'
dist_euclidean = tf.norm([3., 4.], ord='euclidean')
print("Euclidean Distance: {}".format(dist_euclidean))

# Size of the vector [3., 4.]
vector_size = tf.multiply(tf.transpose([3., 4.]), [3., 4.])
print("Vector Size: {}".format(vector_size))


In many contexts, the squared L2 norm may be undesirable, because it increases very slowly near the origin. In many machine learning applications, it is important to discriminate between elements that are exactly zero and elements that are small but nonzero. In these cases, we turn to a function that grows at the same rate in all locations, but retains mathematical simplicity: the L1 norm, which can be simplified to:

def SE(x,y,intc,beta):
return (1./len(x))*(0.5)*sum(y - beta * x - intc)**2

def L1(intc,beta,lam):
return lam*(tf.abs(intc) + tf.abs(beta))

def L2(intc,beta,lam):
return lam*(intc**2 + beta**2)

N = 100
x = np.random.randn(N)
y = 2 * x + np.random.randn(N)

beta_N = 100
beta = tf.linspace(-100., 100., beta_N)
intc = 0.0

SE_array = np.array([SE(x,y,intc,i) for i in beta])
L1_array = np.array([L1(intc,i,lam=30) for i in beta])
L2_array = np.array([L2(intc,i,lam=1) for i in beta])

fig1 = plt.figure()
ax1.plot(beta, SE_array, label='Squared L2 Norm')
ax1.plot(beta, L1_array, label='L1 norm')
ax1.plot(beta, L2_array, label='L2 norm')
plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'red'})
plt.title('The graph of each of the norms', color='w')
plt.legend()
fig1.show()


One other norm that commonly arises in machine learning is the L∞ norm, also known as the max norm. This norm simplifies to the absolute value of the element with the largest magnitude in the vector,

If we wish to measure the size of a matrix, in context of deep learning, the most common way to do this is with the Frobenius norm:

n_matrix_A = tf.constant([[2, -1, 5], [0, 2, 1], [3, 1, 1]], name="matrix_A", dtype=tf.float32)

# Frobenius norm for matrix calculated by setting ord='fro'
frobenius_norm = tf.norm(n_matrix_A, ord='fro', axis=(0, 1))
print("Frobenius norm: {}".format(frobenius_norm))


The dot product of two vectors can be rewritten in terms of norms as:

where θ is the angle between x and y.

# for x(0, 2) and y(2, 2) cos theta = 45 degrees
n_vector_x = tf.constant([[0], [2]], dtype=tf.float32, name="vectorX")
n_vector_y = tf.constant([[2], [2]], dtype=tf.float32, name="vectorY")

# Due to pi being in, we won't get an exact value so we are rounding our final value
prod_RHS = tf.round(tf.multiply(tf.multiply(tf.norm(n_vector_x), tf.norm(n_vector_y)), tf.cos(np.pi/4)))
prod_LHS = tf.tensordot(tf.transpose(n_vector_x), n_vector_y, axes=1, name="LHS")

predictor = tf.equal(prod_RHS, prod_LHS)
def true_print(): print("""Dot Product can be rewritten in terms of norms, where \n
RHS: {} \nLHS: {}""".format(prod_RHS, prod_LHS))

def false_print(): print("""Dot Product can not be rewritten in terms of norms, where \n
RHS: {} \nLHS: {}""".format(prod_RHS, prod_LHS))

tf.cond(predictor, true_print, false_print)

origin=[0,0]
plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'red'})
plt.xlim(-2, 10)
plt.ylim(-1, 10)
plt.axvline(x=0, color='grey', zorder=0)
plt.axhline(y=0, color='grey', zorder=0)
plt.text(-1, 2, r'$\vec{x}$', size=18)
plt.text(2, 1.5, r'$\vec{y}$', size=18)
plt.quiver(*origin, n_vector_x, n_vector_y, color=['#FF9A13','#1190FF'], scale=8)
plt.show()


## Special Kinds of Matrices and Vectors

Diagonal matrices consist mostly of zeros and have nonzero entries only along the main diagonal. Identity matrix is an example of diagonal matrix. We write diag(v) to denote a square diagonal matrix whose diagonal entries are given by the entries of the vector v. To compute diag(v)x we only need to scale each element xi by vi. In other words:

# create vector v and x
sp_vector_v = tf.random.uniform([5], minval=0, maxval=10, dtype = tf.int32, seed = 0, name="vector_v")
sp_vector_x = tf.random.uniform([5], minval=0, maxval=10, dtype = tf.int32, seed = 0, name="vector_x")
print("Vector v: {} \nVector x: {}\n".format(sp_vector_v, sp_vector_x))

# RHS diagonal vector v dot diagonal vector x. The linalg.diag converts a vector to a diagonal matrix
sp_RHS = tf.tensordot(tf.linalg.diag(sp_vector_v), tf.linalg.diag(sp_vector_x), axes=1)

# LHS diag(v)x
sp_LHS = tf.multiply(tf.linalg.diag(sp_vector_v), sp_vector_x)

predictor = tf.reduce_all(tf.equal(sp_RHS, sp_LHS))
def true_print(): print("Diagonal of v times x: \n{} \n\nis equal to vector v dot vector x: \n{}".format(sp_RHS, sp_LHS))
def false_print(): print("Diagonal of v times x: \n{} \n\nis NOT equal to vector v dot vector x: \n{}".format(sp_RHS, sp_LHS))

tf.cond(predictor, true_print, false_print)


Inverting a square diagonal matrix is also efficient. The inverse exists only if every diagonal entry is nonzero, and in that case:

try:
# try creating a vector_v with zero elements and see what happens
d_vector_v = tf.random.uniform([5], minval=1, maxval=10, dtype = tf.float32, seed = 0, name="vector_v")
print("Vector v: {}".format(d_vector_v))

# linalg.diag converts a vector to a diagonal matrix
diag_RHS = tf.linalg.diag(tf.transpose(1. / d_vector_v))

# we convert the vector to diagonal matrix and take it's inverse
inv_LHS = tf.linalg.inv(tf.linalg.diag(d_vector_v))

predictor = tf.reduce_all(tf.equal(diag_RHS, inv_LHS))
def true_print(): print("The inverse of LHS: \n{} \n\nMatch the inverse of RHS: \n{}".format(diag_RHS, inv_LHS))
def false_print(): print("The inverse of LHS: \n{} \n\n Does not match the inverse of RHS: \n{}".format(diag_RHS, inv_LHS))
tf.cond(predictor, true_print, false_print)

except:
print("The inverse exists only if every diagonal is nonzero, your vector looks: \n{}".format(d_vector_v))


Not all diagonal matrices need be square. It is possible to construct a rectangular diagonal matrix. Nonsquare diagonal matrices do not have inverses, but we can still multiply by them cheaply. For a nonsquare diagonal matrix D, the product Dx will involve scaling each element of x and either concatenating some zeros to the result, if D is taller than it is wide, or discarding some of the last elements of the vector, if D is wider than it is tall.

A symmetric matrix is any matrix that is equal to its own transpose: A=A⊤

Symmetric matrices often arise when the entries are generated by some function of two arguments that does not depend on the order of the arguments. For example, if A is a matrix of distance measurements, with Ai,j giving the distance from point i to point j, then Ai,j=Aj,i because distance functions are symmetric.

# create a symmetric matrix
sp_matrix_A = tf.constant([[0, 1, 3], [1, 2, 4], [3, 4, 5]], name="matrix_a", dtype=tf.int32)

# get the transpose of matrix A
sp_transpose_a = tf.transpose(sp_matrix_A)

predictor = tf.reduce_all(tf.equal(sp_matrix_A, sp_transpose_a))
def true_print(): print("Matrix A: \n{} \n\nMatches the the transpose of Matrix A: \n{}".format(sp_matrix_A, sp_transpose_a))
def false_print(): print("Matrix A: \n{} \n\nDoes Not match the the transpose of Matrix A: \n{}".format(sp_matrix_A, sp_transpose_a))

tf.cond(predictor, true_print, false_print)


A vector x and a vector y are orthogonal to each other if x⊤y=0. If both vectors have nonzero norm, this means that they are at a 90 degree angle to each other.

# Lets create two vectors
ortho_vector_x = tf.constant([2, 2], dtype=tf.float32, name="vector_x")
ortho_vector_y = tf.constant([2, -2], dtype=tf.float32, name="vector_y")
print("Vector x: {} \nVector y: {}\n".format(ortho_vector_x, ortho_vector_y))

# lets verify if x transpose dot y is zero
ortho_LHS = tf.tensordot(tf.transpose(ortho_vector_x), ortho_vector_y, axes=1)
print("X transpose times y = {}\n".format(ortho_LHS))

# let's see what their norms are
ortho_norm_x = tf.norm(ortho_vector_x)
ortho_norm_y = tf.norm(ortho_vector_y)
print("Norm x: {} \nNorm y: {}\n".format(ortho_norm_x, ortho_norm_y))

# If they have non zero norm, let's see what angle they are to each other
if tf.logical_and(ortho_norm_x > 0, ortho_norm_y > 0):
# from the equation cos theta = (x dot y)/(norm of x times norm y)
cosine_angle = (tf.divide(tf.tensordot(ortho_vector_x, ortho_vector_y, axes=1), tf.multiply(ortho_norm_x, ortho_norm_y)))
print("Angle between vector x and vector y is: {} degrees".format(tf.acos(cosine_angle) * 180 /np.pi))

origin=[0,0]
plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'red'})
plt.xlim(-1, 10)
plt.ylim(-10, 10)
plt.axvline(x=0, color='grey', zorder=0)
plt.axhline(y=0, color='grey', zorder=0)
plt.text(1, 4, r'$\vec{x}$', size=18)
plt.text(1, -6, r'$\vec{y}$', size=18)
plt.quiver(*origin, ortho_vector_x, ortho_vector_y, color=['#FF9A13','#1190FF'], scale=8)
plt.show()


A unit vector is a vector with a unit norm: ∥x∥2=1.

If two vectors are not only are orthogonal but also have unit norm, we call them orthonormal.

A orthogonal matrix is a square matrix whose rows are mutually orthonormal and whose columns are mutually orthonormal:

which implies A−1=A⊤so orthogonal matrices are of interest because their inverse is very cheap to compute.

# Lets use sine and cosine to create orthogonal matrix
ortho_matrix_A = tf.Variable([[tf.cos(.5), -tf.sin(.5)], [tf.sin(.5), tf.cos(.5)]], name="matrixA")
print("Matrix A: \n{}\n".format(ortho_matrix_A))

# extract columns from the matrix to verify if they are orthogonal
col_0 = tf.reshape(ortho_matrix_A[:, 0], [2, 1])
col_1 = tf.reshape(ortho_matrix_A[:, 1], [2, 1])
row_0 = tf.reshape(ortho_matrix_A[0, :], [2, 1])
row_1 = tf.reshape(ortho_matrix_A[1, :], [2, 1])

# Verifying if the columns are orthogonal
ortho_column = tf.tensordot(tf.transpose(col_0), col_1, axes=2)
print("Columns are orthogonal: {}".format(ortho_column))
plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'red'})
origin = [0, 0]
plt.xlim(-2, 2)
plt.ylim(-3, 4)
plt.axvline(x=0, color='grey', zorder=0)
plt.axhline(y=0, color='grey', zorder=0)
plt.text(0, 2, r'$\vec{col_0}$', size=18)
plt.text(0.5, -2, r'$\vec{col_1}$', size=18)
plt.quiver(*origin, col_0, col_1, color=['#FF9A13','#FF9A13'], scale=3)

# Verifying if the rows are orthogonal
ortho_row = tf.tensordot(tf.transpose(row_0), row_1, axes=2)
print("Rows are orthogonal: {}\n".format(ortho_row))
plt.text(-1, 2, r'$\vec{row_0}$', size=18)
plt.text(1, 0.5, r'$\vec{row_1}$', size=18)
plt.quiver(*origin, row_0, row_1, color=['r','r'], scale=3)
plt.show()

# inverse of matrix A
ortho_inverse_A = tf.linalg.inv(ortho_matrix_A)

# Transpose of matrix A
ortho_transpose_A = tf.transpose(ortho_matrix_A)

predictor = tf.reduce_all(tf.equal(ortho_inverse_A, ortho_transpose_A))
def true_print(): print("Inverse of Matrix A: \n{} \n\nEquals the transpose of Matrix A: \n{}".format(ortho_inverse_A, ortho_transpose_A))
def false_print(): print("Inverse of Matrix A: \n{} \n\nDoes not equal the transpose of Matrix A: \n{}".format(ortho_inverse_A, ortho_transpose_A))

tf.cond(predictor, true_print, false_print)


### Eigendecomposition

We can represent a number, for example 12 as 12 = 2 x 2 x 3. The representation will change depending on whether we write it in base ten or in binary but the above representation will always be true and from that we can conclude that 12 is not divisible by 5 and that any integer multiple of 12 will be divisible by 3.

Similarly, we can also decompose matrices in ways that show us information about their functional properties that is not obvious from the representation of the matrix as an array of elements. One of the most widely used kinds of matrix decomposition is called eigen decomposition, in which we decompose a matrix into a set of eigenvectors and eigenvalues.

An eigenvector of a square matrix A is a nonzero vector v such that multiplication by A alters only the scale of v, in short this is a special vector that doesn’t change the direction of the matrix when applied to it :

The scale λ is known as the eigenvalue corresponding to this eigenvector.

# Let's see how we can compute the eigen vectors and values from a matrix
e_matrix_A = tf.random.uniform([2, 2], minval=3, maxval=10, dtype=tf.float32, name="matrixA")
print("Matrix A: \n{}\n\n".format(e_matrix_A))

# Calculating the eigen values and vectors using tf.linalg.eigh, if you only want the values you can use eigvalsh
eigen_values_A, eigen_vectors_A = tf.linalg.eigh(e_matrix_A)
print("Eigen Vectors: \n{} \n\nEigen Values: \n{}\n".format(eigen_vectors_A, eigen_values_A))

# Now lets plot our Matrix with the Eigen vector and see how it looks
Av = tf.tensordot(e_matrix_A, eigen_vectors_A, axes=0)
vector_plot([tf.reshape(Av, [-1]), tf.reshape(eigen_vectors_A, [-1])], 10, 10)


If v is an eigenvector of A, then so is any rescaled vector sv for s∈R,s≠0.

# Lets us multiply our eigen vector by a random value s and plot the above graph again to see the rescaling
sv = tf.multiply(5, eigen_vectors_A)
vector_plot([tf.reshape(Av, [-1]), tf.reshape(sv, [-1])], 10, 10)


Suppose that a matrix A has n linearly independent eigenvectors v(1),⋯,v(n) with corresponding eigenvalues λ(1),⋯,λ(n). We may concatenate all the eigenvectors to form a matrix V with one eigenvector per column: V=[v(1),⋯,v(n)]. Likewise, we can concatenate the eigenvalues to form a vector λ=[λ(1),⋯,λ(n)]⊤. The eigendecomposition of A is then given by

# Creating a matrix A to find it's decomposition
eig_matrix_A = tf.constant([[5, 1], [3, 3]], dtype=tf.float32)
new_eigen_values_A, new_eigen_vectors_A = tf.linalg.eigh(eig_matrix_A)

print("Eigen Values of Matrix A: {} \n\nEigen Vector of Matrix A: \n{}\n".format(new_eigen_values_A, new_eigen_vectors_A))

# calculate the diag(lamda)
diag_lambda = tf.linalg.diag(new_eigen_values_A)
print("Diagonal of Lambda: \n{}\n".format(diag_lambda))

# Find the eigendecomposition of matrix A
decomp_A = tf.tensordot(tf.tensordot(eigen_vectors_A, diag_lambda, axes=1), tf.linalg.inv(new_eigen_vectors_A), axes=1)

print("The decomposition Matrix A: \n{}".format(decomp_A))


Not every matrix can be decomposed into eigenvalues and eigenvectors. In some cases, the decomposition exists but involves complex rather than real numbers.

In this book, we usually need to decompose only a specific class of matrices that have a simple decomposition. Specifically, every real symmetric matrix can be decomposed into an expression using only real-valued eigenvectors and eigenvalues:

where Q is an orthogonal matrix composed of eigenvectors of A and Λ is a diagonal matrix. The eigenvalue Λi,i is associated with the eigenvector in column i of Q, denoted as Q:,i. Because Q is an orthogonal matrix, we can think of A as scaling space by Λi in direction v(i).

# In section 2.6 we manually created a matrix to verify if it is symmetric, but what if we don't know the exact values and want to create a random symmetric matrix
new_matrix_A = tf.Variable(tf.random.uniform([2,2], minval=1, maxval=10, dtype=tf.float32))

# to create an upper triangular matrix from a square one
X_upper = tf.linalg.band_part(new_matrix_A, 0, -1)
sym_matrix_A = tf.multiply(0.5, (X_upper + tf.transpose(X_upper)))
print("Symmetric Matrix A: \n{}\n".format(sym_matrix_A))

# create orthogonal matrix Q from eigen vectors of A
eigen_values_Q, eigen_vectors_Q = tf.linalg.eigh(sym_matrix_A)
print("Matrix Q: \n{}\n".format(eigen_vectors_Q))

# putting eigen values in a diagonal matrix
new_diag_lambda = tf.linalg.diag(eigen_values_Q)
print("Matrix Lambda: \n{}\n".format(new_diag_lambda))

sym_RHS = tf.tensordot(tf.tensordot(eigen_vectors_Q, new_diag_lambda, axes=1), tf.transpose(eigen_vectors_Q), axes=1)

predictor = tf.reduce_all(tf.equal(tf.round(sym_RHS), tf.round(sym_matrix_A)))
def true_print(): print("It WORKS. \nRHS: \n{} \n\nLHS: \n{}".format(sym_RHS, sym_matrix_A))
def false_print(): print("Condition FAILED. \nRHS: \n{} \n\nLHS: \n{}".format(sym_RHS, sym_matrix_A))

tf.cond(predictor, true_print, false_print)


The eigendecomposition of a matrix tells us many useful facts about the matrix. The matrix is singular if and only if any of the eigenvalues are zero. The eigendecomposition of a real symmetric matrix can also be used to optimize quadratic expressions of the formf(x)=x⊤Ax subject to ∥x∥2=1.

The above equation can be solved as following, we know that if x

is an Eigenvector of A and λ is the corresponding eigenvalue, then Ax=λx, therefore f(x)=x⊤Ax=x⊤λx=x⊤xλ and since ∥x∥2=1 and x⊤x=1, the above equation boils down to f(x)=λWhenever x is equal to an eigenvector of A, f takes on the value of the corresponding eigenvalue and its minimum value within the constraint region is the minimum eigenvalue.

A matrix whose eigenvalues are all positive is called positive definite. A matrix whose eigenvalues are all positive or zero valued is called positive semidefinite. Likewise, if all eigenvalues are negative, the matrix is negative definite, and if all eigenvalues are negative or zero valued, it is negative semidefinite. Positive semidefinite matrces are interesting because they guarantee that ∀x,x⊤Ax≥0. Positive definite matrices additionally guarantee that xTAx=0⟹x=0.

### Singular Value Decomposition

The singular value decomposition (SVD) provides another way to factorize a matrix into singular vectors and singular values. The SVD enables us to discover some of the same kind of information as the eigendecomposition reveals, however, the SVD is more generally applicable. Every real matrix has a singular value decomposition, but the same is not true of the eigenvalue decomposition. SVD can be written as:

Suppose A is an m x n matrix, then U is defined to be an m x m rotation matrix, D to be an m x n matrix scaling & projecting matrix, and V to be an n x n rotation matrix. Each of these matrices is defined to have a special structure. The matrices U and V are both defined to be orthogonal matrices (U⊤=U−1 and V⊤=V−1). The matrix D is defined to be a diagonal matrix.

The elements along the diagonal of D are known as the singular values of the matrix A. The columns of U are known as the left-singular vectors. The columns of V are known as as the right-singular vectors.

# mxn matrix A
svd_matrix_A = tf.constant([[2, 3], [4, 5], [6, 7]], dtype=tf.float32)
print("Matrix A: \n{}\n".format(svd_matrix_A))

# Using tf.linalg.svd to calculate the singular value decomposition where d: Matrix D, u: Matrix U and v: Matrix V
d, u, v = tf.linalg.svd(svd_matrix_A, full_matrices=True, compute_uv=True)
print("Diagonal D: \n{} \n\nMatrix U: \n{} \n\nMatrix V^T: \n{}".format(d, u, v))

# Lets see if we can bring back the original matrix from the values we have

# mxm orthogonal matrix U
svd_matrix_U = tf.constant([[0.30449855, -0.86058956, 0.40824753], [0.54340035, -0.19506174, -0.81649673], [0.78230214, 0.47046405, 0.40824872]])
print("Orthogonal Matrix U: \n{}\n".format(svd_matrix_U))

# mxn diagonal matrix D
svd_matrix_D = tf.constant([[11.782492, 0], [0, 0.41578525], [0, 0]], dtype=tf.float32)
print("Diagonal Matrix D: \n{}\n".format(svd_matrix_D))

# nxn transpose of matrix V
svd_matrix_V_trans = tf.constant([[0.63453555, 0.7728936], [0.7728936, -0.63453555]], dtype=tf.float32)
print("Transpose Matrix V: \n{}\n".format(svd_matrix_V_trans))

# UDV(^T)
svd_RHS = tf.tensordot(tf.tensordot(svd_matrix_U, svd_matrix_D, axes=1), svd_matrix_V_trans, axes=1)

predictor = tf.reduce_all(tf.equal(tf.round(svd_RHS), svd_matrix_A))
def true_print(): print("It WORKS. \nRHS: \n{} \n\nLHS: \n{}".format(tf.round(svd_RHS), svd_matrix_A))
def false_print(): print("Condition FAILED. \nRHS: \n{} \n\nLHS: \n{}".format(tf.round(svd_RHS), svd_matrix_A))

tf.cond(predictor, true_print, false_print)


Matrix A can be seen as a linear transformation. This transformation can be decomposed in to three sub-transformations:

1. Rotation,
2. Re-scaling and projecting,
3. Rotation.

These three steps correspond to the three matrices U,D and V

Let’s see how these transformations are taking place in order

# Let's define a unit square
svd_square = tf.constant([[0, 0, 1, 1],[0, 1, 1, 0]], dtype=tf.float32)

# a new 2x2 matrix
svd_new_matrix = tf.constant([[1, 1.5], [0, 1]])

# SVD for the new matrix
new_d, new_u, new_v = tf.linalg.svd(svd_new_matrix, full_matrices=True, compute_uv=True)

# lets' change d into a diagonal matrix
new_d_marix = tf.linalg.diag(new_d)

# Rotation: V^T for a unit square
plot_transform(svd_square, tf.tensordot(new_v, svd_square, axes=1), "$Square$", "$V^T \cdot Square$", "Rotation", axis=[-0.5, 3.5 , -1.5, 1.5])
plt.show()

# Scaling and Projecting: DV^(T)
plot_transform(tf.tensordot(new_v, svd_square, axes=1), tf.tensordot(new_d_marix, tf.tensordot(new_v, svd_square, axes=1), axes=1), "$V^T \cdot Square$", "$D \cdot V^T \cdot Square$", "Scaling and Projecting", axis=[-0.5, 3.5 , -1.5, 1.5])
plt.show()

# Second Rotation: UDV^(T)
trans_1 = tf.tensordot(tf.tensordot(new_d_marix, new_v, axes=1), svd_square, axes=1)
trans_2 = tf.tensordot(tf.tensordot(tf.tensordot(new_u, new_d_marix, axes=1), new_v, axes=1), svd_square, axes=1)
plot_transform(trans_1, trans_2,"$U \cdot D \cdot V^T \cdot Square$", "$D \cdot V^T \cdot Square$", "Second Rotation", color=['#1190FF', '#FF9A13'], axis=[-0.5, 3.5 , -1.5, 1.5])
plt.show()


The above sub transformations can be found for each matrix as follows:

• U corresponds to the eigenvectors of AA⊤
• V corresponds to to the eigenvectors of A⊤A
• D corresponds to the eigenvalues AA⊤ or A⊤ A which are the same.

As an exercise try proving this is the case.

Perhaps the most useful feature of the SVD is that we can use it to partially generalize matrix inversion to nonsquare matrices, as we will see in the next section.

## The Moore-Penrose Pseudoinverse

Matrix inversion is not defined for matrices that are not square. Suppose we want to make a left-inverse B of a matrix A so that we can solve a linear equation Ax=Y by left multiplying each side to obtain x=By .

Depending on the structure of the problem, it may not be possible to design a unique mapping from A to B.

The Moore-Penrose pseudoinverse enables use to make some headway in these cases. The pseudoinverse of A is defined as a matrix:

Practical algorithms for computing the pseudoinverse are based not on this definition, but rather on the formula:

where U,D and V are the singular decomposition of A and the pseudoinverse of D+ of a diagonal matrix D is obtained by taking the reciprocal of its nonzero elements then taking the transpose of the resulting matrix.

# Matrix A
mpp_matrix_A = tf.random.uniform([3, 2], minval=1, maxval=10, dtype=tf.float32)
print("Matrix A: \n{}\n".format(mpp_matrix_A))

# Singular Value decomposition of matrix A
mpp_d, mpp_u, mpp_v = tf.linalg.svd(mpp_matrix_A, full_matrices=True, compute_uv=True)
print("Matrix U: \n{} \n\nMatrix V: \n{}\n".format(mpp_u, mpp_v))

# pseudo inverse of matrix D
d_plus = tf.concat([tf.transpose(tf.linalg.diag(tf.math.reciprocal(mpp_d))), tf.zeros([2, 1])], axis=1)
print("D plus: \n{}\n".format(d_plus))

# moore-penrose pseudoinverse of matrix A
matrix_A_star = tf.matmul(tf.matmul(mpp_v, d_plus, transpose_a=True), mpp_u, transpose_b=True)

print("The Moore-Penrose pseudoinverse of Matrix A: \n{}".format(matrix_A_star))


When A has more columns than rows, then solving a linear equation using the pseudoinverse provides one of the many possible solutions. Specifically, it provides the solution x=A+y with minimal Euclidean norm ∥x∥2 among all possible solutions.

mpp_vector_y = tf.constant([[2], [3], [4]], dtype=tf.float32)
print("Vector y: \n{}\n".format(mpp_vector_y))

mpp_vector_x = tf.matmul(matrix_A_star, mpp_vector_y)
print("Vector x: \n{}".format(mpp_vector_x))


When A has more rows than columns, it is possible for there to be no solution. In this case, using the pseudoinverse gives us the x for which Ax is as close as possible to y in terms of Euclidean norm ∥Ax−y∥2

### The Trace Operator

The trace operator gives the sum of all the diagonal entries of a matrix:

# random 3x3 matrix A
to_matrix_A = tf.random.uniform([3, 3], minval=0, maxval=10, dtype=tf.float32)

# trace of matrix A using tf.linalg.trace
trace_matrix_A = tf.linalg.trace(to_matrix_A)

print("Trace of Matrix A: \n{} \nis: {}".format(to_matrix_A, trace_matrix_A))


The trace operator is useful for a variety of reasons. Some operations that are difficult to specify without resorting to summation notation can be specified using matrix products and the trace operator. For example, the trace operator provides an alternative way of writing the Frobenius norm of a matrix:

# Frobenius Norm of A
frobenius_A = tf.norm(to_matrix_A)

# sqrt(Tr(A times A^T))
trace_rhs = tf.sqrt(tf.linalg.trace(tf.matmul(to_matrix_A, to_matrix_A, transpose_b=True)))

predictor = tf.equal(tf.round(frobenius_A), tf.round(trace_rhs))
def true_print(): print("It WORKS. \nRHS: {} \nLHS: {}".format(frobenius_A, trace_rhs))
def false_print(): print("Condition FAILED. \nRHS: {} \nLHS: {}".format(frobenius_A, trace_rhs))

tf.cond(predictor, true_print, false_print)


Writing an expression in terms of the trace operator opens up opportunities to manipulate the expression using many useful identities. For example, the trace operator is invariant to the transpose operator:

# Transpose of Matrix A
trans_matrix_A = tf.transpose(to_matrix_A)

#Trace of the transpose Matrix A
trace_trans_A = tf.linalg.trace(trans_matrix_A)

predictor = tf.equal(trace_matrix_A, trace_trans_A)
def true_print(): print("It WORKS. \nRHS: {} \nLHS: {}".format(trace_matrix_A, trace_trans_A))
def false_print(): print("Condition FAILED. \nRHS: {} \nLHS: {}".format(trace_matrix_A, trace_trans_A))

tf.cond(predictor, true_print, false_print)


The trace of a square matrix composed of many factors is also invariant to moving the last factor into the first position, if the shapes of the corresponding matrices allow the resulting product to be defined:

# random 3x3 matrix B and matrix C
to_matrix_B = tf.random.uniform([3, 3], minval=0, maxval=10, dtype=tf.float32)
to_matrix_C = tf.random.uniform([3, 3], minval=0, maxval=10, dtype=tf.float32)

# ABC
abc = tf.tensordot((tf.tensordot(to_matrix_A, to_matrix_B, axes=1)), to_matrix_C, axes=1)

# CAB
cab = tf.tensordot((tf.tensordot(to_matrix_C, to_matrix_A, axes=1)), to_matrix_B, axes=1)

# BCA
bca = tf.tensordot((tf.tensordot(to_matrix_B, to_matrix_C, axes=1)), to_matrix_A, axes=1)

# trace of matrix ABC, CAB and matrix BCA
trace_matrix_abc = tf.linalg.trace(abc)
trace_matrix_cab = tf.linalg.trace(cab)
trace_matrix_bca = tf.linalg.trace(bca)

predictor = tf.equal(tf.round(trace_matrix_abc), tf.round(trace_matrix_cab)) and tf.equal(tf.round(trace_matrix_cab), tf.round(trace_matrix_bca))
def true_print(): print("It WORKS. \nABC: {} \nCAB: {} \nBCA: {}".format(trace_matrix_abc, trace_matrix_cab, trace_matrix_bca))
def false_print(): print("Condition FAILED. \nABC: {} \nCAB: {} \nBCA: {}".format(trace_matrix_abc, trace_matrix_cab, trace_matrix_bca))

tf.cond(predictor, true_print, false_print)


This invariance to cyclic permutation holds even if the resulting product has a different shape. For example, for A∈Rm×n and B∈Rn×m, we have Tr(AB)=Tr(BA) even though AB∈Rm×m and BA∈Rn×n

# mxn matrix A
to_new_matrix_A = tf.random.uniform([3, 2], minval=0, maxval=10, dtype=tf.float32)
print(" 3x2 Matrix A: \n{}\n".format(to_new_matrix_A))

# mxn matrix B
to_new_matrix_B = tf.random.uniform([2, 3], minval=0, maxval=10, dtype=tf.float32)
print(" 3x2 Matrix B: \n{}\n".format(to_new_matrix_B))

# trace of matrix AB and BA
ab = tf.linalg.trace(tf.matmul(to_new_matrix_A, to_new_matrix_B))
ba = tf.linalg.trace(tf.matmul(to_new_matrix_B, to_new_matrix_A))

predictor = tf.equal(tf.round(ab), tf.round(ba))
def true_print(): print("It WORKS. \nAB: {} \nBA: {}".format(ab, ba))
def false_print(): print("Condition FAILED. \nAB: {} \nBA: {}".format(ab, ba))

tf.cond(predictor, true_print, false_print)


### The Determinant

The determinant of a square matrix, denoted det(A), is a function that maps matrices to real scalars. You can calculate the determinant of a 2 x 2 matrix as:

For a 3 x 3 matrix:

# calculate det of a matrix
det_matrix_A = tf.constant([[3,1], [0,3]], dtype=tf.float32)
det_A = tf.linalg.det(det_matrix_A)
print("Matrix A: \n{} \nDeterminant of Matrix A: \n{}".format(det_matrix_A, det_A))

vector_plot(det_matrix_A, 5, 5)


The determinant is equal to the product of all the eigenvalues of the matrix.

# Let's find the eigen values of matrix A
d_eigen_values = tf.linalg.eigvalsh(det_matrix_A)
eigvalsh_product = tf.multiply(d_eigen_values[0], d_eigen_values[1])

# lets validate if the product of the eigen values is equal to the determinant
predictor = tf.equal(eigvalsh_product, det_A)
def true_print(): print("It WORKS. \nRHS: \n{} \n\nLHS: \n{}".format(eigvalsh_product, det_A))
def false_print(): print("Condition FAILED. \nRHS: \n{} \n\nLHS: \n{}".format(eigvalsh_product, det_A))

tf.cond(predictor, true_print, false_print)


The absolute value of the determinant can be thought of as a measure of how much multiplication by the matrix expands or contracts space. If the determinant is 0, then space is contracted completely along at least one dimension, causing it to lose all its volume. If the determinant is 1, then the transformation preserves volume.

# If you see the following plot, you can see the vectors are expanded

vector_plot(tf.multiply(tf.abs(det_A), det_matrix_A), 50, 50)


## Example: Principal Components Analysis

PCA is a complexity reduction technique that tries to reduce a set of variables down to a smaller set of components that represent most of the information in the variables. This can be though of as for a collection of data points applying lossy compression, meaning storing the points in a way that require less memory by trading some precision. At a conceptual level, PCA works by identifying sets of variables that share variance, and creating a component to represent that variance.

Earlier, when we were doing transpose or the matrix inverse, we relied on using Tensorflow’s built in functions but for PCA, there is no such function, except one in the Tensorflow Extended (tft).

There are multiple ways you can implement a PCA in Tensorflow but since this algorithm is such an important one in the machine learning world, we will take the long route.

The reason of having PCA under Linear Algebra is to show that PCA could be implemented using the theorems we studied in this Chapter.

# To start working with PCA, let's start by creating a 2D data set

x_data = tf.multiply(5, tf.random.uniform([100], minval=0, maxval=100, dtype = tf.float32, seed = 0))
y_data = tf.multiply(2, x_data) + 1 + tf.random.uniform([100], minval=0, maxval=100, dtype = tf.float32, seed = 0)

X = tf.stack([x_data, y_data], axis=1)

plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'red'})
plt.plot(X[:,0], X[:,1], '+', color='b')
plt.grid()


We start by standardizing the data. Even though the data we created are on the same scales, its always a good practice to start by standardizing the data because most of the time the data you will be working with will be in different scales.

def normalize(data):
# creates a copy of data
X = tf.identity(data)
# calculates the mean
X -=tf.reduce_mean(data, axis=0)
return X

normalized_data = normalize(X)
plt.plot(normalized_data[:,0], normalized_data[:,1], '+', color='b')
plt.grid()


Recall that PCA can be thought of as applying lossy compression to a collection of x data points. The way we can minimize the loss of precision is by finding some decoding function f(x)≈c where c will be the corresponding vector.

PCA is defined by our choice of this decoding function. Specifically, to make the decoder very simple, we chose to use matrix multiplication to map c and define g©=Dc. Our goal is to minimize the distance between the input point x to its reconstruction and to do that we use L2 norm. Which boils down to our encoding function c=D⊤x.

Finally, to reconstruct the PCA we use the same matrix D to decode all the points and to solve this optimization problem, we use eigendecomposition. Please note that the following equation is the final version of a lot of matrix tranformations. I don’t provide the derivatives because the goal is to focus on the mathematical implementation, rather than the derivation.

To find d we can calculate the eigenvectors XTX.

# Finding the Eigne Values and Vectors for the data
eigen_values, eigen_vectors = tf.linalg.eigh(tf.tensordot(tf.transpose(normalized_data), normalized_data, axes=1))

print("Eigen Vectors: \n{} \nEigen Values: \n{}".format(eigen_vectors, eigen_values))


The eigenvectors (principal components) determine the directions of the new feature space, and the eigenvalues determine their magnitude.

Now, lets use these Eingen vectors to rotate our data. The goal of the rotation is to end up with a new coordinate system where data is uncorrelated and thus where the basis axes gather all the variance. Thereby reducing the dimension.

Recall our encoding function c=D⊤x , where D is the matrix containing the eigenvectors that we have calculated before.

X_new = tf.tensordot(tf.transpose(eigen_vectors), tf.transpose(normalized_data), axes=1)

plt.plot(X_new[0, :], X_new[1, :], '+', color='b')
plt.xlim(-500, 500)
plt.ylim(-700, 700)
plt.grid()


That is the transformed data and that’s it folks for our chapter on Linear Algebra 😉.

## Congratulations

You have successfully completed Chapter 2 Linear algebra of Deep Learning with Tensorflow 2.0. To recap, we went through the following concepts:

• Scalars, Vectors, Matrices and Tensors
• Multiplying Matrices and Vectors
• Identity and Inverse Matrices
• Linear Dependence and Span
• Norms
• Special Kinds of Matrices and Vectors
• Eigendecomposition
• Singular Value Decomposition
• The Moore-Penrose Pseudoinverse
• The Trace Operator
• The Determinant
• Example: Principal Components Analysis

We covered a lot of content in one notebook, like I mentioned in the begining, this is not meant to be an absolute beginner or a comprehensive chapter on Linear Algebra, our focus is Deep Learning with Tensorflow, so we only went through the material we need to understand Deep Learning.

🔥 You can access the Code for this notebook in GitHub or launch an executable versions of this notebook using Google Colab. 🔥

If you liked this post, share it with all of your programming buddies!

#deep-learning #tensorflow #python #data-science

## Buddha Community

1645213813

Hi,

This is a Python tutorial that walks through, step by step, to detect objects in images and real time video.

The link for the video : https://youtu.be/40_NC2Ahs_8

I also shared the Python code in the video description .

Enjoy

Eran

#Python #openCV #TensorFlow

1653475560

## A Pure PHP Implementation Of The MessagePack Serialization Format

msgpack.php

A pure PHP implementation of the MessagePack serialization format.

## Installation

The recommended way to install the library is through Composer:

composer require rybakit/msgpack


## Usage

### Packing

To pack values you can either use an instance of a Packer:

$packer = new Packer();$packed = $packer->pack($value);


or call a static method on the MessagePack class:

$packed = MessagePack::pack($value);


In the examples above, the method pack automatically packs a value depending on its type. However, not all PHP types can be uniquely translated to MessagePack types. For example, the MessagePack format defines map and array types, which are represented by a single array type in PHP. By default, the packer will pack a PHP array as a MessagePack array if it has sequential numeric keys, starting from 0 and as a MessagePack map otherwise:

$mpArr1 =$packer->pack([1, 2]);               // MP array [1, 2]
$mpArr2 =$packer->pack([0 => 1, 1 => 2]);     // MP array [1, 2]
$mpMap1 =$packer->pack([0 => 1, 2 => 3]);     // MP map {0: 1, 2: 3}
$mpMap2 =$packer->pack([1 => 2, 2 => 3]);     // MP map {1: 2, 2: 3}
$mpMap3 =$packer->pack(['a' => 1, 'b' => 2]); // MP map {a: 1, b: 2}


However, sometimes you need to pack a sequential array as a MessagePack map. To do this, use the packMap method:

$mpMap =$packer->packMap([1, 2]); // {0: 1, 1: 2}


Here is a list of type-specific packing methods:

$packer->packNil(); // MP nil$packer->packBool(true);      // MP bool
$packer->packInt(42); // MP int$packer->packFloat(M_PI);     // MP float (32 or 64)
$packer->packFloat32(M_PI); // MP float 32$packer->packFloat64(M_PI);   // MP float 64
$packer->packStr('foo'); // MP str$packer->packBin("\x80");     // MP bin
$packer->packArray([1, 2]); // MP array$packer->packMap(['a' => 1]); // MP map
$packer->packExt(1, "\xaa"); // MP ext  Check the "Custom types" section below on how to pack custom types. #### Packing options The Packer object supports a number of bitmask-based options for fine-tuning the packing process (defaults are in bold): The type detection mode (DETECT_STR_BIN/DETECT_ARR_MAP) adds some overhead which can be noticed when you pack large (16- and 32-bit) arrays or strings. However, if you know the value type in advance (for example, you only work with UTF-8 strings or/and associative arrays), you can eliminate this overhead by forcing the packer to use the appropriate type, which will save it from running the auto-detection routine. Another option is to explicitly specify the value type. The library provides 2 auxiliary classes for this, Map and Bin. Check the "Custom types" section below for details. Examples: // detect str/bin type and pack PHP 64-bit floats (doubles) to MP 32-bit floats$packer = new Packer(PackOptions::DETECT_STR_BIN | PackOptions::FORCE_FLOAT32);

// these will throw MessagePack\Exception\InvalidOptionException
$packer = new Packer(PackOptions::FORCE_STR | PackOptions::FORCE_BIN);$packer = new Packer(PackOptions::FORCE_FLOAT32 | PackOptions::FORCE_FLOAT64);


### Unpacking

To unpack data you can either use an instance of a BufferUnpacker:

$unpacker = new BufferUnpacker();$unpacker->reset($packed);$value = $unpacker->unpack();  or call a static method on the MessagePack class: $value = MessagePack::unpack($packed);  If the packed data is received in chunks (e.g. when reading from a stream), use the tryUnpack method, which attempts to unpack data and returns an array of unpacked messages (if any) instead of throwing an InsufficientDataException: while ($chunk = ...) {
$unpacker->append($chunk);
if ($messages =$unpacker->tryUnpack()) {
return $messages; } }  If you want to unpack from a specific position in a buffer, use seek: $unpacker->seek(42); // set position equal to 42 bytes
$unpacker->seek(-8); // set position to 8 bytes before the end of the buffer  To skip bytes from the current position, use skip: $unpacker->skip(10); // set position to 10 bytes ahead of the current position


To get the number of remaining (unread) bytes in the buffer:

$unreadBytesCount =$unpacker->getRemainingCount();


To check whether the buffer has unread data:

$hasUnreadBytes =$unpacker->hasRemaining();


If needed, you can remove already read data from the buffer by calling:

$releasedBytesCount =$unpacker->release();


With the read method you can read raw (packed) data:

$packedData =$unpacker->read(2); // read 2 bytes


Besides the above methods BufferUnpacker provides type-specific unpacking methods, namely:

$unpacker->unpackNil(); // PHP null$unpacker->unpackBool();  // PHP bool
$unpacker->unpackInt(); // PHP int$unpacker->unpackFloat(); // PHP float
$unpacker->unpackStr(); // PHP UTF-8 string$unpacker->unpackBin();   // PHP binary string
$unpacker->unpackArray(); // PHP sequential array$unpacker->unpackMap();   // PHP associative array
$unpacker->unpackExt(); // PHP MessagePack\Type\Ext object  #### Unpacking options The BufferUnpacker object supports a number of bitmask-based options for fine-tuning the unpacking process (defaults are in bold): 1. The binary MessagePack format has unsigned 64-bit as its largest integer data type, but PHP does not support such integers, which means that an overflow can occur during unpacking. 2. Make sure the GMP extension is enabled. 3. Make sure the Decimal extension is enabled. Examples: $packedUint64 = "\xcf"."\xff\xff\xff\xff"."\xff\xff\xff\xff";

$unpacker = new BufferUnpacker($packedUint64);
var_dump($unpacker->unpack()); // string(20) "18446744073709551615"$unpacker = new BufferUnpacker($packedUint64, UnpackOptions::BIGINT_AS_GMP); var_dump($unpacker->unpack()); // object(GMP) {...}

$unpacker = new BufferUnpacker($packedUint64, UnpackOptions::BIGINT_AS_DEC);
var_dump($unpacker->unpack()); // object(Decimal\Decimal) {...}  ### Custom types In addition to the basic types, the library provides functionality to serialize and deserialize arbitrary types. This can be done in several ways, depending on your use case. Let's take a look at them. #### Type objects If you need to serialize an instance of one of your classes into one of the basic MessagePack types, the best way to do this is to implement the CanBePacked interface in the class. A good example of such a class is the Map type class that comes with the library. This type is useful when you want to explicitly specify that a given PHP array should be packed as a MessagePack map without triggering an automatic type detection routine: $packer = new Packer();

$packedMap =$packer->pack(new Map([1, 2, 3]));
$packedArray =$packer->pack([1, 2, 3]);


More type examples can be found in the src/Type directory.

#### Type transformers

As with type objects, type transformers are only responsible for serializing values. They should be used when you need to serialize a value that does not implement the CanBePacked interface. Examples of such values could be instances of built-in or third-party classes that you don't own, or non-objects such as resources.

A transformer class must implement the CanPack interface. To use a transformer, it must first be registered in the packer. Here is an example of how to serialize PHP streams into the MessagePack bin format type using one of the supplied transformers, StreamTransformer:

$packer = new Packer(null, [new StreamTransformer()]);$packedBin = $packer->pack(fopen('/path/to/file', 'r+'));  More type transformer examples can be found in the src/TypeTransformer directory. #### Extensions In contrast to the cases described above, extensions are intended to handle extension types and are responsible for both serialization and deserialization of values (types). An extension class must implement the Extension interface. To use an extension, it must first be registered in the packer and the unpacker. The MessagePack specification divides extension types into two groups: predefined and application-specific. Currently, there is only one predefined type in the specification, Timestamp. Timestamp The Timestamp extension type is a predefined type. Support for this type in the library is done through the TimestampExtension class. This class is responsible for handling Timestamp objects, which represent the number of seconds and optional adjustment in nanoseconds: $timestampExtension = new TimestampExtension();

$packer = new Packer();$packer = $packer->extendWith($timestampExtension);

$unpacker = new BufferUnpacker();$unpacker = $unpacker->extendWith($timestampExtension);

$packedTimestamp =$packer->pack(Timestamp::now());
$timestamp =$unpacker->reset($packedTimestamp)->unpack();$seconds = $timestamp->getSeconds();$nanoseconds = $timestamp->getNanoseconds();  When using the MessagePack class, the Timestamp extension is already registered: $packedTimestamp = MessagePack::pack(Timestamp::now());
$timestamp = MessagePack::unpack($packedTimestamp);


Application-specific extensions

In addition, the format can be extended with your own types. For example, to make the built-in PHP DateTime objects first-class citizens in your code, you can create a corresponding extension, as shown in the example. Please note, that custom extensions have to be registered with a unique extension ID (an integer from 0 to 127).

More extension examples can be found in the examples/MessagePack directory.

## Exceptions

If an error occurs during packing/unpacking, a PackingFailedException or an UnpackingFailedException will be thrown, respectively. In addition, an InsufficientDataException can be thrown during unpacking.

An InvalidOptionException will be thrown in case an invalid option (or a combination of mutually exclusive options) is used.

## Tests

Run tests as follows:

vendor/bin/phpunit


Also, if you already have Docker installed, you can run the tests in a docker container. First, create a container:

./dockerfile.sh | docker build -t msgpack -


The command above will create a container named msgpack with PHP 8.1 runtime. You may change the default runtime by defining the PHP_IMAGE environment variable:

PHP_IMAGE='php:8.0-cli' ./dockerfile.sh | docker build -t msgpack -


See a list of various images here.

Then run the unit tests:

docker run --rm -v $PWD:/msgpack -w /msgpack msgpack  #### Fuzzing To ensure that the unpacking works correctly with malformed/semi-malformed data, you can use a testing technique called Fuzzing. The library ships with a help file (target) for PHP-Fuzzer and can be used as follows: php-fuzzer fuzz tests/fuzz_buffer_unpacker.php  #### Performance To check performance, run: php -n -dzend_extension=opcache.so \ -dpcre.jit=1 -dopcache.enable=1 -dopcache.enable_cli=1 \ tests/bench.php  Example output Filter: MessagePack\Tests\Perf\Filter\ListFilter Rounds: 3 Iterations: 100000 ============================================= Test/Target Packer BufferUnpacker --------------------------------------------- nil .................. 0.0030 ........ 0.0139 false ................ 0.0037 ........ 0.0144 true ................. 0.0040 ........ 0.0137 7-bit uint #1 ........ 0.0052 ........ 0.0120 7-bit uint #2 ........ 0.0059 ........ 0.0114 7-bit uint #3 ........ 0.0061 ........ 0.0119 5-bit sint #1 ........ 0.0067 ........ 0.0126 5-bit sint #2 ........ 0.0064 ........ 0.0132 5-bit sint #3 ........ 0.0066 ........ 0.0135 8-bit uint #1 ........ 0.0078 ........ 0.0200 8-bit uint #2 ........ 0.0077 ........ 0.0212 8-bit uint #3 ........ 0.0086 ........ 0.0203 16-bit uint #1 ....... 0.0111 ........ 0.0271 16-bit uint #2 ....... 0.0115 ........ 0.0260 16-bit uint #3 ....... 0.0103 ........ 0.0273 32-bit uint #1 ....... 0.0116 ........ 0.0326 32-bit uint #2 ....... 0.0118 ........ 0.0332 32-bit uint #3 ....... 0.0127 ........ 0.0325 64-bit uint #1 ....... 0.0140 ........ 0.0277 64-bit uint #2 ....... 0.0134 ........ 0.0294 64-bit uint #3 ....... 0.0134 ........ 0.0281 8-bit int #1 ......... 0.0086 ........ 0.0241 8-bit int #2 ......... 0.0089 ........ 0.0225 8-bit int #3 ......... 0.0085 ........ 0.0229 16-bit int #1 ........ 0.0118 ........ 0.0280 16-bit int #2 ........ 0.0121 ........ 0.0270 16-bit int #3 ........ 0.0109 ........ 0.0274 32-bit int #1 ........ 0.0128 ........ 0.0346 32-bit int #2 ........ 0.0118 ........ 0.0339 32-bit int #3 ........ 0.0135 ........ 0.0368 64-bit int #1 ........ 0.0138 ........ 0.0276 64-bit int #2 ........ 0.0132 ........ 0.0286 64-bit int #3 ........ 0.0137 ........ 0.0274 64-bit int #4 ........ 0.0180 ........ 0.0285 64-bit float #1 ...... 0.0134 ........ 0.0284 64-bit float #2 ...... 0.0125 ........ 0.0275 64-bit float #3 ...... 0.0126 ........ 0.0283 fix string #1 ........ 0.0035 ........ 0.0133 fix string #2 ........ 0.0094 ........ 0.0216 fix string #3 ........ 0.0094 ........ 0.0222 fix string #4 ........ 0.0091 ........ 0.0241 8-bit string #1 ...... 0.0122 ........ 0.0301 8-bit string #2 ...... 0.0118 ........ 0.0304 8-bit string #3 ...... 0.0119 ........ 0.0315 16-bit string #1 ..... 0.0150 ........ 0.0388 16-bit string #2 ..... 0.1545 ........ 0.1665 32-bit string ........ 0.1570 ........ 0.1756 wide char string #1 .. 0.0091 ........ 0.0236 wide char string #2 .. 0.0122 ........ 0.0313 8-bit binary #1 ...... 0.0100 ........ 0.0302 8-bit binary #2 ...... 0.0123 ........ 0.0324 8-bit binary #3 ...... 0.0126 ........ 0.0327 16-bit binary ........ 0.0168 ........ 0.0372 32-bit binary ........ 0.1588 ........ 0.1754 fix array #1 ......... 0.0042 ........ 0.0131 fix array #2 ......... 0.0294 ........ 0.0367 fix array #3 ......... 0.0412 ........ 0.0472 16-bit array #1 ...... 0.1378 ........ 0.1596 16-bit array #2 ........... S ............. S 32-bit array .............. S ............. S complex array ........ 0.1865 ........ 0.2283 fix map #1 ........... 0.0725 ........ 0.1048 fix map #2 ........... 0.0319 ........ 0.0405 fix map #3 ........... 0.0356 ........ 0.0665 fix map #4 ........... 0.0465 ........ 0.0497 16-bit map #1 ........ 0.2540 ........ 0.3028 16-bit map #2 ............. S ............. S 32-bit map ................ S ............. S complex map .......... 0.2372 ........ 0.2710 fixext 1 ............. 0.0283 ........ 0.0358 fixext 2 ............. 0.0291 ........ 0.0371 fixext 4 ............. 0.0302 ........ 0.0355 fixext 8 ............. 0.0288 ........ 0.0384 fixext 16 ............ 0.0293 ........ 0.0359 8-bit ext ............ 0.0302 ........ 0.0439 16-bit ext ........... 0.0334 ........ 0.0499 32-bit ext ........... 0.1845 ........ 0.1888 32-bit timestamp #1 .. 0.0337 ........ 0.0547 32-bit timestamp #2 .. 0.0335 ........ 0.0560 64-bit timestamp #1 .. 0.0371 ........ 0.0575 64-bit timestamp #2 .. 0.0374 ........ 0.0542 64-bit timestamp #3 .. 0.0356 ........ 0.0533 96-bit timestamp #1 .. 0.0362 ........ 0.0699 96-bit timestamp #2 .. 0.0381 ........ 0.0701 96-bit timestamp #3 .. 0.0367 ........ 0.0687 ============================================= Total 2.7618 4.0820 Skipped 4 4 Failed 0 0 Ignored 0 0  With JIT: php -n -dzend_extension=opcache.so \ -dpcre.jit=1 -dopcache.jit_buffer_size=64M -dopcache.jit=tracing -dopcache.enable=1 -dopcache.enable_cli=1 \ tests/bench.php  Example output Filter: MessagePack\Tests\Perf\Filter\ListFilter Rounds: 3 Iterations: 100000 ============================================= Test/Target Packer BufferUnpacker --------------------------------------------- nil .................. 0.0005 ........ 0.0054 false ................ 0.0004 ........ 0.0059 true ................. 0.0004 ........ 0.0059 7-bit uint #1 ........ 0.0010 ........ 0.0047 7-bit uint #2 ........ 0.0010 ........ 0.0046 7-bit uint #3 ........ 0.0010 ........ 0.0046 5-bit sint #1 ........ 0.0025 ........ 0.0046 5-bit sint #2 ........ 0.0023 ........ 0.0046 5-bit sint #3 ........ 0.0024 ........ 0.0045 8-bit uint #1 ........ 0.0043 ........ 0.0081 8-bit uint #2 ........ 0.0043 ........ 0.0079 8-bit uint #3 ........ 0.0041 ........ 0.0080 16-bit uint #1 ....... 0.0064 ........ 0.0095 16-bit uint #2 ....... 0.0064 ........ 0.0091 16-bit uint #3 ....... 0.0064 ........ 0.0094 32-bit uint #1 ....... 0.0085 ........ 0.0114 32-bit uint #2 ....... 0.0077 ........ 0.0122 32-bit uint #3 ....... 0.0077 ........ 0.0120 64-bit uint #1 ....... 0.0085 ........ 0.0159 64-bit uint #2 ....... 0.0086 ........ 0.0157 64-bit uint #3 ....... 0.0086 ........ 0.0158 8-bit int #1 ......... 0.0042 ........ 0.0080 8-bit int #2 ......... 0.0042 ........ 0.0080 8-bit int #3 ......... 0.0042 ........ 0.0081 16-bit int #1 ........ 0.0065 ........ 0.0095 16-bit int #2 ........ 0.0065 ........ 0.0090 16-bit int #3 ........ 0.0056 ........ 0.0085 32-bit int #1 ........ 0.0067 ........ 0.0107 32-bit int #2 ........ 0.0066 ........ 0.0106 32-bit int #3 ........ 0.0063 ........ 0.0104 64-bit int #1 ........ 0.0072 ........ 0.0162 64-bit int #2 ........ 0.0073 ........ 0.0174 64-bit int #3 ........ 0.0072 ........ 0.0164 64-bit int #4 ........ 0.0077 ........ 0.0161 64-bit float #1 ...... 0.0053 ........ 0.0135 64-bit float #2 ...... 0.0053 ........ 0.0135 64-bit float #3 ...... 0.0052 ........ 0.0135 fix string #1 ....... -0.0002 ........ 0.0044 fix string #2 ........ 0.0035 ........ 0.0067 fix string #3 ........ 0.0035 ........ 0.0077 fix string #4 ........ 0.0033 ........ 0.0078 8-bit string #1 ...... 0.0059 ........ 0.0110 8-bit string #2 ...... 0.0063 ........ 0.0121 8-bit string #3 ...... 0.0064 ........ 0.0124 16-bit string #1 ..... 0.0099 ........ 0.0146 16-bit string #2 ..... 0.1522 ........ 0.1474 32-bit string ........ 0.1511 ........ 0.1483 wide char string #1 .. 0.0039 ........ 0.0084 wide char string #2 .. 0.0073 ........ 0.0123 8-bit binary #1 ...... 0.0040 ........ 0.0112 8-bit binary #2 ...... 0.0075 ........ 0.0123 8-bit binary #3 ...... 0.0077 ........ 0.0129 16-bit binary ........ 0.0096 ........ 0.0145 32-bit binary ........ 0.1535 ........ 0.1479 fix array #1 ......... 0.0008 ........ 0.0061 fix array #2 ......... 0.0121 ........ 0.0165 fix array #3 ......... 0.0193 ........ 0.0222 16-bit array #1 ...... 0.0607 ........ 0.0479 16-bit array #2 ........... S ............. S 32-bit array .............. S ............. S complex array ........ 0.0749 ........ 0.0824 fix map #1 ........... 0.0329 ........ 0.0431 fix map #2 ........... 0.0161 ........ 0.0189 fix map #3 ........... 0.0205 ........ 0.0262 fix map #4 ........... 0.0252 ........ 0.0205 16-bit map #1 ........ 0.1016 ........ 0.0927 16-bit map #2 ............. S ............. S 32-bit map ................ S ............. S complex map .......... 0.1096 ........ 0.1030 fixext 1 ............. 0.0157 ........ 0.0161 fixext 2 ............. 0.0175 ........ 0.0183 fixext 4 ............. 0.0156 ........ 0.0185 fixext 8 ............. 0.0163 ........ 0.0184 fixext 16 ............ 0.0164 ........ 0.0182 8-bit ext ............ 0.0158 ........ 0.0207 16-bit ext ........... 0.0203 ........ 0.0219 32-bit ext ........... 0.1614 ........ 0.1539 32-bit timestamp #1 .. 0.0195 ........ 0.0249 32-bit timestamp #2 .. 0.0188 ........ 0.0260 64-bit timestamp #1 .. 0.0207 ........ 0.0281 64-bit timestamp #2 .. 0.0212 ........ 0.0291 64-bit timestamp #3 .. 0.0207 ........ 0.0295 96-bit timestamp #1 .. 0.0222 ........ 0.0358 96-bit timestamp #2 .. 0.0228 ........ 0.0353 96-bit timestamp #3 .. 0.0210 ........ 0.0319 ============================================= Total 1.6432 1.9674 Skipped 4 4 Failed 0 0 Ignored 0 0  You may change default benchmark settings by defining the following environment variables: For example: export MP_BENCH_TARGETS=pure_p export MP_BENCH_ITERATIONS=1000000 export MP_BENCH_ROUNDS=5 # a comma separated list of test names export MP_BENCH_TESTS='complex array, complex map' # or a group name # export MP_BENCH_TESTS='-@slow' // @pecl_comp # or a regexp # export MP_BENCH_TESTS='/complex (array|map)/'  Another example, benchmarking both the library and the PECL extension: MP_BENCH_TARGETS=pure_p,pure_u,pecl_p,pecl_u \ php -n -dextension=msgpack.so -dzend_extension=opcache.so \ -dpcre.jit=1 -dopcache.enable=1 -dopcache.enable_cli=1 \ tests/bench.php  Example output Filter: MessagePack\Tests\Perf\Filter\ListFilter Rounds: 3 Iterations: 100000 =========================================================================== Test/Target Packer BufferUnpacker msgpack_pack msgpack_unpack --------------------------------------------------------------------------- nil .................. 0.0031 ........ 0.0141 ...... 0.0055 ........ 0.0064 false ................ 0.0039 ........ 0.0154 ...... 0.0056 ........ 0.0053 true ................. 0.0038 ........ 0.0139 ...... 0.0056 ........ 0.0044 7-bit uint #1 ........ 0.0061 ........ 0.0110 ...... 0.0059 ........ 0.0046 7-bit uint #2 ........ 0.0065 ........ 0.0119 ...... 0.0042 ........ 0.0029 7-bit uint #3 ........ 0.0054 ........ 0.0117 ...... 0.0045 ........ 0.0025 5-bit sint #1 ........ 0.0047 ........ 0.0103 ...... 0.0038 ........ 0.0022 5-bit sint #2 ........ 0.0048 ........ 0.0117 ...... 0.0038 ........ 0.0022 5-bit sint #3 ........ 0.0046 ........ 0.0102 ...... 0.0038 ........ 0.0023 8-bit uint #1 ........ 0.0063 ........ 0.0174 ...... 0.0039 ........ 0.0031 8-bit uint #2 ........ 0.0063 ........ 0.0167 ...... 0.0040 ........ 0.0029 8-bit uint #3 ........ 0.0063 ........ 0.0168 ...... 0.0039 ........ 0.0030 16-bit uint #1 ....... 0.0092 ........ 0.0222 ...... 0.0049 ........ 0.0030 16-bit uint #2 ....... 0.0096 ........ 0.0227 ...... 0.0042 ........ 0.0046 16-bit uint #3 ....... 0.0123 ........ 0.0274 ...... 0.0059 ........ 0.0051 32-bit uint #1 ....... 0.0136 ........ 0.0331 ...... 0.0060 ........ 0.0048 32-bit uint #2 ....... 0.0130 ........ 0.0336 ...... 0.0070 ........ 0.0048 32-bit uint #3 ....... 0.0127 ........ 0.0329 ...... 0.0051 ........ 0.0048 64-bit uint #1 ....... 0.0126 ........ 0.0268 ...... 0.0055 ........ 0.0049 64-bit uint #2 ....... 0.0135 ........ 0.0281 ...... 0.0052 ........ 0.0046 64-bit uint #3 ....... 0.0131 ........ 0.0274 ...... 0.0069 ........ 0.0044 8-bit int #1 ......... 0.0077 ........ 0.0236 ...... 0.0058 ........ 0.0044 8-bit int #2 ......... 0.0087 ........ 0.0244 ...... 0.0058 ........ 0.0048 8-bit int #3 ......... 0.0084 ........ 0.0241 ...... 0.0055 ........ 0.0049 16-bit int #1 ........ 0.0112 ........ 0.0271 ...... 0.0048 ........ 0.0045 16-bit int #2 ........ 0.0124 ........ 0.0292 ...... 0.0057 ........ 0.0049 16-bit int #3 ........ 0.0118 ........ 0.0270 ...... 0.0058 ........ 0.0050 32-bit int #1 ........ 0.0137 ........ 0.0366 ...... 0.0058 ........ 0.0051 32-bit int #2 ........ 0.0133 ........ 0.0366 ...... 0.0056 ........ 0.0049 32-bit int #3 ........ 0.0129 ........ 0.0350 ...... 0.0052 ........ 0.0048 64-bit int #1 ........ 0.0145 ........ 0.0254 ...... 0.0034 ........ 0.0025 64-bit int #2 ........ 0.0097 ........ 0.0214 ...... 0.0034 ........ 0.0025 64-bit int #3 ........ 0.0096 ........ 0.0287 ...... 0.0059 ........ 0.0050 64-bit int #4 ........ 0.0143 ........ 0.0277 ...... 0.0059 ........ 0.0046 64-bit float #1 ...... 0.0134 ........ 0.0281 ...... 0.0057 ........ 0.0052 64-bit float #2 ...... 0.0141 ........ 0.0281 ...... 0.0057 ........ 0.0050 64-bit float #3 ...... 0.0144 ........ 0.0282 ...... 0.0057 ........ 0.0050 fix string #1 ........ 0.0036 ........ 0.0143 ...... 0.0066 ........ 0.0053 fix string #2 ........ 0.0107 ........ 0.0222 ...... 0.0065 ........ 0.0068 fix string #3 ........ 0.0116 ........ 0.0245 ...... 0.0063 ........ 0.0069 fix string #4 ........ 0.0105 ........ 0.0253 ...... 0.0083 ........ 0.0077 8-bit string #1 ...... 0.0126 ........ 0.0318 ...... 0.0075 ........ 0.0088 8-bit string #2 ...... 0.0121 ........ 0.0295 ...... 0.0076 ........ 0.0086 8-bit string #3 ...... 0.0125 ........ 0.0293 ...... 0.0130 ........ 0.0093 16-bit string #1 ..... 0.0159 ........ 0.0368 ...... 0.0117 ........ 0.0086 16-bit string #2 ..... 0.1547 ........ 0.1686 ...... 0.1516 ........ 0.1373 32-bit string ........ 0.1558 ........ 0.1729 ...... 0.1511 ........ 0.1396 wide char string #1 .. 0.0098 ........ 0.0237 ...... 0.0066 ........ 0.0065 wide char string #2 .. 0.0128 ........ 0.0291 ...... 0.0061 ........ 0.0082 8-bit binary #1 ........... I ............. I ........... F ............. I 8-bit binary #2 ........... I ............. I ........... F ............. I 8-bit binary #3 ........... I ............. I ........... F ............. I 16-bit binary ............. I ............. I ........... F ............. I 32-bit binary ............. I ............. I ........... F ............. I fix array #1 ......... 0.0040 ........ 0.0129 ...... 0.0120 ........ 0.0058 fix array #2 ......... 0.0279 ........ 0.0390 ...... 0.0143 ........ 0.0165 fix array #3 ......... 0.0415 ........ 0.0463 ...... 0.0162 ........ 0.0187 16-bit array #1 ...... 0.1349 ........ 0.1628 ...... 0.0334 ........ 0.0341 16-bit array #2 ........... S ............. S ........... S ............. S 32-bit array .............. S ............. S ........... S ............. S complex array ............. I ............. I ........... F ............. F fix map #1 ................ I ............. I ........... F ............. I fix map #2 ........... 0.0345 ........ 0.0391 ...... 0.0143 ........ 0.0168 fix map #3 ................ I ............. I ........... F ............. I fix map #4 ........... 0.0459 ........ 0.0473 ...... 0.0151 ........ 0.0163 16-bit map #1 ........ 0.2518 ........ 0.2962 ...... 0.0400 ........ 0.0490 16-bit map #2 ............. S ............. S ........... S ............. S 32-bit map ................ S ............. S ........... S ............. S complex map .......... 0.2380 ........ 0.2682 ...... 0.0545 ........ 0.0579 fixext 1 .................. I ............. I ........... F ............. F fixext 2 .................. I ............. I ........... F ............. F fixext 4 .................. I ............. I ........... F ............. F fixext 8 .................. I ............. I ........... F ............. F fixext 16 ................. I ............. I ........... F ............. F 8-bit ext ................. I ............. I ........... F ............. F 16-bit ext ................ I ............. I ........... F ............. F 32-bit ext ................ I ............. I ........... F ............. F 32-bit timestamp #1 ....... I ............. I ........... F ............. F 32-bit timestamp #2 ....... I ............. I ........... F ............. F 64-bit timestamp #1 ....... I ............. I ........... F ............. F 64-bit timestamp #2 ....... I ............. I ........... F ............. F 64-bit timestamp #3 ....... I ............. I ........... F ............. F 96-bit timestamp #1 ....... I ............. I ........... F ............. F 96-bit timestamp #2 ....... I ............. I ........... F ............. F 96-bit timestamp #3 ....... I ............. I ........... F ............. F =========================================================================== Total 1.5625 2.3866 0.7735 0.7243 Skipped 4 4 4 4 Failed 0 0 24 17 Ignored 24 24 0 7  With JIT: MP_BENCH_TARGETS=pure_p,pure_u,pecl_p,pecl_u \ php -n -dextension=msgpack.so -dzend_extension=opcache.so \ -dpcre.jit=1 -dopcache.jit_buffer_size=64M -dopcache.jit=tracing -dopcache.enable=1 -dopcache.enable_cli=1 \ tests/bench.php  Example output Filter: MessagePack\Tests\Perf\Filter\ListFilter Rounds: 3 Iterations: 100000 =========================================================================== Test/Target Packer BufferUnpacker msgpack_pack msgpack_unpack --------------------------------------------------------------------------- nil .................. 0.0001 ........ 0.0052 ...... 0.0053 ........ 0.0042 false ................ 0.0007 ........ 0.0060 ...... 0.0057 ........ 0.0043 true ................. 0.0008 ........ 0.0060 ...... 0.0056 ........ 0.0041 7-bit uint #1 ........ 0.0031 ........ 0.0046 ...... 0.0062 ........ 0.0041 7-bit uint #2 ........ 0.0021 ........ 0.0043 ...... 0.0062 ........ 0.0041 7-bit uint #3 ........ 0.0022 ........ 0.0044 ...... 0.0061 ........ 0.0040 5-bit sint #1 ........ 0.0030 ........ 0.0048 ...... 0.0062 ........ 0.0040 5-bit sint #2 ........ 0.0032 ........ 0.0046 ...... 0.0062 ........ 0.0040 5-bit sint #3 ........ 0.0031 ........ 0.0046 ...... 0.0062 ........ 0.0040 8-bit uint #1 ........ 0.0054 ........ 0.0079 ...... 0.0062 ........ 0.0050 8-bit uint #2 ........ 0.0051 ........ 0.0079 ...... 0.0064 ........ 0.0044 8-bit uint #3 ........ 0.0051 ........ 0.0082 ...... 0.0062 ........ 0.0044 16-bit uint #1 ....... 0.0077 ........ 0.0094 ...... 0.0065 ........ 0.0045 16-bit uint #2 ....... 0.0077 ........ 0.0094 ...... 0.0063 ........ 0.0045 16-bit uint #3 ....... 0.0077 ........ 0.0095 ...... 0.0064 ........ 0.0047 32-bit uint #1 ....... 0.0088 ........ 0.0119 ...... 0.0063 ........ 0.0043 32-bit uint #2 ....... 0.0089 ........ 0.0117 ...... 0.0062 ........ 0.0039 32-bit uint #3 ....... 0.0089 ........ 0.0118 ...... 0.0063 ........ 0.0044 64-bit uint #1 ....... 0.0097 ........ 0.0155 ...... 0.0063 ........ 0.0045 64-bit uint #2 ....... 0.0095 ........ 0.0153 ...... 0.0061 ........ 0.0045 64-bit uint #3 ....... 0.0096 ........ 0.0156 ...... 0.0063 ........ 0.0047 8-bit int #1 ......... 0.0053 ........ 0.0083 ...... 0.0062 ........ 0.0044 8-bit int #2 ......... 0.0052 ........ 0.0080 ...... 0.0062 ........ 0.0044 8-bit int #3 ......... 0.0052 ........ 0.0080 ...... 0.0062 ........ 0.0043 16-bit int #1 ........ 0.0089 ........ 0.0097 ...... 0.0069 ........ 0.0046 16-bit int #2 ........ 0.0075 ........ 0.0093 ...... 0.0063 ........ 0.0043 16-bit int #3 ........ 0.0075 ........ 0.0094 ...... 0.0062 ........ 0.0046 32-bit int #1 ........ 0.0086 ........ 0.0122 ...... 0.0063 ........ 0.0044 32-bit int #2 ........ 0.0087 ........ 0.0120 ...... 0.0066 ........ 0.0046 32-bit int #3 ........ 0.0086 ........ 0.0121 ...... 0.0060 ........ 0.0044 64-bit int #1 ........ 0.0096 ........ 0.0149 ...... 0.0060 ........ 0.0045 64-bit int #2 ........ 0.0096 ........ 0.0157 ...... 0.0062 ........ 0.0044 64-bit int #3 ........ 0.0096 ........ 0.0160 ...... 0.0063 ........ 0.0046 64-bit int #4 ........ 0.0097 ........ 0.0157 ...... 0.0061 ........ 0.0044 64-bit float #1 ...... 0.0079 ........ 0.0153 ...... 0.0056 ........ 0.0044 64-bit float #2 ...... 0.0079 ........ 0.0152 ...... 0.0057 ........ 0.0045 64-bit float #3 ...... 0.0079 ........ 0.0155 ...... 0.0057 ........ 0.0044 fix string #1 ........ 0.0010 ........ 0.0045 ...... 0.0071 ........ 0.0044 fix string #2 ........ 0.0048 ........ 0.0075 ...... 0.0070 ........ 0.0060 fix string #3 ........ 0.0048 ........ 0.0086 ...... 0.0068 ........ 0.0060 fix string #4 ........ 0.0050 ........ 0.0088 ...... 0.0070 ........ 0.0059 8-bit string #1 ...... 0.0081 ........ 0.0129 ...... 0.0069 ........ 0.0062 8-bit string #2 ...... 0.0086 ........ 0.0128 ...... 0.0069 ........ 0.0065 8-bit string #3 ...... 0.0086 ........ 0.0126 ...... 0.0115 ........ 0.0065 16-bit string #1 ..... 0.0105 ........ 0.0137 ...... 0.0128 ........ 0.0068 16-bit string #2 ..... 0.1510 ........ 0.1486 ...... 0.1526 ........ 0.1391 32-bit string ........ 0.1517 ........ 0.1475 ...... 0.1504 ........ 0.1370 wide char string #1 .. 0.0044 ........ 0.0085 ...... 0.0067 ........ 0.0057 wide char string #2 .. 0.0081 ........ 0.0125 ...... 0.0069 ........ 0.0063 8-bit binary #1 ........... I ............. I ........... F ............. I 8-bit binary #2 ........... I ............. I ........... F ............. I 8-bit binary #3 ........... I ............. I ........... F ............. I 16-bit binary ............. I ............. I ........... F ............. I 32-bit binary ............. I ............. I ........... F ............. I fix array #1 ......... 0.0014 ........ 0.0059 ...... 0.0132 ........ 0.0055 fix array #2 ......... 0.0146 ........ 0.0156 ...... 0.0155 ........ 0.0148 fix array #3 ......... 0.0211 ........ 0.0229 ...... 0.0179 ........ 0.0180 16-bit array #1 ...... 0.0673 ........ 0.0498 ...... 0.0343 ........ 0.0388 16-bit array #2 ........... S ............. S ........... S ............. S 32-bit array .............. S ............. S ........... S ............. S complex array ............. I ............. I ........... F ............. F fix map #1 ................ I ............. I ........... F ............. I fix map #2 ........... 0.0148 ........ 0.0180 ...... 0.0156 ........ 0.0179 fix map #3 ................ I ............. I ........... F ............. I fix map #4 ........... 0.0252 ........ 0.0201 ...... 0.0214 ........ 0.0167 16-bit map #1 ........ 0.1027 ........ 0.0836 ...... 0.0388 ........ 0.0510 16-bit map #2 ............. S ............. S ........... S ............. S 32-bit map ................ S ............. S ........... S ............. S complex map .......... 0.1104 ........ 0.1010 ...... 0.0556 ........ 0.0602 fixext 1 .................. I ............. I ........... F ............. F fixext 2 .................. I ............. I ........... F ............. F fixext 4 .................. I ............. I ........... F ............. F fixext 8 .................. I ............. I ........... F ............. F fixext 16 ................. I ............. I ........... F ............. F 8-bit ext ................. I ............. I ........... F ............. F 16-bit ext ................ I ............. I ........... F ............. F 32-bit ext ................ I ............. I ........... F ............. F 32-bit timestamp #1 ....... I ............. I ........... F ............. F 32-bit timestamp #2 ....... I ............. I ........... F ............. F 64-bit timestamp #1 ....... I ............. I ........... F ............. F 64-bit timestamp #2 ....... I ............. I ........... F ............. F 64-bit timestamp #3 ....... I ............. I ........... F ............. F 96-bit timestamp #1 ....... I ............. I ........... F ............. F 96-bit timestamp #2 ....... I ............. I ........... F ............. F 96-bit timestamp #3 ....... I ............. I ........... F ............. F =========================================================================== Total 0.9642 1.0909 0.8224 0.7213 Skipped 4 4 4 4 Failed 0 0 24 17 Ignored 24 24 0 7  Note that the msgpack extension (v2.1.2) doesn't support ext, bin and UTF-8 str types. ## License The library is released under the MIT License. See the bundled LICENSE file for details. Author: rybakit Source Code: https://github.com/rybakit/msgpack.php License: MIT License 1648900800 ## Plpgsql Check: Extension That Allows to Check Plpgsql Source Code. ## plpgsql_check I founded this project, because I wanted to publish the code I wrote in the last two years, when I tried to write enhanced checking for PostgreSQL upstream. It was not fully successful - integration into upstream requires some larger plpgsql refactoring - probably it will not be done in next years (now is Dec 2013). But written code is fully functional and can be used in production (and it is used in production). So, I created this extension to be available for all plpgsql developers. If you like it and if you would to join to development of this extension, register yourself to postgresql extension hacking google group. Features • check fields of referenced database objects and types inside embedded SQL • using correct types of function parameters • unused variables and function argumens, unmodified OUT argumens • partially detection of dead code (due RETURN command) • detection of missing RETURN command in function • try to identify unwanted hidden casts, that can be performance issue like unused indexes • possibility to collect relations and functions used by function • possibility to check EXECUTE stmt agaist SQL injection vulnerability I invite any ideas, patches, bugreports. plpgsql_check is next generation of plpgsql_lint. It allows to check source code by explicit call plpgsql_check_function. PostgreSQL PostgreSQL 10, 11, 12, 13 and 14 are supported. The SQL statements inside PL/pgSQL functions are checked by validator for semantic errors. These errors can be found by plpgsql_check_function: Active mode postgres=# CREATE EXTENSION plpgsql_check; LOAD postgres=# CREATE TABLE t1(a int, b int); CREATE TABLE postgres=# CREATE OR REPLACE FUNCTION public.f1() RETURNS void LANGUAGE plpgsql AS$function$DECLARE r record; BEGIN FOR r IN SELECT * FROM t1 LOOP RAISE NOTICE '%', r.c; -- there is bug - table t1 missing "c" column END LOOP; END;$function$; CREATE FUNCTION postgres=# select f1(); -- execution doesn't find a bug due to empty table t1 f1 ──── (1 row) postgres=# \x Expanded display is on. postgres=# select * from plpgsql_check_function_tb('f1()'); ─[ RECORD 1 ]─────────────────────────── functionid │ f1 lineno │ 6 statement │ RAISE sqlstate │ 42703 message │ record "r" has no field "c" detail │ [null] hint │ [null] level │ error position │ 0 query │ [null] postgres=# \sf+ f1 CREATE OR REPLACE FUNCTION public.f1() RETURNS void LANGUAGE plpgsql 1 AS$function$2 DECLARE r record; 3 BEGIN 4 FOR r IN SELECT * FROM t1 5 LOOP 6 RAISE NOTICE '%', r.c; -- there is bug - table t1 missing "c" column 7 END LOOP; 8 END; 9$function$ Function plpgsql_check_function() has three possible formats: text, json or xml select * from plpgsql_check_function('f1()', fatal_errors := false); plpgsql_check_function ------------------------------------------------------------------------ error:42703:4:SQL statement:column "c" of relation "t1" does not exist Query: update t1 set c = 30 -- ^ error:42P01:7:RAISE:missing FROM-clause entry for table "r" Query: SELECT r.c -- ^ error:42601:7:RAISE:too few parameters specified for RAISE (7 rows) postgres=# select * from plpgsql_check_function('fx()', format:='xml'); plpgsql_check_function ──────────────────────────────────────────────────────────────── <Function oid="16400"> ↵ <Issue> ↵ <Level>error</level> ↵ <Sqlstate>42P01</Sqlstate> ↵ <Message>relation "foo111" does not exist</Message> ↵ <Stmt lineno="3">RETURN</Stmt> ↵ <Query position="23">SELECT (select a from foo111)</Query>↵ </Issue> ↵ </Function> (1 row)  ## Arguments You can set level of warnings via function's parameters: ### Mandatory arguments • function name or function signature - these functions requires function specification. Any function in PostgreSQL can be specified by Oid or by name or by signature. When you know oid or complete function's signature, you can use a regprocedure type parameter like 'fx()'::regprocedure or 16799::regprocedure. Possible alternative is using a name only, when function's name is unique - like 'fx'. When the name is not unique or the function doesn't exists it raises a error. ### Optional arguments relid DEFAULT 0 - oid of relation assigned with trigger function. It is necessary for check of any trigger function. fatal_errors boolean DEFAULT true - stop on first error other_warnings boolean DEFAULT true - show warnings like different attributes number in assignmenet on left and right side, variable overlaps function's parameter, unused variables, unwanted casting, .. extra_warnings boolean DEFAULT true - show warnings like missing RETURN, shadowed variables, dead code, never read (unused) function's parameter, unmodified variables, modified auto variables, .. performance_warnings boolean DEFAULT false - performance related warnings like declared type with type modificator, casting, implicit casts in where clause (can be reason why index is not used), .. security_warnings boolean DEFAULT false - security related checks like SQL injection vulnerability detection anyelementtype regtype DEFAULT 'int' - a real type used instead anyelement type anyenumtype regtype DEFAULT '-' - a real type used instead anyenum type anyrangetype regtype DEFAULT 'int4range' - a real type used instead anyrange type anycompatibletype DEFAULT 'int' - a real type used instead anycompatible type anycompatiblerangetype DEFAULT 'int4range' - a real type used instead anycompatible range type without_warnings DEFAULT false - disable all warnings all_warnings DEFAULT false - enable all warnings newtable DEFAULT NULL, oldtable DEFAULT NULL - the names of NEW or OLD transitive tables. These parameters are required when transitive tables are used. ## Triggers When you want to check any trigger, you have to enter a relation that will be used together with trigger function CREATE TABLE bar(a int, b int); postgres=# \sf+ foo_trg CREATE OR REPLACE FUNCTION public.foo_trg() RETURNS trigger LANGUAGE plpgsql 1 AS$function$2 BEGIN 3 NEW.c := NEW.a + NEW.b; 4 RETURN NEW; 5 END; 6$function$ Missing relation specification postgres=# select * from plpgsql_check_function('foo_trg()'); ERROR: missing trigger relation HINT: Trigger relation oid must be valid  Correct trigger checking (with specified relation) postgres=# select * from plpgsql_check_function('foo_trg()', 'bar'); plpgsql_check_function -------------------------------------------------------- error:42703:3:assignment:record "new" has no field "c" (1 row)  For triggers with transitive tables you can set a oldtable or newtable parameters: create or replace function footab_trig_func() returns trigger as $$declare x int; begin if false then -- should be ok; select count(*) from newtab into x; -- should fail; select count(*) from newtab where d = 10 into x; end if; return null; end;$$ language plpgsql; select * from plpgsql_check_function('footab_trig_func','footab', newtable := 'newtab');  ## Mass check You can use the plpgsql_check_function for mass check functions and mass check triggers. Please, test following queries: -- check all nontrigger plpgsql functions SELECT p.oid, p.proname, plpgsql_check_function(p.oid) FROM pg_catalog.pg_namespace n JOIN pg_catalog.pg_proc p ON pronamespace = n.oid JOIN pg_catalog.pg_language l ON p.prolang = l.oid WHERE l.lanname = 'plpgsql' AND p.prorettype <> 2279;  or SELECT p.proname, tgrelid::regclass, cf.* FROM pg_proc p JOIN pg_trigger t ON t.tgfoid = p.oid JOIN pg_language l ON p.prolang = l.oid JOIN pg_namespace n ON p.pronamespace = n.oid, LATERAL plpgsql_check_function(p.oid, t.tgrelid) cf WHERE n.nspname = 'public' and l.lanname = 'plpgsql'  or -- check all plpgsql functions (functions or trigger functions with defined triggers) SELECT (pcf).functionid::regprocedure, (pcf).lineno, (pcf).statement, (pcf).sqlstate, (pcf).message, (pcf).detail, (pcf).hint, (pcf).level, (pcf)."position", (pcf).query, (pcf).context FROM ( SELECT plpgsql_check_function_tb(pg_proc.oid, COALESCE(pg_trigger.tgrelid, 0)) AS pcf FROM pg_proc LEFT JOIN pg_trigger ON (pg_trigger.tgfoid = pg_proc.oid) WHERE prolang = (SELECT lang.oid FROM pg_language lang WHERE lang.lanname = 'plpgsql') AND pronamespace <> (SELECT nsp.oid FROM pg_namespace nsp WHERE nsp.nspname = 'pg_catalog') AND -- ignore unused triggers (pg_proc.prorettype <> (SELECT typ.oid FROM pg_type typ WHERE typ.typname = 'trigger') OR pg_trigger.tgfoid IS NOT NULL) OFFSET 0 ) ss ORDER BY (pcf).functionid::regprocedure::text, (pcf).lineno  Passive mode Functions should be checked on start - plpgsql_check module must be loaded. ## Configuration plpgsql_check.mode = [ disabled | by_function | fresh_start | every_start ] plpgsql_check.fatal_errors = [ yes | no ] plpgsql_check.show_nonperformance_warnings = false plpgsql_check.show_performance_warnings = false  Default mode is by_function, that means that the enhanced check is done only in active mode - by plpgsql_check_function. fresh_start means cold start. You can enable passive mode by load 'plpgsql'; -- 1.1 and higher doesn't need it load 'plpgsql_check'; set plpgsql_check.mode = 'every_start'; SELECT fx(10); -- run functions - function is checked before runtime starts it  Limits plpgsql_check should find almost all errors on really static code. When developer use some PLpgSQL's dynamic features like dynamic SQL or record data type, then false positives are possible. These should be rare - in well written code - and then the affected function should be redesigned or plpgsql_check should be disabled for this function. CREATE OR REPLACE FUNCTION f1() RETURNS void AS $$DECLARE r record; BEGIN FOR r IN EXECUTE 'SELECT * FROM t1' LOOP RAISE NOTICE '%', r.c; END LOOP; END;$$ LANGUAGE plpgsql SET plpgsql.enable_check TO false;  A usage of plpgsql_check adds a small overhead (in enabled passive mode) and you should use it only in develop or preprod environments. ## Dynamic SQL This module doesn't check queries that are assembled in runtime. It is not possible to identify results of dynamic queries - so plpgsql_check cannot to set correct type to record variables and cannot to check a dependent SQLs and expressions. When type of record's variable is not know, you can assign it explicitly with pragma type: DECLARE r record; BEGIN EXECUTE format('SELECT * FROM %I', _tablename) INTO r; PERFORM plpgsql_check_pragma('type: r (id int, processed bool)'); IF NOT r.processed THEN ...  Attention: The SQL injection check can detect only some SQL injection vulnerabilities. This tool cannot be used for security audit! Some issues should not be detected. This check can raise false alarms too - probably when variable is sanitized by other command or when value is of some compose type. ## Refcursors plpgsql_check should not to detect structure of referenced cursors. A reference on cursor in PLpgSQL is implemented as name of global cursor. In check time, the name is not known (not in all possibilities), and global cursor doesn't exist. It is significant break for any static analyse. PLpgSQL cannot to set correct type for record variables and cannot to check a dependent SQLs and expressions. A solution is same like dynamic SQL. Don't use record variable as target when you use refcursor type or disable plpgsql_check for these functions. CREATE OR REPLACE FUNCTION foo(refcur_var refcursor) RETURNS void AS $$DECLARE rec_var record; BEGIN FETCH refcur_var INTO rec_var; -- this is STOP for plpgsql_check RAISE NOTICE '%', rec_var; -- record rec_var is not assigned yet error  In this case a record type should not be used (use known rowtype instead): CREATE OR REPLACE FUNCTION foo(refcur_var refcursor) RETURNS void AS$$ DECLARE rec_var some_rowtype; BEGIN FETCH refcur_var INTO rec_var; RAISE NOTICE '%', rec_var;  ## Temporary tables plpgsql_check cannot verify queries over temporary tables that are created in plpgsql's function runtime. For this use case it is necessary to create a fake temp table or disable plpgsql_check for this function. In reality temp tables are stored in own (per user) schema with higher priority than persistent tables. So you can do (with following trick safetly): CREATE OR REPLACE FUNCTION public.disable_dml() RETURNS trigger LANGUAGE plpgsql AS$function$BEGIN RAISE EXCEPTION SQLSTATE '42P01' USING message = format('this instance of %I table doesn''t allow any DML operation', TG_TABLE_NAME), hint = format('you should to run "CREATE TEMP TABLE %1$I(LIKE %1$I INCLUDING ALL);" statement', TG_TABLE_NAME); RETURN NULL; END;$function$; CREATE TABLE foo(a int, b int); -- doesn't hold data ever CREATE TRIGGER foo_disable_dml BEFORE INSERT OR UPDATE OR DELETE ON foo EXECUTE PROCEDURE disable_dml(); postgres=# INSERT INTO foo VALUES(10,20); ERROR: this instance of foo table doesn't allow any DML operation HINT: you should to run "CREATE TEMP TABLE foo(LIKE foo INCLUDING ALL);" statement postgres=# CREATE TABLE postgres=# INSERT INTO foo VALUES(10,20); INSERT 0 1  This trick emulates GLOBAL TEMP tables partially and it allows a statical validation. Other possibility is using a [template foreign data wrapper] (https://github.com/okbob/template_fdw) You can use pragma table and create ephemeral table: BEGIN CREATE TEMP TABLE xxx(a int); PERFORM plpgsql_check_pragma('table: xxx(a int)'); INSERT INTO xxx VALUES(10);  Dependency list A function plpgsql_show_dependency_tb can show all functions, operators and relations used inside processed function: postgres=# select * from plpgsql_show_dependency_tb('testfunc(int,float)'); ┌──────────┬───────┬────────┬─────────┬────────────────────────────┐ │ type │ oid │ schema │ name │ params │ ╞══════════╪═══════╪════════╪═════════╪════════════════════════════╡ │ FUNCTION │ 36008 │ public │ myfunc1 │ (integer,double precision) │ │ FUNCTION │ 35999 │ public │ myfunc2 │ (integer,double precision) │ │ OPERATOR │ 36007 │ public │ ** │ (integer,integer) │ │ RELATION │ 36005 │ public │ myview │ │ │ RELATION │ 36002 │ public │ mytable │ │ └──────────┴───────┴────────┴─────────┴────────────────────────────┘ (4 rows)  Profiler The plpgsql_check contains simple profiler of plpgsql functions and procedures. It can work with/without a access to shared memory. It depends on shared_preload_libraries config. When plpgsql_check was initialized by shared_preload_libraries, then it can allocate shared memory, and function's profiles are stored there. When plpgsql_check cannot to allocate shared momory, the profile is stored in session memory. Due dependencies, shared_preload_libraries should to contains plpgsql first postgres=# show shared_preload_libraries ; ┌──────────────────────────┐ │ shared_preload_libraries │ ╞══════════════════════════╡ │ plpgsql,plpgsql_check │ └──────────────────────────┘ (1 row)  The profiler is active when GUC plpgsql_check.profiler is on. The profiler doesn't require shared memory, but if there are not shared memory, then the profile is limmitted just to active session. When plpgsql_check is initialized by shared_preload_libraries, another GUC is available to configure the amount of shared memory used by the profiler: plpgsql_check.profiler_max_shared_chunks. This defines the maximum number of statements chunk that can be stored in shared memory. For each plpgsql function (or procedure), the whole content is split into chunks of 30 statements. If needed, multiple chunks can be used to store the whole content of a single function. A single chunk is 1704 bytes. The default value for this GUC is 15000, which should be enough for big projects containing hundred of thousands of statements in plpgsql, and will consume about 24MB of memory. If your project doesn't require that much number of chunks, you can set this parameter to a smaller number in order to decrease the memory usage. The minimum value is 50 (which should consume about 83kB of memory), and the maximum value is 100000 (which should consume about 163MB of memory). Changing this parameter requires a PostgreSQL restart. The profiler will also retrieve the query identifier for each instruction that contains an expression or optimizable statement. Note that this requires pg_stat_statements, or another similar third-party extension), to be installed. There are some limitations to the query identifier retrieval: • if a plpgsql expression contains underlying statements, only the top level query identifier will be retrieved • the profiler doesn't compute query identifier by itself but relies on external extension, such as pg_stat_statements, for that. It means that depending on the external extension behavior, you may not be able to see a query identifier for some statements. That's for instance the case with DDL statements, as pg_stat_statements doesn't expose the query identifier for such queries. • a query identifier is retrieved only for instructions containing expressions. This means that plpgsql_profiler_function_tb() function can report less query identifier than instructions on a single line. Attention: A update of shared profiles can decrease performance on servers under higher load. The profile can be displayed by function plpgsql_profiler_function_tb: postgres=# select lineno, avg_time, source from plpgsql_profiler_function_tb('fx(int)'); ┌────────┬──────────┬───────────────────────────────────────────────────────────────────┐ │ lineno │ avg_time │ source │ ╞════════╪══════════╪═══════════════════════════════════════════════════════════════════╡ │ 1 │ │ │ │ 2 │ │ declare result int = 0; │ │ 3 │ 0.075 │ begin │ │ 4 │ 0.202 │ for i in 1..$1 loop                                             │
│      5 │    0.005 │     select result + i into result; select result + i into result; │
│      6 │          │   end loop;                                                       │
│      7 │        0 │   return result;                                                  │
│      8 │          │ end;                                                              │
└────────┴──────────┴───────────────────────────────────────────────────────────────────┘
(9 rows)


The profile per statements (not per line) can be displayed by function plpgsql_profiler_function_statements_tb:

        CREATE OR REPLACE FUNCTION public.fx1(a integer)
RETURNS integer
LANGUAGE plpgsql
1       AS $function$
2       begin
3         if a > 10 then
4           raise notice 'ahoj';
5           return -1;
6         else
7           raise notice 'nazdar';
8           return 1;
9         end if;
10      end;
11      $function$

postgres=# select stmtid, parent_stmtid, parent_note, lineno, exec_stmts, stmtname
from plpgsql_profiler_function_statements_tb('fx1');
┌────────┬───────────────┬─────────────┬────────┬────────────┬─────────────────┐
│ stmtid │ parent_stmtid │ parent_note │ lineno │ exec_stmts │    stmtname     │
╞════════╪═══════════════╪═════════════╪════════╪════════════╪═════════════════╡
│      0 │             ∅ │ ∅           │      2 │          0 │ statement block │
│      1 │             0 │ body        │      3 │          0 │ IF              │
│      2 │             1 │ then body   │      4 │          0 │ RAISE           │
│      3 │             1 │ then body   │      5 │          0 │ RETURN          │
│      4 │             1 │ else body   │      7 │          0 │ RAISE           │
│      5 │             1 │ else body   │      8 │          0 │ RETURN          │
└────────┴───────────────┴─────────────┴────────┴────────────┴─────────────────┘
(6 rows)


All stored profiles can be displayed by calling function plpgsql_profiler_functions_all:

postgres=# select * from plpgsql_profiler_functions_all();
┌───────────────────────┬────────────┬────────────┬──────────┬─────────────┬──────────┬──────────┐
│        funcoid        │ exec_count │ total_time │ avg_time │ stddev_time │ min_time │ max_time │
╞═══════════════════════╪════════════╪════════════╪══════════╪═════════════╪══════════╪══════════╡
│ fxx(double precision) │          1 │       0.01 │     0.01 │        0.00 │     0.01 │     0.01 │
└───────────────────────┴────────────┴────────────┴──────────┴─────────────┴──────────┴──────────┘
(1 row)


There are two functions for cleaning stored profiles: plpgsql_profiler_reset_all() and plpgsql_profiler_reset(regprocedure).

## Coverage metrics

plpgsql_check provides two functions:

• plpgsql_coverage_statements(name)
• plpgsql_coverage_branches(name)

## Note

There is another very good PLpgSQL profiler - https://bitbucket.org/openscg/plprofiler

My extension is designed to be simple for use and practical. Nothing more or less.

plprofiler is more complex. It build call graphs and from this graph it can creates flame graph of execution times.

Both extensions can be used together with buildin PostgreSQL's feature - tracking functions.

set track_functions to 'pl';
...
select * from pg_stat_user_functions;


Tracer

plpgsql_check provides a tracing possibility - in this mode you can see notices on start or end functions (terse and default verbosity) and start or end statements (verbose verbosity). For default and verbose verbosity the content of function arguments is displayed. The content of related variables are displayed when verbosity is verbose.

postgres=# do $$begin perform fx(10,null, 'now', e'stěhule'); end;$$;
NOTICE:  #0 ->> start of inline_code_block (Oid=0)
NOTICE:  #2   ->> start of function fx(integer,integer,date,text) (Oid=16405)
NOTICE:  #2        call by inline_code_block line 1 at PERFORM
NOTICE:  #2       "a" => '10', "b" => null, "c" => '2020-08-03', "d" => 'stěhule'
NOTICE:  #4     ->> start of function fx(integer) (Oid=16404)
NOTICE:  #4          call by fx(integer,integer,date,text) line 1 at PERFORM
NOTICE:  #4         "a" => '10'
NOTICE:  #4     <<- end of function fx (elapsed time=0.098 ms)
NOTICE:  #2   <<- end of function fx (elapsed time=0.399 ms)
NOTICE:  #0 <<- end of block (elapsed time=0.754 ms)


The number after # is a execution frame counter (this number is related to deep of error context stack). It allows to pair start end and of function.

Tracing is enabled by setting plpgsql_check.tracer to on. Attention - enabling this behaviour has significant negative impact on performance (unlike the profiler). You can set a level for output used by tracer plpgsql_check.tracer_errlevel (default is notice). The output content is limited by length specified by plpgsql_check.tracer_variable_max_length configuration variable.

In terse verbose mode the output is reduced:

postgres=# set plpgsql_check.tracer_verbosity TO terse;
SET
postgres=# do $$begin perform fx(10,null, 'now', e'stěhule'); end;$$;
NOTICE:  #0 start of inline code block (oid=0)
NOTICE:  #2 start of fx (oid=16405)
NOTICE:  #4 start of fx (oid=16404)
NOTICE:  #4 end of fx
NOTICE:  #2 end of fx
NOTICE:  #0 end of inline code block


In verbose mode the output is extended about statement details:

postgres=# do $$begin perform fx(10,null, 'now', e'stěhule'); end;$$;
NOTICE:  #0            ->> start of block inline_code_block (oid=0)
NOTICE:  #0.1       1  --> start of PERFORM
NOTICE:  #2              ->> start of function fx(integer,integer,date,text) (oid=16405)
NOTICE:  #2                   call by inline_code_block line 1 at PERFORM
NOTICE:  #2                  "a" => '10', "b" => null, "c" => '2020-08-04', "d" => 'stěhule'
NOTICE:  #2.1       1    --> start of PERFORM
NOTICE:  #2.1                "a" => '10'
NOTICE:  #4                ->> start of function fx(integer) (oid=16404)
NOTICE:  #4                     call by fx(integer,integer,date,text) line 1 at PERFORM
NOTICE:  #4                    "a" => '10'
NOTICE:  #4.1       6      --> start of assignment
NOTICE:  #4.1                  "a" => '10', "b" => '20'
NOTICE:  #4.1              <-- end of assignment (elapsed time=0.076 ms)
NOTICE:  #4.1                  "res" => '130'
NOTICE:  #4.2       7      --> start of RETURN
NOTICE:  #4.2                  "res" => '130'
NOTICE:  #4.2              <-- end of RETURN (elapsed time=0.054 ms)
NOTICE:  #4                <<- end of function fx (elapsed time=0.373 ms)
NOTICE:  #2.1            <-- end of PERFORM (elapsed time=0.589 ms)
NOTICE:  #2              <<- end of function fx (elapsed time=0.727 ms)
NOTICE:  #0.1          <-- end of PERFORM (elapsed time=1.147 ms)
NOTICE:  #0            <<- end of block (elapsed time=1.286 ms)


Special feature of tracer is tracing of ASSERT statement when plpgsql_check.trace_assert is on. When plpgsql_check.trace_assert_verbosity is DEFAULT, then all function's or procedure's variables are displayed when assert expression is false. When this configuration is VERBOSE then all variables from all plpgsql frames are displayed. This behaviour is independent on plpgsql.check_asserts value. It can be used, although the assertions are disabled in plpgsql runtime.

postgres=# set plpgsql_check.tracer to off;
postgres=# set plpgsql_check.trace_assert_verbosity TO verbose;

postgres=# do $$begin perform fx(10,null, 'now', e'stěhule'); end;$$;
NOTICE:  #4 PLpgSQL assert expression (false) on line 12 of fx(integer) is false
NOTICE:   "a" => '10', "res" => null, "b" => '20'
NOTICE:  #2 PL/pgSQL function fx(integer,integer,date,text) line 1 at PERFORM
NOTICE:   "a" => '10', "b" => null, "c" => '2020-08-05', "d" => 'stěhule'
NOTICE:  #0 PL/pgSQL function inline_code_block line 1 at PERFORM
ERROR:  assertion failed
CONTEXT:  PL/pgSQL function fx(integer) line 12 at ASSERT
SQL statement "SELECT fx(a)"
PL/pgSQL function fx(integer,integer,date,text) line 1 at PERFORM
SQL statement "SELECT fx(10,null, 'now', e'stěhule')"
PL/pgSQL function inline_code_block line 1 at PERFORM

postgres=# set plpgsql.check_asserts to off;
SET
postgres=# do $$begin perform fx(10,null, 'now', e'stěhule'); end;$$;
NOTICE:  #4 PLpgSQL assert expression (false) on line 12 of fx(integer) is false
NOTICE:   "a" => '10', "res" => null, "b" => '20'
NOTICE:  #2 PL/pgSQL function fx(integer,integer,date,text) line 1 at PERFORM
NOTICE:   "a" => '10', "b" => null, "c" => '2020-08-05', "d" => 'stěhule'
NOTICE:  #0 PL/pgSQL function inline_code_block line 1 at PERFORM
DO


## Attention - SECURITY

Tracer prints content of variables or function arguments. For security definer function, this content can hold security sensitive data. This is reason why tracer is disabled by default and should be enabled only with super user rights plpgsql_check.enable_tracer.

Pragma

You can configure plpgsql_check behave inside checked function with "pragma" function. This is a analogy of PL/SQL or ADA language of PRAGMA feature. PLpgSQL doesn't support PRAGMA, but plpgsql_check detects function named plpgsql_check_pragma and get options from parameters of this function. These plpgsql_check options are valid to end of group of statements.

CREATE OR REPLACE FUNCTION test()
RETURNS void AS $$BEGIN ... -- for following statements disable check PERFORM plpgsql_check_pragma('disable:check'); ... -- enable check again PERFORM plpgsql_check_pragma('enable:check'); ... END;$$ LANGUAGE plpgsql;


The function plpgsql_check_pragma is immutable function that returns one. It is defined by plpgsql_check extension. You can declare alternative plpgsql_check_pragma function like:

CREATE OR REPLACE FUNCTION plpgsql_check_pragma(VARIADIC args[])
RETURNS int AS $$SELECT 1$$ LANGUAGE sql IMMUTABLE;


Using pragma function in declaration part of top block sets options on function level too.

CREATE OR REPLACE FUNCTION test()
RETURNS void AS $$DECLARE aux int := plpgsql_check_pragma('disable:extra_warnings'); ...  Shorter syntax for pragma is supported too: CREATE OR REPLACE FUNCTION test() RETURNS void AS$$
DECLARE r record;
BEGIN
PERFORM 'PRAGMA:TYPE:r (a int, b int)';
PERFORM 'PRAGMA:TABLE: x (like pg_class)';
...


## Supported pragmas

echo:str - print string (for testing)

status:check,status:tracer, status:other_warnings, status:performance_warnings, status:extra_warnings,status:security_warnings

enable:check,enable:tracer, enable:other_warnings, enable:performance_warnings, enable:extra_warnings,enable:security_warnings

disable:check,disable:tracer, disable:other_warnings, disable:performance_warnings, disable:extra_warnings,disable:security_warnings

type:varname typename or type:varname (fieldname type, ...) - set type to variable of record type

table: name (column_name type, ...) or table: name (like tablename) - create ephereal table

Pragmas enable:tracer and disable:tracerare active for Postgres 12 and higher

Compilation

You need a development environment for PostgreSQL extensions:

make clean
make install


result:

[pavel@localhost plpgsql_check]$make USE_PGXS=1 clean rm -f plpgsql_check.so libplpgsql_check.a libplpgsql_check.pc rm -f plpgsql_check.o rm -rf results/ regression.diffs regression.out tmp_check/ log/ [pavel@localhost plpgsql_check]$ make USE_PGXS=1 all
clang -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fpic -I/usr/local/pgsql/lib/pgxs/src/makefiles/../../src/pl/plpgsql/src -I. -I./ -I/usr/local/pgsql/include/server -I/usr/local/pgsql/include/internal -D_GNU_SOURCE   -c -o plpgsql_check.o plpgsql_check.c
clang -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fpic -I/usr/local/pgsql/lib/pgxs/src/makefiles/../../src/pl/plpgsql/src -shared -o plpgsql_check.so plpgsql_check.o -L/usr/local/pgsql/lib -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags
[pavel@localhost plpgsql_check]$su root Password: ******* [root@localhost plpgsql_check]# make USE_PGXS=1 install /usr/bin/mkdir -p '/usr/local/pgsql/lib' /usr/bin/mkdir -p '/usr/local/pgsql/share/extension' /usr/bin/mkdir -p '/usr/local/pgsql/share/extension' /usr/bin/install -c -m 755 plpgsql_check.so '/usr/local/pgsql/lib/plpgsql_check.so' /usr/bin/install -c -m 644 plpgsql_check.control '/usr/local/pgsql/share/extension/' /usr/bin/install -c -m 644 plpgsql_check--0.9.sql '/usr/local/pgsql/share/extension/' [root@localhost plpgsql_check]# exit [pavel@localhost plpgsql_check]$ make USE_PGXS=1 installcheck
/usr/local/pgsql/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress --inputdir=./ --psqldir='/usr/local/pgsql/bin'    --dbname=pl_regression --load-language=plpgsql --dbname=contrib_regression plpgsql_check_passive plpgsql_check_active plpgsql_check_active-9.5
(using postmaster on Unix socket, default port)
============== dropping database "contrib_regression" ==============
DROP DATABASE
============== creating database "contrib_regression" ==============
CREATE DATABASE
ALTER DATABASE
============== installing plpgsql                     ==============
CREATE LANGUAGE
============== running regression test queries        ==============
test plpgsql_check_passive    ... ok
test plpgsql_check_active     ... ok
test plpgsql_check_active-9.5 ... ok

=====================
All 3 tests passed.
=====================


## Compilation on Ubuntu

Sometimes successful compilation can require libicu-dev package (PostgreSQL 10 and higher - when pg was compiled with ICU support)

sudo apt install libicu-dev


## Compilation plpgsql_check on Windows

You can check precompiled dll libraries http://okbob.blogspot.cz/2015/02/plpgsqlcheck-is-available-for-microsoft.html

or compile by self:

4. Build plpgsql_check.dll
5. Install plugin
6. copy plpgsql_check.dll to PostgreSQL\14\lib
7. copy plpgsql_check.control and plpgsql_check--2.1.sql to PostgreSQL\14\share\extension

## Checked on

• gcc on Linux (against all supported PostgreSQL)
• clang 3.4 on Linux (against PostgreSQL 10)
• for success regress tests the PostgreSQL 10 or higher is required

Compilation against PostgreSQL 10 requires libICU!

Licence

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Note

If you like it, send a postcard to address

Pavel Stehule
Skalice 12
256 01 Benesov u Prahy
Czech Republic


I invite any questions, comments, bug reports, patches on mail address pavel.stehule@gmail.com

Author: okbob
Source Code: https://github.com/okbob/plpgsql_check

1648803600

## plpgsql_check

I founded this project, because I wanted to publish the code I wrote in the last two years, when I tried to write enhanced checking for PostgreSQL upstream. It was not fully successful - integration into upstream requires some larger plpgsql refactoring - probably it will not be done in next years (now is Dec 2013). But written code is fully functional and can be used in production (and it is used in production). So, I created this extension to be available for all plpgsql developers.

If you like it and if you would to join to development of this extension, register yourself to postgresql extension hacking google group.

Features

• check fields of referenced database objects and types inside embedded SQL
• using correct types of function parameters
• unused variables and function argumens, unmodified OUT argumens
• partially detection of dead code (due RETURN command)
• detection of missing RETURN command in function
• try to identify unwanted hidden casts, that can be performance issue like unused indexes
• possibility to collect relations and functions used by function
• possibility to check EXECUTE stmt agaist SQL injection vulnerability

I invite any ideas, patches, bugreports.

plpgsql_check is next generation of plpgsql_lint. It allows to check source code by explicit call plpgsql_check_function.

PostgreSQL PostgreSQL 10, 11, 12, 13 and 14 are supported.

The SQL statements inside PL/pgSQL functions are checked by validator for semantic errors. These errors can be found by plpgsql_check_function:

Active mode

postgres=# CREATE EXTENSION plpgsql_check;
postgres=# CREATE TABLE t1(a int, b int);
CREATE TABLE

postgres=#
CREATE OR REPLACE FUNCTION public.f1()
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE r record;
BEGIN
FOR r IN SELECT * FROM t1
LOOP
RAISE NOTICE '%', r.c; -- there is bug - table t1 missing "c" column
END LOOP;
END;
$function$;

CREATE FUNCTION

postgres=# select f1(); -- execution doesn't find a bug due to empty table t1
f1
────

(1 row)

postgres=# \x
Expanded display is on.
postgres=# select * from plpgsql_check_function_tb('f1()');
─[ RECORD 1 ]───────────────────────────
functionid │ f1
lineno     │ 6
statement  │ RAISE
sqlstate   │ 42703
message    │ record "r" has no field "c"
detail     │ [null]
hint       │ [null]
level      │ error
position   │ 0
query      │ [null]

postgres=# \sf+ f1
CREATE OR REPLACE FUNCTION public.f1()
RETURNS void
LANGUAGE plpgsql
1       AS $function$
2       DECLARE r record;
3       BEGIN
4         FOR r IN SELECT * FROM t1
5         LOOP
6           RAISE NOTICE '%', r.c; -- there is bug - table t1 missing "c" column
7         END LOOP;
8       END;
9       $function$


Function plpgsql_check_function() has three possible formats: text, json or xml

select * from plpgsql_check_function('f1()', fatal_errors := false);
plpgsql_check_function
------------------------------------------------------------------------
error:42703:4:SQL statement:column "c" of relation "t1" does not exist
Query: update t1 set c = 30
--                   ^
error:42P01:7:RAISE:missing FROM-clause entry for table "r"
Query: SELECT r.c
--            ^
error:42601:7:RAISE:too few parameters specified for RAISE
(7 rows)

postgres=# select * from plpgsql_check_function('fx()', format:='xml');
plpgsql_check_function
────────────────────────────────────────────────────────────────
<Function oid="16400">                                        ↵
<Issue>                                                     ↵
<Level>error</level>                                      ↵
<Sqlstate>42P01</Sqlstate>                                ↵
<Message>relation "foo111" does not exist</Message>       ↵
<Stmt lineno="3">RETURN</Stmt>                            ↵
<Query position="23">SELECT (select a from foo111)</Query>↵
</Issue>                                                    ↵
</Function>
(1 row)


## Arguments

You can set level of warnings via function's parameters:

### Mandatory arguments

• function name or function signature - these functions requires function specification. Any function in PostgreSQL can be specified by Oid or by name or by signature. When you know oid or complete function's signature, you can use a regprocedure type parameter like 'fx()'::regprocedure or 16799::regprocedure. Possible alternative is using a name only, when function's name is unique - like 'fx'. When the name is not unique or the function doesn't exists it raises a error.

### Optional arguments

relid DEFAULT 0 - oid of relation assigned with trigger function. It is necessary for check of any trigger function.

fatal_errors boolean DEFAULT true - stop on first error

other_warnings boolean DEFAULT true - show warnings like different attributes number in assignmenet on left and right side, variable overlaps function's parameter, unused variables, unwanted casting, ..

extra_warnings boolean DEFAULT true - show warnings like missing RETURN, shadowed variables, dead code, never read (unused) function's parameter, unmodified variables, modified auto variables, ..

performance_warnings boolean DEFAULT false - performance related warnings like declared type with type modificator, casting, implicit casts in where clause (can be reason why index is not used), ..

security_warnings boolean DEFAULT false - security related checks like SQL injection vulnerability detection

anyelementtype regtype DEFAULT 'int' - a real type used instead anyelement type

anyenumtype regtype DEFAULT '-' - a real type used instead anyenum type

anyrangetype regtype DEFAULT 'int4range' - a real type used instead anyrange type

anycompatibletype DEFAULT 'int' - a real type used instead anycompatible type

anycompatiblerangetype DEFAULT 'int4range' - a real type used instead anycompatible range type

without_warnings DEFAULT false - disable all warnings

all_warnings DEFAULT false - enable all warnings

newtable DEFAULT NULL, oldtable DEFAULT NULL - the names of NEW or OLD transitive tables. These parameters are required when transitive tables are used.

## Triggers

When you want to check any trigger, you have to enter a relation that will be used together with trigger function

CREATE TABLE bar(a int, b int);

postgres=# \sf+ foo_trg
CREATE OR REPLACE FUNCTION public.foo_trg()
RETURNS trigger
LANGUAGE plpgsql
1       AS $function$
2       BEGIN
3         NEW.c := NEW.a + NEW.b;
4         RETURN NEW;
5       END;
6       $function$


Missing relation specification

postgres=# select * from plpgsql_check_function('foo_trg()');
ERROR:  missing trigger relation
HINT:  Trigger relation oid must be valid


Correct trigger checking (with specified relation)

postgres=# select * from plpgsql_check_function('foo_trg()', 'bar');
plpgsql_check_function
--------------------------------------------------------
error:42703:3:assignment:record "new" has no field "c"
(1 row)


For triggers with transitive tables you can set a oldtable or newtable parameters:

create or replace function footab_trig_func()
returns trigger as $$declare x int; begin if false then -- should be ok; select count(*) from newtab into x; -- should fail; select count(*) from newtab where d = 10 into x; end if; return null; end;$$ language plpgsql;

select * from plpgsql_check_function('footab_trig_func','footab', newtable := 'newtab');


## Mass check

You can use the plpgsql_check_function for mass check functions and mass check triggers. Please, test following queries:

-- check all nontrigger plpgsql functions
SELECT p.oid, p.proname, plpgsql_check_function(p.oid)
FROM pg_catalog.pg_namespace n
JOIN pg_catalog.pg_proc p ON pronamespace = n.oid
JOIN pg_catalog.pg_language l ON p.prolang = l.oid
WHERE l.lanname = 'plpgsql' AND p.prorettype <> 2279;


or

SELECT p.proname, tgrelid::regclass, cf.*
FROM pg_proc p
JOIN pg_trigger t ON t.tgfoid = p.oid
JOIN pg_language l ON p.prolang = l.oid
JOIN pg_namespace n ON p.pronamespace = n.oid,
LATERAL plpgsql_check_function(p.oid, t.tgrelid) cf
WHERE n.nspname = 'public' and l.lanname = 'plpgsql'


or

-- check all plpgsql functions (functions or trigger functions with defined triggers)
SELECT
(pcf).functionid::regprocedure, (pcf).lineno, (pcf).statement,
(pcf).sqlstate, (pcf).message, (pcf).detail, (pcf).hint, (pcf).level,
(pcf)."position", (pcf).query, (pcf).context
FROM
(
SELECT
plpgsql_check_function_tb(pg_proc.oid, COALESCE(pg_trigger.tgrelid, 0)) AS pcf
FROM pg_proc
LEFT JOIN pg_trigger
ON (pg_trigger.tgfoid = pg_proc.oid)
WHERE
prolang = (SELECT lang.oid FROM pg_language lang WHERE lang.lanname = 'plpgsql') AND
pronamespace <> (SELECT nsp.oid FROM pg_namespace nsp WHERE nsp.nspname = 'pg_catalog') AND
-- ignore unused triggers
(pg_proc.prorettype <> (SELECT typ.oid FROM pg_type typ WHERE typ.typname = 'trigger') OR
pg_trigger.tgfoid IS NOT NULL)
OFFSET 0
) ss
ORDER BY (pcf).functionid::regprocedure::text, (pcf).lineno


Passive mode

Functions should be checked on start - plpgsql_check module must be loaded.

## Configuration

plpgsql_check.mode = [ disabled | by_function | fresh_start | every_start ]
plpgsql_check.fatal_errors = [ yes | no ]

plpgsql_check.show_nonperformance_warnings = false
plpgsql_check.show_performance_warnings = false


Default mode is by_function, that means that the enhanced check is done only in active mode - by plpgsql_check_function. fresh_start means cold start.

You can enable passive mode by

load 'plpgsql'; -- 1.1 and higher doesn't need it
set plpgsql_check.mode = 'every_start';

SELECT fx(10); -- run functions - function is checked before runtime starts it


Limits

plpgsql_check should find almost all errors on really static code. When developer use some PLpgSQL's dynamic features like dynamic SQL or record data type, then false positives are possible. These should be rare - in well written code - and then the affected function should be redesigned or plpgsql_check should be disabled for this function.

CREATE OR REPLACE FUNCTION f1()
RETURNS void AS $$DECLARE r record; BEGIN FOR r IN EXECUTE 'SELECT * FROM t1' LOOP RAISE NOTICE '%', r.c; END LOOP; END;$$ LANGUAGE plpgsql SET plpgsql.enable_check TO false;


A usage of plpgsql_check adds a small overhead (in enabled passive mode) and you should use it only in develop or preprod environments.

## Dynamic SQL

This module doesn't check queries that are assembled in runtime. It is not possible to identify results of dynamic queries - so plpgsql_check cannot to set correct type to record variables and cannot to check a dependent SQLs and expressions.

When type of record's variable is not know, you can assign it explicitly with pragma type:

DECLARE r record;
BEGIN
EXECUTE format('SELECT * FROM %I', _tablename) INTO r;
PERFORM plpgsql_check_pragma('type: r (id int, processed bool)');
IF NOT r.processed THEN
...


Attention: The SQL injection check can detect only some SQL injection vulnerabilities. This tool cannot be used for security audit! Some issues should not be detected. This check can raise false alarms too - probably when variable is sanitized by other command or when value is of some compose type.

## Refcursors

plpgsql_check should not to detect structure of referenced cursors. A reference on cursor in PLpgSQL is implemented as name of global cursor. In check time, the name is not known (not in all possibilities), and global cursor doesn't exist. It is significant break for any static analyse. PLpgSQL cannot to set correct type for record variables and cannot to check a dependent SQLs and expressions. A solution is same like dynamic SQL. Don't use record variable as target when you use refcursor type or disable plpgsql_check for these functions.

CREATE OR REPLACE FUNCTION foo(refcur_var refcursor)
RETURNS void AS $$DECLARE rec_var record; BEGIN FETCH refcur_var INTO rec_var; -- this is STOP for plpgsql_check RAISE NOTICE '%', rec_var; -- record rec_var is not assigned yet error  In this case a record type should not be used (use known rowtype instead): CREATE OR REPLACE FUNCTION foo(refcur_var refcursor) RETURNS void AS$$
DECLARE
rec_var some_rowtype;
BEGIN
FETCH refcur_var INTO rec_var;
RAISE NOTICE '%', rec_var;


## Temporary tables

plpgsql_check cannot verify queries over temporary tables that are created in plpgsql's function runtime. For this use case it is necessary to create a fake temp table or disable plpgsql_check for this function.

In reality temp tables are stored in own (per user) schema with higher priority than persistent tables. So you can do (with following trick safetly):

CREATE OR REPLACE FUNCTION public.disable_dml()
RETURNS trigger
LANGUAGE plpgsql AS $function$
BEGIN
RAISE EXCEPTION SQLSTATE '42P01'
USING message = format('this instance of %I table doesn''t allow any DML operation', TG_TABLE_NAME),
hint = format('you should to run "CREATE TEMP TABLE %1$I(LIKE %1$I INCLUDING ALL);" statement',
TG_TABLE_NAME);
RETURN NULL;
END;
$function$;

CREATE TABLE foo(a int, b int); -- doesn't hold data ever
CREATE TRIGGER foo_disable_dml
BEFORE INSERT OR UPDATE OR DELETE ON foo
EXECUTE PROCEDURE disable_dml();

postgres=# INSERT INTO  foo VALUES(10,20);
ERROR:  this instance of foo table doesn't allow any DML operation
HINT:  you should to run "CREATE TEMP TABLE foo(LIKE foo INCLUDING ALL);" statement
postgres=#

CREATE TABLE
postgres=# INSERT INTO  foo VALUES(10,20);
INSERT 0 1


This trick emulates GLOBAL TEMP tables partially and it allows a statical validation. Other possibility is using a [template foreign data wrapper] (https://github.com/okbob/template_fdw)

You can use pragma table and create ephemeral table:

BEGIN
CREATE TEMP TABLE xxx(a int);
PERFORM plpgsql_check_pragma('table: xxx(a int)');
INSERT INTO xxx VALUES(10);


Dependency list

A function plpgsql_show_dependency_tb can show all functions, operators and relations used inside processed function:

postgres=# select * from plpgsql_show_dependency_tb('testfunc(int,float)');
┌──────────┬───────┬────────┬─────────┬────────────────────────────┐
│   type   │  oid  │ schema │  name   │           params           │
╞══════════╪═══════╪════════╪═════════╪════════════════════════════╡
│ FUNCTION │ 36008 │ public │ myfunc1 │ (integer,double precision) │
│ FUNCTION │ 35999 │ public │ myfunc2 │ (integer,double precision) │
│ OPERATOR │ 36007 │ public │ **      │ (integer,integer)          │
│ RELATION │ 36005 │ public │ myview  │                            │
│ RELATION │ 36002 │ public │ mytable │                            │
└──────────┴───────┴────────┴─────────┴────────────────────────────┘
(4 rows)


Profiler

The plpgsql_check contains simple profiler of plpgsql functions and procedures. It can work with/without a access to shared memory. It depends on shared_preload_libraries config. When plpgsql_check was initialized by shared_preload_libraries, then it can allocate shared memory, and function's profiles are stored there. When plpgsql_check cannot to allocate shared momory, the profile is stored in session memory.

Due dependencies, shared_preload_libraries should to contains plpgsql first

postgres=# show shared_preload_libraries ;
┌──────────────────────────┐
╞══════════════════════════╡
│ plpgsql,plpgsql_check    │
└──────────────────────────┘
(1 row)


The profiler is active when GUC plpgsql_check.profiler is on. The profiler doesn't require shared memory, but if there are not shared memory, then the profile is limmitted just to active session.

When plpgsql_check is initialized by shared_preload_libraries, another GUC is available to configure the amount of shared memory used by the profiler: plpgsql_check.profiler_max_shared_chunks. This defines the maximum number of statements chunk that can be stored in shared memory. For each plpgsql function (or procedure), the whole content is split into chunks of 30 statements. If needed, multiple chunks can be used to store the whole content of a single function. A single chunk is 1704 bytes. The default value for this GUC is 15000, which should be enough for big projects containing hundred of thousands of statements in plpgsql, and will consume about 24MB of memory. If your project doesn't require that much number of chunks, you can set this parameter to a smaller number in order to decrease the memory usage. The minimum value is 50 (which should consume about 83kB of memory), and the maximum value is 100000 (which should consume about 163MB of memory). Changing this parameter requires a PostgreSQL restart.

The profiler will also retrieve the query identifier for each instruction that contains an expression or optimizable statement. Note that this requires pg_stat_statements, or another similar third-party extension), to be installed. There are some limitations to the query identifier retrieval:

• if a plpgsql expression contains underlying statements, only the top level query identifier will be retrieved
• the profiler doesn't compute query identifier by itself but relies on external extension, such as pg_stat_statements, for that. It means that depending on the external extension behavior, you may not be able to see a query identifier for some statements. That's for instance the case with DDL statements, as pg_stat_statements doesn't expose the query identifier for such queries.
• a query identifier is retrieved only for instructions containing expressions. This means that plpgsql_profiler_function_tb() function can report less query identifier than instructions on a single line.

Attention: A update of shared profiles can decrease performance on servers under higher load.

The profile can be displayed by function plpgsql_profiler_function_tb:

postgres=# select lineno, avg_time, source from plpgsql_profiler_function_tb('fx(int)');
┌────────┬──────────┬───────────────────────────────────────────────────────────────────┐
│ lineno │ avg_time │                              source                               │
╞════════╪══════════╪═══════════════════════════════════════════════════════════════════╡
│      1 │          │                                                                   │
│      2 │          │ declare result int = 0;                                           │
│      3 │    0.075 │ begin                                                             │
│      4 │    0.202 │   for i in 1..$1 loop │ │ 5 │ 0.005 │ select result + i into result; select result + i into result; │ │ 6 │ │ end loop; │ │ 7 │ 0 │ return result; │ │ 8 │ │ end; │ └────────┴──────────┴───────────────────────────────────────────────────────────────────┘ (9 rows)  The profile per statements (not per line) can be displayed by function plpgsql_profiler_function_statements_tb:  CREATE OR REPLACE FUNCTION public.fx1(a integer) RETURNS integer LANGUAGE plpgsql 1 AS$function$2 begin 3 if a > 10 then 4 raise notice 'ahoj'; 5 return -1; 6 else 7 raise notice 'nazdar'; 8 return 1; 9 end if; 10 end; 11$function$postgres=# select stmtid, parent_stmtid, parent_note, lineno, exec_stmts, stmtname from plpgsql_profiler_function_statements_tb('fx1'); ┌────────┬───────────────┬─────────────┬────────┬────────────┬─────────────────┐ │ stmtid │ parent_stmtid │ parent_note │ lineno │ exec_stmts │ stmtname │ ╞════════╪═══════════════╪═════════════╪════════╪════════════╪═════════════════╡ │ 0 │ ∅ │ ∅ │ 2 │ 0 │ statement block │ │ 1 │ 0 │ body │ 3 │ 0 │ IF │ │ 2 │ 1 │ then body │ 4 │ 0 │ RAISE │ │ 3 │ 1 │ then body │ 5 │ 0 │ RETURN │ │ 4 │ 1 │ else body │ 7 │ 0 │ RAISE │ │ 5 │ 1 │ else body │ 8 │ 0 │ RETURN │ └────────┴───────────────┴─────────────┴────────┴────────────┴─────────────────┘ (6 rows)  All stored profiles can be displayed by calling function plpgsql_profiler_functions_all: postgres=# select * from plpgsql_profiler_functions_all(); ┌───────────────────────┬────────────┬────────────┬──────────┬─────────────┬──────────┬──────────┐ │ funcoid │ exec_count │ total_time │ avg_time │ stddev_time │ min_time │ max_time │ ╞═══════════════════════╪════════════╪════════════╪══════════╪═════════════╪══════════╪══════════╡ │ fxx(double precision) │ 1 │ 0.01 │ 0.01 │ 0.00 │ 0.01 │ 0.01 │ └───────────────────────┴────────────┴────────────┴──────────┴─────────────┴──────────┴──────────┘ (1 row)  There are two functions for cleaning stored profiles: plpgsql_profiler_reset_all() and plpgsql_profiler_reset(regprocedure). ## Coverage metrics plpgsql_check provides two functions: • plpgsql_coverage_statements(name) • plpgsql_coverage_branches(name) ## Note There is another very good PLpgSQL profiler - https://bitbucket.org/openscg/plprofiler My extension is designed to be simple for use and practical. Nothing more or less. plprofiler is more complex. It build call graphs and from this graph it can creates flame graph of execution times. Both extensions can be used together with buildin PostgreSQL's feature - tracking functions. set track_functions to 'pl'; ... select * from pg_stat_user_functions;  Tracer plpgsql_check provides a tracing possibility - in this mode you can see notices on start or end functions (terse and default verbosity) and start or end statements (verbose verbosity). For default and verbose verbosity the content of function arguments is displayed. The content of related variables are displayed when verbosity is verbose. postgres=# do $$begin perform fx(10,null, 'now', e'stěhule'); end;$$; NOTICE: #0 ->> start of inline_code_block (Oid=0) NOTICE: #2 ->> start of function fx(integer,integer,date,text) (Oid=16405) NOTICE: #2 call by inline_code_block line 1 at PERFORM NOTICE: #2 "a" => '10', "b" => null, "c" => '2020-08-03', "d" => 'stěhule' NOTICE: #4 ->> start of function fx(integer) (Oid=16404) NOTICE: #4 call by fx(integer,integer,date,text) line 1 at PERFORM NOTICE: #4 "a" => '10' NOTICE: #4 <<- end of function fx (elapsed time=0.098 ms) NOTICE: #2 <<- end of function fx (elapsed time=0.399 ms) NOTICE: #0 <<- end of block (elapsed time=0.754 ms)  The number after # is a execution frame counter (this number is related to deep of error context stack). It allows to pair start end and of function. Tracing is enabled by setting plpgsql_check.tracer to on. Attention - enabling this behaviour has significant negative impact on performance (unlike the profiler). You can set a level for output used by tracer plpgsql_check.tracer_errlevel (default is notice). The output content is limited by length specified by plpgsql_check.tracer_variable_max_length configuration variable. In terse verbose mode the output is reduced: postgres=# set plpgsql_check.tracer_verbosity TO terse; SET postgres=# do $$begin perform fx(10,null, 'now', e'stěhule'); end;$$; NOTICE: #0 start of inline code block (oid=0) NOTICE: #2 start of fx (oid=16405) NOTICE: #4 start of fx (oid=16404) NOTICE: #4 end of fx NOTICE: #2 end of fx NOTICE: #0 end of inline code block  In verbose mode the output is extended about statement details: postgres=# do $$begin perform fx(10,null, 'now', e'stěhule'); end;$$; NOTICE: #0 ->> start of block inline_code_block (oid=0) NOTICE: #0.1 1 --> start of PERFORM NOTICE: #2 ->> start of function fx(integer,integer,date,text) (oid=16405) NOTICE: #2 call by inline_code_block line 1 at PERFORM NOTICE: #2 "a" => '10', "b" => null, "c" => '2020-08-04', "d" => 'stěhule' NOTICE: #2.1 1 --> start of PERFORM NOTICE: #2.1 "a" => '10' NOTICE: #4 ->> start of function fx(integer) (oid=16404) NOTICE: #4 call by fx(integer,integer,date,text) line 1 at PERFORM NOTICE: #4 "a" => '10' NOTICE: #4.1 6 --> start of assignment NOTICE: #4.1 "a" => '10', "b" => '20' NOTICE: #4.1 <-- end of assignment (elapsed time=0.076 ms) NOTICE: #4.1 "res" => '130' NOTICE: #4.2 7 --> start of RETURN NOTICE: #4.2 "res" => '130' NOTICE: #4.2 <-- end of RETURN (elapsed time=0.054 ms) NOTICE: #4 <<- end of function fx (elapsed time=0.373 ms) NOTICE: #2.1 <-- end of PERFORM (elapsed time=0.589 ms) NOTICE: #2 <<- end of function fx (elapsed time=0.727 ms) NOTICE: #0.1 <-- end of PERFORM (elapsed time=1.147 ms) NOTICE: #0 <<- end of block (elapsed time=1.286 ms)  Special feature of tracer is tracing of ASSERT statement when plpgsql_check.trace_assert is on. When plpgsql_check.trace_assert_verbosity is DEFAULT, then all function's or procedure's variables are displayed when assert expression is false. When this configuration is VERBOSE then all variables from all plpgsql frames are displayed. This behaviour is independent on plpgsql.check_asserts value. It can be used, although the assertions are disabled in plpgsql runtime. postgres=# set plpgsql_check.tracer to off; postgres=# set plpgsql_check.trace_assert_verbosity TO verbose; postgres=# do $$begin perform fx(10,null, 'now', e'stěhule'); end;$$; NOTICE: #4 PLpgSQL assert expression (false) on line 12 of fx(integer) is false NOTICE: "a" => '10', "res" => null, "b" => '20' NOTICE: #2 PL/pgSQL function fx(integer,integer,date,text) line 1 at PERFORM NOTICE: "a" => '10', "b" => null, "c" => '2020-08-05', "d" => 'stěhule' NOTICE: #0 PL/pgSQL function inline_code_block line 1 at PERFORM ERROR: assertion failed CONTEXT: PL/pgSQL function fx(integer) line 12 at ASSERT SQL statement "SELECT fx(a)" PL/pgSQL function fx(integer,integer,date,text) line 1 at PERFORM SQL statement "SELECT fx(10,null, 'now', e'stěhule')" PL/pgSQL function inline_code_block line 1 at PERFORM postgres=# set plpgsql.check_asserts to off; SET postgres=# do $$begin perform fx(10,null, 'now', e'stěhule'); end;$$; NOTICE: #4 PLpgSQL assert expression (false) on line 12 of fx(integer) is false NOTICE: "a" => '10', "res" => null, "b" => '20' NOTICE: #2 PL/pgSQL function fx(integer,integer,date,text) line 1 at PERFORM NOTICE: "a" => '10', "b" => null, "c" => '2020-08-05', "d" => 'stěhule' NOTICE: #0 PL/pgSQL function inline_code_block line 1 at PERFORM DO  ## Attention - SECURITY Tracer prints content of variables or function arguments. For security definer function, this content can hold security sensitive data. This is reason why tracer is disabled by default and should be enabled only with super user rights plpgsql_check.enable_tracer. Pragma You can configure plpgsql_check behave inside checked function with "pragma" function. This is a analogy of PL/SQL or ADA language of PRAGMA feature. PLpgSQL doesn't support PRAGMA, but plpgsql_check detects function named plpgsql_check_pragma and get options from parameters of this function. These plpgsql_check options are valid to end of group of statements. CREATE OR REPLACE FUNCTION test() RETURNS void AS $$BEGIN ... -- for following statements disable check PERFORM plpgsql_check_pragma('disable:check'); ... -- enable check again PERFORM plpgsql_check_pragma('enable:check'); ... END;$$ LANGUAGE plpgsql;  The function plpgsql_check_pragma is immutable function that returns one. It is defined by plpgsql_check extension. You can declare alternative plpgsql_check_pragma function like: CREATE OR REPLACE FUNCTION plpgsql_check_pragma(VARIADIC args[]) RETURNS int AS $$SELECT 1$$ LANGUAGE sql IMMUTABLE;  Using pragma function in declaration part of top block sets options on function level too. CREATE OR REPLACE FUNCTION test() RETURNS void AS $$DECLARE aux int := plpgsql_check_pragma('disable:extra_warnings'); ...  Shorter syntax for pragma is supported too: CREATE OR REPLACE FUNCTION test() RETURNS void AS$$ DECLARE r record; BEGIN PERFORM 'PRAGMA:TYPE:r (a int, b int)'; PERFORM 'PRAGMA:TABLE: x (like pg_class)'; ...  ## Supported pragmas echo:str - print string (for testing) status:check,status:tracer, status:other_warnings, status:performance_warnings, status:extra_warnings,status:security_warnings enable:check,enable:tracer, enable:other_warnings, enable:performance_warnings, enable:extra_warnings,enable:security_warnings disable:check,disable:tracer, disable:other_warnings, disable:performance_warnings, disable:extra_warnings,disable:security_warnings type:varname typename or type:varname (fieldname type, ...) - set type to variable of record type table: name (column_name type, ...) or table: name (like tablename) - create ephereal table Pragmas enable:tracer and disable:tracerare active for Postgres 12 and higher Compilation You need a development environment for PostgreSQL extensions: make clean make install  result: [pavel@localhost plpgsql_check]$ make USE_PGXS=1 clean
rm -f plpgsql_check.so   libplpgsql_check.a  libplpgsql_check.pc
rm -f plpgsql_check.o
rm -rf results/ regression.diffs regression.out tmp_check/ log/
[pavel@localhost plpgsql_check]$make USE_PGXS=1 all clang -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fpic -I/usr/local/pgsql/lib/pgxs/src/makefiles/../../src/pl/plpgsql/src -I. -I./ -I/usr/local/pgsql/include/server -I/usr/local/pgsql/include/internal -D_GNU_SOURCE -c -o plpgsql_check.o plpgsql_check.c clang -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fpic -I/usr/local/pgsql/lib/pgxs/src/makefiles/../../src/pl/plpgsql/src -shared -o plpgsql_check.so plpgsql_check.o -L/usr/local/pgsql/lib -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags [pavel@localhost plpgsql_check]$ su root
[root@localhost plpgsql_check]# make USE_PGXS=1 install
/usr/bin/mkdir -p '/usr/local/pgsql/lib'
/usr/bin/mkdir -p '/usr/local/pgsql/share/extension'
/usr/bin/mkdir -p '/usr/local/pgsql/share/extension'
/usr/bin/install -c -m 755  plpgsql_check.so '/usr/local/pgsql/lib/plpgsql_check.so'
/usr/bin/install -c -m 644 plpgsql_check.control '/usr/local/pgsql/share/extension/'
/usr/bin/install -c -m 644 plpgsql_check--0.9.sql '/usr/local/pgsql/share/extension/'
[root@localhost plpgsql_check]# exit
[pavel@localhost plpgsql_check]$make USE_PGXS=1 installcheck /usr/local/pgsql/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress --inputdir=./ --psqldir='/usr/local/pgsql/bin' --dbname=pl_regression --load-language=plpgsql --dbname=contrib_regression plpgsql_check_passive plpgsql_check_active plpgsql_check_active-9.5 (using postmaster on Unix socket, default port) ============== dropping database "contrib_regression" ============== DROP DATABASE ============== creating database "contrib_regression" ============== CREATE DATABASE ALTER DATABASE ============== installing plpgsql ============== CREATE LANGUAGE ============== running regression test queries ============== test plpgsql_check_passive ... ok test plpgsql_check_active ... ok test plpgsql_check_active-9.5 ... ok ===================== All 3 tests passed. =====================  ## Compilation on Ubuntu Sometimes successful compilation can require libicu-dev package (PostgreSQL 10 and higher - when pg was compiled with ICU support) sudo apt install libicu-dev  ## Compilation plpgsql_check on Windows You can check precompiled dll libraries http://okbob.blogspot.cz/2015/02/plpgsqlcheck-is-available-for-microsoft.html or compile by self: 1. Download and install PostgreSQL for Win32 from http://www.enterprisedb.com 2. Download and install Microsoft Visual C++ Express 3. Lern tutorial http://blog.2ndquadrant.com/compiling-postgresql-extensions-visual-studio-windows 4. Build plpgsql_check.dll 5. Install plugin 6. copy plpgsql_check.dll to PostgreSQL\14\lib 7. copy plpgsql_check.control and plpgsql_check--2.1.sql to PostgreSQL\14\share\extension ## Checked on • gcc on Linux (against all supported PostgreSQL) • clang 3.4 on Linux (against PostgreSQL 10) • for success regress tests the PostgreSQL 10 or higher is required Compilation against PostgreSQL 10 requires libICU! Licence Copyright (c) Pavel Stehule (pavel.stehule@gmail.com) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Note If you like it, send a postcard to address Pavel Stehule Skalice 12 256 01 Benesov u Prahy Czech Republic  I invite any questions, comments, bug reports, patches on mail address pavel.stehule@gmail.com Author: okbob Source Code: https://github.com/okbob/plpgsql_check License: View license 1646753760 ## Substrate Parachain Template: A New Cumulus-based Substrate Node ## Substrate Cumulus Parachain Template A new Cumulus-based Substrate node, ready for hacking :cloud: This project is a fork of the Substrate Node Template modified to include dependencies required for registering this node as a parathread or parachain to an established relay chain. 👉 Learn more about parachains here, and parathreads here. ## Build & Run Follow these steps to prepare a local Substrate development environment :hammer_and_wrench: ### Setup of Machine If necessary, refer to the setup instructions at the Substrate Developer Hub. ### Build Once the development environment is set up, build the Cumulus Parachain Template. This command will build the Wasm Runtime and native code: cargo build --release  ## Relay Chain NOTE: In the following two sections, we document how to manually start a few relay chain nodes, start a parachain node (collator), and register the parachain with the relay chain. We also have the polkadot-launch CLI tool that automate the following steps and help you easily launch relay chains and parachains. However it is still good to go through the following procedures once to understand the mechanism for running and registering a parachain. To operate a parathread or parachain, you must connect to a relay chain. Typically you would test on a local Rococo development network, then move to the testnet, and finally launch on the mainnet. Keep in mind you need to configure the specific relay chain you will connect to in your collator chain_spec.rs. In the following examples, we will use rococo-local as the relay network. ### Build Relay Chain Clone and build Polkadot (beware of the version tag we used): # Get a fresh clone, or cd to where you have polkadot already: git clone -b v0.9.7 --depth 1 https://github.com/paritytech/polkadot.git cd polkadot cargo build --release  ### Generate the Relay Chain Chainspec First, we create the chain specification file (chainspec). Note the chainspec file must be generated on a single node and then shared among all nodes! 👉 Learn more about chain specification here. ./target/release/polkadot build-spec \ --chain rococo-local \ --raw \ --disable-default-bootnode \ > rococo_local.json  ### Start Relay Chain We need n + 1 full validator nodes running on a relay chain to accept n parachain / parathread connections. Here we will start two relay chain nodes so we can have one parachain node connecting in later. From the Polkadot working directory: # Start Relay Alice node ./target/release/polkadot \ --chain ./rococo_local.json \ -d /tmp/relay/alice \ --validator \ --alice \ --port 50555  Open a new terminal, same directory: # Start Relay Bob node ./target/release/polkadot \ --chain ./rococo_local.json \ -d /tmp/relay/bob \ --validator \ --bob \ --port 50556  Add more nodes as needed, with non-conflicting ports, DB directories, and validator keys (--charlie, --dave, etc.). ### Reserve a ParaID To connect to a relay chain, you must first _reserve a ParaId for your parathread that will become a parachain. To do this, you will need sufficient amount of currency on the network account to reserve the ID. In this example, we will use Charlie development account where we have funds available. Once you submit this extrinsic successfully, you can start your collators. The easiest way to reserve your ParaId is via Polkadot Apps UI under the Parachains -> Parathreads tab and use the + ParaID button. ## Parachain ### Select the Correct Relay Chain To operate your parachain, you need to specify the correct relay chain you will connect to in your collator chain_spec.rs. Specifically you pass the command for the network you need in the Extensions of your ChainSpec::from_genesis() in the code. Extensions { relay_chain: "rococo-local".into(), // You MUST set this to the correct network! para_id: id.into(), },  You can choose from any pre-set runtime chainspec in the Polkadot repo, by referring to the cli/src/command.rs and node/service/src/chain_spec.rs files or generate your own and use that. See the Cumulus Workshop for how. In the following examples, we will use the rococo-local relay network we setup in the last section. ### Export the Parachain Genesis and Runtime We first generate the genesis state and genesis wasm needed for the parachain registration. # Build the parachain node (from it's top level dir) cd substrate-parachain-template cargo build --release # Folder to store resource files needed for parachain registration mkdir -p resources # Build the chainspec ./target/release/parachain-collator build-spec \ --disable-default-bootnode > ./resources/template-local-plain.json # Build the raw chainspec file ./target/release/parachain-collator build-spec \ --chain=./resources/template-local-plain.json \ --raw --disable-default-bootnode > ./resources/template-local-raw.json # Export genesis state to ./resources, using 2000 as the ParaId ./target/release/parachain-collator export-genesis-state --parachain-id 2000 > ./resources/para-2000-genesis # Export the genesis wasm ./target/release/parachain-collator export-genesis-wasm > ./resources/para-2000-wasm  NOTE: we have set the para_ID to be 2000 here. This must be unique for all parathreads/chains on the relay chain you register with. You must reserve this first on the relay chain for the testnet or mainnet. ### Start a Parachain Node (Collator) From the parachain template working directory: # NOTE: this command assumes the chain spec is in a directory named polkadot # that is at the same level of the template working directory. Change as needed. # # It also assumes a ParaId of 2000. Change as needed. ./target/release/parachain-collator \ -d /tmp/parachain/alice \ --collator \ --alice \ --force-authoring \ --ws-port 9945 \ --parachain-id 2000 \ -- \ --execution wasm \ --chain ../polkadot/rococo_local.json  Output: 2021-05-30 16:57:39 Parachain Collator Template 2021-05-30 16:57:39 ✌️ version 3.0.0-acce183-x86_64-linux-gnu 2021-05-30 16:57:39 ❤️ by Anonymous, 2017-2021 2021-05-30 16:57:39 📋 Chain specification: Local Testnet 2021-05-30 16:57:39 🏷 Node name: Alice 2021-05-30 16:57:39 👤 Role: AUTHORITY 2021-05-30 16:57:39 💾 Database: RocksDb at /tmp/parachain/alice/chains/local_testnet/db 2021-05-30 16:57:39 ⛓ Native runtime: template-parachain-1 (template-parachain-0.tx1.au1) 2021-05-30 16:57:41 Parachain id: Id(2000) 2021-05-30 16:57:41 Parachain Account: 5Ec4AhPUwPeyTFyuhGuBbD224mY85LKLMSqSSo33JYWCazU4 2021-05-30 16:57:41 Parachain genesis state: 0x0000000000000000000000000000000000000000000000000000000000000000000a96f42b5cb798190e5f679bb16970905087a9a9fc612fb5ca6b982b85783c0d03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400 2021-05-30 16:57:41 Is collating: yes 2021-05-30 16:57:41 [Parachain] 🔨 Initializing Genesis block/state (state: 0x0a96…3c0d, header-hash: 0xd42b…f271) 2021-05-30 16:57:41 [Parachain] ⏱ Loaded block-time = 12s from block 0xd42bb78354bc21770e3f0930ed45c7377558d2d8e81ca4d457e573128aabf271 2021-05-30 16:57:43 [Relaychain] 🔨 Initializing Genesis block/state (state: 0xace1…1b62, header-hash: 0xfa68…cf58) 2021-05-30 16:57:43 [Relaychain] 👴 Loading GRANDPA authority set from genesis on what appears to be first startup. 2021-05-30 16:57:44 [Relaychain] ⏱ Loaded block-time = 6s from block 0xfa68f5abd2a80394b87c9bd07e0f4eee781b8c696d0a22c8e5ba38ae10e1cf58 2021-05-30 16:57:44 [Relaychain] 👶 Creating empty BABE epoch changes on what appears to be first startup. 2021-05-30 16:57:44 [Relaychain] 🏷 Local node identity is: 12D3KooWBjYK2W4dsBfsrFA9tZCStb5ogPb6STQqi2AK9awXfXyG 2021-05-30 16:57:44 [Relaychain] 📦 Highest known block at #0 2021-05-30 16:57:44 [Relaychain] 〽️ Prometheus server started at 127.0.0.1:9616 2021-05-30 16:57:44 [Relaychain] Listening for new connections on 127.0.0.1:9945. 2021-05-30 16:57:44 [Parachain] Using default protocol ID "sup" because none is configured in the chain specs 2021-05-30 16:57:44 [Parachain] 🏷 Local node identity is: 12D3KooWADBSC58of6ng2M29YTDkmWCGehHoUZhsy9LGkHgYscBw 2021-05-30 16:57:44 [Parachain] 📦 Highest known block at #0 2021-05-30 16:57:44 [Parachain] Unable to listen on 127.0.0.1:9945 2021-05-30 16:57:44 [Parachain] Unable to bind RPC server to 127.0.0.1:9945. Trying random port. 2021-05-30 16:57:44 [Parachain] Listening for new connections on 127.0.0.1:45141. 2021-05-30 16:57:45 [Relaychain] 🔍 Discovered new external address for our node: /ip4/192.168.42.204/tcp/30334/ws/p2p/12D3KooWBjYK2W4dsBfsrFA9tZCStb5ogPb6STQqi2AK9awXfXyG 2021-05-30 16:57:45 [Parachain] 🔍 Discovered new external address for our node: /ip4/192.168.42.204/tcp/30333/p2p/12D3KooWADBSC58of6ng2M29YTDkmWCGehHoUZhsy9LGkHgYscBw 2021-05-30 16:57:48 [Relaychain] ✨ Imported #8 (0xe60b…9b0a) 2021-05-30 16:57:49 [Relaychain] 💤 Idle (2 peers), best: #8 (0xe60b…9b0a), finalized #5 (0x1e6f…567c), ⬇ 4.5kiB/s ⬆ 2.2kiB/s 2021-05-30 16:57:49 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 2.0kiB/s ⬆ 1.7kiB/s 2021-05-30 16:57:54 [Relaychain] ✨ Imported #9 (0x1af9…c9be) 2021-05-30 16:57:54 [Relaychain] ✨ Imported #9 (0x6ed8…fdf6) 2021-05-30 16:57:54 [Relaychain] 💤 Idle (2 peers), best: #9 (0x1af9…c9be), finalized #6 (0x3319…69a2), ⬇ 1.8kiB/s ⬆ 0.5kiB/s 2021-05-30 16:57:54 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 0.2kiB/s ⬆ 0.2kiB/s 2021-05-30 16:57:59 [Relaychain] 💤 Idle (2 peers), best: #9 (0x1af9…c9be), finalized #7 (0x5b50…1e5b), ⬇ 0.6kiB/s ⬆ 0.4kiB/s 2021-05-30 16:57:59 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0 2021-05-30 16:58:00 [Relaychain] ✨ Imported #10 (0xc9c9…1ca3)  You see messages are from both a relaychain node and a parachain node. This is because a relay chain light client is also run next to the parachain collator. ### Parachain Registration Now that you have two relay chain nodes, and a parachain node accompanied with a relay chain light client running, the next step is to register the parachain in the relay chain with the following steps (for detail, refer to the Substrate Cumulus Worship): • Goto Polkadot Apps UI, connecting to your relay chain. • Execute a sudo extrinsic on the relay chain by going to Developer -> sudo page. • Pick paraSudoWrapper -> sudoScheduleParaInitialize(id, genesis) as the extrinsic type, shown below. • Set the id: ParaId to 2,000 (or whatever ParaId you used above), and set the parachain: Bool option to Yes. • For the genesisHead, drag the genesis state file exported above, para-2000-genesis, in. • For the validationCode, drag the genesis wasm file exported above, para-2000-wasm, in. Note: When registering to the public Rococo testnet, ensure you set a unique paraId larger than 1,000. Values below 1,000 are reserved exclusively for system parachains. ### Restart the Parachain (Collator) The collator node may need to be restarted to get it functioning as expected. After a new epoch starts on the relay chain, your parachain will come online. Once this happens, you should see the collator start reporting parachain blocks: # Notice the relay epoch change! Only then do we start parachain collating! # 2021-05-30 17:00:04 [Relaychain] 💤 Idle (2 peers), best: #30 (0xfc02…2a2a), finalized #28 (0x10ff…6539), ⬇ 1.0kiB/s ⬆ 0.3kiB/s 2021-05-30 17:00:04 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0 2021-05-30 17:00:06 [Relaychain] 👶 New epoch 3 launching at block 0x68bc…0605 (block slot 270402601 >= start slot 270402601). 2021-05-30 17:00:06 [Relaychain] 👶 Next epoch starts at slot 270402611 2021-05-30 17:00:06 [Relaychain] ✨ Imported #31 (0x68bc…0605) 2021-05-30 17:00:06 [Parachain] Starting collation. relay_parent=0x68bcc93d24a31a2c89800a56c7a2b275fe9ca7bd63f829b64588ae0d99280605 at=0xd42bb78354bc21770e3f0930ed45c7377558d2d8e81ca4d457e573128aabf271 2021-05-30 17:00:06 [Parachain] 🙌 Starting consensus session on top of parent 0xd42bb78354bc21770e3f0930ed45c7377558d2d8e81ca4d457e573128aabf271 2021-05-30 17:00:06 [Parachain] 🎁 Prepared block for proposing at 1 [hash: 0xf6507812bf60bf53af1311f775aac03869be870df6b0406b2969784d0935cb92; parent_hash: 0xd42b…f271; extrinsics (2): [0x1bf5…1d76, 0x7c9b…4e23]] 2021-05-30 17:00:06 [Parachain] 🔖 Pre-sealed block for proposal at 1. Hash now 0x80fc151d7ccf228b802525022b6de257e42388ec7dc3c1dd7de491313650ccae, previously 0xf6507812bf60bf53af1311f775aac03869be870df6b0406b2969784d0935cb92. 2021-05-30 17:00:06 [Parachain] ✨ Imported #1 (0x80fc…ccae) 2021-05-30 17:00:06 [Parachain] Produced proof-of-validity candidate. block_hash=0x80fc151d7ccf228b802525022b6de257e42388ec7dc3c1dd7de491313650ccae 2021-05-30 17:00:09 [Relaychain] 💤 Idle (2 peers), best: #31 (0x68bc…0605), finalized #29 (0xa6fa…9e16), ⬇ 1.2kiB/s ⬆ 129.9kiB/s 2021-05-30 17:00:09 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0 2021-05-30 17:00:12 [Relaychain] ✨ Imported #32 (0x5e92…ba30) 2021-05-30 17:00:12 [Relaychain] Moving approval window from session 0..=2 to 0..=3 2021-05-30 17:00:12 [Relaychain] ✨ Imported #32 (0x8144…74eb) 2021-05-30 17:00:14 [Relaychain] 💤 Idle (2 peers), best: #32 (0x5e92…ba30), finalized #29 (0xa6fa…9e16), ⬇ 1.4kiB/s ⬆ 0.2kiB/s 2021-05-30 17:00:14 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0 2021-05-30 17:00:18 [Relaychain] ✨ Imported #33 (0x8c30…9ccd) 2021-05-30 17:00:18 [Parachain] Starting collation. relay_parent=0x8c30ce9e6e9867824eb2aff40148ac1ed64cf464f51c5f2574013b44b20f9ccd at=0x80fc151d7ccf228b802525022b6de257e42388ec7dc3c1dd7de491313650ccae 2021-05-30 17:00:19 [Relaychain] 💤 Idle (2 peers), best: #33 (0x8c30…9ccd), finalized #30 (0xfc02…2a2a), ⬇ 0.7kiB/s ⬆ 0.4kiB/s 2021-05-30 17:00:19 [Parachain] 💤 Idle (0 peers), best: #1 (0x80fc…ccae), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0 2021-05-30 17:00:22 [Relaychain] 👴 Applying authority set change scheduled at block #31 2021-05-30 17:00:22 [Relaychain] 👴 Applying GRANDPA set change to new set [(Public(88dc3417d5058ec4b4503e0c12ea1a0a89be200fe98922423d4334014fa6b0ee (5FA9nQDV...)), 1), (Public(d17c2d7823ebf260fd138f2d7e27d114c0145d968b5ff5006125f2414fadae69 (5GoNkf6W...)), 1)] 2021-05-30 17:00:22 [Relaychain] 👴 Imported justification for block #31 that triggers command Changing authorities, signaling voter. 2021-05-30 17:00:24 [Relaychain] ✨ Imported #34 (0x211b…febf) 2021-05-30 17:00:24 [Parachain] Starting collation. relay_parent=0x211b3c53bebeff8af05e8f283d59fe171b7f91a5bf9c4669d88943f5a42bfebf at=0x80fc151d7ccf228b802525022b6de257e42388ec7dc3c1dd7de491313650ccae 2021-05-30 17:00:24 [Parachain] 🙌 Starting consensus session on top of parent 0x80fc151d7ccf228b802525022b6de257e42388ec7dc3c1dd7de491313650ccae 2021-05-30 17:00:24 [Parachain] 🎁 Prepared block for proposing at 2 [hash: 0x10fcb3180e966729c842d1b0c4d8d2c4028cfa8bef02b909af5ef787e6a6a694; parent_hash: 0x80fc…ccae; extrinsics (2): [0x4a6c…1fc6, 0x6b84…7cea]] 2021-05-30 17:00:24 [Parachain] 🔖 Pre-sealed block for proposal at 2. Hash now 0x5087fd06b1b73d90cfc3ad175df8495b378fffbb02fea212cc9e49a00fd8b5a0, previously 0x10fcb3180e966729c842d1b0c4d8d2c4028cfa8bef02b909af5ef787e6a6a694. 2021-05-30 17:00:24 [Parachain] ✨ Imported #2 (0x5087…b5a0) 2021-05-30 17:00:24 [Parachain] Produced proof-of-validity candidate. block_hash=0x5087fd06b1b73d90cfc3ad175df8495b378fffbb02fea212cc9e49a00fd8b5a0 2021-05-30 17:00:24 [Relaychain] 💤 Idle (2 peers), best: #34 (0x211b…febf), finalized #31 (0x68bc…0605), ⬇ 1.0kiB/s ⬆ 130.1kiB/s 2021-05-30 17:00:24 [Parachain] 💤 Idle (0 peers), best: #1 (0x80fc…ccae), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0 2021-05-30 17:00:29 [Relaychain] 💤 Idle (2 peers), best: #34 (0x211b…febf), finalized #32 (0x5e92…ba30), ⬇ 0.2kiB/s ⬆ 0.1kiB/s 2021-05-30 17:00:29 [Parachain] 💤 Idle (0 peers), best: #1 (0x80fc…ccae), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0 2021-05-30 17:00:30 [Relaychain] ✨ Imported #35 (0xee07…38a0) 2021-05-30 17:00:34 [Relaychain] 💤 Idle (2 peers), best: #35 (0xee07…38a0), finalized #33 (0x8c30…9ccd), ⬇ 0.9kiB/s ⬆ 0.3kiB/s 2021-05-30 17:00:34 [Parachain] 💤 Idle (0 peers), best: #1 (0x80fc…ccae), finalized #1 (0x80fc…ccae), ⬇ 0 ⬆ 0 2021-05-30 17:00:36 [Relaychain] ✨ Imported #36 (0xe8ce…4af6) 2021-05-30 17:00:36 [Parachain] Starting collation. relay_parent=0xe8cec8015c0c7bf508bf3f2f82b1696e9cca078e814b0f6671f0b0d5dfe84af6 at=0x5087fd06b1b73d90cfc3ad175df8495b378fffbb02fea212cc9e49a00fd8b5a0 2021-05-30 17:00:39 [Relaychain] 💤 Idle (2 peers), best: #36 (0xe8ce…4af6), finalized #33 (0x8c30…9ccd), ⬇ 0.6kiB/s ⬆ 0.1kiB/s 2021-05-30 17:00:39 [Parachain] 💤 Idle (0 peers), best: #2 (0x5087…b5a0), finalized #1 (0x80fc…ccae), ⬇ 0 ⬆ 0  Note the delay here! It may take some time for your relay chain to enter a new epoch. ## Rococo & Westend Relay Chain Testnets Is this Cumulus Parachain Template Rococo & Westend testnets compatible? Yes! • Rococo is the testnet of Kusama (join the Rococo Faucet to get testing funds). • Westend is the testnet of Polkadot (join the Westend Faucet to get testing funds). See the Cumulus Workshop for the latest instructions to register a parathread/parachain on a relay chain. NOTE: When running the relay chain and parachain, you must use the same tagged version of Polkadot and Cumulus so the collator would register successfully to the relay chain. You should test locally registering your parachain successfully before attempting to connect to any running relay chain network! Find chainspec files to connect to live networks here. You want to be sure to use the correct git release tag in these files, as they change from time to time and must match the live network! These networks are under constant development - so please follow the progress and update of your parachains in lock step with the testnet changes if you wish to connect to the network. Do join the Parachain Technical matrix chat room to ask questions and connect with the parachain building teams. ## Learn More • More detailed instructions to use Cumulus parachains are found in the Cumulus Workshop. • Refer to the upstream Substrate Node Template to learn more about the structure of this project, the capabilities it encapsulates and the way in which those capabilities are implemented. • Learn more about how a parachain block is added to a finalized chain here. Download Details: Author: aresprotocols Source Code: https://github.com/aresprotocols/substrate-parachain-template License: Unlicense License 1655019480 ## Learning-v8: Project for Learning V8 internals ## Learning Google V8 The sole purpose of this project is to aid me in leaning Google's V8 JavaScript engine ### Isolate An Isolate is an independant copy of the V8 runtime which includes its own heap. Two different Isolates can run in parallel and can be seen as entirely different sandboxed instances of a V8 runtime. ### Context To allow separate JavaScript applications to run in the same isolate a context must be specified for each one. This is to avoid them interfering with each other, for example by changing the builtin objects provided. ### Template This is the super class of both ObjecTemplate and FunctionTemplate. Remember that in JavaScript a function can have fields just like objects. class V8_EXPORT Template : public Data { public: void Set(Local<Name> name, Local<Data> value, PropertyAttribute attributes = None); void SetPrivate(Local<Private> name, Local<Data> value, PropertyAttribute attributes = None); V8_INLINE void Set(Isolate* isolate, const char* name, Local<Data> value); void SetAccessorProperty( Local<Name> name, Local<FunctionTemplate> getter = Local<FunctionTemplate>(), Local<FunctionTemplate> setter = Local<FunctionTemplate>(), PropertyAttribute attribute = None, AccessControl settings = DEFAULT);  The Set function can be used to have an name and a value set on an instance created from this template. The SetAccessorProperty is for properties that are get/set using functions. enum PropertyAttribute { /** None. **/ None = 0, /** ReadOnly, i.e., not writable. **/ ReadOnly = 1 << 0, /** DontEnum, i.e., not enumerable. **/ DontEnum = 1 << 1, /** DontDelete, i.e., not configurable. **/ DontDelete = 1 << 2 }; enum AccessControl { DEFAULT = 0, ALL_CAN_READ = 1, ALL_CAN_WRITE = 1 << 1, PROHIBITS_OVERWRITING = 1 << 2 };  ### ObjectTemplate These allow you to create JavaScript objects without a dedicated constructor. When an instance is created using an ObjectTemplate the new instance will have the properties and functions configured on the ObjectTemplate. This would be something like: const obj = {};  This class is declared in include/v8.h and extends Template: class V8_EXPORT ObjectTemplate : public Template { ... } class V8_EXPORT Template : public Data { ... } class V8_EXPORT Data { private: Data(); };  We create an instance of ObjectTemplate and we can add properties to it that all instance created using this ObjectTemplate instance will have. This is done by calling Set which is member of the Template class. You specify a Local for the property. Name is a superclass for Symbol and String which can be both be used as names for a property. The implementation for Set can be found in src/api/api.cc: void Template::Set(v8::Local<Name> name, v8::Local<Data> value, v8::PropertyAttribute attribute) { ... i::ApiNatives::AddDataProperty(isolate, templ, Utils::OpenHandle(*name), value_obj, static_cast<i::PropertyAttributes>(attribute)); }  There is an example in objecttemplate_test.cc ### FunctionTemplate Is a template that is used to create functions and like ObjectTemplate it inherits from Template: class V8_EXPORT FunctionTemplate : public Template { }  Rememeber that a function in javascript can have properties just like object. There is an example in functiontemplate_test.cc An instance of a function template can be created using:  Local<FunctionTemplate> ft = FunctionTemplate::New(isolate_, function_callback, data); Local<Function> function = ft->GetFunction(context).ToLocalChecked();  And the function can be called using:  MaybeLocal<Value> ret = function->Call(context, recv, 0, nullptr);  Function::Call can be found in src/api/api.cc:  bool has_pending_exception = false; auto self = Utils::OpenHandle(this); i::Handle<i::Object> recv_obj = Utils::OpenHandle(*recv); i::Handle<i::Object>* args = reinterpret_cast<i::Handle<i::Object>*>(argv); Local<Value> result; has_pending_exception = !ToLocal<Value>( i::Execution::Call(isolate, self, recv_obj, argc, args), &result);  Notice that the return value of Call which is a MaybeHandle<Object> will be passed to ToLocal which is defined in api.h: template <class T> inline bool ToLocal(v8::internal::MaybeHandle<v8::internal::Object> maybe, Local<T>* local) { v8::internal::Handle<v8::internal::Object> handle; if (maybe.ToHandle(&handle)) { *local = Utils::Convert<v8::internal::Object, T>(handle); return true; } return false;  So lets take a look at Execution::Call which can be found in execution/execution.cc and it calls: return Invoke(isolate, InvokeParams::SetUpForCall(isolate, callable, receiver, argc, argv));  SetUpForCall will return an InvokeParams. TODO: Take a closer look at InvokeParams. V8_WARN_UNUSED_RESULT MaybeHandle<Object> Invoke(Isolate* isolate, const InvokeParams& params) {  Handle<Object> receiver = params.is_construct ? isolate->factory()->the_hole_value() : params.receiver;  In our case is_construct is false as we are not using new and the receiver, the this in the function should be set to the receiver that we passed in. After that we have Builtins::InvokeApiFunction auto value = Builtins::InvokeApiFunction( isolate, params.is_construct, function, receiver, params.argc, params.argv, Handle<HeapObject>::cast(params.new_target));  result = HandleApiCallHelper<false>(isolate, function, new_target, fun_data, receiver, arguments);  api-arguments-inl.h has: FunctionCallbackArguments::Call(CallHandlerInfo handler) { ... ExternalCallbackScope call_scope(isolate, FUNCTION_ADDR(f)); FunctionCallbackInfo<v8::Value> info(values_, argv_, argc_); f(info); return GetReturnValue<Object>(isolate); }  The call to f(info) is what invokes the callback, which is just a normal function call. Back in HandleApiCallHelper we have: Handle<Object> result = custom.Call(call_data); RETURN_EXCEPTION_IF_SCHEDULED_EXCEPTION(isolate, Object);  RETURN_EXCEPTION_IF_SCHEDULED_EXCEPTION expands to: Handle<Object> result = custom.Call(call_data); do { Isolate* __isolate__ = (isolate); ((void) 0); if (__isolate__->has_scheduled_exception()) { __isolate__->PromoteScheduledException(); return MaybeHandle<Object>(); } } while (false);  Notice that if there was an exception an empty object is returned. Later in Invoke in execution.cca:  auto value = Builtins::InvokeApiFunction( isolate, params.is_construct, function, receiver, params.argc, params.argv, Handle<HeapObject>::cast(params.new_target)); bool has_exception = value.is_null(); if (has_exception) { if (params.message_handling == Execution::MessageHandling::kReport) { isolate->ReportPendingMessages(); } return MaybeHandle<Object>(); } else { isolate->clear_pending_message(); } return value;  Looking at this is looks like passing back an empty object will cause an exception to be triggered? ### Address Address can be found in include/v8-internal.h: typedef uintptr_t Address;  uintptr_t is an optional type specified in cstdint and is capable of storing a data pointer. It is an unsigned integer type that any valid pointer to void can be converted to this type (and back). ### TaggedImpl This class is declared in src/objects/tagged-impl.h and has a single private member which is declared as:  public constexpr StorageType ptr() const { return ptr_; } private: StorageType ptr_;  An instance can be created using:  i::TaggedImpl<i::HeapObjectReferenceType::STRONG, i::Address> tagged{};  Storage type can also be Tagged_t which is defined in globals.h:  using Tagged_t = uint32_t;  It looks like it can be a different value when using pointer compression. See tagged_test.cc for an example. ### Object This class extends TaggedImpl: class Object : public TaggedImpl<HeapObjectReferenceType::STRONG, Address> {  An Object can be created using the default constructor, or by passing in an Address which will delegate to TaggedImpl constructors. Object itself does not have any members (apart from ptr_ which is inherited from TaggedImpl that is). So if we create an Object on the stack this is like a pointer/reference to an object: +------+ |Object| |------| |ptr_ |----> +------+  Now, ptr_ is a StorageType so it could be a Smi in which case it would just contains the value directly, for example a small integer: +------+ |Object| |------| | 18 | +------+  See object_test.cc for an example. ### ObjectSlot  i::Object obj{18}; i::FullObjectSlot slot{&obj};  +----------+ +---------+ |ObjectSlot| | Object | |----------| |---------| | address | ---> | 18 | +----------+ +---------+  See objectslot_test.cc for an example. ### Maybe A Maybe is like an optional which can either hold a value or nothing. template <class T> class Maybe { public: V8_INLINE bool IsNothing() const { return !has_value_; } V8_INLINE bool IsJust() const { return has_value_; } ... private: bool has_value_; T value_; }  I first thought that name Just was a little confusing but if you read this like:  bool cond = true; Maybe<int> maybe = cond ? Just<int>(10) : Nothing<int>();  I think it makes more sense. There are functions that check if the Maybe is nothing and crash the process if so. You can also check and return the value by using FromJust. The usage of Maybe is where api calls can fail and returning Nothing is a way of signaling this. See maybe_test.cc for an example. ### MaybeLocal template <class T> class MaybeLocal { public: V8_INLINE MaybeLocal() : val_(nullptr) {} V8_INLINE Local<T> ToLocalChecked(); V8_INLINE bool IsEmpty() const { return val_ == nullptr; } template <class S> V8_WARN_UNUSED_RESULT V8_INLINE bool ToLocal(Local<S>* out) const { out->val_ = IsEmpty() ? nullptr : this->val_; return !IsEmpty(); } private: T* val_;  ToLocalChecked will crash the process if val_ is a nullptr. If you want to avoid a crash one can use ToLocal. See maybelocal_test.cc for an example. ### Data Is the super class of all objects that can exist the V8 heap: class V8_EXPORT Data { private: Data(); };  ### Value Value extends Data and adds a number of methods that check if a Value is of a certain type, like IsUndefined(), IsNull, IsNumber etc. It also has useful methods to convert to a Local, for example: V8_WARN_UNUSED_RESULT MaybeLocal<Number> ToNumber(Local<Context> context) const; V8_WARN_UNUSED_RESULT MaybeLocal<String> ToNumber(Local<String> context) const; ...  ### Handle A Handle is similar to a Object and ObjectSlot in that it also contains an Address member (called location_ and declared in HandleBase), but with the difference is that Handles acts as a layer of abstraction and can be relocated by the garbage collector. Can be found in src/handles/handles.h. class HandleBase { ... protected: Address* location_; } template <typename T> class Handle final : public HandleBase { ... }  +----------+ +--------+ +---------+ | Handle | | Object | | int | |----------| +-----+ |--------| |---------| |*location_| ---> |&ptr_| --> | ptr_ | -----> | 5 | +----------+ +-----+ +--------+ +---------+  (gdb) p handle$8 = {<v8::internal::HandleBase> = {location_ = 0x7ffdf81d60c0}, <No data fields>}


Notice that location_ contains a pointer:

(gdb) p /x *(int*)0x7ffdf81d60c0
$9 = 0xa9d330  And this is the same as the value in obj: (gdb) p /x obj.ptr_$14 = 0xa9d330


And we can access the int using any of the pointers:

(gdb) p /x *value
$16 = 0x5 (gdb) p /x *obj.ptr_$17 = 0x5
(gdb) p /x *(int*)0x7ffdf81d60c0
$18 = 0xa9d330 (gdb) p /x *(*(int*)0x7ffdf81d60c0)$19 = 0x5


See handle_test.cc for an example.

### HandleScope

Contains a number of Local/Handle's (think pointers to objects but is managed by V8) and will take care of deleting the Local/Handles for us. HandleScopes are stack allocated

When ~HandleScope is called all handles created within that scope are removed from the stack maintained by the HandleScope which makes objects to which the handles point being eligible for deletion from the heap by the GC.

A HandleScope only has three members:

  internal::Isolate* isolate_;


Lets take a closer look at what happens when we construct a HandleScope:

  v8::HandleScope handle_scope{isolate_};


The constructor call will end up in src/api/api.cc and the constructor simply delegates to Initialize:

HandleScope::HandleScope(Isolate* isolate) { Initialize(isolate); }

void HandleScope::Initialize(Isolate* isolate) {
i::Isolate* internal_isolate = reinterpret_cast<i::Isolate*>(isolate);
...
i::HandleScopeData* current = internal_isolate->handle_scope_data();
isolate_ = internal_isolate;
prev_next_ = current->next;
prev_limit_ = current->limit;
current->level++;
}


Every v8::internal::Isolate has member of type HandleScopeData:

HandleScopeData* handle_scope_data() { return &handle_scope_data_; }
HandleScopeData handle_scope_data_;


HandleScopeData is a struct defined in src/handles/handles.h:

struct HandleScopeData final {
int level;
int sealed_level;
CanonicalHandleScope* canonical_scope;

void Initialize() {
next = limit = nullptr;
sealed_level = level = 0;
canonical_scope = nullptr;
}
};


Notice that there are two pointers (Address*) to next and a limit. When a HandleScope is Initialized the current handle_scope_data will be retrieved from the internal isolate. The HandleScope instance that is getting created stores the next/limit pointers of the current isolate so that they can be restored when this HandleScope is closed (see CloseScope).

So with a HandleScope created, how does a Local interact with this instance?

When a Local is created this will/might go through FactoryBase::NewStruct which will allocate a new Map and then create a Handle for the InstanceType being created:

Handle<Struct> str = handle(Struct::cast(result), isolate());


This will land in the constructor Handlesrc/handles/handles-inl.h

template <typename T>
Handle<T>::Handle(T object, Isolate* isolate): HandleBase(object.ptr(), isolate) {}

: location_(HandleScope::GetHandle(isolate, object)) {}


Notice that object.ptr() is used to pass the Address to HandleBase. And also notice that HandleBase sets its location_ to the result of HandleScope::GetHandle.

Address* HandleScope::GetHandle(Isolate* isolate, Address value) {
DCHECK(AllowHandleAllocation::IsAllowed());
HandleScopeData* data = isolate->handle_scope_data();
CanonicalHandleScope* canonical = data->canonical_scope;
return canonical ? canonical->Lookup(value) : CreateHandle(isolate, value);
}


Which will call CreateHandle in this case and this function will retrieve the current isolate's handle_scope_data:

  HandleScopeData* data = isolate->handle_scope_data();
if (result == data->limit) {
result = Extend(isolate);
}


In this case both next and limit will be 0x0 so Extend will be called. Extend will also get the isolates handle_scope_data and check the current level and after that get the isolates HandleScopeImplementer:

  HandleScopeImplementer* impl = isolate->handle_scope_implementer();


HandleScopeImplementer is declared in src/api/api.h

HandleScope:CreateHandle will get the handle_scope_data from the isolate:

Address* HandleScope::CreateHandle(Isolate* isolate, Address value) {
HandleScopeData* data = isolate->handle_scope_data();
if (result == data->limit) {
result = Extend(isolate);
}
// Update the current next field, set the value in the created handle,
// and return the result.
*result = value;
return result;
}


Notice that data->next is set to the address passed in + the size of an Address.

The destructor for HandleScope will call CloseScope. See handlescope_test.cc for an example.

### EscapableHandleScope

Local handles are located on the stack and are deleted when the appropriate destructor is called. If there is a local HandleScope then it will take care of this when the scope returns. When there are no references left to a handle it can be garbage collected. This means if a function has a HandleScope and wants to return a handle/local it will not be available after the function returns. This is what EscapableHandleScope is for, it enable the value to be placed in the enclosing handle scope to allow it to survive. When the enclosing HandleScope goes out of scope it will be cleaned up.

class V8_EXPORT EscapableHandleScope : public HandleScope {
public:
explicit EscapableHandleScope(Isolate* isolate);
V8_INLINE ~EscapableHandleScope() = default;
template <class T>
V8_INLINE Local<T> Escape(Local<T> value) {
return Local<T>(reinterpret_cast<T*>(slot));
}

template <class T>
V8_INLINE MaybeLocal<T> EscapeMaybe(MaybeLocal<T> value) {
return Escape(value.FromMaybe(Local<T>()));
}

private:
...
};


From api.cc

EscapableHandleScope::EscapableHandleScope(Isolate* v8_isolate) {
i::Isolate* isolate = reinterpret_cast<i::Isolate*>(v8_isolate);
Initialize(v8_isolate);
}


So when an EscapableHandleScope is created it will create a handle with the hole value and store it in the escape_slot_ which is of type Address. This Handle will be created in the current HandleScope, and EscapableHandleScope can later set a value for that pointer/address which it want to be escaped. Later when that HandleScope goes out of scope it will be cleaned up. It then calls Initialize just like a normal HandleScope would.

i::Address* HandleScope::CreateHandle(i::Isolate* isolate, i::Address value) {
return i::HandleScope::CreateHandle(isolate, value);
}


From handles-inl.h:

Address* HandleScope::CreateHandle(Isolate* isolate, Address value) {
DCHECK(AllowHandleAllocation::IsAllowed());
HandleScopeData* data = isolate->handle_scope_data();
if (result == data->limit) {
result = Extend(isolate);
}
// Update the current next field, set the value in the created handle,
// and return the result.
*result = value;
return result;
}


When Escape is called the following happens (v8.h):

template <class T>
V8_INLINE Local<T> Escape(Local<T> value) {
return Local<T>(reinterpret_cast<T*>(slot));
}


An the EscapeableHandleScope::Escape (api.cc):

i::Address* EscapableHandleScope::Escape(i::Address* escape_value) {
i::Heap* heap = reinterpret_cast<i::Isolate*>(GetIsolate())->heap();
Utils::ApiCheck(i::Object(*escape_slot_).IsTheHole(heap->isolate()),
"EscapableHandleScope::Escape", "Escape value set twice");
if (escape_value == nullptr) {
return nullptr;
}
*escape_slot_ = *escape_value;
return escape_slot_;
}


If the escape_value is null, the escape_slot that is a pointer into the parent HandleScope is set to the undefined_value() instead of the hole value which is was previously, and nullptr will be returned. This returned address/pointer will then be returned after being casted to T*. Next, we take a look at what happens when the EscapableHandleScope goes out of scope. This will call HandleScope::~HandleScope which makes sense as any other Local handles should be cleaned up.

Escape copies the value of its argument into the enclosing scope, deletes alli its local handles, and then gives back the new handle copy which can safely be returned.

TODO:

### Local

Has a single member val_ which is of type pointer to T:

template <class T> class Local {
...
private:
T* val_
}


Notice that this is a pointer to T. We could create a local using:

  v8::Local<v8::Value> empty_value;


So a Local contains a pointer to type T. We can access this pointer using operator-> and operator*.

We can cast from a subtype to a supertype using Local::Cast:

v8::Local<v8::Number> nr = v8::Local<v8::Number>(v8::Number::New(isolate_, 12));
v8::Local<v8::Value> val = v8::Local<v8::Value>::Cast(nr);


And there is also the

v8::Local<v8::Value> val2 = nr.As<v8::Value>();


See local_test.cc for an example.

### PrintObject

Using _v8_internal_Print_Object from c++:

$nm -C libv8_monolith.a | grep Print_Object 0000000000000000 T _v8_internal_Print_Object(void*)  Notice that this function does not have a namespace. We can use this as: extern void _v8_internal_Print_Object(void* object); _v8_internal_Print_Object(*((v8::internal::Object**)(*global)));  Lets take a closer look at the above:  v8::internal::Object** gl = ((v8::internal::Object**)(*global));  We use the dereference operator to get the value of a Local (*global), which is just of type T*, a pointer to the type the Local: template <class T> class Local { ... private: T* val_; }  We are then casting that to be of type pointer-to-pointer to Object.  gl** Object* Object +-----+ +------+ +-------+ | |----->| |----->| | +-----+ +------+ +-------+  An instance of v8::internal::Object only has a single data member which is a field named ptr_ of type Address: src/objects/objects.h: class Object : public TaggedImpl<HeapObjectReferenceType::STRONG, Address> { public: constexpr Object() : TaggedImpl(kNullAddress) {} explicit constexpr Object(Address ptr) : TaggedImpl(ptr) {} #define IS_TYPE_FUNCTION_DECL(Type) \ V8_INLINE bool Is##Type() const; \ V8_INLINE bool Is##Type(const Isolate* isolate) const; OBJECT_TYPE_LIST(IS_TYPE_FUNCTION_DECL) HEAP_OBJECT_TYPE_LIST(IS_TYPE_FUNCTION_DECL) IS_TYPE_FUNCTION_DECL(HashTableBase) IS_TYPE_FUNCTION_DECL(SmallOrderedHashTable) #undef IS_TYPE_FUNCTION_DECL V8_INLINE bool IsNumber(ReadOnlyRoots roots) const; }  Lets take a look at one of these functions and see how it is implemented. For example in the OBJECT_TYPE_LIST we have: #define OBJECT_TYPE_LIST(V) \ V(LayoutDescriptor) \ V(Primitive) \ V(Number) \ V(Numeric)  So the object class will have a function that looks like: inline bool IsNumber() const; inline bool IsNumber(const Isolate* isolate) const;  And in src/objects/objects-inl.h we will have the implementations: bool Object::IsNumber() const { return IsHeapObject() && HeapObject::cast(*this).IsNumber(); }  IsHeapObject is defined in TaggedImpl:  constexpr inline bool IsHeapObject() const { return IsStrong(); } constexpr inline bool IsStrong() const { #if V8_HAS_CXX14_CONSTEXPR DCHECK_IMPLIES(!kCanBeWeak, !IsSmi() == HAS_STRONG_HEAP_OBJECT_TAG(ptr_)); #endif return kCanBeWeak ? HAS_STRONG_HEAP_OBJECT_TAG(ptr_) : !IsSmi(); }  The macro can be found in src/common/globals.h: #define HAS_STRONG_HEAP_OBJECT_TAG(value) \ (((static_cast<i::Tagged_t>(value) & ::i::kHeapObjectTagMask) == \ ::i::kHeapObjectTag))  So we are casting ptr_ which is of type Address into type Tagged_t which is defined in src/common/global.h and can be different depending on if compressed pointers are used or not. If they are not supported it is the same as Address: using Tagged_t = Address;  src/objects/tagged-impl.h: template <HeapObjectReferenceType kRefType, typename StorageType> class TaggedImpl { StorageType ptr_; }  The HeapObjectReferenceType can be either WEAK or STRONG. And the storage type is Address in this case. So Object itself only has one member that is inherited from its only super class and this is ptr_. So the following is telling the compiler to treat the value of our Local, *global, as a pointer (which it already is) to a pointer that points to a memory location that adhers to the layout of an v8::internal::Object type, which we know now has a prt_ member. And we want to dereference it and pass it into the function. _v8_internal_Print_Object(*((v8::internal::Object**)(*global)));  ### ObjectTemplate But I'm still missing the connection between ObjectTemplate and object. When we create it we use: Local<ObjectTemplate> global = ObjectTemplate::New(isolate);  In src/api/api.cc we have: static Local<ObjectTemplate> ObjectTemplateNew( i::Isolate* isolate, v8::Local<FunctionTemplate> constructor, bool do_not_cache) { i::Handle<i::Struct> struct_obj = isolate->factory()->NewStruct( i::OBJECT_TEMPLATE_INFO_TYPE, i::AllocationType::kOld); i::Handle<i::ObjectTemplateInfo> obj = i::Handle<i::ObjectTemplateInfo>::cast(struct_obj); InitializeTemplate(obj, Consts::OBJECT_TEMPLATE); int next_serial_number = 0; if (!constructor.IsEmpty()) obj->set_constructor(*Utils::OpenHandle(*constructor)); obj->set_data(i::Smi::zero()); return Utils::ToLocal(obj); }  What is a Struct in this context? src/objects/struct.h #include "torque-generated/class-definitions-tq.h" class Struct : public TorqueGeneratedStruct<Struct, HeapObject> { public: inline void InitializeBody(int object_size); void BriefPrintDetails(std::ostream& os); TQ_OBJECT_CONSTRUCTORS(Struct)  Notice that the include is specifying torque-generated include which can be found out/x64.release_gcc/gen/torque-generated/class-definitions-tq. So, somewhere there must be an call to the torque executable which generates the Code Stub Assembler C++ headers and sources before compiling the main source files. There is and there is a section about this in Building V8. The macro TQ_OBJECT_CONSTRUCTORS can be found in src/objects/object-macros.h and expands to:  constexpr Struct() = default; protected: template <typename TFieldType, int kFieldOffset> friend class TaggedField; inline explicit Struct(Address ptr);  So what does the TorqueGeneratedStruct look like? template <class D, class P> class TorqueGeneratedStruct : public P { public:  Where D is Struct and P is HeapObject in this case. But the above is the declartion of the type but what we have in the .h file is what was generated. This type is defined in src/objects/struct.tq: @abstract @generatePrint @generateCppClass extern class Struct extends HeapObject { }  NewStruct can be found in src/heap/factory-base.cc template <typename Impl> HandleFor<Impl, Struct> FactoryBase<Impl>::NewStruct( InstanceType type, AllocationType allocation) { Map map = Map::GetStructMap(read_only_roots(), type); int size = map.instance_size(); HeapObject result = AllocateRawWithImmortalMap(size, allocation, map); HandleFor<Impl, Struct> str = handle(Struct::cast(result), isolate()); str->InitializeBody(size); return str; }  Every object that is stored on the v8 heap has a Map (src/objects/map.h) that describes the structure of the object being stored. class Map : public HeapObject {  1725 return Utils::ToLocal(obj); (gdb) p obj$6 = {<v8::internal::HandleBase> = {location_ = 0x30b5160}, <No data fields>}


So this is the connection, what we see as a Local is a HandleBase. TODO: dig into this some more when I have time.

(lldb) expr gl
(v8::internal::Object **) $0 = 0x00000000020ee160 (lldb) memory read -f x -s 8 -c 1 gl 0x020ee160: 0x00000aee081c0121 (lldb) memory read -f x -s 8 -c 1 *gl 0xaee081c0121: 0x0200000002080433  You can reload .lldbinit using the following command: (lldb) command source ~/.lldbinit  This can be useful when debugging a lldb command. You can set a breakpoint and break at that location and make updates to the command and reload without having to restart lldb. Currently, the lldb-commands.py that ships with v8 contains an extra operation of the parameter pased to ptr_arg_cmd: def ptr_arg_cmd(debugger, name, param, cmd): if not param: print("'{}' requires an argument".format(name)) return param = '(void*)({})'.format(param) no_arg_cmd(debugger, cmd.format(param))  Notice that param is the object that we want to print, for example lets say it is a local named obj: param = "(void*)(obj)"  This will then be "passed"/formatted into the command string: "_v8_internal_Print_Object(*(v8::internal::Object**)(*(void*)(obj))")  #### Threads V8 is single threaded (the execution of the functions of the stack) but there are supporting threads used for garbage collection, profiling (IC, and perhaps other things) (I think). Lets see what threads there are: $ LD_LIBRARY_PATH=../v8_src/v8/out/x64.release_gcc/ lldb ./hello-world
(lldb) br s -n main
(lldb) r
thread #1: tid = 0x2efca6, 0x0000000100001e16 hello-worldmain(argc=1, argv=0x00007fff5fbfee98) + 38 at hello-world.cc:40, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1


So at startup there is only one thread which is what we expected. Lets skip ahead to where we create the platform:

Platform* platform = platform::CreateDefaultPlatform();
...
DefaultPlatform* platform = new DefaultPlatform(idle_task_support, tracing_controller);



Next there is a check for 0 and the number of processors -1 is used as the size of the thread pool:

(lldb) fr v thread_pool_size


This is all that SetThreadPoolSize does. After this we have:

platform->EnsureInitialized();

for (int i = 0; i < thread_pool_size_; ++i)


new WorkerThread will create a new pthread (on my system which is MacOSX):

result = pthread_create(&data_->thread_, &attr, ThreadEntry, this);


ThreadEntry can be found in src/base/platform/platform-posix.

### International Component for Unicode (ICU)

International Components for Unicode (ICU) deals with internationalization (i18n). ICU provides support locale-sensitve string comparisons, date/time/number/currency formatting etc.

There is an optional API called ECMAScript 402 which V8 suppports and which is enabled by default. i18n-support says that even if your application does not use ICU you still need to call InitializeICU :

V8::InitializeICU();


### Local

Local<String> script_name = ...;


So what is script_name. Well it is an object reference that is managed by the v8 GC. The GC needs to be able to move things (pointers around) and also track if things should be GC'd. Local handles as opposed to persistent handles are light weight and mostly used local operations. These handles are managed by HandleScopes so you must have a handlescope on the stack and the local is only valid as long as the handlescope is valid. This uses Resource Acquisition Is Initialization (RAII) so when the HandleScope instance goes out of scope it will remove all the Local instances.

The Local class (in include/v8.h) only has one member which is of type pointer to the type T. So for the above example it would be:

  String* val_;


You can find the available operations for a Local in include/v8.h.

(lldb) p script_name.IsEmpty()
(bool) $12 = false  A Local has overloaded a number of operators, for example ->: (lldb) p script_name->Length() (int)$14 = 7


Where Length is a method on the v8 String class.

The handle stack is not part of the C++ call stack, but the handle scopes are embedded in the C++ stack. Handle scopes can only be stack-allocated, not allocated with new.

### Persistent

https://v8.dev/docs/embed: Persistent handles provide a reference to a heap-allocated JavaScript Object, just like a local handle. There are two flavors, which differ in the lifetime management of the reference they handle. Use a persistent handle when you need to keep a reference to an object for more than one function call, or when handle lifetimes do not correspond to C++ scopes. Google Chrome, for example, uses persistent handles to refer to Document Object Model (DOM) nodes.

A persistent handle can be made weak, using PersistentBase::SetWeak, to trigger a callback from the garbage collector when the only references to an object are from weak persistent handles.

A UniquePersistent handle relies on C++ constructors and destructors to manage the lifetime of the underlying object. A Persistent can be constructed with its constructor, but must be explicitly cleared with Persistent::Reset.

So how is a persistent object created?
Let's write a test and find out (test/persistent-object_text.cc):

$make test/persistent-object_test$ ./test/persistent-object_test --gtest_filter=PersistentTest.value


Now, to create an instance of Persistent we need a Local instance or the Persistent instance will just be empty.

Local<Object> o = Local<Object>::New(isolate_, Object::New(isolate_));


Local<Object>::New can be found in src/api/api.cc:

Local<v8::Object> v8::Object::New(Isolate* isolate) {
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate);
LOG_API(i_isolate, Object, New);
ENTER_V8_NO_SCRIPT_NO_EXCEPTION(i_isolate);
i::Handle<i::JSObject> obj =
i_isolate->factory()->NewJSObject(i_isolate->object_function());
return Utils::ToLocal(obj);
}


The first thing that happens is that the public Isolate pointer is cast to an pointer to the internal Isolate type. LOG_API is a macro in the same source file (src/api/api.cc):

#define LOG_API(isolate, class_name, function_name)                           \
i::RuntimeCallTimerScope _runtime_timer(                                    \
isolate, i::RuntimeCallCounterId::kAPI_##class_name##_##function_name); \
LOG(isolate, ApiEntryCall("v8::" #class_name "::" #function_name))


If our case the preprocessor would expand that to:

  i::RuntimeCallTimerScope _runtime_timer(
isolate, i::RuntimeCallCounterId::kAPI_Object_New);
LOG(isolate, ApiEntryCall("v8::Object::New))


LOG is a macro that can be found in src/log.h:

#define LOG(isolate, Call)                              \
do {                                                  \
v8::internal::Logger* logger = (isolate)->logger(); \
if (logger->is_logging()) logger->Call;             \
} while (false)


And this would expand to:

  v8::internal::Logger* logger = isolate->logger();
if (logger->is_logging()) logger->ApiEntryCall("v8::Object::New");


So with the LOG_API macro expanded we have:

Local<v8::Object> v8::Object::New(Isolate* isolate) {
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate);
i::RuntimeCallTimerScope _runtime_timer( isolate, i::RuntimeCallCounterId::kAPI_Object_New);
v8::internal::Logger* logger = isolate->logger();
if (logger->is_logging()) logger->ApiEntryCall("v8::Object::New");

ENTER_V8_NO_SCRIPT_NO_EXCEPTION(i_isolate);
i::Handle<i::JSObject> obj =
i_isolate->factory()->NewJSObject(i_isolate->object_function());
return Utils::ToLocal(obj);
}


Next we have ENTER_V8_NO_SCRIPT_NO_EXCEPTION:

#define ENTER_V8_NO_SCRIPT_NO_EXCEPTION(isolate)                    \
i::VMState<v8::OTHER> __state__((isolate));                       \
i::DisallowJavascriptExecutionDebugOnly __no_script__((isolate)); \
i::DisallowExceptions __no_exceptions__((isolate))


So with the macros expanded we have:

Local<v8::Object> v8::Object::New(Isolate* isolate) {
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate);
i::RuntimeCallTimerScope _runtime_timer( isolate, i::RuntimeCallCounterId::kAPI_Object_New);
v8::internal::Logger* logger = isolate->logger();
if (logger->is_logging()) logger->ApiEntryCall("v8::Object::New");

i::VMState<v8::OTHER> __state__(i_isolate));
i::DisallowJavascriptExecutionDebugOnly __no_script__(i_isolate);
i::DisallowExceptions __no_exceptions__(i_isolate));

i::Handle<i::JSObject> obj =
i_isolate->factory()->NewJSObject(i_isolate->object_function());

return Utils::ToLocal(obj);
}


TODO: Look closer at VMState.

First, i_isolate->object_function() is called and the result passed to NewJSObject. object_function is generated by a macro named NATIVE_CONTEXT_FIELDS:

#define NATIVE_CONTEXT_FIELD_ACCESSOR(index, type, name)     \
Handle<type> Isolate::name() {                             \
return Handle<type>(raw_native_context()->name(), this); \
}                                                          \
bool Isolate::is_##name(type* value) {                     \
return raw_native_context()->is_##name(value);           \
}
NATIVE_CONTEXT_FIELDS(NATIVE_CONTEXT_FIELD_ACCESSOR)


NATIVE_CONTEXT_FIELDS is a macro in src/contexts and it c

#define NATIVE_CONTEXT_FIELDS(V)                                               \
...                                                                            \
V(OBJECT_FUNCTION_INDEX, JSFunction, object_function)                        \

  Handle<type> Isolate::object_function() {
return Handle<JSFunction>(raw_native_context()->object_function(), this);
}

bool Isolate::is_object_function(JSFunction* value) {
return raw_native_context()->is_object_function(value);
}


I'm not clear on the different types of context, there is a native context, a "normal/public" context. In src/contexts-inl.h we have the native_context function:

Context* Context::native_context() const {
Object* result = get(NATIVE_CONTEXT_INDEX);
DCHECK(IsBootstrappingOrNativeContext(this->GetIsolate(), result));
return reinterpret_cast<Context*>(result);
}


Context extends FixedArray so the get function is the get function of FixedArray and NATIVE_CONTEXT_INDEX is the index into the array where the native context is stored.

Now, lets take a closer look at NewJSObject. If you search for NewJSObject in src/heap/factory.cc:

Handle<JSObject> Factory::NewJSObject(Handle<JSFunction> constructor, PretenureFlag pretenure) {
JSFunction::EnsureHasInitialMap(constructor);
Handle<Map> map(constructor->initial_map(), isolate());
return NewJSObjectFromMap(map, pretenure);
}


NewJSObjectFromMap

...
HeapObject* obj = AllocateRawWithAllocationSite(map, pretenure, allocation_site);


So we have created a new map

### Map

So an HeapObject contains a pointer to a Map, or rather has a function that returns a pointer to Map. I can't see any member map in the HeapObject class.

Lets take a look at when a map is created.

(lldb) br s -f map_test.cc -l 63

Handle<Map> Factory::NewMap(InstanceType type,
int instance_size,
ElementsKind elements_kind,
int inobject_properties) {
HeapObject* result = isolate()->heap()->AllocateRawWithRetryOrFail(Map::kSize, MAP_SPACE);
result->set_map_after_allocation(*meta_map(), SKIP_WRITE_BARRIER);
return handle(InitializeMap(Map::cast(result), type, instance_size,
elements_kind, inobject_properties),
isolate());
}


We can see that the above is calling AllocateRawWithRetryOrFail on the heap instance passing a size of 88 and specifying the MAP_SPACE:

HeapObject* Heap::AllocateRawWithRetryOrFail(int size, AllocationSpace space,
AllocationAlignment alignment) {
AllocationResult alloc;
HeapObject* result = AllocateRawWithLigthRetry(size, space, alignment);
if (result) return result;

isolate()->counters()->gc_last_resort_from_handles()->Increment();
CollectAllAvailableGarbage(GarbageCollectionReason::kLastResort);
{
AlwaysAllocateScope scope(isolate());
alloc = AllocateRaw(size, space, alignment);
}
if (alloc.To(&result)) {
DCHECK(result != exception());
return result;
}
// TODO(1181417): Fix this.
FatalProcessOutOfMemory("CALL_AND_RETRY_LAST");
return nullptr;
}


The default value for alignment is kWordAligned. Reading the docs in the header it says that this function will try to perform an allocation of size 88 in the MAP_SPACE and if it fails a full GC will be performed and the allocation retried. Lets take a look at AllocateRawWithLigthRetry:

  AllocationResult alloc = AllocateRaw(size, space, alignment);


AllocateRaw can be found in src/heap/heap-inl.h. There are different paths that will be taken depending on the space parameteter. Since it is MAP_SPACE in our case we will focus on that path:

AllocationResult Heap::AllocateRaw(int size_in_bytes, AllocationSpace space, AllocationAlignment alignment) {
...
HeapObject* object = nullptr;
AllocationResult allocation;
if (OLD_SPACE == space) {
...
} else if (MAP_SPACE == space) {
allocation = map_space_->AllocateRawUnaligned(size_in_bytes);
}
...
}


map_space_ is a private member of Heap (src/heap/heap.h):

MapSpace* map_space_;


AllocateRawUnaligned can be found in src/heap/spaces-inl.h:

AllocationResult PagedSpace::AllocateRawUnaligned( int size_in_bytes, UpdateSkipList update_skip_list) {
if (!EnsureLinearAllocationArea(size_in_bytes)) {
return AllocationResult::Retry(identity());
}

HeapObject* object = AllocateLinearly(size_in_bytes);
return object;
}


The default value for update_skip_list is UPDATE_SKIP_LIST. So lets take a look at AllocateLinearly:

HeapObject* PagedSpace::AllocateLinearly(int size_in_bytes) {
Address new_top = current_top + size_in_bytes;
allocation_info_.set_top(new_top);
}


Recall that size_in_bytes in our case is 88.

(lldb) expr current_top
(v8::internal::Address) $5 = 24847457492680 (lldb) expr new_top (v8::internal::Address)$6 = 24847457492768
(lldb) expr new_top - current_top
(unsigned long) 7 = 88  Notice that first the top is set to the new_top and then the current_top is returned and that will be a pointer to the start of the object in memory (which in this case is of v8::internal::Map which is also of type HeapObject). I've been wondering why Map (and other HeapObject) don't have any member fields and only/mostly getters/setters for the various fields that make up an object. Well the answer is that pointers to instances of for example Map point to the first memory location of the instance. And the getters/setter functions use indexed to read/write to memory locations. The indexes are mostly in the form of enum fields that define the memory layout of the type. Next, in AllocateRawUnaligned we have the MSAN_ALLOCATED_UNINITIALIZED_MEMORY macro:  MSAN_ALLOCATED_UNINITIALIZED_MEMORY(object->address(), size_in_bytes);  MSAN_ALLOCATED_UNINITIALIZED_MEMORY can be found in src/msan.h and ms stands for Memory Sanitizer and would only be used if V8_US_MEMORY_SANITIZER is defined. The returned object will be used to construct an AllocationResult when returned. Back in AllocateRaw we have: if (allocation.To(&object)) { ... OnAllocationEvent(object, size_in_bytes); } return allocation;  This will return us in AllocateRawWithLightRetry: AllocationResult alloc = AllocateRaw(size, space, alignment); if (alloc.To(&result)) { DCHECK(result != exception()); return result; }  This will return us back in AllocateRawWithRetryOrFail:  HeapObject* result = AllocateRawWithLigthRetry(size, space, alignment); if (result) return result;  And that return will return to NewMap in src/heap/factory.cc:  result->set_map_after_allocation(*meta_map(), SKIP_WRITE_BARRIER); return handle(InitializeMap(Map::cast(result), type, instance_size, elements_kind, inobject_properties), isolate());  InitializeMap:  map->set_instance_type(type); map->set_prototype(*null_value(), SKIP_WRITE_BARRIER); map->set_constructor_or_backpointer(*null_value(), SKIP_WRITE_BARRIER); map->set_instance_size(instance_size); if (map->IsJSObjectMap()) { DCHECK(!isolate()->heap()->InReadOnlySpace(map)); map->SetInObjectPropertiesStartInWords(instance_size / kPointerSize - inobject_properties); DCHECK_EQ(map->GetInObjectProperties(), inobject_properties); map->set_prototype_validity_cell(*invalid_prototype_validity_cell()); } else { DCHECK_EQ(inobject_properties, 0); map->set_inobject_properties_start_or_constructor_function_index(0); map->set_prototype_validity_cell(Smi::FromInt(Map::kPrototypeChainValid)); } map->set_dependent_code(DependentCode::cast(*empty_fixed_array()), SKIP_WRITE_BARRIER); map->set_weak_cell_cache(Smi::kZero); map->set_raw_transitions(MaybeObject::FromSmi(Smi::kZero)); map->SetInObjectUnusedPropertyFields(inobject_properties); map->set_instance_descriptors(*empty_descriptor_array()); map->set_visitor_id(Map::GetVisitorId(map)); map->set_bit_field(0); map->set_bit_field2(Map::IsExtensibleBit::kMask); int bit_field3 = Map::EnumLengthBits::encode(kInvalidEnumCacheSentinel) | Map::OwnsDescriptorsBit::encode(true) | Map::ConstructionCounterBits::encode(Map::kNoSlackTracking); map->set_bit_field3(bit_field3); map->set_elements_kind(elements_kind); //HOLEY_ELEMENTS map->set_new_target_is_base(true); isolate()->counters()->maps_created()->Increment(); if (FLAG_trace_maps) LOG(isolate(), MapCreate(map)); return map;  Creating a new map (map_test.cc:  i::Handle<i::Map> map = i::Map::Create(asInternal(isolate_), 10); std::cout << map->instance_type() << '\n';  Map::Create can be found in objects.cc: Handle<Map> Map::Create(Isolate* isolate, int inobject_properties) { Handle<Map> copy = Copy(handle(isolate->object_function()->initial_map()), "MapCreate");  So, the first thing that will happen is isolate->object_function() will be called. This is function that is generated by the preprocessor. // from src/context.h #define NATIVE_CONTEXT_FIELDS(V) \ ... \ V(OBJECT_FUNCTION_INDEX, JSFunction, object_function) \ // from src/isolate.h #define NATIVE_CONTEXT_FIELD_ACCESSOR(index, type, name) \ Handle<type> Isolate::name() { \ return Handle<type>(raw_native_context()->name(), this); \ } \ bool Isolate::is_##name(type* value) { \ return raw_native_context()->is_##name(value); \ } NATIVE_CONTEXT_FIELDS(NATIVE_CONTEXT_FIELD_ACCESSOR)  object_function() will become:  Handle<JSFunction> Isolate::object_function() { return Handle<JSFunction>(raw_native_context()->object_function(), this); }  Lets look closer at JSFunction::initial_map() in in object-inl.h: Map* JSFunction::initial_map() { return Map::cast(prototype_or_initial_map()); }  prototype_or_initial_map is generated by a macro: ACCESSORS_CHECKED(JSFunction, prototype_or_initial_map, Object, kPrototypeOrInitialMapOffset, map()->has_prototype_slot())  ACCESSORS_CHECKED can be found in src/objects/object-macros.h: #define ACCESSORS_CHECKED(holder, name, type, offset, condition) \ ACCESSORS_CHECKED2(holder, name, type, offset, condition, condition) #define ACCESSORS_CHECKED2(holder, name, type, offset, get_condition, \ set_condition) \ type* holder::name() const { \ type* value = type::cast(READ_FIELD(this, offset)); \ DCHECK(get_condition); \ return value; \ } \ void holder::set_##name(type* value, WriteBarrierMode mode) { \ DCHECK(set_condition); \ WRITE_FIELD(this, offset, value); \ CONDITIONAL_WRITE_BARRIER(GetHeap(), this, offset, value, mode); \ } #define FIELD_ADDR(p, offset) \ (reinterpret_cast<Address>(p) + offset - kHeapObjectTag) #define READ_FIELD(p, offset) \ (*reinterpret_cast<Object* const*>(FIELD_ADDR(p, offset)))  The preprocessor will expand prototype_or_initial_map to:  JSFunction* JSFunction::prototype_or_initial_map() const { JSFunction* value = JSFunction::cast( (*reinterpret_cast<Object* const*>( (reinterpret_cast<Address>(this) + kPrototypeOrInitialMapOffset - kHeapObjectTag)))) DCHECK(map()->has_prototype_slot()); return value; }  Notice that map()->has_prototype_slot()) will be called first which looks like this: Map* HeapObject::map() const { return map_word().ToMap(); }  TODO: Add notes about MapWord MapWord HeapObject::map_word() const { return MapWord( reinterpret_cast<uintptr_t>(RELAXED_READ_FIELD(this, kMapOffset))); }  First thing that will happen is RELAXED_READ_FIELD(this, kMapOffset) #define RELAXED_READ_FIELD(p, offset) \ reinterpret_cast<Object*>(base::Relaxed_Load( \ reinterpret_cast<const base::AtomicWord*>(FIELD_ADDR(p, offset)))) #define FIELD_ADDR(p, offset) \ (reinterpret_cast<Address>(p) + offset - kHeapObjectTag)  This will get expanded by the preprocessor to:  reinterpret_cast<Object*>(base::Relaxed_Load( reinterpret_cast<const base::AtomicWord*>( (reinterpret_cast<Address>(this) + kMapOffset - kHeapObjectTag)))  src/base/atomicops_internals_portable.h: inline Atomic8 Relaxed_Load(volatile const Atomic8* ptr) { return __atomic_load_n(ptr, __ATOMIC_RELAXED); }  So this will do an atomoic load of the ptr with the memory order of __ATOMIC_RELELAXED. ACCESSORS_CHECKED also generates a set_prototyp_or_initial_map:  void JSFunction::set_prototype_or_initial_map(JSFunction* value, WriteBarrierMode mode) { DCHECK(map()->has_prototype_slot()); WRITE_FIELD(this, kPrototypeOrInitialMapOffset, value); CONDITIONAL_WRITE_BARRIER(GetHeap(), this, kPrototypeOrInitialMapOffset, value, mode); }  What does WRITE_FIELD do? #define WRITE_FIELD(p, offset, value) \ base::Relaxed_Store( \ reinterpret_cast<base::AtomicWord*>(FIELD_ADDR(p, offset)), \ reinterpret_cast<base::AtomicWord>(value));  Which would expand into:  base::Relaxed_Store( \ reinterpret_cast<base::AtomicWord*>( (reinterpret_cast<Address>(this) + kPrototypeOrInitialMapOffset - kHeapObjectTag) reinterpret_cast<base::AtomicWord>(value));  Lets take a look at what instance_type does: InstanceType Map::instance_type() const { return static_cast<InstanceType>(READ_UINT16_FIELD(this, kInstanceTypeOffset)); }  To see what the above is doing we can do the same thing in the debugger: Note that I got 11 below from map->kInstanceTypeOffset - i::kHeapObjectTag (lldb) memory read -f u -c 1 -s 8 *map + 11 0x6d4e6609ed4: 585472345729139745 (lldb) expr static_cast<InstanceType>(585472345729139745) (v8::internal::InstanceType)34 = JS_OBJECT_TYPE


Take map->has_non_instance_prototype():

(lldb) br s -n has_non_instance_prototype
(lldb) expr -i 0 -- map->has_non_instance_prototype()


The above command will break in src/objects/map-inl.h:

BIT_FIELD_ACCESSORS(Map, bit_field, has_non_instance_prototype, Map::HasNonInstancePrototypeBit)

// src/objects/object-macros.h
#define BIT_FIELD_ACCESSORS(holder, field, name, BitField)      \
typename BitField::FieldType holder::name() const {           \
return BitField::decode(field());                           \
}                                                             \
void holder::set_##name(typename BitField::FieldType value) { \
set_##field(BitField::update(field(), value));              \
}


The preprocessor will expand that to:

  typename Map::HasNonInstancePrototypeBit::FieldType Map::has_non_instance_prototype() const {
return Map::HasNonInstancePrototypeBit::decode(bit_field());
}                                                             \
void holder::set_has_non_instance_prototype(typename BitField::FieldType value) { \
set_bit_field(Map::HasNonInstancePrototypeBit::update(bit_field(), value));              \
}


So where can we find Map::HasNonInstancePrototypeBit?
It is generated by a macro in src/objects/map.h:

// Bit positions for |bit_field|.
#define MAP_BIT_FIELD_FIELDS(V, _)          \
V(HasNonInstancePrototypeBit, bool, 1, _) \
...
DEFINE_BIT_FIELDS(MAP_BIT_FIELD_FIELDS)
#undef MAP_BIT_FIELD_FIELDS

#define DEFINE_BIT_FIELDS(LIST_MACRO) \
DEFINE_BIT_RANGES(LIST_MACRO)       \
LIST_MACRO(DEFINE_BIT_FIELD_TYPE, LIST_MACRO##_Ranges)

#define DEFINE_BIT_RANGES(LIST_MACRO)                               \
struct LIST_MACRO##_Ranges {                                      \
enum { LIST_MACRO(DEFINE_BIT_FIELD_RANGE_TYPE, _) kBitsCount }; \
};

#define DEFINE_BIT_FIELD_RANGE_TYPE(Name, Type, Size, _) \
k##Name##Start, k##Name##End = k##Name##Start + Size - 1,


Alright, lets see what preprocessor expands that to:

  struct MAP_BIT_FIELD_FIELDS_Ranges {
enum {
kHasNonInstancePrototypeBitStart,
kHasNonInstancePrototypeBitEnd = kHasNonInstancePrototypeBitStart + 1 - 1,
... // not showing the rest of the entries.
kBitsCount
};
};


So this would create a struct with an enum and it could be accessed using: i::Map::MAP_BIT_FIELD_FIELDS_Ranges::kHasNonInstancePrototypeBitStart The next part of the macro is

  LIST_MACRO(DEFINE_BIT_FIELD_TYPE, LIST_MACRO##_Ranges)

#define DEFINE_BIT_FIELD_TYPE(Name, Type, Size, RangesName) \
typedef BitField<Type, RangesName::k##Name##Start, Size> Name;


Which will get expanded to:

  typedef BitField<HasNonInstancePrototypeBit, MAP_BIT_FIELD_FIELDS_Ranges::kHasNonInstancePrototypeBitStart, 1> HasNonInstancePrototypeBit;


So this is how HasNonInstancePrototypeBit is declared and notice that it is of type BitField which can be found in src/utils.h:

template<class T, int shift, int size>
class BitField : public BitFieldBase<T, shift, size, uint32_t> { };

template<class T, int shift, int size, class U>
class BitFieldBase {
public:
typedef T FieldType;


Map::HasNonInstancePrototypeBit::decode(bit_field()); first bit_field is called:

byte Map::bit_field() const { return READ_BYTE_FIELD(this, kBitFieldOffset); }


And the result of that is passed to Map::HasNonInstancePrototypeBit::decode:

(lldb) br s -n bit_field
(lldb) expr -i 0 --  map->bit_field()

byte Map::bit_field() const { return READ_BYTE_FIELD(this, kBitFieldOffset); }


So, this is the current Map instance, and we are going to read from.

#define READ_BYTE_FIELD(p, offset) \



Which will get expanded to:

byte Map::bit_field() const {
return *reinterpret_cast<const byte*>(
}


The instance_size is the instance_size_in_words << kPointerSizeLog2 (3 on my machine):

(lldb) memory read -f x -s 1 -c 1 *map+8
0x24d1cd509ed1: 0x03
(lldb) expr 0x03 << 3
(int) $2 = 24 (lldb) expr map->instance_size() (int)$3 = 24


i::HeapObject::kHeaderSize is 8 on my system which is used in the DEFINE_FIELD_OFFSET_CONSTANTS:

#define MAP_FIELDS(V)
V(kInstanceSizeInWordsOffset, kUInt8Size)
V(kInObjectPropertiesStartOrConstructorFunctionIndexOffset, kUInt8Size)
...


So we can use this information to read the inobject_properties_start_or_constructor_function_index directly from memory using:

(lldb) expr map->inobject_properties_start_or_constructor_function_index()
(lldb) memory read -f x -s 1 -c 1 map+9
error: address expression "map+9" evaluation failed
(lldb) memory read -f x -s 1 -c 1 *map+9
0x17b027209ed2: 0x03


Inspect the visitor_id (which is the last of the first byte):

lldb) memory read -f x -s 1 -c 1 *map+10
0x17b027209ed3: 0x15
(lldb) expr (int) 0x15
(int) $8 = 21 (lldb) expr map->visitor_id() (v8::internal::VisitorId)$11 = kVisitJSObjectFast
(lldb) expr (int) $11 (int)$12 = 21


Inspect the instance_type (which is part of the second byte):

(lldb) expr map->instance_type()
(v8::internal::InstanceType) $41 = JS_OBJECT_TYPE (lldb) expr v8::internal::InstanceType::JS_OBJECT_TYPE (uint16_t)$35 = 1057
(lldb) memory read -f x -s 2 -c 1 *map+11
0x17b027209ed4: 0x0421
(lldb) expr (int)0x0421
(int) $40 = 1057  Notice that instance_type is a short so that will take up 2 bytes (lldb) expr map->has_non_instance_prototype() (bool)$60 = false
(lldb) expr map->is_callable()
(bool) $46 = false (lldb) expr map->has_named_interceptor() (bool)$51 = false
(lldb) expr map->has_indexed_interceptor()
(bool) $55 = false (lldb) expr map->is_undetectable() (bool)$56 = false
(lldb) expr map->is_access_check_needed()
(bool) $57 = false (lldb) expr map->is_constructor() (bool)$58 = false
(lldb) expr map->has_prototype_slot()
(bool) $59 = false  Verify that the above is correct: (lldb) expr map->has_non_instance_prototype() (bool)$44 = false
(lldb) memory read -f x -s 1 -c 1 *map+13
0x17b027209ed6: 0x00

(lldb) expr map->set_has_non_instance_prototype(true)
(lldb) memory read -f x -s 1 -c 1 *map+13
0x17b027209ed6: 0x01

(lldb) expr map->set_has_prototype_slot(true)
(lldb) memory read -f x -s 1 -c 1 *map+13
0x17b027209ed6: 0x81


Inspect second int field (bit_field2):

(lldb) memory read -f x -s 1 -c 1 *map+14
0x17b027209ed7: 0x19
(lldb) expr map->is_extensible()
(bool) $78 = true (lldb) expr -- 0x19 & (1 << 0) (bool)$90 = 1

(lldb) expr map->is_prototype_map()
(bool) $79 = false (lldb) expr map->is_in_retained_map_list() (bool)$80 = false

(lldb) expr map->elements_kind()
(v8::internal::ElementsKind) $81 = HOLEY_ELEMENTS (lldb) expr v8::internal::ElementsKind::HOLEY_ELEMENTS (int)$133 = 3
(lldb) expr  0x19 >> 3
(int) $134 = 3  Inspect third int field (bit_field3): (lldb) memory read -f b -s 4 -c 1 *map+15 0x17b027209ed8: 0b00001000001000000000001111111111 (lldb) memory read -f x -s 4 -c 1 *map+15 0x17b027209ed8: 0x082003ff  So we know that a Map instance is a pointer allocated by the Heap and with a specific size. Fields are accessed using indexes (remember there are no member fields in the Map class). We also know that all HeapObject have a Map. The Map is sometimes referred to as the HiddenClass and sometimes the shape of an object. If two objects have the same properties they would share the same Map. This makes sense and I've see blog post that show this but I'd like to verify this to fully understand it. I'm going to try to match https://v8project.blogspot.com/2017/08/fast-properties.html with the code. So, lets take a look at adding a property to a JSObject. We start by creating a new Map and then use it to create a new JSObject:  i::Handle<i::Map> map = factory->NewMap(i::JS_OBJECT_TYPE, 32); i::Handle<i::JSObject> js_object = factory->NewJSObjectFromMap(map); i::Handle<i::String> prop_name = factory->InternalizeUtf8String("prop_name"); i::Handle<i::String> prop_value = factory->InternalizeUtf8String("prop_value"); i::JSObject::AddProperty(js_object, prop_name, prop_value, i::NONE);  Lets take a closer look at AddProperty and how it interacts with the Map. This function can be found in src/objects.cc: void JSObject::AddProperty(Handle<JSObject> object, Handle<Name> name, Handle<Object> value, PropertyAttributes attributes) { LookupIterator it(object, name, object, LookupIterator::OWN_SKIP_INTERCEPTOR); CHECK_NE(LookupIterator::ACCESS_CHECK, it.state());  First we have the LookupIterator constructor (src/lookup.h) but since this is a new property which we know does not exist it will not find any property. CHECK(AddDataProperty(&it, value, attributes, kThrowOnError, CERTAINLY_NOT_STORE_FROM_KEYED) .IsJust());   Handle<JSReceiver> receiver = it->GetStoreTarget<JSReceiver>(); ... it->UpdateProtector(); // Migrate to the most up-to-date map that will be able to store |value| // under it->name() with |attributes|. it->PrepareTransitionToDataProperty(receiver, value, attributes, store_mode); DCHECK_EQ(LookupIterator::TRANSITION, it->state()); it->ApplyTransitionToDataProperty(receiver); // Write the property value. it->WriteDataValue(value, true);  PrepareTransitionToDataProperty:  Representation representation = value->OptimalRepresentation(); Handle<FieldType> type = value->OptimalType(isolate, representation); maybe_map = Map::CopyWithField(map, name, type, attributes, constness, representation, flag);  Map::CopyWithField:  Descriptor d = Descriptor::DataField(name, index, attributes, constness, representation, wrapped_type);  Lets take a closer look the Decriptor which can be found in src/property.cc: Descriptor Descriptor::DataField(Handle<Name> key, int field_index, PropertyAttributes attributes, PropertyConstness constness, Representation representation, MaybeObjectHandle wrapped_field_type) { DCHECK(wrapped_field_type->IsSmi() || wrapped_field_type->IsWeakHeapObject()); PropertyDetails details(kData, attributes, kField, constness, representation, field_index); return Descriptor(key, wrapped_field_type, details); }  Descriptor is declared in src/property.h and describes the elements in a instance-descriptor array. These are returned when calling map->instance_descriptors(). Let check some of the arguments: (lldb) job *key #prop_name (lldb) expr attributes (v8::internal::PropertyAttributes)$27 = NONE
(lldb) expr constness
(v8::internal::PropertyConstness) $28 = kMutable (lldb) expr representation (v8::internal::Representation)$29 = (kind_ = '\b')


The Descriptor class contains three members:

 private:
Handle<Name> key_;
MaybeObjectHandle value_;
PropertyDetails details_;


Lets take a closer look PropertyDetails which only has a single member named value_

  uint32_t value_;


It also declares a number of classes the extend BitField, for example:

class KindField : public BitField<PropertyKind, 0, 1> {};
class LocationField : public BitField<PropertyLocation, KindField::kNext, 1> {};
class ConstnessField : public BitField<PropertyConstness, LocationField::kNext, 1> {};
class AttributesField : public BitField<PropertyAttributes, ConstnessField::kNext, 3> {};
class PropertyCellTypeField : public BitField<PropertyCellType, AttributesField::kNext, 2> {};
class DictionaryStorageField : public BitField<uint32_t, PropertyCellTypeField::kNext, 23> {};

// Bit fields for fast objects.
class RepresentationField : public BitField<uint32_t, AttributesField::kNext, 4> {};
class DescriptorPointer : public BitField<uint32_t, RepresentationField::kNext, kDescriptorIndexBitCount> {};
class FieldIndexField : public BitField<uint32_t, DescriptorPointer::kNext, kDescriptorIndexBitCount> {

enum PropertyKind { kData = 0, kAccessor = 1 };
enum PropertyLocation { kField = 0, kDescriptor = 1 };
enum class PropertyConstness { kMutable = 0, kConst = 1 };
enum PropertyAttributes {
NONE = ::v8::None,
DONT_ENUM = ::v8::DontEnum,
DONT_DELETE = ::v8::DontDelete,
SEALED = DONT_DELETE,
ABSENT = 64,  // Used in runtime to indicate a property is absent.
// ABSENT can never be stored in or returned from a descriptor's attributes
// bitfield.  It is only used as a return value meaning the attributes of
// a non-existent property.
};
enum class PropertyCellType {
// Meaningful when a property cell does not contain the hole.
kUndefined,     // The PREMONOMORPHIC of property cells.
kConstant,      // Cell has been assigned only once.
kConstantType,  // Cell has been assigned only one type.
kMutable,       // Cell will no longer be tracked as constant.
// Meaningful when a property cell contains the hole.
kUninitialized = kUndefined,  // Cell has never been initialized.
kInvalidated = kConstant,     // Cell has been deleted, invalidated or never
// existed.
// For dictionaries not holding cells.
kNoCell = kMutable,
};

template<class T, int shift, int size>
class BitField : public BitFieldBase<T, shift, size, uint32_t> { };


The Type T of KindField will be PropertyKind, the shift will be 0 , and the size 1. Notice that LocationField is using KindField::kNext as its shift. This is a static class constant of type uint32_t and is defined as:

static const U kNext = kShift + kSize;


So LocationField would get the value from KindField which should be:

class LocationField : public BitField<PropertyLocation, 1, 1> {};


The constructor for PropertyDetails looks like this:

PropertyDetails(PropertyKind kind, PropertyAttributes attributes, PropertyCellType cell_type, int dictionary_index = 0) {
value_ = KindField::encode(kind) | LocationField::encode(kField) |
AttributesField::encode(attributes) |
DictionaryStorageField::encode(dictionary_index) |
PropertyCellTypeField::encode(cell_type);
}


So what does KindField::encode(kind) actualy do then?

(lldb) expr static_cast<uint32_t>(kind())
(uint32_t) $36 = 0 (lldb) expr static_cast<uint32_t>(kind()) << 0 (uint32_t)$37 = 0


This value is later returned by calling kind():

PropertyKind kind() const { return KindField::decode(value_); }


So we have all this information about this property, its type (Representation), constness, if it is read-only, enumerable, deletable, sealed, frozen. After that little detour we are back in Descriptor::DataField:

  return Descriptor(key, wrapped_field_type, details);


Here we are using the key (name of the property), the wrapped_field_type, and PropertyDetails we created. What is wrapped_field_type again?
If we back up a few frames back into Map::TransitionToDataProperty we can see that the type passed in is taken from the following code:

  Representation representation = value->OptimalRepresentation();
Handle<FieldType> type = value->OptimalType(isolate, representation);


So this is only taking the type of the field:

(lldb) expr representation.kind()
(v8::internal::Representation::Kind) $51 = kHeapObject  This makes sense as the map only deals with the shape of the propery and not the value. Next in Map::CopyWithField we have:  Handle<Map> new_map = Map::CopyAddDescriptor(map, &d, flag);  CopyAddDescriptor does:  Handle<DescriptorArray> descriptors(map->instance_descriptors()); int nof = map->NumberOfOwnDescriptors(); Handle<DescriptorArray> new_descriptors = DescriptorArray::CopyUpTo(descriptors, nof, 1); new_descriptors->Append(descriptor); Handle<LayoutDescriptor> new_layout_descriptor = FLAG_unbox_double_fields ? LayoutDescriptor::New(map, new_descriptors, nof + 1) : handle(LayoutDescriptor::FastPointerLayout(), map->GetIsolate()); return CopyReplaceDescriptors(map, new_descriptors, new_layout_descriptor, flag, descriptor->GetKey(), "CopyAddDescriptor", SIMPLE_PROPERTY_TRANSITION);  Lets take a closer look at LayoutDescriptor (lldb) expr new_layout_descriptor->Print() Layout descriptor: <all tagged>  TODO: Take a closer look at LayoutDescritpor Later when actually adding the value in Object::AddDataProperty:  it->WriteDataValue(value, true);  This call will end up in src/lookup.cc and in our case the path will be the following call:  JSObject::cast(*holder)->WriteToField(descriptor_number(), property_details_, *value);  TODO: Take a closer look at LookupIterator. WriteToField can be found in src/objects-inl.h:  FieldIndex index = FieldIndex::ForDescriptor(map(), descriptor);  FieldIndex::ForDescriptor can be found in src/field-index-inl.h: inline FieldIndex FieldIndex::ForDescriptor(const Map* map, int descriptor_index) { PropertyDetails details = map->instance_descriptors()->GetDetails(descriptor_index); int field_index = details.field_index(); return ForPropertyIndex(map, field_index, details.representation()); }  Notice that this is calling instance_descriptors() on the passed-in map. This as we recall from earlier returns and DescriptorArray (which is a type of WeakFixedArray). A Descriptor array Our DecsriptorArray only has one entry: (lldb) expr map->instance_descriptors()->number_of_descriptors() (int)$6 = 1
(lldb) expr map->instance_descriptors()->GetKey(0)->Print()
#prop_name
(lldb) expr map->instance_descriptors()->GetFieldIndex(0)
(int) $11 = 0  We can also use Print on the DescriptorArray: lldb) expr map->instance_descriptors()->Print() [0]: #prop_name (data field 0:h, p: 0, attrs: [WEC]) @ Any  In our case we are accessing the PropertyDetails and then getting the field_index which I think tells us where in the object the value for this property is stored. The last call in ForDescriptor is ForProperty: inline FieldIndex FieldIndex::ForPropertyIndex(const Map* map, int property_index, Representation representation) { int inobject_properties = map->GetInObjectProperties(); bool is_inobject = property_index < inobject_properties; int first_inobject_offset; int offset; if (is_inobject) { first_inobject_offset = map->GetInObjectPropertyOffset(0); offset = map->GetInObjectPropertyOffset(property_index); } else { first_inobject_offset = FixedArray::kHeaderSize; property_index -= inobject_properties; offset = FixedArray::kHeaderSize + property_index * kPointerSize; } Encoding encoding = FieldEncoding(representation); return FieldIndex(is_inobject, offset, encoding, inobject_properties, first_inobject_offset); }  I was expecting inobject_propertis to be 1 here but it is 0: (lldb) expr inobject_properties (int)$14 = 0


Why is that, what am I missing?
These in-object properties are stored directly on the object instance and not do not use the properties array. All get back to an example of this later to clarify this. TODO: Add in-object properties example.

Back in JSObject::WriteToField:

  RawFastPropertyAtPut(index, value);

void JSObject::RawFastPropertyAtPut(FieldIndex index, Object* value) {
if (index.is_inobject()) {
int offset = index.offset();
WRITE_FIELD(this, offset, value);
WRITE_BARRIER(GetHeap(), this, offset, value);
} else {
property_array()->set(index.outobject_array_index(), value);
}
}


In our case we know that the index is not inobject()

(lldb) expr index.is_inobject()
(bool) $18 = false  So, property_array()->set() will be called. (lldb) expr this (v8::internal::JSObject *)$21 = 0x00002c31c6a88b59


JSObject inherits from JSReceiver which is where the property_array() function is declared.

  inline PropertyArray* property_array() const;

(lldb) expr property_array()->Print()
0x2c31c6a88bb1: [PropertyArray]
- map: 0x2c31f5603e21 <Map>
- length: 3
- hash: 0
0: 0x2c31f56025a1 <Odd Oddball: uninitialized>
1-2: 0x2c31f56026f1 <undefined>
(lldb) expr index.outobject_array_index()
(int) $26 = 0 (lldb) expr value->Print() #prop_value  Looking at the above values printed we should see the property be written to entry 0. (lldb) expr property_array()->get(0)->Print() #uninitialized // after call to set (lldb) expr property_array()->get(0)->Print() #prop_value  (lldb) expr map->instance_descriptors() (v8::internal::DescriptorArray *)$4 = 0x000039a927082339


So a map has an pointer array of instance of DescriptorArray

(lldb) expr map->GetInObjectProperties()
(int) 19 = 1  Each Map has int that tells us the number of properties it has. This is the number specified when creating a new Map, for example: i::Handle<i::Map> map = i::Map::Create(asInternal(isolate_), 1);  But at this stage we don't really have any properties. The value for a property is associated with the actual instance of the Object. What the Map specifies is index of the value for a particualar property. #### Creating a Map instance Lets take a look at when a map is created. (lldb) br s -f map_test.cc -l 63  Handle<Map> Factory::NewMap(InstanceType type, int instance_size, ElementsKind elements_kind, int inobject_properties) { HeapObject* result = isolate()->heap()->AllocateRawWithRetryOrFail(Map::kSize, MAP_SPACE); result->set_map_after_allocation(*meta_map(), SKIP_WRITE_BARRIER); return handle(InitializeMap(Map::cast(result), type, instance_size, elements_kind, inobject_properties), isolate()); }  We can see that the above is calling AllocateRawWithRetryOrFail on the heap instance passing a size of 88 and specifying the MAP_SPACE: HeapObject* Heap::AllocateRawWithRetryOrFail(int size, AllocationSpace space, AllocationAlignment alignment) { AllocationResult alloc; HeapObject* result = AllocateRawWithLigthRetry(size, space, alignment); if (result) return result; isolate()->counters()->gc_last_resort_from_handles()->Increment(); CollectAllAvailableGarbage(GarbageCollectionReason::kLastResort); { AlwaysAllocateScope scope(isolate()); alloc = AllocateRaw(size, space, alignment); } if (alloc.To(&result)) { DCHECK(result != exception()); return result; } // TODO(1181417): Fix this. FatalProcessOutOfMemory("CALL_AND_RETRY_LAST"); return nullptr; }  The default value for alignment is kWordAligned. Reading the docs in the header it says that this function will try to perform an allocation of size 88 in the MAP_SPACE and if it fails a full GC will be performed and the allocation retried. Lets take a look at AllocateRawWithLigthRetry:  AllocationResult alloc = AllocateRaw(size, space, alignment);  AllocateRaw can be found in src/heap/heap-inl.h. There are different paths that will be taken depending on the space parameteter. Since it is MAP_SPACE in our case we will focus on that path: AllocationResult Heap::AllocateRaw(int size_in_bytes, AllocationSpace space, AllocationAlignment alignment) { ... HeapObject* object = nullptr; AllocationResult allocation; if (OLD_SPACE == space) { ... } else if (MAP_SPACE == space) { allocation = map_space_->AllocateRawUnaligned(size_in_bytes); } ... }  map_space_ is a private member of Heap (src/heap/heap.h): MapSpace* map_space_;  AllocateRawUnaligned can be found in src/heap/spaces-inl.h: AllocationResult PagedSpace::AllocateRawUnaligned( int size_in_bytes, UpdateSkipList update_skip_list) { if (!EnsureLinearAllocationArea(size_in_bytes)) { return AllocationResult::Retry(identity()); } HeapObject* object = AllocateLinearly(size_in_bytes); MSAN_ALLOCATED_UNINITIALIZED_MEMORY(object->address(), size_in_bytes); return object; }  The default value for update_skip_list is UPDATE_SKIP_LIST. So lets take a look at AllocateLinearly: HeapObject* PagedSpace::AllocateLinearly(int size_in_bytes) { Address current_top = allocation_info_.top(); Address new_top = current_top + size_in_bytes; allocation_info_.set_top(new_top); return HeapObject::FromAddress(current_top); }  Recall that size_in_bytes in our case is 88. (lldb) expr current_top (v8::internal::Address)5 = 24847457492680
(lldb) expr new_top
(v8::internal::Address) $6 = 24847457492768 (lldb) expr new_top - current_top (unsigned long)$7 = 88


Notice that first the top is set to the new_top and then the current_top is returned and that will be a pointer to the start of the object in memory (which in this case is of v8::internal::Map which is also of type HeapObject). I've been wondering why Map (and other HeapObject) don't have any member fields and only/mostly getters/setters for the various fields that make up an object. Well the answer is that pointers to instances of for example Map point to the first memory location of the instance. And the getters/setter functions use indexed to read/write to memory locations. The indexes are mostly in the form of enum fields that define the memory layout of the type.

Next, in AllocateRawUnaligned we have the MSAN_ALLOCATED_UNINITIALIZED_MEMORY macro:

  MSAN_ALLOCATED_UNINITIALIZED_MEMORY(object->address(), size_in_bytes);


MSAN_ALLOCATED_UNINITIALIZED_MEMORY can be found in src/msan.h and ms stands for Memory Sanitizer and would only be used if V8_US_MEMORY_SANITIZER is defined. The returned object will be used to construct an AllocationResult when returned. Back in AllocateRaw we have:

if (allocation.To(&object)) {
...
OnAllocationEvent(object, size_in_bytes);
}

return allocation;


This will return us in AllocateRawWithLightRetry:

AllocationResult alloc = AllocateRaw(size, space, alignment);
if (alloc.To(&result)) {
DCHECK(result != exception());
return result;
}


This will return us back in AllocateRawWithRetryOrFail:

  HeapObject* result = AllocateRawWithLigthRetry(size, space, alignment);
if (result) return result;


And that return will return to NewMap in src/heap/factory.cc:

  result->set_map_after_allocation(*meta_map(), SKIP_WRITE_BARRIER);
return handle(InitializeMap(Map::cast(result), type, instance_size,
elements_kind, inobject_properties),
isolate());


InitializeMap:

  map->set_instance_type(type);
map->set_prototype(*null_value(), SKIP_WRITE_BARRIER);
map->set_constructor_or_backpointer(*null_value(), SKIP_WRITE_BARRIER);
map->set_instance_size(instance_size);
if (map->IsJSObjectMap()) {
map->SetInObjectPropertiesStartInWords(instance_size / kPointerSize - inobject_properties);
DCHECK_EQ(map->GetInObjectProperties(), inobject_properties);
map->set_prototype_validity_cell(*invalid_prototype_validity_cell());
} else {
DCHECK_EQ(inobject_properties, 0);
map->set_inobject_properties_start_or_constructor_function_index(0);
map->set_prototype_validity_cell(Smi::FromInt(Map::kPrototypeChainValid));
}
map->set_dependent_code(DependentCode::cast(*empty_fixed_array()), SKIP_WRITE_BARRIER);
map->set_weak_cell_cache(Smi::kZero);
map->set_raw_transitions(MaybeObject::FromSmi(Smi::kZero));
map->SetInObjectUnusedPropertyFields(inobject_properties);
map->set_instance_descriptors(*empty_descriptor_array());

map->set_visitor_id(Map::GetVisitorId(map));
map->set_bit_field(0);
int bit_field3 = Map::EnumLengthBits::encode(kInvalidEnumCacheSentinel) |
Map::OwnsDescriptorsBit::encode(true) |
Map::ConstructionCounterBits::encode(Map::kNoSlackTracking);
map->set_bit_field3(bit_field3);
map->set_elements_kind(elements_kind); //HOLEY_ELEMENTS
map->set_new_target_is_base(true);
isolate()->counters()->maps_created()->Increment();
if (FLAG_trace_maps) LOG(isolate(), MapCreate(map));
return map;


### Context

Context extends FixedArray (src/context.h). So an instance of this Context is a FixedArray and we can use Get(index) etc to get entries in the array.

### V8_EXPORT

This can be found in quite a few places in v8 source code. For example:

class V8_EXPORT ArrayBuffer : public Object {


What is this?
It is a preprocessor macro which looks like this:

#if V8_HAS_ATTRIBUTE_VISIBILITY && defined(V8_SHARED)
# ifdef BUILDING_V8_SHARED
#  define V8_EXPORT __attribute__ ((visibility("default")))
# else
#  define V8_EXPORT
# endif
#else
# define V8_EXPORT
#endif


So we can see that if V8_HAS_ATTRIBUTE_VISIBILITY, and defined(V8_SHARED), and also if BUILDING_V8_SHARED, V8_EXPORT is set to __attribute__ ((visibility("default")). But in all other cases V8_EXPORT is empty and the preprocessor does not insert anything (nothing will be there come compile time). But what about the __attribute__ ((visibility("default")) what is this?

In the GNU compiler collection (GCC) environment, the term that is used for exporting is visibility. As it applies to functions and variables in a shared object, visibility refers to the ability of other shared objects to call a C/C++ function. Functions with default visibility have a global scope and can be called from other shared objects. Functions with hidden visibility have a local scope and cannot be called from other shared objects.

Visibility can be controlled by using either compiler options or visibility attributes. In your header files, wherever you want an interface or API made public outside the current Dynamic Shared Object (DSO) , place __attribute__ ((visibility ("default"))) in struct, class and function declarations you wish to make public. With -fvisibility=hidden, you are telling GCC that every declaration not explicitly marked with a visibility attribute has a hidden visibility. There is such a flag in build/common.gypi

### ToLocalChecked()

You'll see a few of these calls in the hello_world example:

  Local<String> source = String::NewFromUtf8(isolate, js, NewStringType::kNormal).ToLocalChecked();


NewFromUtf8 actually returns a Local wrapped in a MaybeLocal which forces a check to see if the Local<> is empty before using it. NewStringType is an enum which can be kNormalString (k for constant) or kInternalized.

The following is after running the preprocessor (clang -E src/api.cc):

# 5961 "src/api.cc"
Local<String> String::NewFromUtf8(Isolate* isolate,
const char* data,
NewStringType type,
int length) {
MaybeLocal<String> result;
if (length == 0) {
result = String::Empty(isolate);
} else if (length > i::String::kMaxLength) {
result = MaybeLocal<String>();
} else {
i::Isolate* i_isolate = reinterpret_cast<internal::Isolate*>(isolate);
i::VMState<v8::OTHER> __state__((i_isolate));
i::RuntimeCallTimerScope _runtime_timer( i_isolate, &i::RuntimeCallStats::API_String_NewFromUtf8);
LOG(i_isolate, ApiEntryCall("v8::" "String" "::" "NewFromUtf8"));
if (length < 0) length = StringLength(data);
i::Handle<i::String> handle_result = NewString(i_isolate->factory(), static_cast<v8::NewStringType>(type), i::Vector<const char>(data, length)) .ToHandleChecked();
result = Utils::ToLocal(handle_result);
};
return result.FromMaybe(Local<String>());;
}


I was wondering where the Utils::ToLocal was defined but could not find it until I found:

MAKE_TO_LOCAL(ToLocal, String, String)

#define MAKE_TO_LOCAL(Name, From, To)                                       \
Local<v8::To> Utils::Name(v8::internal::Handle<v8::internal::From> obj) {   \
return Convert<v8::internal::From, v8::To>(obj);                          \
}


The above can be found in src/api.h. The same goes for Local<Object>, Local<String> etc.

### Small Integers

Reading through v8.h I came accross // Tag information for Smi Smi stands for small integers.

A pointer is really just a integer that is treated like a memory address. We can use that memory address to get the start of the data located in that memory slot. But we can also just store an normal value like 18 in it. There might be cases where it does not make sense to store a small integer somewhere in the heap and have a pointer to it, but instead store the value directly in the pointer itself. But that only works for small integers so there needs to be away to know if the value we want is stored in the pointer or if we should follow the value stored to the heap to get the value.

A word on a 64 bit machine is 8 bytes (64 bits) and all of the pointers need to be aligned to multiples of 8. So a pointer could be:

1000       = 8
10000      = 16
11000      = 24
100000     = 32
1000000000 = 512


Remember that we are talking about the pointers and not the values store at the memory location they point to. We can see that there are always three bits that are zero in the pointers. So we can use them for something else and just mask them out when using them as pointers.

Tagging involves borrowing one bit of the 32-bit, making it 31-bit and having the leftover bit represent a tag. If the tag is zero then this is a plain value, but if tag is 1 then the pointer must be followed. This does not only have to be for numbers it could also be used for object (I think)

Instead the small integer is represented by the 32 bits plus a pointer to the 64-bit number. V8 needs to know if a value stored in memory represents a 32-bit integer, or if it is really a 64-bit number, in which case it has to follow the pointer to get the complete value. This is where the concept of tagging comes in.

### Properties/Elements

Take the following object:

{ firstname: "Jon", lastname: "Doe' }


The above object has two named properties. Named properties differ from integer indexed which is what you have when you are working with arrays.

Memory layout of JavaScript Object:

Properties                  JavaScript Object               Elements
+-----------+              +-----------------+         +----------------+
|property1  |<------+      | HiddenClass     |  +----->|                |
+-----------+       |      +-----------------+  |      +----------------+
|...        |       +------| Properties      |  |      | element1       |<------+
+-----------+              +-----------------+  |      +----------------+       |
|...        |              | Elements        |--+      | ...            |       |
+-----------+              +-----------------+         +----------------+       |
|propertyN  | <---------------------+                  | elementN       |       |
+-----------+                       |                  +----------------+       |
|                                           |
|                                           |
|                                           |
Named properties:    { firstname: "Jon", lastname: "Doe' } Indexed Properties: {1: "Jon", 2: "Doe"}


We can see that properies and elements are stored in different data structures. Elements are usually implemented as a plain array and the indexes can be used for fast access to the elements. But for the properties this is not the case. Instead there is a mapping between the property names and the index into the properties.

In src/objects/objects.h we can find JSObject:

class JSObject: public JSReceiver {
...
DECL_ACCESSORS(elements, FixedArrayBase)


And looking a the DECL_ACCESSOR macro:

#define DECL_ACCESSORS(name, type)    \
inline type* name() const;          \
inline void set_##name(type* value, \
WriteBarrierMode mode = UPDATE_WRITE_BARRIER);

inline FixedArrayBase* name() const;
inline void set_elements(FixedArrayBase* value, WriteBarrierMode = UPDATE_WRITE_BARRIER)


Notice that JSObject extends JSReceiver which is extended by all types that can have properties defined on them. I think this includes all JSObjects and JSProxy. It is in JSReceiver that the we find the properties array:

DECL_ACCESSORS(raw_properties_or_hash, Object)


Now properties (named properties not elements) can be of different kinds internally. These work just like simple dictionaries from the outside but a dictionary is only used in certain curcumstances at runtime.

Properties                  JSObject                    HiddenClass (Map)
+-----------+              +-----------------+         +----------------+
|property1  |<------+      | HiddenClass     |-------->| bit field1     |
+-----------+       |      +-----------------+         +----------------+
|...        |       +------| Properties      |         | bit field2     |
+-----------+              +-----------------+         +----------------+
|...        |              | Elements        |         | bit field3     |
+-----------+              +-----------------+         +----------------+
|propertyN  |              | property1       |
+-----------+              +-----------------+
| property2       |
+-----------------+
| ...             |
+-----------------+


#### JSObject

Each JSObject has as its first field a pointer to the generated HiddenClass. A hiddenclass contain mappings from property names to indices into the properties data type. When an instance of JSObject is created a Map is passed in. As mentioned earlier JSObject inherits from JSReceiver which inherits from HeapObject

For example,in jsobject_test.cc we first create a new Map using the internal Isolate Factory:

v8::internal::Handle<v8::internal::Map> map = factory->NewMap(v8::internal::JS_OBJECT_TYPE, 24);
v8::internal::Handle<v8::internal::JSObject> js_object = factory->NewJSObjectFromMap(map);
EXPECT_TRUE(js_object->HasFastProperties());


When we call js_object->HasFastProperties() this will delegate to the map instance:

return !map()->is_dictionary_map();


How do you add a property to a JSObject instance? Take a look at jsobject_test.cc for an example.

### Caching

Are ways to optimize polymorphic function calls in dynamic languages, for example JavaScript.

#### Lookup caches

Sending a message to a receiver requires the runtime to find the correct target method using the runtime type of the receiver. A lookup cache maps the type of the receiver/message name pair to methods and stores the most recently used lookup results. The cache is first consulted and if there is a cache miss a normal lookup is performed and the result stored in the cache.

#### Inline caches

Using a lookup cache as described above still takes a considerable amount of time since the cache must be probed for each message. It can be observed that the type of the target does often not vary. If a call to type A is done at a particular call site it is very likely that the next time it is called the type will also be A. The method address looked up by the system lookup routine can be cached and the call instruction can be overwritten. Subsequent calls for the same type can jump directly to the cached method and completely avoid the lookup. The prolog of the called method must verify that the receivers type has not changed and do the lookup if it has changed (the type if incorrect, no longer A for example).

The target methods address is stored in the callers code, or "inline" with the callers code, hence the name "inline cache".

If V8 is able to make a good assumption about the type of object that will be passed to a method, it can bypass the process of figuring out how to access the objects properties, and instead use the stored information from previous lookups to the objects hidden class.

#### Polymorfic Inline cache (PIC)

A polymorfic call site is one where there are many equally likely receiver types (and thus call targets).

• Monomorfic means there is only one receiver type
• Polymorfic a few receiver types
• Megamorfic very many receiver types

This type of caching extends inline caching to not just cache the last lookup, but cache all lookup results for a given polymorfic call site using a specially generated stub. Lets say we have a method that iterates through a list of types and calls a method. If all the types are the same (monomorfic) a PIC acts just like an inline cache. The calls will directly call the target method (with the method prolog followed by the method body). If a different type exists in the list there will be a cache miss in the prolog and the lookup routine called. In normal inline caching this would rebind the call, replacing the call to this types target method. This would happen each time the type changes.

With PIC the cache miss handler will generate a small stub routine and rebinds the call to this stub. The stub will check if the receiver is of a type that it has seen before and branch to the correct targets. Since the type of the target is already known at this point it can directly branch to the target method body without the need for the prolog. If the type has not been seen before it will be added to the stub to handle that type. Eventually the stub will contain all types used and there will be no more cache misses/lookups.

The problem is that we don't have type information so methods cannot be called directly, but instead be looked up. In a static language a virtual table might have been used. In JavaScript there is no inheritance relationship so it is not possible to know a vtable offset ahead of time. What can be done is to observe and learn about the "types" used in the program. When an object is seen it can be stored and the target of that method call can be stored and inlined into that call. Bascially the type will be checked and if that particular type has been seen before the method can just be invoked directly. But how do we check the type in a dynamic language? The answer is hidden classes which allow the VM to quickly check an object against a hidden class.

The inline caching source are located in src/ic.

## --trace-ic

$out/x64.debug/d8 --trace-ic --trace-maps class.js before [TraceMaps: Normalize from= 0x19a314288b89 to= 0x19a31428aff9 reason= NormalizeAsPrototype ] [TraceMaps: ReplaceDescriptors from= 0x19a31428aff9 to= 0x19a31428b051 reason= CopyAsPrototype ] [TraceMaps: InitialMap map= 0x19a31428afa1 SFI= 34_Person ] [StoreIC in ~Person+65 at class.js:2 (0->.) map=0x19a31428afa1 0x10e68ba83361 <String[4]: name>] [TraceMaps: Transition from= 0x19a31428afa1 to= 0x19a31428b0a9 name= name ] [StoreIC in ~Person+102 at class.js:3 (0->.) map=0x19a31428b0a9 0x2beaa25abd89 <String[3]: age>] [TraceMaps: Transition from= 0x19a31428b0a9 to= 0x19a31428b101 name= age ] [TraceMaps: SlowToFast from= 0x19a31428b051 to= 0x19a31428b159 reason= OptimizeAsPrototype ] [StoreIC in ~Person+65 at class.js:2 (.->1) map=0x19a31428afa1 0x10e68ba83361 <String[4]: name>] [StoreIC in ~Person+102 at class.js:3 (.->1) map=0x19a31428b0a9 0x2beaa25abd89 <String[3]: age>] [LoadIC in ~+546 at class.js:9 (0->.) map=0x19a31428b101 0x10e68ba83361 <String[4]: name>] [CallIC in ~+571 at class.js:9 (0->1) map=0x0 0x32f481082231 <String[5]: print>] Daniel [LoadIC in ~+642 at class.js:10 (0->.) map=0x19a31428b101 0x2beaa25abd89 <String[3]: age>] [CallIC in ~+667 at class.js:10 (0->1) map=0x0 0x32f481082231 <String[5]: print>] 41 [LoadIC in ~+738 at class.js:11 (0->.) map=0x19a31428b101 0x10e68ba83361 <String[4]: name>] [CallIC in ~+763 at class.js:11 (0->1) map=0x0 0x32f481082231 <String[5]: print>] Tilda [LoadIC in ~+834 at class.js:12 (0->.) map=0x19a31428b101 0x2beaa25abd89 <String[3]: age>] [CallIC in ~+859 at class.js:12 (0->1) map=0x0 0x32f481082231 <String[5]: print>] 2 [CallIC in ~+927 at class.js:13 (0->1) map=0x0 0x32f481082231 <String[5]: print>] after  LoadIC (0->.) means that it has transitioned from unititialized state (0) to pre-monomophic state (.) monomorphic state is specified with a 1. These states can be found in src/ic/ic.cc. What we are doing caching knowledge about the layout of the previously seen object inside the StoreIC/LoadIC calls. $ lldb -- out/x64.debug/d8 class.js


#### HeapObject

This class describes heap allocated objects. It is in this class we find information regarding the type of object. This information is contained in v8::internal::Map.

### v8::internal::Map

src/objects/map.h

• bit_field1
• bit_field2
• bit field3 contains information about the number of properties that this Map has, a pointer to an DescriptorArray. The DescriptorArray contains information like the name of the property, and the posistion where the value is stored in the JSObject. I noticed that this information available in src/objects/map.h.

#### DescriptorArray

Can be found in src/objects/descriptor-array.h. This class extends FixedArray and has the following entries:

[0] the number of descriptors it contains
[1] If uninitialized this will be Smi(0) otherwise an enum cache bridge which is a FixedArray of size 2:
[0] enum cache: FixedArray containing all own enumerable keys
[1] either Smi(0) or a pointer to a FixedArray with indices
[2] first key (and internalized String
[3] first descriptor


### Factory

Each Internal Isolate has a Factory which is used to create instances. This is because all handles needs to be allocated using the factory (src/heap/factory.h)

### Objects

All objects extend the abstract class Object (src/objects/objects.h).

### Oddball

This class extends HeapObject and describes null, undefined, true, and false objects.

#### Map

Extends HeapObject and all heap objects have a Map which describes the objects structure. This is where you can find the size of the instance, access to the inobject_properties.

### Compiler pipeline

When a script is compiled all of the top level code is parsed. These are function declarartions (but not the function bodies).

function f1() {       <- top level code
console.log('f1');  <- non top level
}

function f2() {       <- top level code
f1();               <- non top level
console.logg('f2'); <- non top level
}

f2();                 <- top level code
var i = 10;           <- top level code


The non top level code must be pre-parsed to check for syntax errors. The top level code is parsed and compiles by the full-codegen compiler. This compiler does not perform any optimizations and it's only task is to generate machine code as quickly as possible (this is pre turbofan)

Source ------> Parser  --------> Full-codegen ---------> Unoptimized Machine Code


So the whole script is parsed even though we only generated code for the top-level code. The pre-parse (the syntax checking) was not stored in any way. The functions are lazy stubs that when/if the function gets called the function get compiled. This means that the function has to be parsed (again, the first time was the pre-parse remember).

If a function is determined to be hot it will be optimized by one of the two optimizing compilers crankshaft for older parts of JavaScript or Turbofan for Web Assembly (WASM) and some of the newer es6 features.

The first time V8 sees a function it will parse it into an AST but not do any further processing of that tree until that function is used.

                     +-----> Full-codegen -----> Unoptimized code
/                               \/ /\       \
Parser  ------> AST -------> Cranshaft    -----> Optimized code  |
\                                           /
+-----> Turbofan     -----> Optimized code


Inline Cachine (IC) is done here which also help to gather type information. V8 also has a profiler thread which monitors which functions are hot and should be optimized. This profiling also allows V8 to find out information about types using IC. This type information can then be fed to Crankshaft/Turbofan. The type information is stored as a 8 bit value.

When a function is optimized the unoptimized code cannot be thrown away as it might be needed since JavaScript is highly dynamic the optimzed function migth change and the in that case we fallback to the unoptimzed code. This takes up alot of memory which may be important for low end devices. Also the time spent in parsing (twice) takes time.

The idea with Ignition is to be an bytecode interpreter and to reduce memory consumption, the bytecode is very consice compared to native code which can vary depending on the target platform. The whole source can be parsed and compiled, compared to the current pipeline the has the pre-parse and parse stages mentioned above. So even unused functions will get compiled. The bytecode becomes the source of truth instead of as before the AST.

Source ------> Parser  --------> Ignition-codegen ---------> Bytecode ---------> Turbofan ----> Optimized Code ---+
/\                                                  |
+--------------------------------------------------+

function bajja(a, b, c) {
var d = c - 100;
return a + d * b;
}

var result = bajja(2, 2, 150);
print(result);

$./d8 test.js --ignition --print_bytecode [generating bytecode for function: bajja] Parameter count 4 Frame size 8 14 E> 0x2eef8d9b103e @ 0 : 7f StackCheck 38 S> 0x2eef8d9b103f @ 1 : 03 64 LdaSmi [100] // load 100 38 E> 0x2eef8d9b1041 @ 3 : 2b 02 02 Sub a2, [2] // a2 is the third argument. a2 is an argument register 0x2eef8d9b1044 @ 6 : 1f fa Star r0 // r0 is a register for local variables. We only have one which is d 47 S> 0x2eef8d9b1046 @ 8 : 1e 03 Ldar a1 // LoaD accumulator from Register argument a1 which is b 60 E> 0x2eef8d9b1048 @ 10 : 2c fa 03 Mul r0, [3] // multiply that is our local variable in r0 56 E> 0x2eef8d9b104b @ 13 : 2a 04 04 Add a0, [4] // add that to our argument register 0 which is a 65 S> 0x2eef8d9b104e @ 16 : 83 Return // return the value in the accumulator?  ### Abstract Syntax Tree (AST) In src/ast/ast.h. You can print the ast using the --print-ast option for d8. Lets take the following javascript and look at the ast: const msg = 'testing'; console.log(msg);  $ d8 --print-ast simple.js
[generating interpreter code for user-defined function: ]
--- AST ---
FUNC at 0
. KIND 0
. SUSPEND COUNT 0
. NAME ""
. INFERRED NAME ""
. DECLS
. . VARIABLE (0x7ffe5285b0f8) (mode = CONST) "msg"
. BLOCK NOCOMPLETIONS at -1
. . EXPRESSION STATEMENT at 12
. . . INIT at 12
. . . . VAR PROXY context[4] (0x7ffe5285b0f8) (mode = CONST) "msg"
. . . . LITERAL "testing"
. EXPRESSION STATEMENT at 23
. . ASSIGN at -1
. . . VAR PROXY local[0] (0x7ffe5285b330) (mode = TEMPORARY) ".result"
. . . CALL Slot(0)
. . . . PROPERTY Slot(4) at 31
. . . . . VAR PROXY Slot(2) unallocated (0x7ffe5285b3d8) (mode = DYNAMIC_GLOBAL) "console"
. . . . . NAME log
. . . . VAR PROXY context[4] (0x7ffe5285b0f8) (mode = CONST) "msg"
. RETURN at -1
. . VAR PROXY local[0] (0x7ffe5285b330) (mode = TEMPORARY) ".result"


You can find the declaration of EXPRESSION in ast.h.

### Bytecode

Can be found in src/interpreter/bytecodes.h

• StackCheck checks that stack limits are not exceeded to guard against overflow.
• Star Store content in accumulator regiser in register (the operand).
• Ldar LoaD accumulator from Register argument a1 which is b

The registers are not machine registers, apart from the accumlator as I understand it, but would instead be stack allocated.

#### Parsing

Parsing is the parsing of the JavaScript and the generation of the abstract syntax tree. That tree is then visited and bytecode generated from it. This section tries to figure out where in the code these operations are performed.

For example, take the script example.

$make run-script$ lldb -- run-script
(lldb) br s -n main
(lldb) r


Lets take a look at the following line:

Local<Script> script = Script::Compile(context, source).ToLocalChecked();


This will land us in api.cc

ScriptCompiler::Source script_source(source);
return ScriptCompiler::Compile(context, &script_source);

MaybeLocal<Script> ScriptCompiler::Compile(Local<Context> context, Source* source, CompileOptions options) {
...
auto isolate = context->GetIsolate();
auto maybe = CompileUnboundInternal(isolate, source, options);


CompileUnboundInternal will call GetSharedFunctionInfoForScript (in src/compiler.cc):

result = i::Compiler::GetSharedFunctionInfoForScript(
str, name_obj, line_offset, column_offset, source->resource_options,
source_map_url, isolate->native_context(), NULL, &script_data, options,
i::NOT_NATIVES_CODE);

(lldb) br s -f compiler.cc -l 1259

LanguageMode language_mode = construct_language_mode(FLAG_use_strict);
(lldb) p language_mode
(v8::internal::LanguageMode) $10 = SLOPPY  LanguageMode can be found in src/globals.h and it is an enum with three values: enum LanguageMode : uint32_t { SLOPPY, STRICT, LANGUAGE_END };  SLOPPY mode, I assume, is the mode when there is no "use strict";. Remember that this can go inside a function and does not have to be at the top level of the file. ParseInfo parse_info(script);  There is a unit test that shows how a ParseInfo instance can be created and inspected. This will call ParseInfo's constructor (in src/parsing/parse-info.cc), and which will call ParseInfo::InitFromIsolate: DCHECK_NOT_NULL(isolate); set_hash_seed(isolate->heap()->HashSeed()); set_stack_limit(isolate->stack_guard()->real_climit()); set_unicode_cache(isolate->unicode_cache()); set_runtime_call_stats(isolate->counters()->runtime_call_stats()); set_ast_string_constants(isolate->ast_string_constants());  I was curious about these ast_string_constants: (lldb) p *ast_string_constants_ (const v8::internal::AstStringConstants)$58 = {
zone_ = {
allocation_size_ = 1312
segment_bytes_allocated_ = 8192
position_ = 0x0000000105052538 <no value available>
limit_ = 0x0000000105054000 <no value available>
allocator_ = 0x0000000103e00080
name_ = 0x0000000101623a70 "../../src/ast/ast-value-factory.h:365"
sealed_ = false
}
string_table_ = {
v8::base::TemplateHashMapImpl<void *, void *, v8::base::HashEqualityThenKeyMatcher<void *, bool (*)(void *, void *)>, v8::base::DefaultAllocationPolicy> = {
map_ = 0x0000000105054000
capacity_ = 64
occupancy_ = 41
match_ = {
match_ = 0x000000010014b260 (libv8.dylibv8::internal::AstRawString::Compare(void*, void*) at ast-value-factory.cc:122)
}
}
}
hash_seed_ = 500815076
anonymous_function_string_ = 0x0000000105052018
arguments_string_ = 0x0000000105052038
async_string_ = 0x0000000105052058
await_string_ = 0x0000000105052078
boolean_string_ = 0x0000000105052098
constructor_string_ = 0x00000001050520b8
default_string_ = 0x00000001050520d8
done_string_ = 0x00000001050520f8
dot_string_ = 0x0000000105052118
dot_for_string_ = 0x0000000105052138
dot_generator_object_string_ = 0x0000000105052158
dot_iterator_string_ = 0x0000000105052178
dot_result_string_ = 0x0000000105052198
dot_switch_tag_string_ = 0x00000001050521b8
dot_catch_string_ = 0x00000001050521d8
empty_string_ = 0x00000001050521f8
eval_string_ = 0x0000000105052218
function_string_ = 0x0000000105052238
get_space_string_ = 0x0000000105052258
length_string_ = 0x0000000105052278
let_string_ = 0x0000000105052298
name_string_ = 0x00000001050522b8
native_string_ = 0x00000001050522d8
new_target_string_ = 0x00000001050522f8
next_string_ = 0x0000000105052318
number_string_ = 0x0000000105052338
object_string_ = 0x0000000105052358
proto_string_ = 0x0000000105052378
prototype_string_ = 0x0000000105052398
return_string_ = 0x00000001050523b8
set_space_string_ = 0x00000001050523d8
star_default_star_string_ = 0x00000001050523f8
string_string_ = 0x0000000105052418
symbol_string_ = 0x0000000105052438
this_string_ = 0x0000000105052458
this_function_string_ = 0x0000000105052478
throw_string_ = 0x0000000105052498
undefined_string_ = 0x00000001050524b8
use_asm_string_ = 0x00000001050524d8
use_strict_string_ = 0x00000001050524f8
value_string_ = 0x0000000105052518
}


So these are constants that are set on the new ParseInfo instance using the values from the isolate. Not exactly sure what I want with this but I might come back to it later. So, we are back in ParseInfo's constructor:

set_allow_lazy_parsing();
set_toplevel();
set_script(script);


Script is of type v8::internal::Script which can be found in src/object/script.h

Back now in compiler.cc and the GetSharedFunctionInfoForScript function:

Zone compile_zone(isolate->allocator(), ZONE_NAME);

...
if (parse_info->literal() == nullptr && !parsing::ParseProgram(parse_info, isolate))


ParseProgram:

Parser parser(info);
...
FunctionLiteral* result = nullptr;
result = parser.ParseProgram(isolate, info);


parser.ParseProgram:

Handle<String> source(String::cast(info->script()->source()));

(lldb) job *source
"var user1 = new Person('Fletch');\x0avar user2 = new Person('Dr.Rosen');\x0aprint("user1 = " + user1.name);\x0aprint("user2 = " + user2.name);\x0a\x0a"


So here we can see our JavaScript as a String.

std::unique_ptr<Utf16CharacterStream> stream(ScannerStream::For(source));
scanner_.Initialize(stream.get(), info->is_module());
result = DoParseProgram(info);


DoParseProgram:

(lldb) br s -f parser.cc -l 639
...

this->scope()->SetLanguageMode(info->language_mode());
ParseStatementList(body, Token::EOS, &ok);


This call will land in parser-base.h and its ParseStatementList function.

(lldb) br s -f parser-base.h -l 4695

StatementT stat = ParseStatementListItem(CHECK_OK_CUSTOM(Return, kLazyParsingComplete));

result = CompileToplevel(&parse_info, isolate, Handle<SharedFunctionInfo>::null());


This will land in CompileTopelevel (in the same file which is src/compiler.cc):

// Compile the code.
result = CompileUnoptimizedCode(parse_info, shared_info, isolate);


This will land in CompileUnoptimizedCode (in the same file which is src/compiler.cc):

// Prepare and execute compilation of the outer-most function.
std::unique_ptr<CompilationJob> outer_job(
PrepareAndExecuteUnoptimizedCompileJob(parse_info, parse_info->literal(),
shared_info, isolate));

std::unique_ptr<CompilationJob> job(
interpreter::Interpreter::NewCompilationJob(parse_info, literal, isolate));
if (job->PrepareJob() == CompilationJob::SUCCEEDED &&
job->ExecuteJob() == CompilationJob::SUCCEEDED) {
return job;
}


PrepareJobImpl:

CodeGenerator::MakeCodePrologue(parse_info(), compilation_info(),
"interpreter");
return SUCCEEDED;


codegen.cc MakeCodePrologue:

interpreter.cc ExecuteJobImpl:

generator()->GenerateBytecode(stack_limit());


src/interpreter/bytecode-generator.cc

 RegisterAllocationScope register_scope(this);


The bytecode is register based (if that is the correct term) and we had an example previously. I'm guessing that this is what this call is about.

VisitDeclarations will iterate over all the declarations in the file which in our case are:

var user1 = new Person('Fletch');
var user2 = new Person('Dr.Rosen');

(lldb) p *variable->raw_name()
(const v8::internal::AstRawString) $33 = { = { next_ = 0x000000010600a280 string_ = 0x000000010600a280 } literal_bytes_ = (start_ = "user1", length_ = 5) hash_field_ = 1303438034 is_one_byte_ = true has_string_ = false } // Perform a stack-check before the body. builder()->StackCheck(info()->literal()->start_position());  So that call will output a stackcheck instruction, like in the example above: 14 E> 0x2eef8d9b103e @ 0 : 7f StackCheck  ### Performance Say you have the expression x + y the full-codegen compiler might produce: movq rax, x movq rbx, y callq RuntimeAdd  If x and y are integers just using the add operation would be much quicker: movq rax, x movq rbx, y add rax, rbx  Recall that functions are optimized so if the compiler has to bail out and unoptimize part of a function then the whole functions will be affected and it will go back to the unoptimized version. ## Bytecode This section will examine the bytecode for the following JavaScript: function beve() { const p = new Promise((resolve, reject) => { resolve('ok'); }); p.then(msg => { console.log(msg); }); } beve();$ d8 --print-bytecode promise.js


First have the main function which does not have a name:

[generating bytecode for function: ]
(The code that generated this can be found in src/objects.cc BytecodeArray::Dissassemble)
Parameter count 1
Frame size 32
// load what ever the FixedArray[4] is in the constant pool into the accumulator.
0x34423e7ac19e @    0 : 09 00             LdaConstant [0]
// store the FixedArray[4] in register r1
0x34423e7ac1a0 @    2 : 1e f9             Star r1
// store zero into the accumulator.
0x34423e7ac1a2 @    4 : 02                LdaZero
// store zero (the contents of the accumulator) into register r2.
0x34423e7ac1a3 @    5 : 1e f8             Star r2
//
0x34423e7ac1a5 @    7 : 1f fe f7          Mov <closure>, r3
0x34423e7ac1a8 @   10 : 53 96 01 f9 03    CallRuntime [DeclareGlobalsForInterpreter], r1-r3
0 E> 0x34423e7ac1ad @   15 : 90                StackCheck
141 S> 0x34423e7ac1ae @   16 : 0a 01 00          LdaGlobal [1], [0]
0x34423e7ac1b1 @   19 : 1e f9             Star r1
141 E> 0x34423e7ac1b3 @   21 : 4f f9 03          CallUndefinedReceiver0 r1, [3]
0x34423e7ac1b6 @   24 : 1e fa             Star r0
148 S> 0x34423e7ac1b8 @   26 : 94                Return

Constant pool (size = 2)
0x34423e7ac149: [FixedArray] in OldSpace
- map = 0x344252182309 <Map(HOLEY_ELEMENTS)>
- length: 2
0: 0x34423e7ac069 <FixedArray[4]>
1: 0x34423e7abf59 <String[4]: beve>

Handler Table (size = 16) Load the global with name in constant pool entry <name_index> into the
// accumulator using FeedBackVector slot <slot> outside of a typeof

• LdaConstant Load the constant at index from the constant pool into the accumulator.
• Star Store the contents of the accumulator register in dst.
• Ldar Load accumulator with value from register src.
• LdaGlobal Load the global with name in constant pool entry idx into the accumulator using FeedBackVector slot outside of a typeof.
• Mov , Store the value of register

You can find the declarations for the these instructions in src/interpreter/interpreter-generator.cc.

## FeedbackVector

Is attached to every function and is responsible for recording and managing all execution feedback, which is information about types enabling. You can find the declaration for this class in src/feedback-vector.h

## BytecodeGenerator

Is currently the only part of V8 that cares about the AST.

## BytecodeGraphBuilder

Produces high-level IR graph based on interpreter bytecodes.

## TurboFan

Is a compiler backend that gets fed a control flow graph and then does instruction selection, register allocation and code generation. The code generation generates

### Execution/Runtime

I'm not sure if V8 follows this exactly but I've heard and read that when the engine comes across a function declaration it only parses and verifies the syntax and saves a ref to the function name. The statements inside the function are not checked at this stage only the syntax of the function declaration (parenthesis, arguments, brackets etc).

### Function methods

The declaration of Function can be found in include/v8.h (just noting this as I've looked for it several times)

### Symbol

The declarations for the Symbol class can be found in v8.h and the internal implementation in src/api/api.cc.

The well known Symbols are generated using macros so you won't find the just by searching using the static function names like 'GetToPrimitive.

#define WELL_KNOWN_SYMBOLS(V)                 \
V(AsyncIterator, async_iterator)            \
V(HasInstance, has_instance)                \
V(Iterator, iterator)                       \
V(Match, match)                             \
V(Replace, replace)                         \
V(Search, search)                           \
V(Split, split)                             \
V(ToPrimitive, to_primitive)                \
V(ToStringTag, to_string_tag)               \
V(Unscopables, unscopables)

#define SYMBOL_GETTER(Name, name)                                   \
Local<Symbol> v8::Symbol::Get##Name(Isolate* isolate) {           \
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate); \
return Utils::ToLocal(i_isolate->factory()->name##_symbol());   \
}


So GetToPrimitive would become:

Local<Symbol> v8::Symbol::GeToPrimitive(Isolate* isolate) {
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate);
return Utils::ToLocal(i_isolate->factory()->to_primitive_symbol());
}


There is an example in symbol-test.cc.

## Builtins

Are JavaScript functions/objects that are provided by V8. These are built using a C++ DSL and are passed through:

CodeStubAssembler -> CodeAssembler -> RawMachineAssembler.


Builtins need to have bytecode generated for them so that they can be run in TurboFan.

src/code-stub-assembler.h

All the builtins are declared in src/builtins/builtins-definitions.h by the BUILTIN_LIST_BASE macro. There are different type of builtins (TF = Turbo Fan):

TFJ JavaScript linkage which means it is callable as a JavaScript function

TFS CodeStub linkage. A builtin with stub linkage can be used to extract common code into a separate code object which can then be used by multiple callers. These is useful because builtins are generated at compile time and included in the V8 snapshot. This means that they are part of every isolate that is created. Being able to share common code for multiple builtins will save space.

TFC CodeStub linkage with custom descriptor

To see how this works in action we first need to disable snapshots. If we don't, we won't be able to set breakpoints as the the heap will be serialized at compile time and deserialized upon startup of v8.

To find the option to disable snapshots use:

$gn args --list out.gn/learning --short | more ... v8_use_snapshot=true$ gn args out.gn/learning
v8_use_snapshot=false
$gn -C out.gn/learning  After building we should be able to set a break point in bootstrapper.cc and its function Genesis::InitializeGlobal: (lldb) br s -f bootstrapper.cc -l 2684  Lets take a look at how the JSON object is setup: Handle<String> name = factory->InternalizeUtf8String("JSON"); Handle<JSObject> json_object = factory->NewJSObject(isolate->object_function(), TENURED);  TENURED means that this object should be allocated directly in the old generation. JSObject::AddProperty(global, name, json_object, DONT_ENUM);  DONT_ENUM is checked by some builtin functions and if set this object will be ignored by those functions. SimpleInstallFunction(json_object, "parse", Builtins::kJsonParse, 2, false);  Here we can see that we are installing a function named parse, which takes 2 parameters. You can find the definition in src/builtins/builtins-json.cc. What does the SimpleInstallFunction do? Lets take console as an example which was created using: Handle<JSObject> console = factory->NewJSObject(cons, TENURED); JSObject::AddProperty(global, name, console, DONT_ENUM); SimpleInstallFunction(console, "debug", Builtins::kConsoleDebug, 1, false, NONE); V8_NOINLINE Handle<JSFunction> SimpleInstallFunction( Handle<JSObject> base, const char* name, Builtins::Name call, int len, bool adapt, PropertyAttributes attrs = DONT_ENUM, BuiltinFunctionId id = kInvalidBuiltinFunctionId) {  So we can see that base is our Handle to a JSObject, and name is "debug". Builtins::Name is Builtins:kConsoleDebug. Where is this defined? You can find a macro named CPP in src/builtins/builtins-definitions.h: CPP(ConsoleDebug) What does this macro expand to? It is part of the BUILTIN_LIST_BASE macro in builtin-definitions.h We have to look at where BUILTIN_LIST is used which we can find in builtins.cc. In builtins.cc we have an array of BuiltinMetadata which is declared as: const BuiltinMetadata builtin_metadata[] = { BUILTIN_LIST(DECL_CPP, DECL_API, DECL_TFJ, DECL_TFC, DECL_TFS, DECL_TFH, DECL_ASM) }; #define DECL_CPP(Name, ...) { #Name, Builtins::CPP, \ { FUNCTION_ADDR(Builtin_##Name) }},  Which will expand to the creation of a BuiltinMetadata struct entry in the array. The BuildintMetadata struct looks like this which might help understand what is going on: struct BuiltinMetadata { const char* name; Builtins::Kind kind; union { Address cpp_entry; // For CPP and API builtins. int8_t parameter_count; // For TFJ builtins. } kind_specific_data; };  So the CPP(ConsoleDebug) will expand to an entry in the array which would look something like this: { ConsoleDebug, Builtins::CPP, { reinterpret_cast<v8::internal::Address>(reinterpret_cast<intptr_t>(Builtin_ConsoleDebug)) } },  The third paramter is the creation on the union which might not be obvious. Back to the question I'm trying to answer which is: "Buildtins::Name is is Builtins:kConsoleDebug. Where is this defined?" For this we have to look at builtins.h and the enum Name: enum Name : int32_t { #define DEF_ENUM(Name, ...) k##Name, BUILTIN_LIST_ALL(DEF_ENUM) #undef DEF_ENUM builtin_count };  This will expand to the complete list of builtins in builtin-definitions.h using the DEF_ENUM macro. So the expansion for ConsoleDebug will look like: enum Name: int32_t { ... kDebugConsole, ... };  So backing up to looking at the arguments to SimpleInstallFunction which are: SimpleInstallFunction(console, "debug", Builtins::kConsoleDebug, 1, false, NONE); V8_NOINLINE Handle<JSFunction> SimpleInstallFunction( Handle<JSObject> base, const char* name, Builtins::Name call, int len, bool adapt, PropertyAttributes attrs = DONT_ENUM, BuiltinFunctionId id = kInvalidBuiltinFunctionId) {  We know about Builtins::Name, so lets look at len which is one, what is this? SimpleInstallFunction will call: Handle<JSFunction> fun = SimpleCreateFunction(base->GetIsolate(), function_name, call, len, adapt);  len would be used if adapt was true but it is false in our case. This is what it would be used for if adapt was true: fun->shared()->set_internal_formal_parameter_count(len);  I'm not exactly sure what adapt is referring to here. PropertyAttributes is not specified so it will get the default value of DONT_ENUM. The last parameter which is of type BuiltinFunctionId is not specified either so the default value of kInvalidBuiltinFunctionId will be used. This is an enum defined in src/objects/objects.h. This blog provides an example of adding a function to the String object. $ out.gn/learning/mksnapshot --print-code > output


You can then see the generated code from this. This will produce a code stub that can be called through C++. Lets update this to have it be called from JavaScript:

Update builtins/builtins-string-get.cc :

TF_BUILTIN(GetStringLength, StringBuiltinsAssembler) {
}


We also have to update builtins/builtins-definitions.h:

TFJ(GetStringLength, 0)


And bootstrapper.cc:

SimpleInstallFunction(prototype, "len", Builtins::kGetStringLength, 0, true);


If you now build using 'ninja -C out.gn/learning_v8' you should be able to run d8 and try this out:

d8> const s = 'testing'
undefined
d8> s.len()
7


Now lets take a closer look at the code that is generated for this:

$out.gn/learning/mksnapshot --print-code > output  Looking at the output generated I was surprised to see two entries for GetStringLength (I changed the name just to make sure there was not something else generating the second one). Why two? The following uses Intel Assembly syntax which means that no register/immediate prefixes and the first operand is the destination and the second operand the source. --- Code --- kind = BUILTIN name = BeveStringLength compiler = turbofan Instructions (size = 136) 0x1fafde09b3a0 0 55 push rbp 0x1fafde09b3a1 1 4889e5 REX.W movq rbp,rsp // movq rsp into rbp 0x1fafde09b3a4 4 56 push rsi // push the value of rsi (first parameter) onto the stack 0x1fafde09b3a5 5 57 push rdi // push the value of rdi (second parameter) onto the stack 0x1fafde09b3a6 6 50 push rax // push the value of rax (accumulator) onto the stack 0x1fafde09b3a7 7 4883ec08 REX.W subq rsp,0x8 // make room for a 8 byte value on the stack 0x1fafde09b3ab b 488b4510 REX.W movq rax,[rbp+0x10] // move the value rpm + 10 to rax 0x1fafde09b3af f 488b58ff REX.W movq rbx,[rax-0x1] 0x1fafde09b3b3 13 807b0b80 cmpb [rbx+0xb],0x80 // IsString(object). compare byte to zero 0x1fafde09b3b7 17 0f8350000000 jnc 0x1fafde09b40d <+0x6d> // jump it carry flag was not set 0x1fafde09b3bd 1d 488b400f REX.W movq rax,[rax+0xf] 0x1fafde09b3c1 21 4989e2 REX.W movq r10,rsp 0x1fafde09b3c4 24 4883ec08 REX.W subq rsp,0x8 0x1fafde09b3c8 28 4883e4f0 REX.W andq rsp,0xf0 0x1fafde09b3cc 2c 4c891424 REX.W movq [rsp],r10 0x1fafde09b3d0 30 488945e0 REX.W movq [rbp-0x20],rax 0x1fafde09b3d4 34 48be0000000001000000 REX.W movq rsi,0x100000000 0x1fafde09b3de 3e 48bad9c228dfa8090000 REX.W movq rdx,0x9a8df28c2d9 ;; object: 0x9a8df28c2d9 <String[101]: CAST(LoadObjectField(object, offset, MachineTypeOf<T>::value)) at ../../src/code-stub-assembler.h:432> 0x1fafde09b3e8 48 488bf8 REX.W movq rdi,rax 0x1fafde09b3eb 4b 48b830726d0a01000000 REX.W movq rax,0x10a6d7230 ;; external reference (check_object_type) 0x1fafde09b3f5 55 40f6c40f testb rsp,0xf 0x1fafde09b3f9 59 7401 jz 0x1fafde09b3fc <+0x5c> 0x1fafde09b3fb 5b cc int3l 0x1fafde09b3fc 5c ffd0 call rax 0x1fafde09b3fe 5e 488b2424 REX.W movq rsp,[rsp] 0x1fafde09b402 62 488b45e0 REX.W movq rax,[rbp-0x20] 0x1fafde09b406 66 488be5 REX.W movq rsp,rbp 0x1fafde09b409 69 5d pop rbp 0x1fafde09b40a 6a c20800 ret 0x8 // this is where we jump to if IsString failed 0x1fafde09b40d 6d 48ba71c228dfa8090000 REX.W movq rdx,0x9a8df28c271 ;; object: 0x9a8df28c271 <String[76]\: CSA_ASSERT failed: IsString(object) [../../src/code-stub-assembler.cc:1498]\n> 0x1fafde09b417 77 e8e4d1feff call 0x1fafde088600 ;; code: BUILTIN 0x1fafde09b41c 7c cc int3l 0x1fafde09b41d 7d cc int3l 0x1fafde09b41e 7e 90 nop 0x1fafde09b41f 7f 90 nop Safepoints (size = 8) RelocInfo (size = 7) 0x1fafde09b3e0 embedded object (0x9a8df28c2d9 <String[101]: CAST(LoadObjectField(object, offset, MachineTypeOf<T>::value)) at ../../src/code-stub-assembler.h:432>) 0x1fafde09b3ed external reference (check_object_type) (0x10a6d7230) 0x1fafde09b40f embedded object (0x9a8df28c271 <String[76]\: CSA_ASSERT failed: IsString(object) [../../src/code-stub-assembler.cc:1498]\n>) 0x1fafde09b418 code target (BUILTIN) (0x1fafde088600) --- End code ---  ### TF_BUILTIN macro Is a macro to defining Turbofan (TF) builtins and can be found in builtins/builtins-utils-gen.h If we take a look at the file src/builtins/builtins-bigint-gen.cc and the following function: TF_BUILTIN(BigIntToI64, CodeStubAssembler) { if (!Is64()) { Unreachable(); return; } TNode<Object> value = CAST(Parameter(Descriptor::kArgument)); TNode<Context> context = CAST(Parameter(Descriptor::kContext)); TNode<BigInt> n = ToBigInt(context, value); TVARIABLE(UintPtrT, var_low); TVARIABLE(UintPtrT, var_high); BigIntToRawBytes(n, &var_low, &var_high); Return(var_low.value()); }  Let's take our GetStringLength example from above and see what this will be expanded to after processing this macro: $ clang++ --sysroot=build/linux/debian_sid_amd64-sysroot -isystem=./buildtools/third_party/libc++/trunk/include -isystem=buildtools/third_party/libc++/trunk/include -I. -E src/builtins/builtins-bigint-gen.cc > builtins-bigint-gen.cc.pp

static void Generate_BigIntToI64(compiler::CodeAssemblerState* state);

class BigIntToI64Assembler : public CodeStubAssembler {
public:
using Descriptor = Builtin_BigIntToI64_InterfaceDescriptor;
explicit BigIntToI64Assembler(compiler::CodeAssemblerState* state) : CodeStubAssembler(state) {}
void GenerateBigIntToI64Impl();
Node* Parameter(Descriptor::ParameterIndices index) {
return CodeAssembler::Parameter(static_cast<int>(index));
}
};

void Builtins::Generate_BigIntToI64(compiler::CodeAssemblerState* state) {
BigIntToI64Assembler assembler(state);
state->SetInitialDebugInformation("BigIntToI64", "src/builtins/builtins-bigint-gen.cc", 14);
if (Builtins::KindOf(Builtins::kBigIntToI64) == Builtins::TFJ) {
assembler.PerformStackCheck(assembler.GetJSContextParameter());
}
assembler.GenerateBigIntToI64Impl();
}
void BigIntToI64Assembler::GenerateBigIntToI64Impl() {
if (!Is64()) {
Unreachable();
return;
}

TNode<Object> value = Cast(Parameter(Descriptor::kArgument));
TNode<Context> context = Cast(Parameter(Descriptor::kContext));
TNode<BigInt> n = ToBigInt(context, value);

TVariable<UintPtrT> var_low(this);
TVariable<UintPtrT> var_high(this);

BigIntToRawBytes(n, &var_low, &var_high);
Return(var_low.value());
}


From the resulting class you can see how Parameter can be used from within TF_BUILTIN macro.

## Building V8

You'll need to have checked out the Google V8 sources to you local file system and build it by following the instructions found here.

### Configure v8 build for learning-v8

There is a make target that can generate a build configuration for V8 that is specific to this project. It can be run using the following command:

$make configure_v8  Then to compile this configuration: $ make compile_v8


### gclient sync

$gclient sync  #### Troubleshooting build: /v8_src/v8/out/x64.release/obj/libv8_monolith.a(eh-frame.o):eh-frame.cc:function v8::internal::EhFrameWriter::WriteEmptyEhFrame(std::__1::basic_ostream<char, std::__1::char_traits<char> >&): error: undefined reference to 'std::__1::basic_ostream<char, std::__1::char_traits<char> >::write(char const*, long)' clang: error: linker command failed with exit code 1 (use -v to see invocation)  -stdlib=libc++ is llvm's C++ runtime. This runtime has a __1 namespace. I looks like the static library above was compiled with clangs/llvm's libc++ as we are seeing the __1 namespace. -stdlib=libstdc++ is GNU's C++ runtime So we can see that the namespace std::__1 is used which we now know is the namespace that libc++ which is clangs libc++ library. I guess we could go about this in two ways, either we can change v8 build of to use glibc++ when compiling so that the symbols are correct when we want to link against it, or we can update our linker (ld) to use libc++. We need to include the correct libraries to link with during linking, which means specifying: -stdlib=libc++ -Wl,-L$(v8_build_dir)


If we look in $(v8_build_dir) we find libc++.so. We also need to this library to be found at runtime by the dynamic linker using LD_LIBRARY_PATH: $ LD_LIBRARY_PATH=../v8_src/v8/out/x64.release/ ./hello-world


Notice that this is using ld from our path. We can tell clang to use a different search path with the -B option:

$clang++ --help | grep -- '-B' -B <dir> Add <dir> to search path for binaries and object files used implicitly  libgcc_s is GCC low level runtime library. I've been confusing this with glibc++ libraries for some reason but they are not the same. Running cctest: $ out.gn/learning/cctest test-heap-profiler/HeapSnapshotRetainedObjectInfo


To get a list of the available tests:

$out.gn/learning/cctest --list  Checking formating/linting: $ git cl format


You can then git diff and see the changes.

Running pre-submit checks:

$git cl presubmit  Then upload using: $ git cl upload


#### Build details

So when we run gn it will generate Ninja build file. GN itself is written in C++ but has a python wrapper around it.

A group in gn is just a collection of other targets which enables them to have a name.

So when we run gn there will be a number of .ninja files generated. If we look in the root of the output directory we find two .ninja files:

build.ninja  toolchain.ninja


By default ninja will look for build.ninja and when we run ninja we usually specify the -C out/dir. If no targets are specified on the command line ninja will execute all outputs unless there is one specified as default. V8 has the following default target:

default all

build all: phony $./bytecode_builtins_list_generator$
./d8 $obj/fuzzer_support.stamp$
./gen-regexp-special-case $obj/generate_bytecode_builtins_list.stamp$
obj/gn_all.stamp $obj/json_fuzzer.stamp$
obj/lib_wasm_fuzzer_common.stamp $./mksnapshot$
obj/multi_return_fuzzer.stamp $obj/parser_fuzzer.stamp$
obj/postmortem-metadata.stamp $obj/regexp_builtins_fuzzer.stamp$
obj/regexp_fuzzer.stamp $obj/run_gen-regexp-special-case.stamp$
obj/run_mksnapshot_default.stamp $obj/run_torque.stamp$
./torque $./torque-language-server$
obj/torque_base.stamp $obj/torque_generated_definitions.stamp$
obj/torque_generated_initializers.stamp $obj/torque_ls_base.stamp$
./libv8.so.TOC $obj/v8_archive.stamp$
...


A phony rule can be used to create an alias for other targets. The $ in ninja is an escape character so in the case of the all target it escapes the new line, like using \ in a shell script. Lets take a look at bytecode_builtins_list_generator: build$:bytecode_builtins_list_generator: phony ./bytecode_builtins_list_generator


The format of the ninja build statement is:

build outputs: rulename inputs


We are again seeing the $ ninja escape character but this time it is escaping the colon which would otherwise be interpreted as separating file names. The output in this case is bytecode_builtins_list_generator. And I'm guessing, as I can't find a connection between ./bytecode_builtins_list_generator and The default target_out_dir in this case is //out/x64.release_gcc/obj. The executable in BUILD.gn which generates this does not specify any output directory so I'm assuming that it the generated .ninja file is place in the target_out_dir in this case where we can find bytecode_builtins_list_generator.ninja This file has a label named: label_name = bytecode_builtins_list_generator  Hmm, notice that in build.ninja there is the following command: subninja toolchain.ninja  And in toolchain.ninja we have: subninja obj/bytecode_builtins_list_generator.ninja  This is what is making ./bytecode_builtins_list_generator available. $ ninja -C out/x64.release_gcc/ -t targets all  | grep bytecode_builtins_list_generator
$rm out/x64.release_gcc/bytecode_builtins_list_generator$ ninja -C out/x64.release_gcc/ bytecode_builtins_list_generator
ninja: Entering directory out/x64.release_gcc/'


Alright, so I'd like to understand when in the process torque is run to generate classes like TorqueGeneratedStruct:

class Struct : public TorqueGeneratedStruct<Struct, HeapObject> {

./torque $./torque-language-server$
obj/torque_base.stamp $obj/torque_generated_definitions.stamp$
obj/torque_generated_initializers.stamp $obj/torque_ls_base.stamp$


Like before we can find that obj/torque.ninja in included by the subninja command in toolchain.ninja:

subninja obj/torque.ninja


So this is building the executable torque, but it has not been run yet.

$gn ls out/x64.release_gcc/ --type=action //:generate_bytecode_builtins_list //:postmortem-metadata //:run_gen-regexp-special-case //:run_mksnapshot_default //:run_torque //:v8_dump_build_config //src/inspector:protocol_compatibility //src/inspector:protocol_generated_sources //tools/debug_helper:gen_heap_constants //tools/debug_helper:run_mkgrokdump  Notice the run_torque target $ gn desc out/x64.release_gcc/ //:run_torque


If we look in toolchain.ninja we have a rule named ___run_torque___build_toolchain_linux_x64__rule

command = python ../../tools/run.py ./torque -o gen/torque-generated -v8-root ../..
src/builtins/array-copywithin.tq
src/builtins/array-every.tq
src/builtins/array-filter.tq
src/builtins/array-find.tq
...


And there is a build that specifies the .h and cc files in gen/torque-generated which has this rule in it if they change.

## Building chromium

When making changes to V8 you might need to verify that your changes have not broken anything in Chromium.

Generate Your Project (gpy) : You'll have to run this once before building:

$gclient sync$ gclient runhooks


#### Update the code base

$git fetch origin master$ git co master
$git merge origin/master  ### Building using GN $ gn args out.gn/learning


### Building using Ninja

$ninja -C out.gn/learning  Building the tests: $ ninja -C out.gn/learning chrome/test:unit_tests


An error I got when building the first time:

traceback (most recent call last):
File "./gyp-mac-tool", line 713, in <module>
sys.exit(main(sys.argv[1:]))
File "./gyp-mac-tool", line 29, in main
exit_code = executor.Dispatch(args)
File "./gyp-mac-tool", line 44, in Dispatch
return getattr(self, method)(*args[1:])
File "./gyp-mac-tool", line 68, in ExecCopyBundleResource
self._CopyStringsFile(source, dest)
File "./gyp-mac-tool", line 134, in _CopyStringsFile
import CoreFoundation
ImportError: No module named CoreFoundation
[6644/20987] ACTION base_nacl: build newlib plib_9b4f41e4158ebb93a5d28e6734a13e85
ninja: build stopped: subcommand failed.


I was able to get around this by:

$pip install -U pyobjc  #### Using a specific version of V8 The instructions below work but it is also possible to create a soft link from chromium/src/v8 to local v8 repository and the build/test. So, we want to include our updated version of V8 so that we can verify that it builds correctly with our change to V8. While I'm not sure this is the proper way to do it, I was able to update DEPS in src (chromium) and set the v8 entry to git@github.com:danbev/v8.git@064718a8921608eaf9b5eadbb7d734ec04068a87: "git@github.com:danbev/v8.git@064718a8921608eaf9b5eadbb7d734ec04068a87"  You'll have to run gclient sync after this. Another way is to not updated the DEPS file, which is a version controlled file, but instead update .gclientrc and add a custom_deps entry: solutions = [{u'managed': False, u'name': u'src', u'url': u'https://chromium.googlesource.com/chromium/src.git', u'custom_deps': { "src/v8": "git@github.com:danbev/v8.git@27a666f9be7ca3959c7372bdeeee14aef2a4b7ba" }, u'deps_file': u'.DEPS.git', u'safesync_url': u''}]  ## Buiding pdfium You may have to compile this project (in addition to chromium to verify that changes in v8 are not breaking code in pdfium. ### Create/clone the project $ mkdir pdfuim_reop
$gclient config --unmanaged https://pdfium.googlesource.com/pdfium.git$ gclient sync
$cd pdfium  ### Building $ ninja -C out/Default


#### Using a branch of v8

You should be able to update the .gclient file adding a custom_deps entry:

solutions = [
{
"name"        : "pdfium",
"deps_file"   : "DEPS",
"managed"     : False,
"custom_deps" : {
},
},


] cache_dir = None You'll have to run gclient sync after this too.

## Code in this repo

#### hello-world

hello-world is heavily commented and show the usage of a static int being exposed and accessed from JavaScript.

#### instances

instances shows the usage of creating new instances of a C++ class from JavaScript.

#### run-script

run-script is basically the same as instance but reads an external file, script.js and run the script.

#### tests

The test directory contains unit tests for individual classes/concepts in V8 to help understand them.

$make  ## Running $ ./hello-world


## Cleaning

$make clean  ## Contributing a change to V8 1. Create a working branch using git new-branch name 2. git cl upload See Googles contributing-code for more details. ### Find the current issue number $ git cl issue


## Debugging

$lldb hello-world (lldb) br s -f hello-world.cc -l 27  There are a number of useful functions in src/objects-printer.cc which can also be used in lldb. #### Print value of a Local object (lldb) print _v8_internal_Print_Object(*(v8::internal::Object**)(*init_fn))  #### Print stacktrace (lldb) p _v8_internal_Print_StackTrace()  #### Creating command aliases in lldb Create a file named .lldbinit (in your project director or home directory). This file can now be found in v8's tools directory. ### Using d8 This is the source used for the following examples: $ cat class.js
function Person(name, age) {
this.name = name;
this.age = age;
}

print("before");
const p = new Person("Daniel", 41);
print(p.name);
print(p.age);
print("after");


### V8_shell startup

What happens when the v8_shell is run?

$lldb -- out/x64.debug/d8 --enable-inspector class.js (lldb) breakpoint set --file d8.cc --line 2662 Breakpoint 1: where = d8v8::Shell::Main(int, char**) + 96 at d8.cc:2662, address = 0x0000000100015150  First v8::base::debug::EnableInProcessStackDumping() is called followed by some windows specific code guarded by macros. Next is all the options are set using v8::Shell::SetOptions SetOptions will call v8::V8::SetFlagsFromCommandLine which is found in src/api.cc: i::FlagList::SetFlagsFromCommandLine(argc, argv, remove_flags);  This function can be found in src/flags.cc. The flags themselves are defined in src/flag-definitions.h Next a new SourceGroup array is create: options.isolate_sources = new SourceGroup[options.num_isolates]; SourceGroup* current = options.isolate_sources; current->Begin(argv, 1); for (int i = 1; i < argc; i++) { const char* str = argv[i]; (lldb) p str (const char *)$6 = 0x00007fff5fbfed4d "manual.js"


There are then checks performed to see if the args is --isolate or --module, or -e and if not (like in our case)

} else if (strncmp(str, "-", 1) != 0) {
// Not a flag, so it must be a script to execute.
options.script_executed = true;


TODO: I'm not exactly sure what SourceGroups are about but just noting this and will revisit later.

This will take us back int Shell::Main in src/d8.cc

::V8::InitializeICUDefaultLocation(argv[0], options.icu_data_file);

(lldb) p argv[0]
(char *) $8 = 0x00007fff5fbfed48 "./d8"  See ICU a little more details. Next the default V8 platform is initialized: g_platform = i::FLAG_verify_predictable ? new PredictablePlatform() : v8::platform::CreateDefaultPlatform();  v8::platform::CreateDefaultPlatform() will be called in our case. We are then back in Main and have the following lines: 2685 v8::V8::InitializePlatform(g_platform); 2686 v8::V8::Initialize();  This is very similar to what I've seen in the Node.js startup process. We did not specify any natives_blob or snapshot_blob as an option on the command line so the defaults will be used: v8::V8::InitializeExternalStartupData(argv[0]);  back in src/d8.cc line 2918: Isolate* isolate = Isolate::New(create_params);  this call will bring us into api.cc line 8185:  i::Isolate* isolate = new i::Isolate(false);  So, we are invoking the Isolate constructor (in src/isolate.cc). isolate->set_snapshot_blob(i::Snapshot::DefaultSnapshotBlob());  api.cc: isolate->Init(NULL); compilation_cache_ = new CompilationCache(this); context_slot_cache_ = new ContextSlotCache(); descriptor_lookup_cache_ = new DescriptorLookupCache(); unicode_cache_ = new UnicodeCache(); inner_pointer_to_code_cache_ = new InnerPointerToCodeCache(this); global_handles_ = new GlobalHandles(this); eternal_handles_ = new EternalHandles(); bootstrapper_ = new Bootstrapper(this); handle_scope_implementer_ = new HandleScopeImplementer(this); load_stub_cache_ = new StubCache(this, Code::LOAD_IC); store_stub_cache_ = new StubCache(this, Code::STORE_IC); materialized_object_store_ = new MaterializedObjectStore(this); regexp_stack_ = new RegExpStack(); regexp_stack_->isolate_ = this; date_cache_ = new DateCache(); call_descriptor_data_ = new CallInterfaceDescriptorData[CallDescriptors::NUMBER_OF_DESCRIPTORS]; access_compiler_data_ = new AccessCompilerData(); cpu_profiler_ = new CpuProfiler(this); heap_profiler_ = new HeapProfiler(heap()); interpreter_ = new interpreter::Interpreter(this); compiler_dispatcher_ = new CompilerDispatcher(this, V8::GetCurrentPlatform(), FLAG_stack_size);  src/builtins/builtins.cc, this is where the builtins are defined. TODO: sort out what these macros do. In src/v8.cc we have a couple of checks for if the options passed are for a stress_run but since we did not pass in any such flags this code path will be followed which will call RunMain: result = RunMain(isolate, argc, argv, last_run);  this will end up calling: options.isolate_sources[0].Execute(isolate);  Which will call SourceGroup::Execute(Isolate* isolate) // Use all other arguments as names of files to load and run. HandleScope handle_scope(isolate); Local<String> file_name = String::NewFromUtf8(isolate, arg, NewStringType::kNormal).ToLocalChecked(); Local<String> source = ReadFile(isolate, arg); if (source.IsEmpty()) { printf("Error reading '%s'\n", arg); Shell::Exit(1); } Shell::options.script_executed = true; if (!Shell::ExecuteString(isolate, source, file_name, false, true)) { exception_was_thrown = true; break; } ScriptOrigin origin(name); if (compile_options == ScriptCompiler::kNoCompileOptions) { ScriptCompiler::Source script_source(source, origin); return ScriptCompiler::Compile(context, &script_source, compile_options); }  Which will delegate to ScriptCompiler(Local, Source* source, CompileOptions options): auto maybe = CompileUnboundInternal(isolate, source, options);  CompileUnboundInternal result = i::Compiler::GetSharedFunctionInfoForScript( str, name_obj, line_offset, column_offset, source->resource_options, source_map_url, isolate->native_context(), NULL, &script_data, options, i::NOT_NATIVES_CODE);  src/compiler.cc // Compile the function and add it to the cache. ParseInfo parse_info(script); Zone compile_zone(isolate->allocator(), ZONE_NAME); CompilationInfo info(&compile_zone, &parse_info, Handle<JSFunction>::null());  Back in src/compiler.cc-info.cc: result = CompileToplevel(&info); (lldb) job *result 0x17df0df309f1: [SharedFunctionInfo] - name = 0x1a7f12d82471 <String[0]: > - formal_parameter_count = 0 - expected_nof_properties = 10 - ast_node_count = 23 - instance class name = #Object - code = 0x1d8484d3661 <Code: BUILTIN> - source code = function bajja(a, b, c) { var d = c - 100; return a + d * b; } var result = bajja(2, 2, 150); print(result); - anonymous expression - function token position = -1 - start position = 0 - end position = 114 - no debug info - length = 0 - optimized_code_map = 0x1a7f12d82241 <FixedArray[0]> - feedback_metadata = 0x17df0df30d09: [FeedbackMetadata] - length: 3 - slot_count: 11 Slot #0 LOAD_GLOBAL_NOT_INSIDE_TYPEOF_IC Slot #2 kCreateClosure Slot #3 LOAD_GLOBAL_NOT_INSIDE_TYPEOF_IC Slot #5 CALL_IC Slot #7 CALL_IC Slot #9 LOAD_GLOBAL_NOT_INSIDE_TYPEOF_IC - bytecode_array = 0x17df0df30c61  Back in d8.cc: maybe_result = script->Run(realm);  src/api.cc auto fun = i::Handle<i::JSFunction>::cast(Utils::OpenHandle(this)); (lldb) job *fun 0x17df0df30e01: [Function] - map = 0x19cfe0003859 [FastProperties] - prototype = 0x17df0df043b1 - elements = 0x1a7f12d82241 <FixedArray[0]> [FAST_HOLEY_ELEMENTS] - initial_map = - shared_info = 0x17df0df309f1 <SharedFunctionInfo> - name = 0x1a7f12d82471 <String[0]: > - formal_parameter_count = 0 - context = 0x17df0df03bf9 <FixedArray[245]> - feedback vector cell = 0x17df0df30ed1 Cell for 0x17df0df30e49 <FixedArray[13]> - code = 0x1d8484d3661 <Code: BUILTIN> - properties = 0x1a7f12d82241 <FixedArray[0]> { #length: 0x2c35a5718089 <AccessorInfo> (const accessor descriptor) #name: 0x2c35a57180f9 <AccessorInfo> (const accessor descriptor) #arguments: 0x2c35a5718169 <AccessorInfo> (const accessor descriptor) #caller: 0x2c35a57181d9 <AccessorInfo> (const accessor descriptor) #prototype: 0x2c35a5718249 <AccessorInfo> (const accessor descriptor) } i::Handle<i::Object> receiver = isolate->global_proxy(); Local<Value> result; has_pending_exception = !ToLocal<Value>(i::Execution::Call(isolate, fun, receiver, 0, nullptr), &result);  src/execution.cc ### Zone Taken directly from src/zone/zone.h: // The Zone supports very fast allocation of small chunks of // memory. The chunks cannot be deallocated individually, but instead // the Zone supports deallocating all chunks in one fast // operation. The Zone is used to hold temporary data structures like // the abstract syntax tree, which is deallocated after compilation.  ### V8 flags $ ./d8 --help


### d8

(lldb) br s -f d8.cc -l 2935

return v8::Shell::Main(argc, argv);

api.cc:6112
natives-external.cc


### v8::String::NewFromOneByte

So I was a little confused when I first read this function name and thought it had something to do with the length of the string. But the byte is the type of the chars that make up the string. For example, a one byte char would be reinterpreted as uint8_t:

const char* data

reinterpret_cast<const uint8_t*>(data)


• gdbinit has been updated. Check if there is something that should be ported to lldbinit

### Invocation walkthrough

This section will go through calling a Script to understand what happens in V8.

I'll be using run-scripts.cc as the example for this.

$lldb -- ./run-scripts (lldb) br s -n main  I'll step through until the following call: script->Run(context).ToLocalChecked();  So, Script::Run is defined in api.cc First things that happens in this function is a macro: PREPARE_FOR_EXECUTION_WITH_CONTEXT_IN_RUNTIME_CALL_STATS_SCOPE( "v8", "V8.Execute", context, Script, Run, MaybeLocal<Value>(), InternalEscapableScope, true); TRACE_EVENT_CALL_STATS_SCOPED(isolate, category, name); PREPARE_FOR_EXECUTION_GENERIC(isolate, context, class_name, function_name, \ bailout_value, HandleScopeClass, do_callback);  So, what does the preprocessor replace this with then: auto isolate = context.IsEmpty() ? i::Isolate::Current() : reinterpret_cast<i::Isolate*>(context->GetIsolate());  I'm skipping TRACE_EVENT_CALL_STATS_SCOPED for now. PREPARE_FOR_EXECUTION_GENERIC will be replaced with: if (IsExecutionTerminatingCheck(isolate)) { \ return bailout_value; \ } \ HandleScopeClass handle_scope(isolate); \ CallDepthScope<do_callback> call_depth_scope(isolate, context); \ LOG_API(isolate, class_name, function_name); \ ENTER_V8_DO_NOT_USE(isolate); \ bool has_pending_exception = false auto fun = i::Handle<i::JSFunction>::cast(Utils::OpenHandle(this)); (lldb) job *fun 0x33826912c021: [Function] - map = 0x1d0656c03599 [FastProperties] - prototype = 0x338269102e69 - elements = 0x35190d902241 <FixedArray[0]> [FAST_HOLEY_ELEMENTS] - initial_map = - shared_info = 0x33826912bc11 <SharedFunctionInfo> - name = 0x35190d902471 <String[0]: > - formal_parameter_count = 0 - context = 0x338269102611 <FixedArray[265]> - feedback vector cell = 0x33826912c139 <Cell value= 0x33826912c069 <FixedArray[24]>> - code = 0x1319e25fcf21 <Code BUILTIN> - properties = 0x35190d902241 <FixedArray[0]> { #length: 0x2e9d97ce68b1 <AccessorInfo> (const accessor descriptor) #name: 0x2e9d97ce6921 <AccessorInfo> (const accessor descriptor) #arguments: 0x2e9d97ce6991 <AccessorInfo> (const accessor descriptor) #caller: 0x2e9d97ce6a01 <AccessorInfo> (const accessor descriptor) #prototype: 0x2e9d97ce6a71 <AccessorInfo> (const accessor descriptor) }  The code for i::JSFunction is generated in src/api.h. Lets take a closer look at this. #define DECLARE_OPEN_HANDLE(From, To) \ static inline v8::internal::Handle<v8::internal::To> \ OpenHandle(const From* that, bool allow_empty_handle = false); OPEN_HANDLE_LIST(DECLARE_OPEN_HANDLE)  OPEN_HANDLE_LIST looks like this: #define OPEN_HANDLE_LIST(V) \ .... V(Script, JSFunction) \  So lets expand this for JSFunction and it should become:  static inline v8::internal::Handle<v8::internal::JSFunction> \ OpenHandle(const Script* that, bool allow_empty_handle = false);  So there will be an function named OpenHandle that will take a const pointer to Script. A little further down in src/api.h there is another macro which looks like this: OPEN_HANDLE_LIST(MAKE_OPEN_HANDLE)  MAKE_OPEN_HANDLE:  #define MAKE_OPEN_HANDLE(From, To) v8::internal::Handle<v8::internal::To> Utils::OpenHandle( const v8::From* that, bool allow_empty_handle) { return v8::internal::Handle<v8::internal::To>( reinterpret_cast<v8::internal::Address*>(const_cast<v8::From*>(that))); }  And remember that JSFunction is included in the OPEN_HANDLE_LIST so there will be the following in the source after the preprocessor has processed this header: A concrete example would look like this: v8::internal::Handle<v8::internal::JSFunction> Utils::OpenHandle( const v8::Script* that, bool allow_empty_handle) { return v8::internal::Handle<v8::internal::JSFunction>( reinterpret_cast<v8::internal::Address*>(const_cast<v8::Script*>(that))); }  You can inspect the output of the preprocessor using: $ clang++ -I./out/x64.release/gen -I. -I./include -E src/api/api-inl.h > api-inl.output


So where is JSFunction declared? It is defined in objects.h

## Ignition interpreter

User JavaScript also needs to have bytecode generated for them and they also use the C++ DLS and use the CodeStubAssembler -> CodeAssembler -> RawMachineAssembler just like builtins.

## C++ Domain Specific Language (DLS)

#### Build failure

After rebasing I've seen the following issue:

$ninja -C out/Debug chrome ninja: Entering directory out/Debug' ninja: error: '../../chrome/renderer/resources/plugins/plugin_delay.html', needed by 'gen/chrome/grit/renderer_resources.h', missing and no known rule to make it  The "solution" was to remove the out directory and rebuild. ### Tasks To find suitable task you can use label:HelpWanted at bugs.chromium.org. ### OpenHandle What does this call do: Utils::OpenHandle(*(source->source_string)); OPEN_HANDLE_LIST(MAKE_OPEN_HANDLE)  Which is a macro defined in src/api.h: #define MAKE_OPEN_HANDLE(From, To) \ v8::internal::Handle<v8::internal::To> Utils::OpenHandle( \ const v8::From* that, bool allow_empty_handle) { \ DCHECK(allow_empty_handle || that != NULL); \ DCHECK(that == NULL || \ (*reinterpret_cast<v8::internal::Object* const*>(that))->Is##To()); \ return v8::internal::Handle<v8::internal::To>( \ reinterpret_cast<v8::internal::To**>(const_cast<v8::From*>(that))); \ } OPEN_HANDLE_LIST(MAKE_OPEN_HANDLE)  If we take a closer look at the macro is should expand to something like this in our case:  v8::internal::Handle<v8::internal::To> Utils::OpenHandle(const v8:String* that, false) { DCHECK(allow_empty_handle || that != NULL); \ DCHECK(that == NULL || \ (*reinterpret_cast<v8::internal::Object* const*>(that))->IsString()); \ return v8::internal::Handle<v8::internal::String>( \ reinterpret_cast<v8::internal::String**>(const_cast<v8::String*>(that))); \ }  So this is returning a new v8::internal::Handle, the constructor is defined in src/handles.h:95. src/objects.cc Handle WeakFixedArray::Add(Handle maybe_array, 10167 Handle value, 10168 int* assigned_index) { Notice the name of the first parameter maybe_array but it is not of type maybe? ### Context JavaScript provides a set of builtin functions and objects. These functions and objects can be changed by user code. Each context is separate collection of these objects and functions. And internal::Context is declared in deps/v8/src/contexts.h and extends FixedArray class Context: public FixedArray {  A Context can be create by calling: const v8::HandleScope handle_scope(isolate_); Handle<Context> context = Context::New(isolate_, nullptr, v8::Local<v8::ObjectTemplate>());  Context::New can be found in src/api.cc:6405: Local<Context> v8::Context::New( v8::Isolate* external_isolate, v8::ExtensionConfiguration* extensions, v8::MaybeLocal<ObjectTemplate> global_template, v8::MaybeLocal<Value> global_object, DeserializeInternalFieldsCallback internal_fields_deserializer) { return NewContext(external_isolate, extensions, global_template, global_object, 0, internal_fields_deserializer); }  The declaration of this function can be found in include/v8.h: static Local<Context> New( Isolate* isolate, ExtensionConfiguration* extensions = NULL, MaybeLocal<ObjectTemplate> global_template = MaybeLocal<ObjectTemplate>(), MaybeLocal<Value> global_object = MaybeLocal<Value>(), DeserializeInternalFieldsCallback internal_fields_deserializer = DeserializeInternalFieldsCallback());  So we can see the reason why we did not have to specify internal_fields_deserialize. What is ExtensionConfiguration? This class can be found in include/v8.h and only has two members, a count of the extension names and an array with the names. If specified these will be installed by Boostrapper::InstallExtensions which will delegate to Genesis::InstallExtensions, both can be found in src/boostrapper.cc. Where are extensions registered? This is done once per process and called from V8::Initialize(): void Bootstrapper::InitializeOncePerProcess() { free_buffer_extension_ = new FreeBufferExtension; v8::RegisterExtension(free_buffer_extension_); gc_extension_ = new GCExtension(GCFunctionName()); v8::RegisterExtension(gc_extension_); externalize_string_extension_ = new ExternalizeStringExtension; v8::RegisterExtension(externalize_string_extension_); statistics_extension_ = new StatisticsExtension; v8::RegisterExtension(statistics_extension_); trigger_failure_extension_ = new TriggerFailureExtension; v8::RegisterExtension(trigger_failure_extension_); ignition_statistics_extension_ = new IgnitionStatisticsExtension; v8::RegisterExtension(ignition_statistics_extension_); }  The extensions can be found in src/extensions. You register your own extensions and an example of this can be found in test/context_test.cc. (lldb) br s -f node.cc -l 4439 (lldb) expr context->length() (int)$522 = 281


This output was taken

Creating a new Context is done by v8::CreateEnvironment

(lldb) br s -f api.cc -l 6565

InvokeBootstrapper<ObjectType> invoke;
6635    result =
-> 6636        invoke.Invoke(isolate, maybe_proxy, proxy_template, extensions,
6637                      context_snapshot_index, embedder_fields_deserializer);


This will later end up in Snapshot::NewContextFromSnapshot:

Vector<const byte> context_data =
ExtractContextData(blob, static_cast<uint32_t>(context_index));
SnapshotData snapshot_data(context_data);

MaybeHandle<Context> maybe_result = PartialDeserializer::DeserializeContext(
isolate, &snapshot_data, can_rehash, global_proxy,
embedder_fields_deserializer);


So we can see here that the Context is deserialized from the snapshot. What does the Context contain at this stage:

(lldb) expr result->length()
(int) $650 = 281 (lldb) expr result->Print() // not inlcuding the complete output  Lets take a look at an entry: (lldb) expr result->get(0)->Print() 0xc201584331: [Function] in OldSpace - map = 0xc24c002251 [FastProperties] - prototype = 0xc201584371 - elements = 0xc2b2882251 <FixedArray[0]> [HOLEY_ELEMENTS] - initial_map = - shared_info = 0xc2b2887521 <SharedFunctionInfo> - name = 0xc2b2882441 <String[0]: > - formal_parameter_count = -1 - kind = [ NormalFunction ] - context = 0xc201583a59 <FixedArray[281]> - code = 0x2df1f9865a61 <Code BUILTIN> - source code = () {} - properties = 0xc2b2882251 <FixedArray[0]> { #length: 0xc2cca83729 <AccessorInfo> (const accessor descriptor) #name: 0xc2cca83799 <AccessorInfo> (const accessor descriptor) #arguments: 0xc201587fd1 <AccessorPair> (const accessor descriptor) #caller: 0xc201587fd1 <AccessorPair> (const accessor descriptor) #constructor: 0xc201584c29 <JSFunction Function (sfi = 0xc2b28a6fb1)> (const data descriptor) #apply: 0xc201588079 <JSFunction apply (sfi = 0xc2b28a7051)> (const data descriptor) #bind: 0xc2015880b9 <JSFunction bind (sfi = 0xc2b28a70f1)> (const data descriptor) #call: 0xc2015880f9 <JSFunction call (sfi = 0xc2b28a7191)> (const data descriptor) #toString: 0xc201588139 <JSFunction toString (sfi = 0xc2b28a7231)> (const data descriptor) 0xc2b28bc669 <Symbol: Symbol.hasInstance>: 0xc201588179 <JSFunction [Symbol.hasInstance] (sfi = 0xc2b28a72d1)> (const data descriptor) } - feedback vector: not available  So we can see that this is of type [Function] which we can cast using: (lldb) expr JSFunction::cast(result->get(0))->code()->Print() 0x2df1f9865a61: [Code] kind = BUILTIN name = EmptyFunction  (lldb) expr JSFunction::cast(result->closure())->Print() 0xc201584331: [Function] in OldSpace - map = 0xc24c002251 [FastProperties] - prototype = 0xc201584371 - elements = 0xc2b2882251 <FixedArray[0]> [HOLEY_ELEMENTS] - initial_map = - shared_info = 0xc2b2887521 <SharedFunctionInfo> - name = 0xc2b2882441 <String[0]: > - formal_parameter_count = -1 - kind = [ NormalFunction ] - context = 0xc201583a59 <FixedArray[281]> - code = 0x2df1f9865a61 <Code BUILTIN> - source code = () {} - properties = 0xc2b2882251 <FixedArray[0]> { #length: 0xc2cca83729 <AccessorInfo> (const accessor descriptor) #name: 0xc2cca83799 <AccessorInfo> (const accessor descriptor) #arguments: 0xc201587fd1 <AccessorPair> (const accessor descriptor) #caller: 0xc201587fd1 <AccessorPair> (const accessor descriptor) #constructor: 0xc201584c29 <JSFunction Function (sfi = 0xc2b28a6fb1)> (const data descriptor) #apply: 0xc201588079 <JSFunction apply (sfi = 0xc2b28a7051)> (const data descriptor) #bind: 0xc2015880b9 <JSFunction bind (sfi = 0xc2b28a70f1)> (const data descriptor) #call: 0xc2015880f9 <JSFunction call (sfi = 0xc2b28a7191)> (const data descriptor) #toString: 0xc201588139 <JSFunction toString (sfi = 0xc2b28a7231)> (const data descriptor) 0xc2b28bc669 <Symbol: Symbol.hasInstance>: 0xc201588179 <JSFunction [Symbol.hasInstance] (sfi = 0xc2b28a72d1)> (const data descriptor) } - feedback vector: not available  So this is the JSFunction associated with the deserialized context. Not sure what this is about as looking at the source code it looks like an empty function. A function can also be set on the context so I'm guessing that this give access to the function of a context once set. Where is function set, well it is probably deserialized but we can see it be used in deps/v8/src/bootstrapper.cc: { Handle<JSFunction> function = SimpleCreateFunction(isolate, factory->empty_string(), Builtins::kAsyncFunctionAwaitCaught, 2, false); native_context->set_async_function_await_caught(*function); } ​console (lldb) expr isolate()->builtins()->builtin_handle(Builtins::Name::kAsyncFunctionAwaitCaught)->Print()  Context::Scope is a RAII class used to Enter/Exit a context. Lets take a closer look at Enter: void Context::Enter() { i::Handle<i::Context> env = Utils::OpenHandle(this); i::Isolate* isolate = env->GetIsolate(); ENTER_V8_NO_SCRIPT_NO_EXCEPTION(isolate); i::HandleScopeImplementer* impl = isolate->handle_scope_implementer(); impl->EnterContext(env); impl->SaveContext(isolate->context()); isolate->set_context(*env); }  So the current context is saved and then the this context env is set as the current on the isolate. EnterContext will push the passed-in context (deps/v8/src/api.cc): void HandleScopeImplementer::EnterContext(Handle<Context> context) { entered_contexts_.push_back(*context); } ... DetachableVector<Context*> entered_contexts_;  DetachableVector is a delegate/adaptor with some additonaly features on a std::vector. Handle<Context> context1 = NewContext(isolate); Handle<Context> context2 = NewContext(isolate); Context::Scope context_scope1(context1); // entered_contexts_ [context1], saved_contexts_[isolateContext] Context::Scope context_scope2(context2); // entered_contexts_ [context1, context2], saved_contexts[isolateContext, context1]  Now, SaveContext is using the current context, not this context (env) and pushing that to the end of the saved_contexts_ vector. We can look at this as we entered context_scope2 from context_scope1: And Exit looks like: void Context::Exit() { i::Handle<i::Context> env = Utils::OpenHandle(this); i::Isolate* isolate = env->GetIsolate(); ENTER_V8_NO_SCRIPT_NO_EXCEPTION(isolate); i::HandleScopeImplementer* impl = isolate->handle_scope_implementer(); if (!Utils::ApiCheck(impl->LastEnteredContextWas(env), "v8::Context::Exit()", "Cannot exit non-entered context")) { return; } impl->LeaveContext(); isolate->set_context(impl->RestoreContext()); }  #### EmbedderData A context can have embedder data set on it. Like decsribed above a Context is internally A FixedArray. SetEmbedderData in Context is implemented in src/api.cc: const char* location = "v8::Context::SetEmbedderData()"; i::Handle<i::FixedArray> data = EmbedderDataFor(this, index, true, location); i::Handle<i::FixedArray> data(env->embedder_data());  location is only used for logging and we can ignore it for now. EmbedderDataFor: i::Handle<i::Context> env = Utils::OpenHandle(context); ... i::Handle<i::FixedArray> data(env->embedder_data());  We can find embedder_data in src/contexts-inl.h #define NATIVE_CONTEXT_FIELD_ACCESSORS(index, type, name) \ inline void set_##name(type* value); \ inline bool is_##name(type* value) const; \ inline type* name() const; NATIVE_CONTEXT_FIELDS(NATIVE_CONTEXT_FIELD_ACCESSORS)  And NATIVE_CONTEXT_FIELDS in context.h: #define NATIVE_CONTEXT_FIELDS(V) \ V(GLOBAL_PROXY_INDEX, JSObject, global_proxy_object) \ V(EMBEDDER_DATA_INDEX, FixedArray, embedder_data) \ ... #define NATIVE_CONTEXT_FIELD_ACCESSORS(index, type, name) \ void Context::set_##name(type* value) { \ DCHECK(IsNativeContext()); \ set(index, value); \ } \ bool Context::is_##name(type* value) const { \ DCHECK(IsNativeContext()); \ return type::cast(get(index)) == value; \ } \ type* Context::name() const { \ DCHECK(IsNativeContext()); \ return type::cast(get(index)); \ } NATIVE_CONTEXT_FIELDS(NATIVE_CONTEXT_FIELD_ACCESSORS) #undef NATIVE_CONTEXT_FIELD_ACCESSORS  So the preprocessor would expand this to: FixedArray embedder_data() const; void Context::set_embedder_data(FixedArray value) { DCHECK(IsNativeContext()); set(EMBEDDER_DATA_INDEX, value); } bool Context::is_embedder_data(FixedArray value) const { DCHECK(IsNativeContext()); return FixedArray::cast(get(EMBEDDER_DATA_INDEX)) == value; } FixedArray Context::embedder_data() const { DCHECK(IsNativeContext()); return FixedArray::cast(get(EMBEDDER_DATA_INDEX)); }  We can take a look at the initial data: lldb) expr data->Print() 0x2fac3e896439: [FixedArray] in OldSpace - map = 0x2fac9de82341 <Map(HOLEY_ELEMENTS)> - length: 3 0-2: 0x2fac1cb822e1 <undefined> (lldb) expr data->length() (int)$5 = 3


And after setting:

(lldb) expr data->Print()
0x2fac3e896439: [FixedArray] in OldSpace
- map = 0x2fac9de82341 <Map(HOLEY_ELEMENTS)>
- length: 3
0: 0x2fac20c866e1 <String[7]: embdata>
1-2: 0x2fac1cb822e1 <undefined>

(lldb) expr v8::internal::String::cast(data->get(0))->Print()
"embdata"


This was taken while debugging ContextTest::EmbedderData.

### ENTER_V8_FOR_NEW_CONTEXT

This macro is used in CreateEnvironment (src/api.cc) and the call in this function looks like this:

ENTER_V8_FOR_NEW_CONTEXT(isolate);


### Factory::NewMap

This section will take a look at the following call:

i::Handle<i::Map> map = factory->NewMap(i::JS_OBJECT_TYPE, 24);


Lets take a closer look at this function which can be found in src/factory.cc:

Handle<Map> Factory::NewMap(InstanceType type, int instance_size,
ElementsKind elements_kind,
int inobject_properties) {
CALL_HEAP_FUNCTION(
isolate(),
isolate()->heap()->AllocateMap(type, instance_size, elements_kind,
inobject_properties),
Map);
}


If we take a look at factory.h we can see the default values for elements_kind and inobject_properties:

Handle<Map> NewMap(InstanceType type, int instance_size,
ElementsKind elements_kind = TERMINAL_FAST_ELEMENTS_KIND,
int inobject_properties = 0);


If we expand the CALL_HEAP_FUNCTION macro we will get:

    AllocationResult __allocation__ = isolate()->heap()->AllocateMap(type,
instance_size,
elements_kind,
inobject_properties),
Object* __object__ = nullptr;
RETURN_OBJECT_UNLESS_RETRY(isolate(), Map)
/* Two GCs before panicking.  In newspace will almost always succeed. */
for (int __i__ = 0; __i__ < 2; __i__++) {
(isolate())->heap()->CollectGarbage(
__allocation__.RetrySpace(),
GarbageCollectionReason::kAllocationFailure);
__allocation__ = FUNCTION_CALL;
RETURN_OBJECT_UNLESS_RETRY(isolate, Map)
}
(isolate())->counters()->gc_last_resort_from_handles()->Increment();
(isolate())->heap()->CollectAllAvailableGarbage(
GarbageCollectionReason::kLastResort);
{
AlwaysAllocateScope __scope__(isolate());
t __allocation__ = isolate()->heap()->AllocateMap(type,
instance_size,
elements_kind,
inobject_properties),
}
RETURN_OBJECT_UNLESS_RETRY(isolate, Map)
/* TODO(1181417): Fix this. */
v8::internal::Heap::FatalProcessOutOfMemory("CALL_AND_RETRY_LAST", true);
return Handle<Map>();


So, lets take a look at isolate()->heap()->AllocateMap in 'src/heap/heap.cc':

  HeapObject* result = nullptr;
AllocationResult allocation = AllocateRaw(Map::kSize, MAP_SPACE);


AllocateRaw can be found in src/heap/heap-inl.h:

  bool large_object = size_in_bytes > kMaxRegularHeapObjectSize;
HeapObject* object = nullptr;
AllocationResult allocation;
if (NEW_SPACE == space) {
if (large_object) {
space = LO_SPACE;
} else {
allocation = new_space_->AllocateRaw(size_in_bytes, alignment);
if (allocation.To(&object)) {
OnAllocationEvent(object, size_in_bytes);
}
return allocation;
}
}
} else if (MAP_SPACE == space) {
allocation = map_space_->AllocateRawUnaligned(size_in_bytes);
}

(lldb) expr large_object
(bool) $3 = false (lldb) expr size_in_bytes (int)$5 = 80
(lldb) expr map_space_
(v8::internal::MapSpace *) 6 = 0x0000000104700f60  AllocateRawUnaligned can be found in src/heap/spaces-inl.h  HeapObject* object = AllocateLinearly(size_in_bytes);  ### v8::internal::Object Is an abstract super class for all classes in the object hierarch and both Smi and HeapObject are subclasses of Object so there are no data members in object only functions. For example:  bool IsObject() const { return true; } INLINE(bool IsSmi() const INLINE(bool IsLayoutDescriptor() const INLINE(bool IsHeapObject() const INLINE(bool IsPrimitive() const INLINE(bool IsNumber() const INLINE(bool IsNumeric() const INLINE(bool IsAbstractCode() const INLINE(bool IsAccessCheckNeeded() const INLINE(bool IsArrayList() const INLINE(bool IsBigInt() const INLINE(bool IsUndefined() const INLINE(bool IsNull() const INLINE(bool IsTheHole() const INLINE(bool IsException() const INLINE(bool IsUninitialized() const INLINE(bool IsTrue() const INLINE(bool IsFalse() const ...  ### v8::internal::Smi Extends v8::internal::Object and are not allocated on the heap. There are no members as the pointer itself is used to store the information. In our case the calling v8::Isolate::New which is done by the test fixture: virtual void SetUp() { isolate_ = v8::Isolate::New(create_params_); }  This will call: Isolate* Isolate::New(const Isolate::CreateParams& params) { Isolate* isolate = Allocate(); Initialize(isolate, params); return isolate; }  In Isolate::Initialize we'll call i::Snapshot::Initialize(i_isolate): if (params.entry_hook || !i::Snapshot::Initialize(i_isolate)) { ...  Which will call: bool success = isolate->Init(&deserializer);  Before this call all the roots are uninitialized. Reading this blog it says that the Isolate class contains a roots table. It looks to me that the Heap contains this data structure but perhaps that is what they meant. (lldb) bt 3 * thread #1, queue = 'com.apple.main-thread', stop reason = step over * frame #0: 0x0000000101584f43 libv8.dylibv8::internal::StartupDeserializer::DeserializeInto(this=0x00007ffeefbfe200, isolate=0x000000010481cc00) at startup-deserializer.cc:39 frame #1: 0x0000000101028bb6 libv8.dylibv8::internal::Isolate::Init(this=0x000000010481cc00, des=0x00007ffeefbfe200) at isolate.cc:3036 frame #2: 0x000000010157c682 libv8.dylibv8::internal::Snapshot::Initialize(isolate=0x000000010481cc00) at snapshot-common.cc:54  In startup-deserializer.cc we can find StartupDeserializer::DeserializeInto:  DisallowHeapAllocation no_gc; isolate->heap()->IterateSmiRoots(this); isolate->heap()->IterateStrongRoots(this, VISIT_ONLY_STRONG);  After If we take a look in src/roots.h we can find the read-only roots in Heap. If we take the 10 value, which is: V(String, empty_string, empty_string) \  we can then inspect this value: (lldb) expr roots_[9] (v8::internal::Object *)32 = 0x0000152d30b82851
(lldb) expr roots_[9]->IsString()
(bool) $30 = true (lldb) expr roots_[9]->Print() #  So this entry is a pointer to objects on the managed heap which have been deserialized from the snapshot. The heap class has a lot of members that are initialized during construction by the body of the constructor looks like this: { // Ensure old_generation_size_ is a multiple of kPageSize. DCHECK_EQ(0, max_old_generation_size_ & (Page::kPageSize - 1)); memset(roots_, 0, sizeof(roots_[0]) * kRootListLength); set_native_contexts_list(nullptr); set_allocation_sites_list(Smi::kZero); set_encountered_weak_collections(Smi::kZero); // Put a dummy entry in the remembered pages so we can find the list the // minidump even if there are no real unmapped pages. RememberUnmappedPage(nullptr, false); }  We can see that roots_ is filled with 0 values. We can inspect roots_ using: (lldb) expr roots_ (lldb) expr RootListIndex::kRootListLength (int)$16 = 509


Now they are all 0 at this stage, so when will this array get populated?
These will happen in Isolate::Init:

  heap_.SetUp()
if (!create_heap_objects) des->DeserializeInto(this);

void StartupDeserializer::DeserializeInto(Isolate* isolate) {
-> 17    Initialize(isolate);
startup-deserializer.cc:37

isolate->heap()->IterateSmiRoots(this);


This will delegate to ConfigureHeapDefaults() which will call Heap::ConfigureHeap:

enum RootListIndex {
kFreeSpaceMapRootIndex,
kOnePointerFillerMapRootIndex,
...
}

(lldb) expr heap->RootListIndex::kFreeSpaceMapRootIndex
(int) $3 = 0 (lldb) expr heap->RootListIndex::kOnePointerFillerMapRootIndex (int)$4 = 1


### MemoryChunk

Found in src/heap/spaces.h an instace of a MemoryChunk represents a region in memory that is owned by a specific space.

### Embedded builtins

In the blog post explains how the builtins are embedded into the executable in to the .TEXT section which is readonly and therefore can be shared amoung multiple processes. We know that builtins are compiled and stored in the snapshot but now it seems that the are instead placed in to out.gn/learning/gen/embedded.cc and the combined with the object files from the compile to produce the libv8.dylib. V8 has a configuration option named v8_enable_embedded_builtins which which case embedded.cc will be added to the list of sources. This is done in BUILD.gn and the v8_snapshot target. If v8_enable_embedded_builtins is false then src/snapshot/embedded-empty.cc will be included instead. Both of these files have the following functions:

const uint8_t* DefaultEmbeddedBlob()
uint32_t DefaultEmbeddedBlobSize()

#ifdef V8_MULTI_SNAPSHOTS
const uint8_t* TrustedEmbeddedBlob()
uint32_t TrustedEmbeddedBlobSize()
#endif


These functions are used by isolate.cc and declared extern:

extern const uint8_t* DefaultEmbeddedBlob();
extern uint32_t DefaultEmbeddedBlobSize();


And the usage of DefaultEmbeddedBlob can be see in Isolate::Isolate where is sets the embedded blob:

SetEmbeddedBlob(DefaultEmbeddedBlob(), DefaultEmbeddedBlobSize());


Lets set a break point there and see if this is empty of not.

(lldb) expr v8_embedded_blob_size_
(uint32_t) $0 = 4021088  So we can see that we are not using the empty one. Isolate::SetEmbeddedBlob We can see in src/snapshot/deserializer.cc (line 552) we have a check for the embedded_blob():  CHECK_NOT_NULL(isolate->embedded_blob()); EmbeddedData d = EmbeddedData::FromBlob(); Address address = d.InstructionStartOfBuiltin(builtin_index);  EmbeddedData can be found in src/snapshot/snapshot.h and the implementation can be found in snapshot-common.cc. Address EmbeddedData::InstructionStartOfBuiltin(int i) const { const struct Metadata* metadata = Metadata(); const uint8_t* result = RawData() + metadata[i].instructions_offset; return reinterpret_cast<Address>(result); }  (lldb) expr *metadata (const v8::internal::EmbeddedData::Metadata)$7 = (instructions_offset = 0, instructions_length = 1464)

  struct Metadata {
// Blob layout information.
uint32_t instructions_offset;
uint32_t instructions_length;
};

(lldb) expr *this
(v8::internal::EmbeddedData) $10 = (data_ = "\xffffffdc\xffffffc0\xffffff88'"y[\xffffffd6", size_ = 4021088) (lldb) expr metadata[i] (const v8::internal::EmbeddedData::Metadata)$8 = (instructions_offset = 0, instructions_length = 1464)


So, is it possible for us to verify that this information is in the .text section?

(lldb) expr result
(const uint8_t *) 13 = 0x0000000101b14ee0 "UH\x89�jH\x83�(H\x89U�H�\x16H\x89}�H�u�H�E�H\x89U�H\x83� (lldb) image lookup --address 0x0000000101b14ee0 --verbose Address: libv8.dylib[0x00000000019cdee0] (libv8.dylib.__TEXT.__text + 27054464) Summary: libv8.dylibv8_Default_embedded_blob_ + 7072 Module: file = "/Users/danielbevenius/work/google/javascript/v8/out.gn/learning/libv8.dylib", arch = "x86_64" Symbol: id = {0x0004b596}, range = [0x0000000101b13340-0x0000000101ee8ea0), name="v8_Default_embedded_blob_"  So what we have is a pointer to the .text segment which is returned: (lldb) memory read -f x -s 1 -c 13 0x0000000101b14ee0 0x101b14ee0: 0x55 0x48 0x89 0xe5 0x6a 0x18 0x48 0x83 0x101b14ee8: 0xec 0x28 0x48 0x89 0x55  And we can compare this with out.gn/learning/gen/embedded.cc: V8_EMBEDDED_TEXT_HEADER(v8_Default_embedded_blob_) __asm__( ... ".byte 0x55,0x48,0x89,0xe5,0x6a,0x18,0x48,0x83,0xec,0x28,0x48,0x89,0x55\n" ... );  The macro V8_EMBEDDED_TEXT_HEADER can be found src/snapshot/macros.h: #define V8_EMBEDDED_TEXT_HEADER(LABEL) \ __asm__(V8_ASM_DECLARE(#LABEL) \ ".csect " #LABEL "[DS]\n" \ #LABEL ":\n" \ ".llong ." #LABEL ", TOC[tc0], 0\n" \ V8_ASM_TEXT_SECTION \ "." #LABEL ":\n"); define V8_ASM_DECLARE(NAME) ".private_extern " V8_ASM_MANGLE_LABEL NAME "\n" #define V8_ASM_MANGLE_LABEL "_" #define V8_ASM_TEXT_SECTION ".csect .text[PR]\n"  And would be expanded by the preprocessor into:  __asm__(".private_extern " _ v8_Default_embedded_blob_ "\n" ".csect " v8_Default_embedded_blob_ "[DS]\n" v8_Default_embedded_blob_ ":\n" ".llong ." v8_Default_embedded_blob_ ", TOC[tc0], 0\n" ".csect .text[PR]\n" "." v8_Default_embedded_blob_ ":\n"); __asm__( ... ".byte 0x55,0x48,0x89,0xe5,0x6a,0x18,0x48,0x83,0xec,0x28,0x48,0x89,0x55\n" ... );  Back in src/snapshot/deserialzer.cc we are on this line:  Address address = d.InstructionStartOfBuiltin(builtin_index); CHECK_NE(kNullAddress, address); if (RelocInfo::OffHeapTargetIsCodedSpecially()) { // is false in our case so skipping the code here } else { MaybeObject* o = reinterpret_cast<MaybeObject*>(address); UnalignedCopy(current, &o); current++; } break;  ### print-code  ./d8 -print-bytecode  -print-code sample.js
[generated bytecode for function:  (0x2a180824ffbd <SharedFunctionInfo>)]
Parameter count 1
Register count 5
Frame size 40
0x2a1808250066 @    0 : 12 00             LdaConstant [0]
0x2a1808250068 @    2 : 26 f9             Star r2
0x2a180825006a @    4 : 27 fe f8          Mov <closure>, r3
0x2a180825006d @    7 : 61 32 01 f9 02    CallRuntime [DeclareGlobals], r2-r3
0x2a1808250072 @   12 : 0b                LdaZero
0x2a1808250073 @   13 : 26 fa             Star r1
0x2a1808250075 @   15 : 0d                LdaUndefined
0x2a1808250076 @   16 : 26 fb             Star r0
0x2a1808250078 @   18 : 00 0c 10 27       LdaSmi.Wide [10000]
0x2a180825007c @   22 : 69 fa 00          TestLessThan r1, [0]
0x2a180825007f @   25 : 9a 1c             JumpIfFalse [28] (0x2a180825009b @ 53)
0x2a1808250081 @   27 : a7                StackCheck
0x2a1808250082 @   28 : 13 01 01          LdaGlobal [1], [1]
0x2a1808250085 @   31 : 26 f9             Star r2
0x2a1808250087 @   33 : 0c 02             LdaSmi [2]
0x2a1808250089 @   35 : 26 f7             Star r4
0x2a180825008b @   37 : 5e f9 fa f7 03    CallUndefinedReceiver2 r2, r1, r4, [3]
0x2a1808250090 @   42 : 26 fb             Star r0
0x2a1808250092 @   44 : 25 fa             Ldar r1
0x2a1808250094 @   46 : 4c 05             Inc [5]
0x2a1808250096 @   48 : 26 fa             Star r1
0x2a1808250098 @   50 : 8a 20 00          JumpLoop [32], [0] (0x2a1808250078 @ 18)
0x2a180825009b @   53 : 25 fb             Ldar r0
0x2a180825009d @   55 : ab                Return
Constant pool (size = 2)
0x2a1808250035: [FixedArray] in OldSpace
- map: 0x2a18080404b1 <Map>
- length: 2
0: 0x2a180824ffe5 <FixedArray[2]>
1: 0x2a180824ff61 <String[#9]: something>
Handler Table (size = 0)
Source Position Table (size = 0)
[generated bytecode for function: something (0x2a180824fff5 <SharedFunctionInfo something>)]
Parameter count 3
Register count 0
Frame size 0
0x2a18082501ba @    0 : 25 02             Ldar a1
0x2a18082501bc @    2 : 34 03 00          Add a0, [0]
0x2a18082501bf @    5 : ab                Return
Constant pool (size = 0)
Handler Table (size = 0)
Source Position Table (size = 0)
--- Raw source ---
function something(x, y) {
return x + y
}
for (let i = 0; i < 10000; i++) {
something(i, 2);
}

--- Optimized code ---
optimization_id = 0
source_position = 0
kind = OPTIMIZED_FUNCTION
stack_slots = 14
compiler = turbofan

Instructions (size = 536)
0x108400082b20     0  488d1df9ffffff REX.W leaq rbx,[rip+0xfffffff9]
0x108400082b27     7  483bd9         REX.W cmpq rbx,rcx
0x108400082b2a     a  7418           jz 0x108400082b44  <+0x24>
0x108400082b2c     c  48ba6800000000000000 REX.W movq rdx,0x68
0x108400082b36    16  49bae0938c724b560000 REX.W movq r10,0x564b728c93e0  (Abort)    ;; off heap target
0x108400082b40    20  41ffd2         call r10
0x108400082b43    23  cc             int3l
0x108400082b44    24  8b59d0         movl rbx,[rcx-0x30]
0x108400082b47    27  4903dd         REX.W addq rbx,r13
0x108400082b4a    2a  f6430701       testb [rbx+0x7],0x1
0x108400082b4e    2e  740d           jz 0x108400082b5d  <+0x3d>
0x108400082b50    30  49bae0f781724b560000 REX.W movq r10,0x564b7281f7e0  (CompileLazyDeoptimizedCode)    ;; off heap target
0x108400082b5a    3a  41ffe2         jmp r10
0x108400082b5d    3d  55             push rbp
0x108400082b5e    3e  4889e5         REX.W movq rbp,rsp
0x108400082b61    41  56             push rsi
0x108400082b62    42  57             push rdi
0x108400082b63    43  48ba4200000000000000 REX.W movq rdx,0x42
0x108400082b6d    4d  4c8b15c4ffffff REX.W movq r10,[rip+0xffffffc4]
0x108400082b74    54  41ffd2         call r10
0x108400082b77    57  cc             int3l
0x108400082b78    58  4883ec18       REX.W subq rsp,0x18
0x108400082b7c    5c  488975a0       REX.W movq [rbp-0x60],rsi
0x108400082b80    60  488b4dd0       REX.W movq rcx,[rbp-0x30]
0x108400082b84    64  f6c101         testb rcx,0x1
0x108400082b87    67  0f8557010000   jnz 0x108400082ce4  <+0x1c4>
0x108400082b8d    6d  81f9204e0000   cmpl rcx,0x4e20
0x108400082b93    73  0f8c0b000000   jl 0x108400082ba4  <+0x84>
0x108400082b99    79  488b45d8       REX.W movq rax,[rbp-0x28]
0x108400082b9d    7d  488be5         REX.W movq rsp,rbp
0x108400082ba0    80  5d             pop rbp
0x108400082ba1    81  c20800         ret 0x8
0x108400082ba4    84  493b6560       REX.W cmpq rsp,[r13+0x60] (external value (StackGuard::address_of_jslimit()))
0x108400082ba8    88  0f8669000000   jna 0x108400082c17  <+0xf7>
0x108400082bae    8e  488bf9         REX.W movq rdi,rcx
0x108400082bb1    91  d1ff           sarl rdi, 1
0x108400082bb3    93  4c8bc7         REX.W movq r8,rdi
0x108400082bba    9a  0f8030010000   jo 0x108400082cf0  <+0x1d0>
0x108400082bc3    a3  0f8033010000   jo 0x108400082cfc  <+0x1dc>
0x108400082bc9    a9  e921000000     jmp 0x108400082bef  <+0xcf>
0x108400082bce    ae  6690           nop
0x108400082bd0    b0  488bcf         REX.W movq rcx,rdi
0x108400082bd6    b6  0f802c010000   jo 0x108400082d08  <+0x1e8>
0x108400082bdc    bc  4c8bc7         REX.W movq r8,rdi
0x108400082be3    c3  0f802b010000   jo 0x108400082d14  <+0x1f4>
0x108400082be9    c9  498bf8         REX.W movq rdi,r8
0x108400082bec    cc  4c8bc1         REX.W movq r8,rcx
0x108400082bef    cf  81ff10270000   cmpl rdi,0x2710
0x108400082bf5    d5  0f8d0b000000   jge 0x108400082c06  <+0xe6>
0x108400082bfb    db  493b6560       REX.W cmpq rsp,[r13+0x60] (external value (StackGuard::address_of_jslimit()))
0x108400082bff    df  77cf           ja 0x108400082bd0  <+0xb0>
0x108400082c01    e1  e943000000     jmp 0x108400082c49  <+0x129>
0x108400082c06    e6  498bc8         REX.W movq rcx,r8
0x108400082c0c    ec  0f8061000000   jo 0x108400082c73  <+0x153>
0x108400082c12    f2  488bc1         REX.W movq rax,rcx
0x108400082c15    f5  eb86           jmp 0x108400082b9d  <+0x7d>
0x108400082c17    f7  33c0           xorl rax,rax
0x108400082c19    f9  48bef50c240884100000 REX.W movq rsi,0x108408240cf5    ;; object: 0x108408240cf5 <NativeContext[261]>
0x108400082c23   103  48bb101206724b560000 REX.W movq rbx,0x564b72061210    ;; external reference (Runtime::StackGuard)
0x108400082c2d   10d  488bf8         REX.W movq rdi,rax
0x108400082c30   110  4c8bc6         REX.W movq r8,rsi
0x108400082c33   113  49ba2089a3724b560000 REX.W movq r10,0x564b72a38920  (CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit)    ;; off heap target
0x108400082c3d   11d  41ffd2         call r10
0x108400082c40   120  488b4dd0       REX.W movq rcx,[rbp-0x30]
0x108400082c44   124  e965ffffff     jmp 0x108400082bae  <+0x8e>
0x108400082c49   129  48897da8       REX.W movq [rbp-0x58],rdi
0x108400082c4d   12d  488b1dd1ffffff REX.W movq rbx,[rip+0xffffffd1]
0x108400082c54   134  33c0           xorl rax,rax
0x108400082c56   136  48bef50c240884100000 REX.W movq rsi,0x108408240cf5    ;; object: 0x108408240cf5 <NativeContext[261]>
0x108400082c60   140  4c8b15ceffffff REX.W movq r10,[rip+0xffffffce]
0x108400082c67   147  41ffd2         call r10
0x108400082c6a   14a  488b7da8       REX.W movq rdi,[rbp-0x58]
0x108400082c6e   14e  e95dffffff     jmp 0x108400082bd0  <+0xb0>
0x108400082c73   153  48b968ea2f744b560000 REX.W movq rcx,0x564b742fea68    ;; external reference (Heap::NewSpaceAllocationTopAddress())
0x108400082c7d   15d  488b39         REX.W movq rdi,[rcx]
0x108400082c80   160  4c8d4f0c       REX.W leaq r9,[rdi+0xc]
0x108400082c84   164  4c8945b0       REX.W movq [rbp-0x50],r8
0x108400082c88   168  49bb70ea2f744b560000 REX.W movq r11,0x564b742fea70    ;; external reference (Heap::NewSpaceAllocationLimitAddress())
0x108400082c92   172  4d390b         REX.W cmpq [r11],r9
0x108400082c95   175  0f8721000000   ja 0x108400082cbc  <+0x19c>
0x108400082c9b   17b  ba0c000000     movl rdx,0xc
0x108400082ca0   180  49ba200282724b560000 REX.W movq r10,0x564b72820220  (AllocateRegularInYoungGeneration)    ;; off heap target
0x108400082caa   18a  41ffd2         call r10
0x108400082cad   18d  488d78ff       REX.W leaq rdi,[rax-0x1]
0x108400082cb1   191  488b0dbdffffff REX.W movq rcx,[rip+0xffffffbd]
0x108400082cb8   198  4c8b45b0       REX.W movq r8,[rbp-0x50]
0x108400082cbc   19c  4c8d4f0c       REX.W leaq r9,[rdi+0xc]
0x108400082cc0   1a0  4c8909         REX.W movq [rcx],r9
0x108400082cc3   1a3  488d4f01       REX.W leaq rcx,[rdi+0x1]
0x108400082cc7   1a7  498bbd40010000 REX.W movq rdi,[r13+0x140] (root (heap_number_map))
0x108400082cce   1ae  8979ff         movl [rcx-0x1],rdi
0x108400082cd1   1b1  c4c1032ac0     vcvtlsi2sd xmm0,xmm15,r8
0x108400082cd6   1b6  c5fb114103     vmovsd [rcx+0x3],xmm0
0x108400082cdb   1bb  488bc1         REX.W movq rax,rcx
0x108400082cde   1be  e9bafeffff     jmp 0x108400082b9d  <+0x7d>
0x108400082ce3   1c3  90             nop
0x108400082ce4   1c4  49c7c500000000 REX.W movq r13,0x0
0x108400082ceb   1cb  e850f30300     call 0x1084000c2040     ;; eager deoptimization bailout
0x108400082cf0   1d0  49c7c501000000 REX.W movq r13,0x1
0x108400082cf7   1d7  e844f30300     call 0x1084000c2040     ;; eager deoptimization bailout
0x108400082cfc   1dc  49c7c502000000 REX.W movq r13,0x2
0x108400082d03   1e3  e838f30300     call 0x1084000c2040     ;; eager deoptimization bailout
0x108400082d08   1e8  49c7c503000000 REX.W movq r13,0x3
0x108400082d0f   1ef  e82cf30300     call 0x1084000c2040     ;; eager deoptimization bailout
0x108400082d14   1f4  49c7c504000000 REX.W movq r13,0x4
0x108400082d1b   1fb  e820f30300     call 0x1084000c2040     ;; eager deoptimization bailout
0x108400082d20   200  49c7c505000000 REX.W movq r13,0x5
0x108400082d27   207  e814f30700     call 0x108400102040     ;; lazy deoptimization bailout
0x108400082d2c   20c  49c7c506000000 REX.W movq r13,0x6
0x108400082d33   213  e808f30700     call 0x108400102040     ;; lazy deoptimization bailout

Source positions:
pc offset  position
f7         0

Inlined functions (count = 1)
0x10840824fff5 <SharedFunctionInfo something>

Deoptimization Input Data (deopt points = 7)
index  bytecode-offset    pc
0               22    NA
1                2    NA
2               46    NA
3                2    NA
4               46    NA
5               27   120
6               27   14a

Safepoints (size = 50)
0x108400082c40     120   200  10000010000000 (sp -> fp)       5
0x108400082c6a     14a   20c  10000000000000 (sp -> fp)       6
0x108400082cad     18d    NA  00000000000000 (sp -> fp)  <none>

RelocInfo (size = 34)
0x108400082b38  off heap target
0x108400082b52  off heap target
0x108400082c1b  full embedded object  (0x108408240cf5 <NativeContext[261]>)
0x108400082c25  external reference (Runtime::StackGuard)  (0x564b72061210)
0x108400082c35  off heap target
0x108400082c58  full embedded object  (0x108408240cf5 <NativeContext[261]>)
0x108400082ca2  off heap target
0x108400082cec  runtime entry  (eager deoptimization bailout)
0x108400082cf8  runtime entry  (eager deoptimization bailout)
0x108400082d04  runtime entry  (eager deoptimization bailout)
0x108400082d10  runtime entry  (eager deoptimization bailout)
0x108400082d1c  runtime entry  (eager deoptimization bailout)
0x108400082d28  runtime entry  (lazy deoptimization bailout)
0x108400082d34  runtime entry  (lazy deoptimization bailout)

--- End code ---
$ ### Building Google Test $ mkdir lib
$mkdir deps ; cd deps$ git clone git@github.com:google/googletest.git
$cd googletest/googletest$ /usr/bin/clang++ --std=c++14 -Iinclude -I. -pthread -c src/gtest-all.cc
$ar -rv libgtest-linux.a gtest-all.o$ cp libgtest-linux.a ../../../../lib/gtest


./lib/gtest/libgtest-linux.a(gtest-all.o):gtest-all.cc:function testing::internal::BoolFromGTestEnv(char const*, bool): error: undefined reference to 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::c_str() const'

$nm lib/gtest/libgtest-linux.a | grep basic_string | c++filt ....  There are a lot of symbols listed above but the point is that in the object file of libgtest-linux.a these symbols were compiled in. Now, when we compile v8 and the tests we are using -std=c++14 and we have to use the same when compiling gtest. Lets try that. Just adding that does not help in this case. We need to check which c++ headers are being used: $ /usr/bin/clang++ -print-search-dirs
programs: =/usr/bin:/usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../x86_64-redhat-linux/bin
libraries: =/usr/lib64/clang/9.0.0:
/usr/bin/../lib/gcc/x86_64-redhat-linux/9:
/usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../lib64:
/usr/bin/../lib64:
/lib/../lib64:
/usr/lib/../lib64:
/usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../..:
/usr/bin/../lib:
/lib:/usr/lib
$ Lets search for the string header and inspect the namespace in that header: $ find /usr/ -name string
/usr/include/c++/9/debug/string
/usr/include/c++/9/experimental/string
/usr/include/c++/9/string
/usr/src/debug/gcc-9.2.1-1.fc31.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/string

$vi /usr/include/c++/9/string  So this looks alright and thinking about this a little more I've been bitten by the linking with different libc++ symbols issue (again). When we compile using Make we are using the c++ headers that are shipped with v8 (clang libc++). Take the string header for example in v8/buildtools/third_party/libc++/trunk/include/string which is from clang's c++ library which does not use namespaces (__11 or __14 etc). But when I compiled gtest did not specify the istystem include path and the default would be used adding symbols with __11 into them. When the linker tries to find these symbols it fails as it does not have any such symbols in the libraries that it searches. Create a simple test linking with the standard build of gtest to see if that compiles and runs: $ /usr/bin/clang++ -std=c++14 -I./deps/googletest/googletest/include  -LPWD/lib -g -O0 -o test/simple_test test/main.cc test/simple.cc lib/libgtest.a -lpthread  That worked and does not segfault. But when I run the version that is built using the makefile I get: lldb) target create "./test/persistent-object_test" Current executable set to './test/persistent-object_test' (x86_64). (lldb) r Process 1024232 launched: '/home/danielbevenius/work/google/learning-v8/test/persistent-object_test' (x86_64) warning: (x86_64) /lib64/libgcc_s.so.1 unsupported DW_FORM values: 0x1f20 0x1f21 [ FATAL ] Process 1024232 stopped * thread #1, name = 'persistent-obje', stop reason = signal SIGSEGV: invalid address (fault address: 0x33363658) frame #0: 0x00007ffff7c0a7b0 libc.so.6__GI___libc_free + 32 libc.so.6__GI___libc_free: -> 0x7ffff7c0a7b0 <+32>: mov rax, qword ptr [rdi - 0x8] 0x7ffff7c0a7b4 <+36>: lea rsi, [rdi - 0x10] 0x7ffff7c0a7b8 <+40>: test al, 0x2 0x7ffff7c0a7ba <+42>: jne 0x7ffff7c0a7f0 ; <+96> (lldb) bt * thread #1, name = 'persistent-obje', stop reason = signal SIGSEGV: invalid address (fault address: 0x33363658) * frame #0: 0x00007ffff7c0a7b0 libc.so.6__GI___libc_free + 32 frame #1: 0x000000000042bb58 persistent-object_teststd::__1::basic_stringbuf<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_stringbuf(this=0x000000000046e908) at iosfwd:130:32 frame #2: 0x000000000042ba4f persistent-object_teststd::__1::basic_stringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_stringstream(this=0x000000000046e8f0, vtt=0x000000000044db28) at iosfwd:139:32 frame #3: 0x0000000000420176 persistent-object_teststd::__1::basic_stringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_stringstream(this=0x000000000046e8f0) at iosfwd:139:32 frame #4: 0x000000000042bacc persistent-object_teststd::__1::basic_stringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_stringstream(this=0x000000000046e8f0) at iosfwd:139:32 frame #5: 0x0000000000427f4e persistent-object_testtesting::internal::scoped_ptr<std::__1::basic_stringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> > >::reset(this=0x00007fffffffcee8, p=0x0000000000000000) at gtest-port.h:1216:9 frame #6: 0x0000000000427ee9 persistent-object_testtesting::internal::scoped_ptr<std::__1::basic_stringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> > >::~scoped_ptr(this=0x00007fffffffcee8) at gtest-port.h:1201:19 frame #7: 0x000000000041f265 persistent-object_testtesting::Message::~Message(this=0x00007fffffffcee8) at gtest-message.h:89:18 frame #8: 0x00000000004235ec persistent-object_teststd::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > testing::internal::StreamableToString<int>(streamable=0x00007fffffffcf9c) at gtest-message.h:247:3 frame #9: 0x000000000040d2bd persistent-object_testtesting::internal::FormatFileLocation(file="/home/danielbevenius/work/google/learning-v8/deps/googletest/googletest/src/gtest-internal-inl.h", line=663) at gtest-port.cc:946:28 frame #10: 0x000000000041b7e2 persistent-object_testtesting::internal::GTestLog::GTestLog(this=0x00007fffffffd060, severity=GTEST_FATAL, file="/home/danielbevenius/work/google/learning-v8/deps/googletest/googletest/src/gtest-internal-inl.h", line=663) at gtest-port.cc:972:18 frame #11: 0x000000000042242c persistent-object_testtesting::internal::UnitTestImpl::AddTestInfo(this=0x000000000046e480, set_up_tc=(persistent-object_testtesting::Test::SetUpTestCase() at gtest.h:427), tear_down_tc=(persistent-object_testtesting::Test::TearDownTestCase() at gtest.h:435), test_info=0x000000000046e320)(), void (*)(), testing::TestInfo*) at gtest-internal-inl.h:663:7 frame #12: 0x000000000040d04f persistent-object_testtesting::internal::MakeAndRegisterTestInfo(test_case_name="Persistent", name="object", type_param=0x0000000000000000, value_param=0x0000000000000000, code_location=<unavailable>, fixture_class_id=0x000000000046d748, set_up_tc=(persistent-object_testtesting::Test::SetUpTestCase() at gtest.h:427), tear_down_tc=(persistent-object_testtesting::Test::TearDownTestCase() at gtest.h:435), factory=0x000000000046e300)(), void (*)(), testing::internal::TestFactoryBase*) at gtest.cc:2599:22 frame #13: 0x00000000004048b8 persistent-object_test::__cxx_global_var_init() at persistent-object_test.cc:5:1 frame #14: 0x00000000004048e9 persistent-object_test_GLOBAL__sub_I_persistent_object_test.cc at persistent-object_test.cc:0 frame #15: 0x00000000004497a5 persistent-object_test__libc_csu_init + 69 frame #16: 0x00007ffff7ba512e libc.so.6__libc_start_main + 126 frame #17: 0x0000000000404eba persistent-object_test_start + 42  ### Google test (gtest) linking issue This issue came up when linking a unit test with gtest: /usr/bin/ld: ./lib/gtest/libgtest-linux.a(gtest-all.o): in function testing::internal::BoolFromGTestEnv(char const*, bool)': /home/danielbevenius/work/google/learning-v8/deps/googletest/googletest/src/gtest-port.cc:1259: undefined reference to std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string()'  So this indicated that the object files in libgtest-linux.a where infact using headers from libc++ and not libstc++. This was a really stupig mistake on my part, I'd not specified the output file explicitly (-o) so this was getting added into the current working directory, but the file included in the archive was taken from within deps/googltest/googletest/ directory which was old and compiled using libc++. ### Peristent cast-function-type This issue was seen in Node.js when compiling with GCC. It can also been see if building V8 using GCC and also enabling -Wcast-function-type in BUILD.gn:  "-Wcast-function-type",  There are unit tests in V8 that also produce this warning, for example test/cctest/test-global-handles.cc: Original: g++ -MMD -MF obj/test/cctest/cctest_sources/test-global-handles.o.d -DV8_INTL_SUPPORT -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_X11=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DCR_SYSROOT_HASH=9c905c99558f10e19cc878b5dca1d4bd58c607ae -D_DEBUG -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DENABLE_DISASSEMBLER -DV8_TYPED_ARRAY_MAX_SIZE_IN_HEAP=64 -DENABLE_GDB_JIT_INTERFACE -DENABLE_MINOR_MC -DOBJECT_PRINT -DV8_TRACE_MAPS -DV8_ENABLE_ALLOCATION_TIMEOUT -DV8_ENABLE_FORCE_SLOW_PATH -DV8_ENABLE_DOUBLE_CONST_STORE_CHECK -DV8_INTL_SUPPORT -DENABLE_HANDLE_ZAPPING -DV8_SNAPSHOT_NATIVE_CODE_COUNTERS -DV8_CONCURRENT_MARKING -DV8_ENABLE_LAZY_SOURCE_POSITIONS -DV8_CHECK_MICROTASKS_SCOPES_CONSISTENCY -DV8_EMBEDDED_BUILTINS -DV8_WIN64_UNWINDING_INFO -DV8_ENABLE_REGEXP_INTERPRETER_THREADED_DISPATCH -DV8_SNAPSHOT_COMPRESSION -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DV8_IMMINENT_DEPRECATION_WARNINGS -DV8_TARGET_ARCH_X64 -DV8_HAVE_TARGET_OS -DV8_TARGET_OS_LINUX -DDEBUG -DDISABLE_UNTRUSTED_CODE_MITIGATIONS -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DV8_IMMINENT_DEPRECATION_WARNINGS -DU_USING_ICU_NAMESPACE=0 -DU_ENABLE_DYLOAD=0 -DUSE_CHROMIUM_ICU=1 -DU_STATIC_IMPLEMENTATION -DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -DUCHAR_TYPE=uint16_t -I../.. -Igen -I../../include -Igen/include -I../.. -Igen -I../../third_party/icu/source/common -I../../third_party/icu/source/i18n -I../../include -I../../tools/debug_helper -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -funwind-tables -fPIC -pipe -B../../third_party/binutils/Linux_x64/Release/bin -pthread -m64 -march=x86-64 -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -Wall -Wno-unused-local-typedefs -Wno-maybe-uninitialized -Wno-deprecated-declarations -Wno-comments -Wno-packed-not-aligned -Wno-missing-field-initializers -Wno-unused-parameter -fno-omit-frame-pointer -g2 -Wno-strict-overflow -Wno-return-type -Wcast-function-type -O3 -fno-ident -fdata-sections -ffunction-sections -fvisibility=default -std=gnu++14 -Wno-narrowing -Wno-class-memaccess -fno-exceptions -fno-rtti --sysroot=../../build/linux/debian_sid_amd64-sysroot -c ../../test/cctest/test-global-handles.cc -o obj/test/cctest/cctest_sources/test-global-handles.o In file included from ../../include/v8-inspector.h:14, from ../../src/execution/isolate.h:15, from ../../src/api/api.h:10, from ../../src/api/api-inl.h:8, from ../../test/cctest/test-global-handles.cc:28: ../../include/v8.h: In instantiation of ‘void v8::PersistentBase<T>::SetWeak(P*, typename v8::WeakCallbackInfo<P>::Callback, v8::WeakCallbackType) [with P = v8::Global<v8::Object>; T = v8::Object; typename v8::WeakCallbackInfo<P>::Callback = void (*)(const v8::WeakCallbackInfo<v8::Global<v8::Object> >&)]’: ../../test/cctest/test-global-handles.cc:292:47: required from here ../../include/v8.h:10750:16: warning: cast between incompatible function types from ‘v8::WeakCallbackInfo<v8::Global<v8::Object> >::Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<v8::Global<v8::Object> >&)’} to ‘Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<void>&)’} [-Wcast-function-type] 10750 | reinterpret_cast<Callback>(callback), type); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../../include/v8.h: In instantiation of ‘void v8::PersistentBase<T>::SetWeak(P*, typename v8::WeakCallbackInfo<P>::Callback, v8::WeakCallbackType) [with P = v8::internal::{anonymous}::FlagAndGlobal; T = v8::Object; typename v8::WeakCallbackInfo<P>::Callback = void (*)(const v8::WeakCallbackInfo<v8::internal::{anonymous}::FlagAndGlobal>&)]’: ../../test/cctest/test-global-handles.cc:493:53: required from here ../../include/v8.h:10750:16: warning: cast between incompatible function types from ‘v8::WeakCallbackInfo<v8::internal::{anonymous}::FlagAndGlobal>::Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<v8::internal::{anonymous}::FlagAndGlobal>&)’} to ‘Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<void>&)’} [-Wcast-function-type]  Formatted for git commit message: g++ -MMD -MF obj/test/cctest/cctest_sources/test-global-handles.o.d ... In file included from ../../include/v8-inspector.h:14, from ../../src/execution/isolate.h:15, from ../../src/api/api.h:10, from ../../src/api/api-inl.h:8, from ../../test/cctest/test-global-handles.cc:28: ../../include/v8.h: In instantiation of ‘void v8::PersistentBase<T>::SetWeak( P*, typename v8::WeakCallbackInfo<P>::Callback, v8::WeakCallbackType) [with P = v8::Global<v8::Object>; T = v8::Object; typename v8::WeakCallbackInfo<P>::Callback = void (*)(const v8::WeakCallbackInfo<v8::Global<v8::Object> >&) ]’: ../../test/cctest/test-global-handles.cc:292:47: required from here ../../include/v8.h:10750:16: warning: cast between incompatible function types from ‘v8::WeakCallbackInfo<v8::Global<v8::Object> >::Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<v8::Global<v8::Object> >&)’} to ‘Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<void>&)’} [-Wcast-function-type] 10750 | reinterpret_cast<Callback>(callback), type); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  This commit suggests adding a pragma specifically for GCC to suppress this warning. The motivation for this is that there were quite a few of these warnings in the Node.js build, but these have been suppressed by adding a similar pragma but around the include of v8.h [1]. 
In file included from persistent-obj.cc:8:
/home/danielbevenius/work/google/v8_src/v8/include/v8.h: In instantiation of ‘void v8::PersistentBase<T>::SetWeak(P*, typename v8::WeakCallbackInfo<P>::Callback, v8::WeakCallbackType) [with P = Something; T = v8::Object; typename v8::WeakCallbackInfo<P>::Callback = void (*)(const v8::WeakCallbackInfo<Something>&)]’:

persistent-obj.cc:57:38:   required from here
/home/danielbevenius/work/google/v8_src/v8/include/v8.h:10750:16: warning: cast between incompatible function types from ‘v8::WeakCallbackInfo<Something>::Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<Something>&)’} to ‘Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<void>&)’} [-Wcast-function-type]
10750 |                reinterpret_cast<Callback>(callback), type);
|                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Currently, we have added a pragma to avoid this warning in node.js but we'd like to add this in v8 and closer to the actual code that is causing it. In node we have to set the praga on the header.

template <class T>
template <typename P>
V8_INLINE void PersistentBase<T>::SetWeak(
P* parameter,
typename WeakCallbackInfo<P>::Callback callback,
WeakCallbackType type) {
typedef typename WeakCallbackInfo<void>::Callback Callback;
reinterpret_cast<Callback>(callback), type);
}


Notice the second parameter is typename WeakCallbackInfo<P>::Callback which is a typedef:

  typedef void (*Callback)(const WeakCallbackInfo<T>& data);


This is a function declaration for Callback which is a function that takes a reference to a const WeakCallbackInfo and returns void. So we could define it like this:

void WeakCallback(const v8::WeakCallbackInfo<Something>& data) {
Something* obj = data.GetParameter();
std::cout << "in make weak callback..." << '\n';
}


And the trying to cast it into:

  typedef typename v8::WeakCallbackInfo<void>::Callback Callback;
Callback cb = reinterpret_cast<Callback>(WeakCallback);


This is done as V8::MakeWeak has the following signature:

void V8::MakeWeak(i::Address* location, void* parameter,
WeakCallbackInfo<void>::Callback weak_callback,
WeakCallbackType type) {
i::GlobalHandles::MakeWeak(location, parameter, weak_callback, type);
}


### gdb warnings

warning: Could not find DWO CU obj/v8_compiler/common-node-cache.dwo(0x42b8adb87d74d56b) referenced by CU at offset 0x206f7 [in module /home/danielbevenius/work/google/learning-v8/hello-world]


This can be worked around by specifying the --cd argument to gdb:

$gdb --cd=/home/danielbevenius/work/google/v8_src/v8/out/x64.release --args /home/danielbevenius/work/google/learning-v8/hello-world  ### Building with g++ Update args.gn to include: is_clang = false  Next I got the following error when trying to compile: $ ninja -v -C out/x64.release/ obj/test/cctest/cctest_sources/test-global-handles.o
ux/debian_sid_amd64-sysroot -fexceptions -frtti -c ../../src/torque/instance-type-generator.cc -o obj/torque_base/instance-type-generator.o
In file included from /usr/include/c++/9/bits/stl_algobase.h:59,
from /usr/include/c++/9/memory:62,
from ../../src/torque/implementation-visitor.h:8,
from ../../src/torque/instance-type-generator.cc:5:
/usr/include/c++/9/x86_64-redhat-linux/bits/c++config.h:3:10: fatal error: bits/wordsize.h: No such file or directory
3 | #include <bits/wordsize.h>
|          ^~~~~~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

$export CPATH=/usr/include  third_party/binutils/Linux_x64/Release/bin/ld.gold: error: cannot open /usr/lib64/libatomic.so.1.2.0: No such file or directory  $ sudo dnf install -y libatomic


I still got an error because of a warning but I'm trying to build using:

treat_warnings_as_errors = false


Lets see how that works out. I also had to use gnus linker by disableing gold:

use_gold = false


### CodeStubAssembler

This history of this is that JavaScript builtins used be written in assembly which gave very good performance but made porting V8 to different architectures more difficult as these builtins had to have specific implementations for each supported architecture, so it dit not scale very well. With the addition of features to the JavaScript specifications having to support new features meant having to implement them for all platforms which made it difficult to keep up and deliver these new features.

The goal is to have the perfomance of handcoded assembly but not have to write it for every platform. So a portable assembly language was build on top of Tubofans backend. This is an API that generates Turbofan's machine-level IR. This IR can be used by Turbofan to produce very good machine code on all platforms. So one "only" has to implement one component/function/feature (not sure what to call this) and then it can be made available to all platforms. They no longer have to maintain all that handwritten assembly.

Just to be clear CSA is a C++ API that is used to generate IR which is then compiled in to machine code for the target instruction set architectur.

### Torque

Torque is a DLS language to avoid having to use the CodeStubAssembler directly (it is still used behind the scene). This language is statically typed, garbage collected, and compatible with JavaScript.

The JavaScript standard library was implemented in V8 previously using hand written assembly. But as we mentioned in the previous section this did not scale.

It could have been written in JavaScript too, and I think this was done in the past but this has some issues as builtins would need warmup time to become optimized, there were also issues with monkey-patching and exposing VM internals unintentionally.

Is torque run a build time, I'm thinking yes as it would have to generate the c++ code.

There is a main function in torque.cc which will be built into an executable

$./out/x64.release_gcc/torque --help Unexpected command-line argument "--help", expected a .tq file.  The files that are processed by torque are defined in BUILD.gc in the torque_files section. There is also a template named run_torque. I've noticed that this template and others in GN use the script tools/run.py. This is apperently because GN can only execute scripts at the moment and what this script does is use python to create a subprocess with the passed in argument: $ gn help action


And a template is way to reuse code in GN.

There is a make target that shows what is generated by torque:

$make torque-example  This will create a directory in the current directory named gen/torque-generated. Notice that this directory contains c++ headers and sources. It take torque-example.tq as input. For this file the following header will be generated: #ifndef V8_GEN_TORQUE_GENERATED_TORQUE_EXAMPLE_TQ_H_ #define V8_GEN_TORQUE_GENERATED_TORQUE_EXAMPLE_TQ_H_ #include "src/builtins/builtins-promise.h" #include "src/compiler/code-assembler.h" #include "src/codegen/code-stub-assembler.h" #include "src/utils/utils.h" #include "torque-generated/field-offsets-tq.h" #include "torque-generated/csa-types-tq.h" namespace v8 { namespace internal { void HelloWorld_0(compiler::CodeAssemblerState* state_); } // namespace internal } // namespace v8 #endif // V8_GEN_TORQUE_GENERATED_TORQUE_EXAMPLE_TQ_H_  This is only to show the generated files and make it clear that torque will generate these file which will then be compiled during the v8 build. So, lets try copying example-torque.tq to v8/src/builtins directory. $ cp torque-example.tq ../v8_src/v8/src/builtins/


This is not enough to get it included in the build, we have to update BUILD.gn and add this file to the torque_files list. After running the build we can see that there is a file named src/builtins/torque-example-tq-csa.h generated along with a .cc.

To understand how this works I'm going to use https://v8.dev/docs/torque-builtins as a starting point:

  transitioning javascript builtin
MathIs42(js-implicit context: NativeContext, receiver: JSAny)(x: JSAny): Boolean {
const number: Number = ToNumber_Inline(x);
typeswitch (number) {
case (smi: Smi): {
return smi == 42 ? True : False;
}
case (heapNumber: HeapNumber): {
return Convert<float64>(heapNumber) == 42 ? True : False;
}
}
}


This has been updated to work with the latest V8 version.

Next, we need to update src/init/bootstrappers.cc to add/install this function on the math object:

  SimpleInstallFunction(isolate_, math, "is42", Builtins::kMathIs42, 1, true);


After this we need to rebuild v8:

$env CPATH=/usr/include ninja -v -C out/x64.release_gcc  $ d8
d8> Math.is42(42)
true
d8> Math.is42(2)
false


If we look at the generated code that Torque has produced in out/x64.release_gcc/gen/torque-generated/src/builtins/math-tq-csa.cc (we can run it through the preprocessor using):

$clang++ --sysroot=build/linux/debian_sid_amd64-sysroot -isystem=./buildtools/third_party/libc++/trunk/include -isystem=buildtools/third_party/libc++/trunk/include -I. -E out/x64.release_gcc/gen/torque-generated/src/builtins/math-tq-csa.cc > math.cc.pp  If we open math.cc.pp and search for Is42 we can find: class MathIs42Assembler : public CodeStubAssembler { public: using Descriptor = Builtin_MathIs42_InterfaceDescriptor; explicit MathIs42Assembler(compiler::CodeAssemblerState* state) : CodeStubAssembler(state) {} void GenerateMathIs42Impl(); Node* Parameter(Descriptor::ParameterIndices index) { return CodeAssembler::Parameter(static_cast<int>(index)); } }; void Builtins::Generate_MathIs42(compiler::CodeAssemblerState* state) { MathIs42Assembler assembler(state); state->SetInitialDebugInformation("MathIs42", "out/x64.release_gcc/gen/torque-generated/src/builtins/math-tq-csa.cc", 2121); if (Builtins::KindOf(Builtins::kMathIs42) == Builtins::TFJ) { assembler.PerformStackCheck(assembler.GetJSContextParameter()); } assembler.GenerateMathIs42Impl(); } void MathIs42Assembler::GenerateMathIs42Impl() { ...  So this is what gets generated by the Torque compiler and what we see above is CodeStubAssemble class. If we take a look in out/x64.release_gcc/gen/torque-generated/builtin-definitions-tq.h we can find the following line that has been generated: TFJ(MathIs42, 1, kReceiver, kX) \  Now, there is a section about the TF_BUILTIN macro, and it will create function declarations, and function and class definitions: Now, in src/builtins/builtins.h we have the following macros: class Builtins { public: enum Name : int32_t { #define DEF_ENUM(Name, ...) k##Name, BUILTIN_LIST(DEF_ENUM, DEF_ENUM, DEF_ENUM, DEF_ENUM, DEF_ENUM, DEF_ENUM, DEF_ENUM) #undef DEF_ENUM ... } #define DECLARE_TF(Name, ...) \ static void Generate_##Name(compiler::CodeAssemblerState* state); BUILTIN_LIST(IGNORE_BUILTIN, DECLARE_TF, DECLARE_TF, DECLARE_TF, DECLARE_TF, IGNORE_BUILTIN, DECLARE_ASM)  And BUILTINS_LIST is declared in src/builtins/builtins-definitions.h and this file includes: #include "torque-generated/builtin-definitions-tq.h" #define BUILTIN_LIST(CPP, TFJ, TFC, TFS, TFH, BCH, ASM) \ BUILTIN_LIST_BASE(CPP, TFJ, TFC, TFS, TFH, ASM) \ BUILTIN_LIST_FROM_TORQUE(CPP, TFJ, TFC, TFS, TFH, ASM) \ BUILTIN_LIST_INTL(CPP, TFJ, TFS) \ BUILTIN_LIST_BYTECODE_HANDLERS(BCH)  Notice BUILTIN_LIST_FROM_TORQUE, this is how our MathIs42 gets included from builtin-definitions-tq.h. This is in turn included by builtins.h. If we take a look at the this header after it has gone through the preprocessor we can see what has been generated for MathIs42: $ clang++ --sysroot=build/linux/debian_sid_amd64-sysroot -isystem=./buildtools/third_party/libc++/trunk/include -isystem=buildtools/third_party/libc++/trunk/include -I. -I./out/x64.release_gcc/gen/ -E src/builtins/builtins.h > builtins.h.pp


First MathIs42 will be come a member in the Name enum of the Builtins class:

class Builtins {
public:

enum Name : int32_t {
...
kMathIs42,
};

static void Generate_MathIs42(compiler::CodeAssemblerState* state);


We should also take a look in src/builtins/builtins-descriptors.h as the BUILTIN_LIST is used there two and specifically to our current example there is a DEFINE_TFJ_INTERFACE_DESCRIPTOR macro used:

BUILTIN_LIST(IGNORE_BUILTIN, DEFINE_TFJ_INTERFACE_DESCRIPTOR,
DEFINE_TFC_INTERFACE_DESCRIPTOR, DEFINE_TFS_INTERFACE_DESCRIPTOR,
DEFINE_TFH_INTERFACE_DESCRIPTOR, IGNORE_BUILTIN,
DEFINE_ASM_INTERFACE_DESCRIPTOR)

#define DEFINE_TFJ_INTERFACE_DESCRIPTOR(Name, Argc, ...)                \
struct Builtin_##Name##_InterfaceDescriptor {                         \
enum ParameterIndices {                                             \
kJSTarget = compiler::CodeAssembler::kTargetParameterIndex,       \
##__VA_ARGS__,                                                    \
kJSNewTarget,                                                     \
kJSActualArgumentsCount,                                          \
kContext,                                                         \
kParameterCount,                                                  \
};                                                                  \
};


So the above will generate the following code but this time for builtins.cc:

$clang++ --sysroot=build/linux/debian_sid_amd64-sysroot -isystem=./buildtools/third_party/libc++/trunk/include -isystem=buildtools/third_party/libc++/trunk/include -I. -I./out/x64.release_gcc/gen/ -E src/builtins/builtins.cc > builtins.cc.pp  struct Builtin_MathIs42_InterfaceDescriptor { enum ParameterIndices { kJSTarget = compiler::CodeAssembler::kTargetParameterIndex, kReceiver, kX, kJSNewTarget, kJSActualArgumentsCount, kContext, kParameterCount, }; const BuiltinMetadata builtin_metadata[] = { ... {"MathIs42", Builtins::TFJ, {1, 0}} ... };  BuiltinMetadata is a struct defined in builtins.cc and in our case the name is passed, then the type, and the last struct is specifying the number of parameters and the last 0 is unused as far as I can tell and only there make it different from the constructor that takes an Address parameter. So, where is Generate_MathIs42 used: void SetupIsolateDelegate::SetupBuiltinsInternal(Isolate* isolate) { Code code; ... code = BuildWithCodeStubAssemblerJS(isolate, index, &Builtins::Generate_MathIs42, 1, "MathIs42"); AddBuiltin(builtins, index++, code); ...  BuildWithCodeStubAssemblerJS can be found in src/builtins/setup-builtins-internal.cc Code BuildWithCodeStubAssemblerJS(Isolate* isolate, int32_t builtin_index, CodeAssemblerGenerator generator, int argc, const char* name) { Zone zone(isolate->allocator(), ZONE_NAME); const int argc_with_recv = (argc == kDontAdaptArgumentsSentinel) ? 0 : argc + 1; compiler::CodeAssemblerState state( isolate, &zone, argc_with_recv, Code::BUILTIN, name, PoisoningMitigationLevel::kDontPoison, builtin_index); generator(&state); Handle<Code> code = compiler::CodeAssembler::GenerateCode( &state, BuiltinAssemblerOptions(isolate, builtin_index)); return *code;  Lets add a conditional break point so that we can stop in this function when MathIs42 is passed in: (gdb) br setup-builtins-internal.cc:161 (gdb) cond 1 ((int)strcmp(name, "MathIs42")) == 0  We can see that we first create a new CodeAssemblerState, which we say previously was that type that the Generate_MathIs42 function takes. TODO: look into this class a litte more. After this generator will be called with the newly created state passed in: (gdb) p generator$8 = (v8::internal::(anonymous namespace)::CodeAssemblerGenerator) 0x5619fd61b66e <v8::internal::Builtins::Generate_MathIs42(v8::internal::compiler::CodeAssemblerState*)>


TODO: Take a closer look at generate and how that code works. After generate returns we will have the following call:

  generator(&state);
Handle<Code> code = compiler::CodeAssembler::GenerateCode(
&state, BuiltinAssemblerOptions(isolate, builtin_index));
return *code;


Then next thing that will happen is the code returned will be added to the builtins by calling SetupIsolateDelegate::AddBuiltin:

void SetupIsolateDelegate::AddBuiltin(Builtins* builtins, int index, Code code) {
builtins->set_builtin(index, code);
}


set_builtins can be found in src/builtins/builtins.cc and looks like this:

void Builtins::set_builtin(int index, Code builtin) {
isolate_->heap()->set_builtin(index, builtin);
}


And Heap::set_builtin does:

 void Heap::set_builtin(int index, Code builtin) {
isolate()->builtins_table()[index] = builtin.ptr();
}


So this is how the builtins_table is populated.

And when is SetupBuiltinsInternal called?
It is called from SetupIsolateDelegat::SetupBuiltins which is called from Isolate::Init.

Just to recap before I loose track of what is going on...We have math.tq, which is the torque source file. This is parsed by the torque compiler/parser and it will generate c++ headers and source files, one of which will be a CodeStubAssembler class for our MathI42 function. It will also generate the "torque-generated/builtin-definitions-tq.h. After this has happened the sources need to be compiled into object files. After that if a snapshot is configured to be created, mksnapshot will create a new Isolate and in that process the MathIs42 builtin will get added. Then a context will be created and saved. The snapshot can then be deserialized into an Isoalte as some later point.

Alright, so we have seen what gets generated for the function MathIs42 but how does this get "hooked" but to enable us to call Math.is42(11)?

In bootstrapper.cc we can see a number of lines:

 SimpleInstallFunction(isolate_, math, "trunc", Builtins::kMathTrunc, 1, true);


And we are going to add a line like the following:

 SimpleInstallFunction(isolate_, math, "is42", Builtins::kMathIs42, 1, true);


The signature for SimpleInstallFunction looks like this

V8_NOINLINE Handle<JSFunction> SimpleInstallFunction(
Isolate* isolate, Handle<JSObject> base, const char* name,
Builtins::Name call, int len, bool adapt,
PropertyAttributes attrs = DONT_ENUM) {
Handle<String> internalized_name = isolate->factory()->InternalizeUtf8String(name);
Handle<JSFunction> fun = SimpleCreateFunction(isolate, internalized_name, call, len, adapt);
return fun;
}


So we see that the function is added as a property to the Math object. Notice that we also have to add kMathIs42 to the Builtins class which is now part of the builtins_table_ array which we went through above.

#### Transitioning/Transient

In torgue source files we can sometimes see types declared as transient, and functions that have a transitioning specifier. In V8 HeapObjects can change at runtime (I think an example of this would be deleting an element in an array which would transition it to a different type of array HoleyElementArray or something like that. TODO: verify and explain this). And a function that calls JavaScript which cause such a transition is marked with transitioning.

#### Callables

Are like functions is js/c++ but have some additional capabilities and there are several different types of callables:

macro callables

These correspond to generated CodeStubAssebler C++ that will be inlined at the callsite.

builtin callables

These will become V8 builtins with info added to builtin-definitions.h (via the include of torque-generated/builtin-definitions-tq.h). There is only one copy of this and this will be a call instead of being inlined as is the case with macros.

runtime callables

intrinsic callables

#### Explicit parameters

macros and builtins can have parameters. For example:

@export
macro HelloWorld1(msg: JSAny) {
Print(msg);
}


And we can call this from another macro like this:

@export
macro HelloWorld() {
HelloWorld1('Hello World');
}


#### Implicit parameters

In the previous section we showed explicit parameters but we can also have implicit parameters:

@export
macro HelloWorld2(implicit msg: JSAny)() {
Print(msg);
}
@export
macro HelloWorld() {
const msg = 'Hello implicit';
HelloWorld2();
}


### Troubleshooting

Compilation error when including src/objects/objects-inl.h:

/home/danielbevenius/work/google/v8_src/v8/src/objects/object-macros.h:263:14: error: no declaration matches ‘bool v8::internal::HeapObject::IsJSCollator() const’


Does this need i18n perhaps?

$gn args --list out/x64.release_gcc | grep i18n v8_enable_i18n_support  usr/bin/ld: /tmp/ccJOrUMl.o: in function v8::internal::MaybeHandle<v8::internal::Object>::Check() const': /home/danielbevenius/work/google/v8_src/v8/src/handles/maybe-handles.h:44: undefined reference to V8_Fatal(char const*, ...)' collect2: error: ld returned 1 exit status  V8_Fatal is referenced but not defined in v8_monolith.a: $ nm libv8_monolith.a | grep V8_Fatal | c++filt
...
U V8_Fatal(char const*, int, char const*, ...)


And I thought it might be defined in libv8_libbase.a but it is the same there. Actually, I was looking at the wrong symbol. This was not from the logging.o object file. If we look at it we find:

v8_libbase/logging.o:
...
0000000000000000 T V8_Fatal(char const*, int, char const*, ...)


In out/x64.release/obj/logging.o we can find it defined:

$nm -C libv8_libbase.a | grep -A 50 logging.o | grep V8_Fatal 0000000000000000 T V8_Fatal(char const*, int, char const*, ...)  T means that the symbol is in the text section. So if the linker is able to find libv8_libbase.a it should be able to resolve this. So we need to make sure the linker can find the directory where the libraries are located ('-Wl,-Ldir'), and also that it will include the library ('-Wl,-llibname') With this in place I can see that the linker can open the archive: attempt to open /home/danielbevenius/work/google/v8_src/v8/out/x64.release_gcc/obj/libv8_libbase.so failed attempt to open /home/danielbevenius/work/google/v8_src/v8/out/x64.release_gcc/obj/libv8_libbase.a succeeded /home/danielbevenius/work/google/v8_src/v8/out/x64.release_gcc/obj/libv8_libbase.a  But I'm still getting the same linking error. If we look closer at the error message we can see that it is maybe-handles.h that is complaining. Could it be that the order is incorrect when linking. libv8_libbase.a needs to come after libv8_monolith Something I noticed is that even though the library libv8_libbase.a is found it does not look like the linker actually reads the object files. I can see that it does this for libv8_monolith.a: (/home/danielbevenius/work/google/v8_src/v8/out/x64.release_gcc/obj/libv8_monolith.a)common-node-cache.o  Hmm, actually looking at the signature of the function it is V8_Fatal(char const*, ...) and not char const*, int, char const*, ...) For a debug build it will be:  void V8_Fatal(const char* file, int line, const char* format, ...);  And else  void V8_Fatal(const char* format, ...);  So it looks like I need to set debug to false. With this the V8_Fatal symbol in logging.o is: $ nm -C out/x64.release_gcc/obj/v8_libbase/logging.o | grep V8_Fatal
0000000000000000 T V8_Fatal(char const*, ...)


### V8 Build artifacts

What is actually build when you specify v8_monolithic: When this type is chosen the build cannot be a component build, there is an assert for this. In this case a static library build:

if (v8_monolithic) {
# A component build is not monolithic.
assert(!is_component_build)

# Using external startup data would produce separate files.
assert(!v8_use_external_startup_data)
v8_static_library("v8_monolith") {
deps = [
":v8",
":v8_libbase",
":v8_libplatform",
":v8_libsampler",
"//build/win:default_exe_manifest",
]

configs = [ ":internal_config" ]
}
}


Notice that the builtin function is called static_library so is a template that can be found in gni/v8.gni

v8_static_library: This will use source_set instead of creating a static library when compiling. When set to false, the object files that would be included in the linker command. The can speed up the build as the creation of the static libraries is skipped. But this does not really help when linking to v8 externally as from this project.

is_component_build: This will compile targets declared as components as shared libraries. All the v8_components in BUILD.gn will be built as .so files in the output director (not the obj directory which is the case for static libraries).

So the only two options are the v8_monolith or is_component_build where it might be an advantage of being able to build a single component and not have to rebuild the whole monolith at times.

### wee8

libwee8 can be produced which is a library which only supports WebAssembly and does not support JavaScript.

$ninja -C out/wee8 wee8  ### V8 Internal Isolate src/execution/isolate.h is where you can find the v8::internal::Isolate. class V8_EXPORT_PRIVATE Isolate final : private HiddenFactory {  And HiddenFactory is just to allow Isolate to inherit privately from Factory which can be found in src/heap/factory.h. ### Startup Walk through This section will walk through the start up on V8 by using the hello_world example in this project: $ LD_LIBRARY_PATH=../v8_src/v8/out/x64.release_gcc/ lldb ./hello-world
(lldb) br s -n main
Breakpoint 1: where = hello-worldmain + 25 at hello-world.cc:41:38, address = 0x0000000000402821

    V8::InitializeExternalStartupData(argv[0]);


This call will land in api.cc which will just delegate the call to and internal (internal namespace that is). If you try to step into this function you will just land on the next line in hello_world. This is because we compiled v8 without external start up data so this function will be empty:

$objdump -Cd out/x64.release_gcc/obj/v8_base_without_compiler/startup-data-util.o Disassembly of section .text._ZN2v88internal37InitializeExternalStartupDataFromFileEPKc: 0000000000000000 <v8::internal::InitializeExternalStartupDataFromFile(char const*)>: 0: c3 retq  Next, we have:  std::unique_ptr<Platform> platform = platform::NewDefaultPlatform();  This will land in src/libplatform/default-platform.cc which will create a new DefaultPlatform. Isolate* isolate = Isolate::New(create_params);  This will call Allocate: Isolate* isolate = Allocate();  Isolate* Isolate::Allocate() { return reinterpret_cast<Isolate*>(i::Isolate::New()); }  Remember that the internal Isolate can be found in src/execution/isolate.h. In src/execution/isolate.cc we find Isolate::New Isolate* Isolate::New(IsolateAllocationMode mode) { std::unique_ptr<IsolateAllocator> isolate_allocator = std::make_unique<IsolateAllocator>(mode); void* isolate_ptr = isolate_allocator->isolate_memory(); Isolate* isolate = new (isolate_ptr) Isolate(std::move(isolate_allocator));  So we first create an IsolateAllocator instance which will allocate memory for a single Isolate instance. This is then passed into the Isolate constructor, notice the usage of new here, this is just a normal heap allocation. The default new operator has been deleted and an override provided that takes a void pointer, which is just returned:  void* operator new(size_t, void* ptr) { return ptr; } void* operator new(size_t) = delete; void operator delete(void*) = delete;  In this case it just returns the memory allocateed by isolate-memory(). The reason for doing this is that using the new operator not only invokes the new operator but the compiler will also add a call the types constructor passing in the address of the allocated memory. Isolate::Isolate(std::unique_ptr<i::IsolateAllocator> isolate_allocator) : isolate_data_(this), isolate_allocator_(std::move(isolate_allocator)), id_(isolate_counter.fetch_add(1, std::memory_order_relaxed)), allocator_(FLAG_trace_zone_stats ? new VerboseAccountingAllocator(&heap_, 256 * KB) : new AccountingAllocator()), builtins_(this), rail_mode_(PERFORMANCE_ANIMATION), code_event_dispatcher_(new CodeEventDispatcher()), jitless_(FLAG_jitless), #if V8_SFI_HAS_UNIQUE_ID next_unique_sfi_id_(0), #endif cancelable_task_manager_(new CancelableTaskManager()) {  Notice that isolate_data_ will be populated by calling the constructor which takes an pointer to an Isolate. class IsolateData final { public: explicit IsolateData(Isolate* isolate) : stack_guard_(isolate) {}  Back in Isolate's constructor we have: #define ISOLATE_INIT_LIST(V) \ /* Assembler state. */ \ V(FatalErrorCallback, exception_behavior, nullptr) \ ... #define ISOLATE_INIT_EXECUTE(type, name, initial_value) \ name##_ = (initial_value); ISOLATE_INIT_LIST(ISOLATE_INIT_EXECUTE) #undef ISOLATE_INIT_EXECUTE  So lets expand the first entry to understand what is going on:  exception_behavior_ = (nullptr); oom_behavior_ = (nullptr); event_logger_ = (nullptr); allow_code_gen_callback_ = (nullptr); modify_code_gen_callback_ = (nullptr); allow_wasm_code_gen_callback_ = (nullptr); wasm_module_callback_ = (&NoExtension); wasm_instance_callback_ = (&NoExtension); wasm_streaming_callback_ = (nullptr); wasm_threads_enabled_callback_ = (nullptr); wasm_load_source_map_callback_ = (nullptr); relocatable_top_ = (nullptr); string_stream_debug_object_cache_ = (nullptr); string_stream_current_security_token_ = (Object()); api_external_references_ = (nullptr); external_reference_map_ = (nullptr); root_index_map_ = (nullptr); default_microtask_queue_ = (nullptr); turbo_statistics_ = (nullptr); code_tracer_ = (nullptr); per_isolate_assert_data_ = (0xFFFFFFFFu); promise_reject_callback_ = (nullptr); snapshot_blob_ = (nullptr); code_and_metadata_size_ = (0); bytecode_and_metadata_size_ = (0); external_script_source_size_ = (0); is_profiling_ = (false); num_cpu_profilers_ = (0); formatting_stack_trace_ = (false); debug_execution_mode_ = (DebugInfo::kBreakpoints); code_coverage_mode_ = (debug::CoverageMode::kBestEffort); type_profile_mode_ = (debug::TypeProfileMode::kNone); last_stack_frame_info_id_ = (0); last_console_context_id_ = (0); inspector_ = (nullptr); next_v8_call_is_safe_for_termination_ = (false); only_terminate_in_safe_scope_ = (false); detailed_source_positions_for_profiling_ = (FLAG_detailed_line_info); embedder_wrapper_type_index_ = (-1); embedder_wrapper_object_index_ = (-1);  So all of the entries in this list will become private members of the Isolate class after the preprocessor is finished. There will also be public assessor to get and set these initial values values (which is the last entry in the ISOLATE_INIT_LIST above. Back in isolate.cc constructor we have: #define ISOLATE_INIT_ARRAY_EXECUTE(type, name, length) \ memset(name##_, 0, sizeof(type) * length); ISOLATE_INIT_ARRAY_LIST(ISOLATE_INIT_ARRAY_EXECUTE) #undef ISOLATE_INIT_ARRAY_EXECUTE #define ISOLATE_INIT_ARRAY_LIST(V) \ /* SerializerDeserializer state. */ \ V(int32_t, jsregexp_static_offsets_vector, kJSRegexpStaticOffsetsVectorSize) \ ... InitializeDefaultEmbeddedBlob(); MicrotaskQueue::SetUpDefaultMicrotaskQueue(this);  After that we have created a new Isolate, we were in this function call:  Isolate* isolate = new (isolate_ptr) Isolate(std::move(isolate_allocator));  After this we will be back in api.cc:  Initialize(isolate, params);  void Isolate::Initialize(Isolate* isolate, const v8::Isolate::CreateParams& params) {  We are not using any external snapshot data so the following will be false:  if (params.snapshot_blob != nullptr) { i_isolate->set_snapshot_blob(params.snapshot_blob); } else { i_isolate->set_snapshot_blob(i::Snapshot::DefaultSnapshotBlob());  (gdb) p snapshot_blob_$7 = (const v8::StartupData *) 0x0
(gdb) n
(gdb) p i_isolate->snapshot_blob_
$8 = (const v8::StartupData *) 0x7ff92d7d6cf0 <v8::internal::blob>  snapshot_blob_ is also one of the members that was set up with ISOLATE_INIT_LIST. So we are setting up the Isolate instance for creation. Isolate::Scope isolate_scope(isolate); if (!i::Snapshot::Initialize(i_isolate)) {  In src/snapshot/snapshot-common.cc we find bool Snapshot::Initialize(Isolate* isolate) { ... const v8::StartupData* blob = isolate->snapshot_blob(); Vector<const byte> startup_data = ExtractStartupData(blob); Vector<const byte> read_only_data = ExtractReadOnlyData(blob); SnapshotData startup_snapshot_data(MaybeDecompress(startup_data)); SnapshotData read_only_snapshot_data(MaybeDecompress(read_only_data)); StartupDeserializer startup_deserializer(&startup_snapshot_data); ReadOnlyDeserializer read_only_deserializer(&read_only_snapshot_data); startup_deserializer.SetRehashability(ExtractRehashability(blob)); read_only_deserializer.SetRehashability(ExtractRehashability(blob)); bool success = isolate->InitWithSnapshot(&read_only_deserializer, &startup_deserializer);  So we get the blob and create deserializers for it which are then passed to isolate->InitWithSnapshot which delegated to Isolate::Init. The blob will have be create previously using mksnapshot (more on this can be found later). This will use a FOR_EACH_ISOLATE_ADDRESS_NAME macro to assign to the isolate_addresses_ field: isolate_addresses_[IsolateAddressId::kHandlerAddress] = reinterpret_cast<Address>(handler_address()); isolate_addresses_[IsolateAddressId::kCEntryFPAddress] = reinterpret_cast<Address>(c_entry_fp_address()); isolate_addresses_[IsolateAddressId::kCFunctionAddress] = reinterpret_cast<Address>(c_function_address()); isolate_addresses_[IsolateAddressId::kContextAddress] = reinterpret_cast<Address>(context_address()); isolate_addresses_[IsolateAddressId::kPendingExceptionAddress] = reinterpret_cast<Address>(pending_exception_address()); isolate_addresses_[IsolateAddressId::kPendingHandlerContextAddress] = reinterpret_cast<Address>(pending_handler_context_address()); isolate_addresses_[IsolateAddressId::kPendingHandlerEntrypointAddress] = reinterpret_cast<Address>(pending_handler_entrypoint_address()); isolate_addresses_[IsolateAddressId::kPendingHandlerConstantPoolAddress] = reinterpret_cast<Address>(pending_handler_constant_pool_address()); isolate_addresses_[IsolateAddressId::kPendingHandlerFPAddress] = reinterpret_cast<Address>(pending_handler_fp_address()); isolate_addresses_[IsolateAddressId::kPendingHandlerSPAddress] = reinterpret_cast<Address>(pending_handler_sp_address()); isolate_addresses_[IsolateAddressId::kExternalCaughtExceptionAddress] = reinterpret_cast<Address>(external_caught_exception_address()); isolate_addresses_[IsolateAddressId::kJSEntrySPAddress] = reinterpret_cast<Address>(js_entry_sp_address());  After this we have a number of members that are assigned to:  compilation_cache_ = new CompilationCache(this); descriptor_lookup_cache_ = new DescriptorLookupCache(); inner_pointer_to_code_cache_ = new InnerPointerToCodeCache(this); global_handles_ = new GlobalHandles(this); eternal_handles_ = new EternalHandles(); bootstrapper_ = new Bootstrapper(this); handle_scope_implementer_ = new HandleScopeImplementer(this); load_stub_cache_ = new StubCache(this); store_stub_cache_ = new StubCache(this); materialized_object_store_ = new MaterializedObjectStore(this); regexp_stack_ = new RegExpStack(); regexp_stack_->isolate_ = this; date_cache_ = new DateCache(); heap_profiler_ = new HeapProfiler(heap()); interpreter_ = new interpreter::Interpreter(this); compiler_dispatcher_ = new CompilerDispatcher(this, V8::GetCurrentPlatform(), FLAG_stack_size);  After this we have: isolate_data_.external_reference_table()->Init(this);  This will land in src/codegen/external-reference-table.cc where we have: void ExternalReferenceTable::Init(Isolate* isolate) { int index = 0; Add(kNullAddress, &index); AddReferences(isolate, &index); AddBuiltins(&index); AddRuntimeFunctions(&index); AddIsolateAddresses(isolate, &index); AddAccessors(&index); AddStubCache(isolate, &index); AddNativeCodeStatsCounters(isolate, &index); is_initialized_ = static_cast<uint32_t>(true); CHECK_EQ(kSize, index); } void ExternalReferenceTable::Add(Address address, int* index) { ref_addr_[(*index)++] = address; } Address ref_addr_[kSize];  Now, lets take a look at AddReferences: Add(ExternalReference::abort_with_reason().address(), index);  What are ExternalReferences? They represent c++ addresses used in generated code. After that we have AddBuiltins: static const Address c_builtins[] = { (reinterpret_cast<v8::internal::Address>(&Builtin_HandleApiCall)), ... Address Builtin_HandleApiCall(int argc, Address* args, Isolate* isolate);  I can see that the function declaration is in external-reference.h but the implementation is not there. Instead this is defined in src/builtins/builtins-api.cc: BUILTIN(HandleApiCall) { (will expand to:) V8_WARN_UNUSED_RESULT static Object Builtin_Impl_HandleApiCall( BuiltinArguments args, Isolate* isolate); V8_NOINLINE static Address Builtin_Impl_Stats_HandleApiCall( int args_length, Address* args_object, Isolate* isolate) { BuiltinArguments args(args_length, args_object); RuntimeCallTimerScope timer(isolate, RuntimeCallCounterId::kBuiltin_HandleApiCall); TRACE_EVENT0(TRACE_DISABLED_BY_DEFAULT("v8.runtime"), "V8.Builtin_HandleApiCall"); return CONVERT } V8_WARN_UNUSED_RESULT Address Builtin_HandleApiCall( int args_length, Address* args_object, Isolate* isolate) { DCHECK(isolate->context().is_null() || isolate->context().IsContext()); if (V8_UNLIKELY(TracingFlags::is_runtime_stats_enabled())) { return Builtin_Impl_Stats_HandleApiCall(args_length, args_object, isolate); } BuiltinArguments args(args_length, args_object); return CONVERT_OBJECT(Builtin_Impl_HandleApiCall(args, isolate)); } V8_WARN_UNUSED_RESULT static Object Builtin_Impl_HandleApiCall( BuiltinArguments args, Isolate* isolate) { HandleScope scope(isolate); Handle<JSFunction> function = args.target(); Handle<Object> receiver = args.receiver(); Handle<HeapObject> new_target = args.new_target(); Handle<FunctionTemplateInfo> fun_data(function->shared().get_api_func_data(), isolate); if (new_target->IsJSReceiver()) { RETURN_RESULT_OR_FAILURE( isolate, HandleApiCallHelper<true>(isolate, function, new_target, fun_data, receiver, args)); } else { RETURN_RESULT_OR_FAILURE( isolate, HandleApiCallHelper<false>(isolate, function, new_target, fun_data, receiver, args)); } }  The BUILTIN macro can be found in src/builtins/builtins-utils.h: #define BUILTIN(name) \ V8_WARN_UNUSED_RESULT static Object Builtin_Impl_##name( \ BuiltinArguments args, Isolate* isolate);   if (setup_delegate_ == nullptr) { setup_delegate_ = new SetupIsolateDelegate(create_heap_objects); } if (!setup_delegate_->SetupHeap(&heap_)) { V8::FatalProcessOutOfMemory(this, "heap object creation"); return false; }  This does nothing in the current code path and the code comment says that the heap will be deserialized from the snapshot and true will be returned. InitializeThreadLocal(); startup_deserializer->DeserializeInto(this);  DisallowHeapAllocation no_gc; isolate->heap()->IterateSmiRoots(this); isolate->heap()->IterateStrongRoots(this, VISIT_FOR_SERIALIZATION); Iterate(isolate, this); isolate->heap()->IterateWeakRoots(this, VISIT_FOR_SERIALIZATION); DeserializeDeferredObjects(); RestoreExternalReferenceRedirectors(accessor_infos()); RestoreExternalReferenceRedirectors(call_handler_infos());  In heap.cc we find IterateSmiRootswhich takes a pointer to aRootVistor. RootVisitor is used for visiting and modifying (optionally) the pointers contains in roots. This is used in garbage collection and also in serializing and deserializing snapshots. ### Roots RootVistor: class RootVisitor { public: virtual void VisitRootPointers(Root root, const char* description, FullObjectSlot start, FullObjectSlot end) = 0; virtual void VisitRootPointer(Root root, const char* description, FullObjectSlot p) { VisitRootPointers(root, description, p, p + 1); } static const char* RootName(Root root);  Root is an enum in src/object/visitors.h. This enum is generated by a macro and expands to: enum class Root { kStringTable, kExternalStringsTable, kReadOnlyRootList, kStrongRootList, kSmiRootList, kBootstrapper, kTop, kRelocatable, kDebug, kCompilationCache, kHandleScope, kBuiltins, kGlobalHandles, kEternalHandles, kThreadManager, kStrongRoots, kExtensions, kCodeFlusher, kPartialSnapshotCache, kReadOnlyObjectCache, kWeakCollections, kWrapperTracing, kUnknown, kNumberOfRoots };  These can be displayed using: $ ./test/roots_test --gtest_filter=RootsTest.visitor_roots


Just to keep things clear for myself here, these visitor roots are only used for GC and serialization/deserialization (at least I think so) and should not be confused with the RootIndex enum in src/roots/roots.h.

Lets set a break point in mksnapshot and see if we can find where one of the above Root enum elements is used to make it a little more clear what these are used for.

$lldb ../v8_src/v8/out/x64.debug/mksnapshot (lldb) target create "../v8_src/v8/out/x64.debug/mksnapshot" Current executable set to '../v8_src/v8/out/x64.debug/mksnapshot' (x86_64). (lldb) br s -n main Breakpoint 1: where = mksnapshotmain + 42, address = 0x00000000009303ca (lldb) r  What this does is that it creates an V8 environment (Platform, Isolate, Context) and then saves it to a file, either a binary file on disk but it can also save it to a .cc file that can be used in programs in which case the binary is a byte array. It does this in much the same way as the hello-world example create a platform and then initializes it, and the creates and initalizes a new Isolate. After the Isolate a new Context will be create using the Isolate. If there was an embedded-src flag passed to mksnaphot it will be run. StartupSerializer will use the Root enum elements for example and the deserializer will use the same enum elements. Adding a script to a snapshot: $ gdb ../v8_src/v8/out/x64.release_gcc/mksnapshot --embedded-src="$PWD/embed.js"  TODO: Look into CreateOffHeapTrampolines. So the VisitRootPointers function takes one of these Root's and visits all those roots. In our case the first Root to be visited is Heap::IterateSmiRoots: void Heap::IterateSmiRoots(RootVisitor* v) { ExecutionAccess access(isolate()); v->VisitRootPointers(Root::kSmiRootList, nullptr, roots_table().smi_roots_begin(), roots_table().smi_roots_end()); v->Synchronize(VisitorSynchronization::kSmiRootList); }  And here we can see that it is using Root::kSmiRootList, and passing nullptr for the description argument (I wonder what this is used for?). Next, comes the start and end arguments. (lldb) p roots_table().smi_roots_begin() (v8::internal::FullObjectSlot)$5 = {
v8::internal::SlotBase<v8::internal::FullObjectSlot, unsigned long, 8> = (ptr_ = 50680614097760)
}


We can list all the values of roots_table using:

(lldb) expr -A -- roots_table()


In src/snapshot/deserializer.cc we can find VisitRootPointers:

void Deserializer::VisitRootPointers(Root root, const char* description,
FullObjectSlot start, FullObjectSlot end)


Notice that description is never used. ReadDatais in the same source file:

The class SnapshotByteSource has a data member that is initialized upon construction from a const char* or a Vector. Where is this done?
This was done back in Snapshot::Initialize:

  const v8::StartupData* blob = isolate->snapshot_blob();
Vector<const byte> startup_data = ExtractStartupData(blob);
SnapshotData startup_snapshot_data(MaybeDecompress(startup_data));
StartupDeserializer startup_deserializer(&startup_snapshot_data);

(lldb) expr *this
(v8::internal::SnapshotByteSource) $30 = (data_ = "\x04", length_ = 125752, position_ = 1)  All the roots in a heap are declared in src/roots/roots.h. You can access the roots using RootsTable via the Isolate using isolate_data->roots() or by using isolate->roots_table. The roots_ field is an array of Address elements: class RootsTable { public: static constexpr size_t kEntriesCount = static_cast<size_t>(RootIndex::kRootListLength); ... private: Address roots_[kEntriesCount]; static const char* root_names_[kEntriesCount];  RootIndex is generated by a macro enum class RootIndex : uint16_t {  The complete enum can be displayed using: $ ./test/roots_test --gtest_filter=RootsTest.list_root_index


Lets take a look at an entry:

(lldb) p roots_[(uint16_t)RootIndex::kError_string]
(v8::internal::Address) $1 = 42318447256121  Now, there are functions in factory which can be used to retrieve these addresses, like factory->Error_string(): (lldb) expr *isolate->factory()->Error_string() (v8::internal::String)$9 = {
v8::internal::TorqueGeneratedString<v8::internal::String, v8::internal::Name> = {
v8::internal::Name = {
v8::internal::TorqueGeneratedName<v8::internal::Name, v8::internal::PrimitiveHeapObject> = {
v8::internal::PrimitiveHeapObject = {
v8::internal::TorqueGeneratedPrimitiveHeapObject<v8::internal::PrimitiveHeapObject, v8::internal::HeapObject> = {
v8::internal::HeapObject = {
v8::internal::Object = {
v8::internal::TaggedImpl<v8::internal::HeapObjectReferenceType::STRONG, unsigned long> = (ptr_ = 42318447256121)
}
}
}
}
}
}
}
}
(lldb) expr $9.length() (int32_t)$10 = 5
(lldb) expr $9.Print() #Error  These accessor functions declarations are generated by the ROOT_LIST(ROOT_ACCESSOR)) macros: #define ROOT_ACCESSOR(Type, name, CamelName) inline Handle<Type> name(); ROOT_LIST(ROOT_ACCESSOR) #undef ROOT_ACCESSOR  And the definitions can be found in src/heap/factory-inl.h and look like this The implementations then look like this: String ReadOnlyRoots::Error_string() const { return String::unchecked_cast(Object(at(RootIndex::kError_string))); } Handle<String> ReadOnlyRoots::Error_string_handle() const { return Handle<String>(&at(RootIndex::kError_string)); }  The unit test roots_test shows and example of this. This shows the usage of root entries but where are the roots added to this array. roots_ is a member of IsolateData in src/execution/isolate-data.h:  RootsTable roots_;  We can inspect the roots_ content by using the interal Isolate: (lldb) f frame #0: 0x00007ffff6261cdf libv8.sov8::Isolate::Initialize(isolate=0x00000eb900000000, params=0x00007fffffffd0d0) at api.cc:8269:31 8266 void Isolate::Initialize(Isolate* isolate, 8267 const v8::Isolate::CreateParams& params) { (lldb) expr i_isolate->isolate_data_.roots_ (v8::internal::RootsTable)$5 = {
roots_ = {
[0] = 0
[1] = 0
[2] = 0


So we can see that the roots are intially zero:ed out. And the type of roots_ is an array of Address's.

    frame #3: 0x00007ffff6c33d58 libv8.sov8::internal::Deserializer::VisitRootPointers(this=0x00007fffffffcce0, root=kReadOnlyRootList, description=0x0000000000000000, start=FullObjectSlot @ 0x00007fffffffc530, end=FullObjectSlot @ 0x00007fffffffc528) at deserializer.cc:94:11
frame #4: 0x00007ffff6b6212f libv8.sov8::internal::ReadOnlyRoots::Iterate(this=0x00007fffffffc5c8, visitor=0x00007fffffffcce0) at roots.cc:21:29
frame #5: 0x00007ffff6c46fee libv8.sov8::internal::ReadOnlyDeserializer::DeserializeInto(this=0x00007fffffffcce0, isolate=0x00000f7500000000) at read-only-deserializer.cc:41:18
frame #7: 0x00007ffff66af5de libv8.sov8::internal::ReadOnlyHeap::SetUp(isolate=0x00000f7500000000, des=0x00007fffffffcce0) at read-only-heap.cc:78:53


This will land us in roots.cc ReadOnlyRoots::Iterate(RootVisitor* visitor):

void ReadOnlyRoots::Iterate(RootVisitor* visitor) {
}


Deserializer::VisitRootPointers calls Deserializer::ReadData and the roots_ array is still zero:ed out when we enter this function.

void Deserializer::VisitRootPointers(Root root, const char* description,
FullObjectSlot start, FullObjectSlot end) {


Notice that we called VisitRootPointer and pased in Root:kReadOnlyRootList, nullptr (the description), and start and end addresses as FullObjectSlots. The signature of VisitRootPointers looks like this:

virtual void VisitRootPointers(Root root, const char* description,
FullObjectSlot start, FullObjectSlot end)


In our case we are using the address of read_only_roots_ from src/roots/roots.h and the end is found by using the static member of ReadOnlyRoots::kEntrysCount.

The switch statement in ReadData is generated by macros so lets take a look at an expanded snippet to understand what is going on:

template <typename TSlot>
SnapshotSpace source_space,
Isolate* const isolate = isolate_;
...
while (current < limit) {
byte data = source_.Get();


So current is the start address of the read_only_list and limit the end. source_ is a member of ReadOnlyDeserializer and is of type SnapshotByteSource.

source_ got populated back in Snapshot::Initialize(internal_isolate):

const v8::StartupData* blob = isolate->snapshot_blob();


And ReadOnlyDeserializer extends Deserialier (src/snapshot/deserializer.h) which has a constructor that sets the source_ member to data->Payload(). So source_ is will be pointer to an instance of SnapshotByteSource which can be found in src/snapshot-source-sink.h:

class SnapshotByteSource final {
public:
SnapshotByteSource(const char* data, int length)
: data_(reinterpret_cast<const byte*>(data)),
length_(length),
position_(0) {}

byte Get() {
return data_[position_++];
}
...
private:
const byte* data_;
int length_;
int posistion_;


Alright, so we are calling source_.Get() which we can see returns the current entry from the byte array data_ and increment the position. So with that in mind lets take closer look at the switch statment:

  while (current < limit) {
byte data = source_.Get();
switch (data) {
case kNewObject + static_cast<int>(SnapshotSpace::kNew):
break;
case kNewObject + static_cast<int>(SnapshotSpace::kOld):
[[clang::fallthrough]];
case kNewObject + static_cast<int>(SnapshotSpace::kCode):
[[clang::fallthrough]];
case kNewObject + static_cast<int>(SnapshotSpace::kMap):
[[clang::fallthrough]];
...


We can see that switch statement will assign the passed-in current with a new instance of ReadDataCase.

  current = ReadDataCase<TSlot, kNewObject, SnapshotSpace::kNew>(isolate,


Notice that kNewObject is the type of SerializerDeserliazer::Bytecode that is to be read (I think), this enum can be found in src/snapshot/serializer-common.h. TSlot I think stands for the "Type of Slot", which in our case is a FullMaybyObjectSlot.

  HeapObject heap_object;
if (bytecode == kNewObject) {


ReadObject is also in deserializer.cc :

Address address = allocator()->Allocate(space, size);
isolate_->heap()->OnAllocationEvent(obj, size);

Alright, lets set a watch point on the roots_ array to see when the first entry
is populated and try to figure this out that way:
console
(lldb) watch set variable  isolate->isolate_data_.roots_.roots_[0]
Watchpoint created: Watchpoint 5: addr = 0xf7500000080 size = 8 state = enabled type = w
watchpoint spec = 'isolate->isolate_data_.roots_.roots_[0]'
new value: 0
(lldb) r

Watchpoint 5 hit:
old value: 0
new value: 16995320070433
Process 1687448 stopped
* thread #1, name = 'hello-world', stop reason = watchpoint 5
frame #0: 0x00007ffff664e5b1 libv8.sov8::internal::FullMaybeObjectSlot::store(this=0x00007fffffffc3b0, value=MaybeObject @ 0x00007fffffffc370) const at slots-inl.h:74:1
71
72      void FullMaybeObjectSlot::store(MaybeObject value) const {
73        *location() = value.ptr();
-> 74      }
75


We can verify that location actually contains the address of roots_[0]:

(lldb) expr -f hex -- this->ptr_
(v8::internal::Address) $164 = 0x00000f7500000080 (lldb) expr -f hex -- &this->isolate_->isolate_data_.roots_.roots_[0] (v8::internal::Address *)$171 = 0x00000f7500000080

(lldb) expr -f hex -- value.ptr()
(unsigned long) $184 = 0x00000f7508040121 (lldb) expr -f hex -- isolate_->isolate_data_.roots_.roots_[0] (v8::internal::Address)$183 = 0x00000f7508040121


The first entry is free_space_map.

(lldb) expr v8::internal::Map::unchecked_cast(v8::internal::Object(value->ptr()))
(v8::internal::Map) $185 = { v8::internal::HeapObject = { v8::internal::Object = { v8::internal::TaggedImpl<v8::internal::HeapObjectReferenceType::STRONG, unsigned long> = (ptr_ = 16995320070433) } }  Next, we will go through the while loop again: (lldb) expr -f hex -- isolate_->isolate_data_.roots_.roots_[1] (v8::internal::Address)$191 = 0x0000000000000000
(lldb) expr -f hex -- &isolate_->isolate_data_.roots_.roots_[1]
(v8::internal::Address *) $192 = 0x00000f7500000088 (lldb) expr -f hex -- location() (v8::internal::SlotBase<v8::internal::FullMaybeObjectSlot, unsigned long, 8>::TData *)$194 = 0x00000f7500000088


Notice that in Deserializer::Write we have:

  dest.store(value);
return dest + 1;


And it's current value is:

(v8::internal::Address) $197 = 0x00000f7500000088  Which is the same address as roots_[1] that we just wrote to. If we know the type that an Address points to we can use the Type::cast(Object obj) to cast it into a pointer of that type. I think this works will all types. (lldb) expr -A -f hex -- v8::internal::Oddball::cast(v8::internal::Object(isolate_->isolate_data_.roots_.roots_[4])) (v8::internal::Oddball)$258 = {
v8::internal::TorqueGeneratedOddball<v8::internal::Oddball, v8::internal::PrimitiveHeapObject> = {
v8::internal::PrimitiveHeapObject = {
v8::internal::TorqueGeneratedPrimitiveHeapObject<v8::internal::PrimitiveHeapObject, v8::internal::HeapObject> = {
v8::internal::HeapObject = {
v8::internal::Object = {
v8::internal::TaggedImpl<v8::internal::HeapObjectReferenceType::STRONG, unsigned long> = (ptr_ = 0x00000f750804030d)
}
}
}
}
}
}


You can also just cast it to an object and try printing it:

(lldb) expr -A -f hex  -- v8::internal::Object(isolate_->isolate_data_.roots_.roots_[4]).Print()
#undefined


This is actually the Oddball UndefinedValue so it makes sense in this case I think. With this value in the roots_ array we can use the function ReadOnlyRoots::undefined_value():

(lldb) expr v8::internal::ReadOnlyRoots(&isolate_->heap_).undefined_value()
(v8::internal::Oddball) $265 = { v8::internal::TorqueGeneratedOddball<v8::internal::Oddball, v8::internal::PrimitiveHeapObject> = { v8::internal::PrimitiveHeapObject = { v8::internal::TorqueGeneratedPrimitiveHeapObject<v8::internal::PrimitiveHeapObject, v8::internal::HeapObject> = { v8::internal::HeapObject = { v8::internal::Object = { v8::internal::TaggedImpl<v8::internal::HeapObjectReferenceType::STRONG, unsigned long> = (ptr_ = 16995320070925) } } } } } }  So how are these roots used, take the above undefined_value for example? Well most things (perhaps all) that are needed go via the Factory which the internal Isolate is a type of. In factory we can find: Handle<Oddball> Factory::undefined_value() { return Handle<Oddball>(&isolate()->roots_table()[RootIndex::kUndefinedValue]); }  Notice that this is basically what we did in the debugger before but here it is wrapped in Handle so that it can be tracked by the GC. The unit test isolate_test explores the internal isolate and has example of usages of the above mentioned methods. InitwithSnapshot will call Isolate::Init: bool Isolate::Init(ReadOnlyDeserializer* read_only_deserializer, StartupDeserializer* startup_deserializer) { #define ASSIGN_ELEMENT(CamelName, hacker_name) \ isolate_addresses_[IsolateAddressId::k##CamelName##Address] = \ reinterpret_cast<Address>(hacker_name##_address()); FOR_EACH_ISOLATE_ADDRESS_NAME(ASSIGN_ELEMENT) #undef ASSIGN_ELEMENT   Address isolate_addresses_[kIsolateAddressCount + 1] = {};  (gdb) p isolate_addresses_$16 = {0 <repeats 13 times>}


Lets take a look at the expanded code in Isolate::Init:

$clang++ -I./out/x64.release/gen -I. -I./include -E src/execution/isolate.cc > output  isolate_addresses_[IsolateAddressId::kHandlerAddress] = reinterpret_cast<Address>(handler_address()); isolate_addresses_[IsolateAddressId::kCEntryFPAddress] = reinterpret_cast<Address>(c_entry_fp_address()); isolate_addresses_[IsolateAddressId::kCFunctionAddress] = reinterpret_cast<Address>(c_function_address()); isolate_addresses_[IsolateAddressId::kContextAddress] = reinterpret_cast<Address>(context_address()); isolate_addresses_[IsolateAddressId::kPendingExceptionAddress] = reinterpret_cast<Address>(pending_exception_address()); isolate_addresses_[IsolateAddressId::kPendingHandlerContextAddress] = reinterpret_cast<Address>(pending_handler_context_address()); isolate_addresses_[IsolateAddressId::kPendingHandlerEntrypointAddress] = reinterpret_cast<Address>(pending_handler_entrypoint_address()); isolate_addresses_[IsolateAddressId::kPendingHandlerConstantPoolAddress] = reinterpret_cast<Address>(pending_handler_constant_pool_address()); isolate_addresses_[IsolateAddressId::kPendingHandlerFPAddress] = reinterpret_cast<Address>(pending_handler_fp_address()); isolate_addresses_[IsolateAddressId::kPendingHandlerSPAddress] = reinterpret_cast<Address>(pending_handler_sp_address()); isolate_addresses_[IsolateAddressId::kExternalCaughtExceptionAddress] = reinterpret_cast<Address>(external_caught_exception_address()); isolate_addresses_[IsolateAddressId::kJSEntrySPAddress] = reinterpret_cast<Address>(js_entry_sp_address());  Then functions, like handler_address() are implemented as: inline Address* handler_address() { return &thread_local_top()->handler_; }  (gdb) x/x isolate_addresses_[0] 0x1a3500003240: 0x00000000  At this point in the program we have only set the entries to point contain the addresses specified in ThreadLocalTop, At the time there are initialized the will mostly be initialized to kNullAddress: static const Address kNullAddress = 0;  And notice that the functions above return pointers so later these pointers can be updated to point to something. What/when does this happen? Lets continue and find out... Back in Isolate::Init we have:  compilation_cache_ = new CompilationCache(this); descriptor_lookup_cache_ = new DescriptorLookupCache(); inner_pointer_to_code_cache_ = new InnerPointerToCodeCache(this); global_handles_ = new GlobalHandles(this); eternal_handles_ = new EternalHandles(); bootstrapper_ = new Bootstrapper(this); handle_scope_implementer_ = new HandleScopeImplementer(this); load_stub_cache_ = new StubCache(this); store_stub_cache_ = new StubCache(this); materialized_object_store_ = new MaterializedObjectStore(this); regexp_stack_ = new RegExpStack(); regexp_stack_->isolate_ = this; date_cache_ = new DateCache(); heap_profiler_ = new HeapProfiler(heap()); interpreter_ = new interpreter::Interpreter(this); compiler_dispatcher_ = new CompilerDispatcher(this, V8::GetCurrentPlatform(), FLAG_stack_size); // SetUp the object heap. DCHECK(!heap_.HasBeenSetUp()); heap_.SetUp(); ... InitializeThreadLocal();  Lets take a look at InitializeThreadLocal void Isolate::InitializeThreadLocal() { thread_local_top()->Initialize(this); clear_pending_exception(); clear_pending_message(); clear_scheduled_exception(); }  void Isolate::clear_pending_exception() { DCHECK(!thread_local_top()->pending_exception_.IsException(this)); thread_local_top()->pending_exception_ = ReadOnlyRoots(this).the_hole_value(); }  ReadOnlyRoots #define ROOT_ACCESSOR(Type, name, CamelName) \ V8_INLINE class Type name() const; \ V8_INLINE Handle<Type> name##_handle() const; READ_ONLY_ROOT_LIST(ROOT_ACCESSOR) #undef ROOT_ACCESSOR  This will expand to a number of function declarations that looks like this: $ clang++ -I./out/x64.release/gen -I. -I./include -E src/roots/roots.h > output

inline __attribute__((always_inline)) class Map free_space_map() const;
inline __attribute__((always_inline)) Handle<Map> free_space_map_handle() const;


The Map class is what all HeapObject use to describe their structure. Notice that there is also a Handle declared. These are generated by a macro in roots-inl.h:

Map ReadOnlyRoots::free_space_map() const {
((void) 0);
return Map::unchecked_cast(Object(at(RootIndex::kFreeSpaceMap)));
}

((void) 0);
return Handle<Map>(&at(RootIndex::kFreeSpaceMap));
}


Notice that this is using the RootIndex enum that was mentioned earlier:

  return Map::unchecked_cast(Object(at(RootIndex::kFreeSpaceMap)));


In object/map.h there is the following line:

  DECL_CAST(Map)


Which can be found in objects/object-macros.h:

#define DECL_CAST(Type)                                 \
V8_INLINE static Type cast(Object object);            \
V8_INLINE static Type unchecked_cast(Object object) { \
return bit_cast<Type>(object);                      \
}


This will expand to something like

  static Map cast(Object object);
static Map unchecked_cast(Object object) {
return bit_cast<Map>(object);
}


And the Object part is the Object contructor that takes an Address:

  explicit constexpr Object(Address ptr) : TaggedImpl(ptr) {}


That leaves the at function which is a private function in ReadOnlyRoots:

  V8_INLINE Address& at(RootIndex root_index) const;


So we are now back in Isolate::Init after the call to InitializeThreadLocal we have:

setup_delegate_->SetupBuiltins(this);


In the following line in api.cc, where does i::OBJECT_TEMPLATE_INFO_TYPE come from:

  i::Handle<i::Struct> struct_obj = isolate->factory()->NewStruct(
i::OBJECT_TEMPLATE_INFO_TYPE, i::AllocationType::kOld);


### InstanceType

The enum InstanceType is defined in src/objects/instance-type.h:

#include "torque-generated/instance-types-tq.h"

enum InstanceType : uint16_t {
...
#define MAKE_TORQUE_INSTANCE_TYPE(TYPE, value) TYPE = value,
TORQUE_ASSIGNED_INSTANCE_TYPES(MAKE_TORQUE_INSTANCE_TYPE)
#undef MAKE_TORQUE_INSTANCE_TYPE
...
};


And in gen/torque-generated/instance-types-tq.h we can find:

#define TORQUE_ASSIGNED_INSTANCE_TYPES(V) \
...
V(OBJECT_TEMPLATE_INFO_TYPE, 79) \
...


There is list in src/objects/objects-definitions.h:

#define STRUCT_LIST_GENERATOR_BASE(V, _)                                      \
...
V(_, OBJECT_TEMPLATE_INFO_TYPE, ObjectTemplateInfo, object_template_info)   \
...

template <typename Impl>
Handle<Struct> FactoryBase<Impl>::NewStruct(InstanceType type,
AllocationType allocation) {


If we look in Map::GetInstanceTypeMap in map.cc we find:

  Map map;
switch (type) {
#define MAKE_CASE(TYPE, Name, name) \
case TYPE:                        \
map = roots.name##_map();       \
break;
STRUCT_LIST(MAKE_CASE)
#undef MAKE_CASE


Now, we know that our type is:

(gdb) p type
$1 = v8::internal::OBJECT_TEMPLATE_INFO_TYPE   map = roots.object_template_info_map(); \  And we can inspect the output of the preprocessor of roots.cc and find: Map ReadOnlyRoots::object_template_info_map() const { ((void) 0); return Map::unchecked_cast(Object(at(RootIndex::kObjectTemplateInfoMap))); }  And this is something we have seen before. One things I ran into was wanting to print the InstanceType using the overloaded << operator which is defined for the InstanceType in objects.cc. std::ostream& operator<<(std::ostream& os, InstanceType instance_type) { switch (instance_type) { #define WRITE_TYPE(TYPE) \ case TYPE: \ return os << #TYPE; INSTANCE_TYPE_LIST(WRITE_TYPE) #undef WRITE_TYPE } UNREACHABLE(); }  The code I'm using is the followig:  i::InstanceType type = map.instance_type(); std::cout << "object_template_info_map type: " << type << '\n';  This will cause the UNREACHABLE() function to be called and a Fatal error thrown. But note that the following line works:  std::cout << "object_template_info_map type: " << v8::internal::OBJECT_TEMPLATE_INFO_TYPE << '\n';  And prints object_template_info_map type: OBJECT_TEMPLATE_INFO_TYPE  In the switch/case block above the case for this value is:  case OBJECT_TEMPLATE_INFO_TYPE: return os << "OBJECT_TEMPLATE_INFO_TYPE"  When map.instance_type() is called, it returns a value of 1023 but the value of OBJECT_TEMPLATE_INFO_TYPE is: OBJECT_TEMPLATE_INFO_TYPE = 79  And we can confirm this using:  std::cout << "object_template_info_map type: " << static_cast<uint16_t>(v8::internal::OBJECT_TEMPLATE_INFO_TYPE) << '\n';  Which will print: object_template_info_map type: 79  ### IsolateData ### Context creation When we create a new context using:  Local<ObjectTemplate> global = ObjectTemplate::New(isolate_); Local<Context> context = Context::New(isolate_, nullptr, global);  The Context class in include/v8.h declares New as follows: static Local<Context> New(Isolate* isolate, ExtensionConfiguration* extensions = nullptr, MaybeLocal<ObjectTemplate> global_template = MaybeLocal<ObjectTemplate>(), MaybeLocal<Value> global_object = MaybeLocal<Value>(), DeserializeInternalFieldsCallback internal_fields_deserializer = DeserializeInternalFieldsCallback(), MicrotaskQueue* microtask_queue = nullptr);  When a step into Context::New(isolate_, nullptr, global) this will first break in the constructor of DeserializeInternalFieldsCallback in v8.h which has default values for the callback function and data_args (both are nullptr). After that gdb will break in MaybeLocal and setting val_ to nullptr. Next it will break in Local::operator* for the value of global which is then passed to the MaybeLocalv8::ObjectTemplate constructor. After those break points the break point will be in api.cc and v8::Context::New. New will call NewContext in api.cc. There will be some checks and logging/tracing and then a call to CreateEnvironment: i::Handle<i::Context> env = CreateEnvironment<i::Context>( isolate, extensions, global_template, global_object, context_snapshot_index, embedder_fields_deserializer, microtask_queue);  The first line in CreateEnironment is: ENTER_V8_FOR_NEW_CONTEXT(isolate);  Which is a macro defined in api.cc i::VMState<v8::OTHER> __state__((isolate)); \ i::DisallowExceptions __no_exceptions__((isolate))  So the first break point we break on will be the execution/vm-state-inl.h and VMState's constructor: template <StateTag Tag> VMState<Tag>::VMState(Isolate* isolate) : isolate_(isolate), previous_tag_(isolate->current_vm_state()) { isolate_->set_current_vm_state(Tag); }  In gdb you'll see this: (gdb) s v8::internal::VMState<(v8::StateTag)5>::VMState (isolate=0x372500000000, this=<synthetic pointer>) at ../../src/api/api.cc:6005 6005 context_snapshot_index, embedder_fields_deserializer, microtask_queue); (gdb) s v8::internal::Isolate::current_vm_state (this=0x372500000000) at ../../src/execution/isolate.h:1072 1072 THREAD_LOCAL_TOP_ACCESSOR(StateTag, current_vm_state)  Notice that VMState's constructor sets its previous_tag_ to isolate->current_vm_state() which is generated by the macro THREAD_LOCAL_TOP_ACCESSOR. The next break point will be: #0 v8::internal::PerIsolateAssertScopeDebugOnly<(v8::internal::PerIsolateAssertType)5, false>::PerIsolateAssertScopeDebugOnly ( isolate=0x372500000000, this=0x7ffc7b51b500) at ../../src/common/assert-scope.h:107 107 explicit PerIsolateAssertScopeDebugOnly(Isolate* isolate)  We can find that DisallowExceptions is defined in src/common/assert-scope.h as: using DisallowExceptions = PerIsolateAssertScopeDebugOnly<NO_EXCEPTION_ASSERT, false>;  After all that we can start to look at the code in CreateEnvironment.  // Create the environment. InvokeBootstrapper<ObjectType> invoke; result = invoke.Invoke(isolate, maybe_proxy, proxy_template, extensions, context_snapshot_index, embedder_fields_deserializer, microtask_queue); template <typename ObjectType> struct InvokeBootstrapper; template <> struct InvokeBootstrapper<i::Context> { i::Handle<i::Context> Invoke( i::Isolate* isolate, i::MaybeHandle<i::JSGlobalProxy> maybe_global_proxy, v8::Local<v8::ObjectTemplate> global_proxy_template, v8::ExtensionConfiguration* extensions, size_t context_snapshot_index, v8::DeserializeInternalFieldsCallback embedder_fields_deserializer, v8::MicrotaskQueue* microtask_queue) { return isolate->bootstrapper()->CreateEnvironment( maybe_global_proxy, global_proxy_template, extensions, context_snapshot_index, embedder_fields_deserializer, microtask_queue); } };  Bootstrapper can be found in src/init/bootstrapper.cc: HandleScope scope(isolate_); Handle<Context> env; { Genesis genesis(isolate_, maybe_global_proxy, global_proxy_template, context_snapshot_index, embedder_fields_deserializer, microtask_queue); env = genesis.result(); if (env.is_null() || !InstallExtensions(env, extensions)) { return Handle<Context>(); } }  Notice that the break point will be in the HandleScope constructor. Then a new instance of Genesis is created which performs some actions in its constructor. global_proxy = isolate->factory()->NewUninitializedJSGlobalProxy(instance_size);  This will land in factory.cc: Handle<Map> map = NewMap(JS_GLOBAL_PROXY_TYPE, size);  size will be 16 in this case. NewMap is declared in factory.h which has default values for its parameters:  Handle<Map> NewMap(InstanceType type, int instance_size, ElementsKind elements_kind = TERMINAL_FAST_ELEMENTS_KIND, int inobject_properties = 0);  In Factory::InitializeMap we have the following check: DCHECK_EQ(map.GetInObjectProperties(), inobject_properties);  Remember that I called Context::New with the following arguments:  Local<ObjectTemplate> global = ObjectTemplate::New(isolate_); Local<Context> context = Context::New(isolate_, nullptr, global);  ### VMState ### TaggedImpl Has a single private member which is declared as: StorageType ptr_;  An instance can be created using:  i::TaggedImpl<i::HeapObjectReferenceType::STRONG, i::Address> tagged{};  Storage type can also be Tagged_t which is defined in globals.h:  using Tagged_t = uint32_t;  It looks like it can be a different value when using pointer compression. ### Object (internal) This class extends TaggedImpl: class Object : public TaggedImpl<HeapObjectReferenceType::STRONG, Address> {  An Object can be created using the default constructor, or by passing in an Address which will delegate to TaggedImpl constructors. Object itself does not have any members (apart from ptr_ which is inherited from TaggedImpl that is). So if we create an Object on the stack this is like a pointer/reference to an object: +------+ |Object| |------| |ptr_ |----> +------+  Now, ptr_ is a TaggedImpl so it would be a Smi in which case it would just contains the value directly, for example a small integer: +------+ |Object| |------| | 18 | +------+  ### Handle A Handle is similar to a Object and ObjectSlot in that it also contains an Address member (called location_ and declared in HandleBase), but with the difference is that Handles can be relocated by the garbage collector. ### HeapObject ### NewContext When we create a new context using: const v8::Local<v8::ObjectTemplate> obt = v8::Local<v8::ObjectTemplate>(); v8::Handle<v8::Context> context = v8::Context::New(isolate_, nullptr, obt);  The above is using the static function New declared in include/v8.h static Local<Context> New( Isolate* isolate, ExtensionConfiguration* extensions = nullptr, MaybeLocal<ObjectTemplate> global_template = MaybeLocal<ObjectTemplate>(), MaybeLocal<Value> global_object = MaybeLocal<Value>(), DeserializeInternalFieldsCallback internal_fields_deserializer = DeserializeInternalFieldsCallback(), MicrotaskQueue* microtask_queue = nullptr);  The implementation for this function can be found in src/api/api.cc How does a Local become a MaybeLocal in this above case? This is because MaybeLocal has a constructor that takes a Local<S> and this will be casted into the val_ member of the MaybeLocal instance. ### Genesis TODO ### What is the difference between a Local and a Handle? Currently, the torque generator will generate Print functions that look like the following: template <> void TorqueGeneratedEnumCache<EnumCache, Struct>::EnumCachePrint(std::ostream& os) { this->PrintHeader(os, "TorqueGeneratedEnumCache"); os << "\n - keys: " << Brief(this->keys()); os << "\n - indices: " << Brief(this->indices()); os << "\n"; }  Notice the last line where the newline character is printed as a string. This would just be a char instead '\n'. There are a number of things that need to happen only once upon startup for each process. These things are placed in V8::InitializeOncePerProcessImpl which can be found in src/init/v8.cc. This is called by v8::V8::Initialize().  CpuFeatures::Probe(false); ElementsAccessor::InitializeOncePerProcess(); Bootstrapper::InitializeOncePerProcess(); CallDescriptors::InitializeOncePerProcess(); wasm::WasmEngine::InitializeOncePerProcess();  ElementsAccessor populates the accessor_array with Elements listed in ELEMENTS_LIST. TODO: take a closer look at Elements. v8::Isolate::Initialize will set up the heap. i_isolate->heap()->ConfigureHeap(params.constraints);  It is when we create an new Context that Genesis is created. This will call Snapshot::NewContextFromSnapshot. So the context is read from the StartupData* blob with ExtractContextData(blob). What is the global proxy? ### Builtins runtime error Builtins is a member of Isolate and an instance is created by the Isolate constructor. We can inspect the value of initialized_ and that it is false: (gdb) p *this->builtins()$3 = {static kNoBuiltinId = -1, static kFirstWideBytecodeHandler = 1248, static kFirstExtraWideBytecodeHandler = 1398,
static kLastBytecodeHandlerPlusOne = 1548, static kAllBuiltinsAreIsolateIndependent = true, isolate_ = 0x0, initialized_ = false,
js_entry_handler_offset_ = 0}


The above is printed form Isolate's constructor and it is not changes in the contructor.

This is very strange, while I though that the initialized_ was being updated it now looks like there might be two instances, one with has this value as false and the other as true. And also one has a nullptr as the isolate and the other as an actual value. For example, when I run the hello-world example:

$4 = (v8::internal::Builtins *) 0x33b20000a248 (gdb) p &builtins_$5 = (v8::internal::Builtins *) 0x33b20000a248


Notice that these are poiting to the same location in memory.

(gdb) p &builtins_
$1 = (v8::internal::Builtins *) 0x25210000a248 (gdb) p builtins()$2 = (v8::internal::Builtins *) 0x25210000a228


Alright, so after looking into this closer I noticed that I was including internal headers in the test itself. When I include src/builtins/builtins.h I will get an implementation of isolate->builtins() in the object file which is in the shared library libv8.so, but the field is part of object file that is part of the cctest. This will be a different method and not the method that is in libv8_v8.so shared library.

As I'm only interested in exploring v8 internals and my goal is only for each unit test to verify my understanding I've statically linked those object files needed, like builtins.o and code.o to the test.

 Fatal error in ../../src/snapshot/read-only-deserializer.cc, line 35
# Debug check failed: !isolate->builtins()->is_initialized().
#
#
#
#FailureMessage Object: 0x7ffed92ceb20
==== C stack trace ===============================

/home/danielbevenius/work/google/v8_src/v8/out/x64.release_gcc/libv8_libbase.so(V8_Fatal(char const*, int, char const*, ...)+0x172) [0x7fabe6c2416d]
/home/danielbevenius/work/google/v8_src/v8/out/x64.release_gcc/libv8_libbase.so(V8_Dcheck(char const*, int, char const*)+0x2d) [0x7fabe6c241b1]
./test/builtins_test() [0x4135a2]
./test/builtins_test() [0x43a1b7]
./test/builtins_test() [0x434c99]
./test/builtins_test() [0x41a3a7]
./test/builtins_test() [0x41aafb]
./test/builtins_test() [0x41b085]
./test/builtins_test() [0x4238e0]
./test/builtins_test() [0x43b1aa]
./test/builtins_test() [0x435773]
./test/builtins_test() [0x422836]
./test/builtins_test() [0x412ea4]
./test/builtins_test() [0x412e3d]
/lib64/libc.so.6(__libc_start_main+0xf3) [0x7fabe66b31a3]
./test/builtins_test() [0x412d5e]
Illegal instruction (core dumped)


The issue here is that I'm including the header in the test, which means that code will be in the object code of the test, while the implementation part will be in the linked dynamic library which is why these are pointing to different areas in memory. The one retreived by the function call will use the

### Goma

I've goma referenced in a number of places so just makeing a note of what it is here: Goma is googles internal distributed compile service.

### WebAssembly

This section is going to take a closer look at how wasm works in V8.

We can use a wasm module like this:

  const buffer = fixtures.readSync('add.wasm');
const module = new WebAssembly.Module(buffer);
const instance = new WebAssembly.Instance(module);


Where is the WebAssembly object setup? We have sen previously that objects and function are added in src/init/bootstrapper.cc and for Wasm there is a function named Genisis::InstallSpecialObjects which calls:

  WasmJs::Install(isolate, true);


This call will land in src/wasm/wasm-js.cc where we can find:

void WasmJs::Install(Isolate* isolate, bool exposed_on_global_object) {
...
Handle<String> name = v8_str(isolate, "WebAssembly")
...
NewFunctionArgs args = NewFunctionArgs::ForFunctionWithoutCode(
name, isolate->strict_function_map(), LanguageMode::kStrict);
Handle<JSFunction> cons = factory->NewFunction(args);
JSFunction::SetPrototype(cons, isolate->initial_object_prototype());
Handle<JSObject> webassembly =
factory->NewJSObject(cons, AllocationType::kOld);
name, ro_attributes);

InstallFunc(isolate, webassembly, "compile", WebAssemblyCompile, 1);
InstallFunc(isolate, webassembly, "validate", WebAssemblyValidate, 1);
InstallFunc(isolate, webassembly, "instantiate", WebAssemblyInstantiate, 1);
...
Handle<JSFunction> module_constructor =
InstallConstructorFunc(isolate, webassembly, "Module", WebAssemblyModule);
...
}


And all the rest of the functions that are available on the WebAssembly object are setup in the same function.

(lldb) br s -name Genesis::InstallSpecialObjects


Now, lets also set a break point in WebAssemblyModule:

(lldb) br s -n WebAssemblyModule
(lldb) r

  v8::Isolate* isolate = args.GetIsolate();
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate);
if (i_isolate->wasm_module_callback()(args)) return;


Notice the wasm_module_callback() function which is a function that is setup on the internal Isolate in src/execution/isolate.h:

#define ISOLATE_INIT_LIST(V)                                                   \
...
V(ExtensionCallback, wasm_module_callback, &NoExtension)                     \
V(ExtensionCallback, wasm_instance_callback, &NoExtension)                   \
V(WasmStreamingCallback, wasm_streaming_callback, nullptr)                   \

#define GLOBAL_ACCESSOR(type, name, initialvalue)                \
inline type name() const {                                     \
DCHECK(OFFSET_OF(Isolate, name##_) == name##_debug_offset_); \
return name##_;                                              \
}                                                              \
inline void set_##name(type value) {                           \
DCHECK(OFFSET_OF(Isolate, name##_) == name##_debug_offset_); \
name##_ = value;                                             \
}
ISOLATE_INIT_LIST(GLOBAL_ACCESSOR)
#undef GLOBAL_ACCESSOR


So this would be expanded by the preprocessor into:

inline ExtensionCallback wasm_module_callback() const {
((void) 0);
return wasm_module_callback_;
}
inline void set_wasm_module_callback(ExtensionCallback value) {
((void) 0);
wasm_module_callback_ = value;
}


Also notice that if wasm_module_callback() return true the WebAssemblyModule fuction will return and no further processing of the instructions in that function will be done. NoExtension is a function that looks like this:

bool NoExtension(const v8::FunctionCallbackInfo<v8::Value>&) { return false; }


And is set as the default function for module/instance callbacks.

Looking a little further we can see checks for WASM Threads support (TODO: take a look at this). And then we have:

  module_obj = i_isolate->wasm_engine()->SyncCompile(
i_isolate, enabled_features, &thrower, bytes);


SyncCompile can be found in src/wasm/wasm-engine.cc and will call DecodeWasmModule which can be found in src/wasm/module-decoder.cc.

ModuleResult result = DecodeWasmModule(enabled, bytes.start(), bytes.end(),
false, kWasmOrigin,
isolate->counters(), allocator());

ModuleResult DecodeWasmModule(const WasmFeatures& enabled,
const byte* module_start, const byte* module_end,
bool verify_functions, ModuleOrigin origin,
Counters* counters,
AccountingAllocator* allocator) {
...
ModuleDecoderImpl decoder(enabled, module_start, module_end, origin);
return decoder.DecodeModule(counters, allocator, verify_functions);


  uint32_t magic_word = consume_u32("wasm magic");


This will land in src/wasm/decoder.h consume_little_endian(name):




A wasm module has the following preamble:

magic nr: 0x6d736100
version: 0x1


These can be found as a constant in src/wasm/wasm-constants.h:

constexpr uint32_t kWasmMagic = 0x6d736100;
constexpr uint32_t kWasmVersion = 0x01;


After the DecodeModuleHeader the code will iterate of the sections (type, import, function, table, memory, global, export, start, element, code, data, custom). For each section DecodeSection will be called:

DecodeSection(section_iter.section_code(), section_iter.payload(),
offset, verify_functions);


There is an enum named SectionCode in src/wasm/wasm-constants.h which contains the various sections which is used in switch statement in DecodeSection . Depending on the section_code there are DecodeSection methods that will be called. In our case section_code is:

(lldb) expr section_code
(v8::internal::wasm::SectionCode) \$5 = kTypeSectionCode


And this will match the kTypeSectionCode and DecodeTypeSection will be called.

ValueType can be found in src/wasm/value-type.h and there are types for each of the currently supported types:

constexpr ValueType kWasmI32 = ValueType(ValueType::kI32);
constexpr ValueType kWasmI64 = ValueType(ValueType::kI64);
constexpr ValueType kWasmF32 = ValueType(ValueType::kF32);
constexpr ValueType kWasmF64 = ValueType(ValueType::kF64);
constexpr ValueType kWasmAnyRef = ValueType(ValueType::kAnyRef);
constexpr ValueType kWasmExnRef = ValueType(ValueType::kExnRef);
constexpr ValueType kWasmFuncRef = ValueType(ValueType::kFuncRef);
constexpr ValueType kWasmNullRef = ValueType(ValueType::kNullRef);
constexpr ValueType kWasmS128 = ValueType(ValueType::kS128);
constexpr ValueType kWasmStmt = ValueType(ValueType::kStmt);
constexpr ValueType kWasmBottom = ValueType(ValueType::kBottom);


FunctionSig is declared with a using statement in value-type.h:

using FunctionSig = Signature<ValueType>;


We can find Signature in src/codegen/signature.h:

template <typename T>
class Signature : public ZoneObject {
public:
constexpr Signature(size_t return_count, size_t parameter_count,
const T* reps)
: return_count_(return_count),
parameter_count_(parameter_count),
reps_(reps) {}


The return count can be zero, one (or greater if multi-value return types are enabled). The parameter count also makes sense, but reps is not clear to me what that represents.

(lldb) fr v
(v8::internal::Signature<v8::internal::wasm::ValueType> *) this = 0x0000555555583950
(size_t) return_count = 1
(size_t) parameter_count = 2
(const v8::internal::wasm::ValueType *) reps = 0x0000555555583948


Before the call to Signatures construtor we have:

    // FunctionSig stores the return types first.
ValueType* buffer = zone->NewArray<ValueType>(param_count + return_count);
uint32_t b = 0;
for (uint32_t i = 0; i < return_count; ++i) buffer[b++] = returns[i];
for (uint32_t i = 0; i < param_count; ++i) buffer[b++] = params[i];

return new (zone) FunctionSig(return_count, param_count, buffer);


So reps_ contains the return (re?) and the params (ps?).

After the DecodeWasmModule has returned in SyncCompile we will have a ModuleResult. This will be compiled to NativeModule:

ModuleResult result =
DecodeWasmModule(enabled, bytes.start(), bytes.end(), false, kWasmOrigin,
isolate->counters(), allocator());
Handle<FixedArray> export_wrappers;
std::shared_ptr<NativeModule> native_module =
CompileToNativeModule(isolate, enabled, thrower,
std::move(result).value(), bytes, &export_wrappers);


CompileToNativeModule can be found in module-compiler.cc

TODO: CompileNativeModule...

There is an example in wasm_test.cc.

### ExtensionCallback

Is a typedef defined in include/v8.h:

typedef bool (*ExtensionCallback)(const FunctionCallbackInfo<Value>&);


### JSEntry

TODO: This section should describe the functions calls below.

 * frame #0: 0x00007ffff79a52e4 libv8.sov8::(anonymous namespace)::WebAssemblyModule(v8::FunctionCallbackInfo<v8::Value> const&) [inlined] v8::FunctionCallbackInfo<v8::Value>::GetIsolate(this=0x00007fffffffc9a0) const at v8.h:11204:40
frame #1: 0x00007ffff79a52e4 libv8.sov8::(anonymous namespace)::WebAssemblyModule(args=0x00007fffffffc9a0) at wasm-js.cc:638
frame #2: 0x00007ffff6fe9e92 libv8.sov8::internal::FunctionCallbackArguments::Call(this=0x00007fffffffca40, handler=CallHandlerInfo @ 0x00007fffffffc998) at api-arguments-inl.h:158:3
frame #3: 0x00007ffff6fe7c42 libv8.sov8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<true>(isolate=<unavailable>, function=Handle<v8::internal::HeapObject> @ 0x00007fffffffca20, new_target=<unavailable>, fun_data=<unavailable>, receiver=<unavailable>, args=BuiltinArguments @ 0x00007fffffffcae0) at builtins-api.cc:111:36
frame #4: 0x00007ffff6fe67d4 libv8.sov8::internal::Builtin_Impl_HandleApiCall(args=BuiltinArguments @ 0x00007fffffffcb20, isolate=0x00000f8700000000) at builtins-api.cc:137:5
frame #5: 0x00007ffff6fe6319 libv8.sov8::internal::Builtin_HandleApiCall(args_length=6, args_object=0x00007fffffffcc10, isolate=0x00000f8700000000) at builtins-api.cc:129:1
frame #6: 0x00007ffff6b2c23f libv8.soBuiltins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit + 63
frame #7: 0x00007ffff68fde25 libv8.soBuiltins_JSBuiltinsConstructStub + 101
frame #8: 0x00007ffff6daf46d libv8.soBuiltins_ConstructHandler + 1485
frame #9: 0x00007ffff690e1d5 libv8.soBuiltins_InterpreterEntryTrampoline + 213
frame #10: 0x00007ffff6904b5a libv8.soBuiltins_JSEntryTrampoline + 90
frame #11: 0x00007ffff6904938 libv8.soBuiltins_JSEntry + 120
frame #12: 0x00007ffff716ba0c libv8.sov8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [inlined] v8::internal::GeneratedCode<unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long**>::Call(this=<unavailable>, args=17072495001600, args=<unavailable>, args=17072631376141, args=17072630006049, args=<unavailable>, args=<unavailable>) at simulator.h:142:12
frame #13: 0x00007ffff716ba01 libv8.sov8::internal::(anonymous namespace)::Invoke(isolate=<unavailable>, params=0x00007fffffffcf50)::InvokeParams const&) at execution.cc:367
frame #14: 0x00007ffff716aa10 libv8.sov8::internal::Execution::Call(isolate=0x00000f8700000000, callable=<unavailable>, receiver=<unavailable>, argc=<unavailable>, argv=<unavailable>) at execution.cc:461:10


### CustomArguments

Subclasses of CustomArguments, like PropertyCallbackArguments and FunctionCallabackArguments are used for setting up and accessing values on the stack, and also the subclasses provide methods to call various things like CallNamedSetter for PropertyCallbackArguments and Call for FunctionCallbackArguments.

#### FunctionCallbackArguments

class FunctionCallbackArguments
: public CustomArguments<FunctionCallbackInfo<Value> > {
FunctionCallbackArguments(internal::Isolate* isolate, internal::Object data,
internal::HeapObject callee,
internal::Object holder,
internal::HeapObject new_target,


This class is in the namespace v8::internal so I'm curious why the explicit namespace is used here?

#### BuiltinArguments

This class extends JavaScriptArguments

class BuiltinArguments : public JavaScriptArguments {
public:
: Arguments(length, arguments) {

static constexpr int kNewTargetOffset = 0;
static constexpr int kTargetOffset = 1;
static constexpr int kArgcOffset = 2;
static constexpr int kPaddingOffset = 3;

static constexpr int kNumExtraArgs = 4;
static constexpr int kNumExtraArgsWithReceiver = 5;


JavaScriptArguments is declared in src/common/global.h:

using JavaScriptArguments = Arguments<ArgumentsType::kJS>;


Arguments can be found in src/execution/arguments.hand is templated with the a type of ArgumentsType (in src/common/globals.h):

enum class ArgumentsType {
kRuntime,
kJS,
};


An instance of Arguments only has a length which is the number of arguments, and an Address pointer which points to the first argument. The functions it provides allows for getting/setting specific arguments and handling various types (like Handle<S>, smi, etc). It also overloads the operator[] allowing to specify an index and getting back an Object to that argument. In BuiltinArguments the constants specify the index's and provides functions to get them:

  inline Handle<Object> receiver() const;
inline Handle<JSFunction> target() const;
inline Handle<HeapObject> new_target() const;


### NativeContext

Can be found in src/objects/contexts.h and has the following definition:

class NativeContext : public Context {
public:

inline OSROptimizedCodeCache GetOSROptimizedCodeCache();
void ResetErrorsThrown();
void IncrementErrorsThrown();
int GetErrorsThrown();


src/parsing/parser.h we can find:

class V8_EXPORT_PRIVATE Parser : public NON_EXPORTED_BASE(ParserBase<Parser>) {
...
enum CompletionKind {
kNormalCompletion,
kThrowCompletion,
kAbruptCompletion
};
`

But I can't find any usages of this enum?

#### Internal fields/methods

When you see something like [[Notation]] you can think of this as a field in an object that is not exposed to JavaScript user code but internal to the JavaScript engine. These can also be used for internal methods.

Author: Danbev
Source Code: https://github.com/danbev/learning-v8