1661942160

# Geodesy.jl: Work with Points Defined in Various Coordinate Systems

## Geodesy

Geodesy is a Julia package for working with points in various world and local coordinate systems. The primary feature of Geodesy is to define and perform coordinate transformations in a convenient and safe framework, leveraging the CoordinateTransformations package. Transformations are accurate and efficient and implemented in native Julia code (with many functions being ported from Charles Karney's GeographicLib C++ library), and some common geodetic datums are provided for convenience.

## Quick start

Lets define a 3D point by its latitude, longitude and altitude (LLA):

``````x_lla = LLA(-27.468937, 153.023628, 0.0) # City Hall, Brisbane, Australia
``````

This can be converted to a Cartesian Earth-Centered-Earth-Fixed (ECEF) coordinate simply by calling the constructor

``````x_ecef = ECEF(x_lla, wgs84)
``````

Here we have used the WGS-84 ellipsoid to calculate the transformation, but other datums such as `osgb36`, `nad27` and `grs80` are provided. All transformations use the CoordinateTransformations' interface, and the above is short for

``````x_ecef = ECEFfromLLA(wgs84)(x_lla)
``````

where `ECEFfromLLA` is a type inheriting from CoordinateTransformations' `Transformation`. (Similar names `XfromY` exist for each of the coordinate types.)

Often, points are measured or required in a local frame, such as the north-east-up coordinates with respect to a given origin. The `ENU` type represents points in this coordinate system and we may transform between ENU and globally referenced coordinates using `ENUfromLLA`, etc.

``````origin_lla = LLA(-27.468937, 153.023628, 0.0) # City Hall, Brisbane, Australia
point_lla = LLA(-27.465933, 153.025900, 0.0)  # Central Station, Brisbane, Australia

# Define the transformation and execute it
trans = ENUfromLLA(origin_lla, wgs84)
point_enu = trans(point_lla)

# Equivalently
point_enu = ENU(point_enu, point_origin, wgs84)
``````

Similarly, we could convert to UTM/UPS coordinates, and two types are provided for this - `UTM` stores 3D coordinates `x`, `y`, and `z` in an unspecified zone, while `UTMZ` includes the `zone` number and `hemisphere` bool (where `true` = northern, `false` = southern). To get the canonical zone for your coordinates, simply use:

``````x_utmz = UTMZ(x_lla, wgs84)
``````

If you are transforming a large number of points to or from a given zone, it may be more effective to define the transformation explicitly and use the lighter `UTM` storage type.

``````points_lla::Vector{LLA{Float64}}
utm_from_lla = UTMfromLLA(56, false, wgs84) # Zone 56-South
points_utm = map(utm_from_lla, points_lla) # A new vector of UTM coordinates
``````

Geodesy becomes particularly powerful when you chain together transformations. For example, you can define a single transformation from your data on disk in UTM coordinates to a local frame in ENU coordinates. Internally, this will perform UTM (+ zone) → LLA → ECEF → ENU via composing transformations with `∘` into a `ComposedTransformation`:

``````julia> origin = LLA(-27.468937, 153.023628, 0.0) # City Hall, Brisbane, Australia
LLA(lat=-27.468937°, lon=153.023628°, alt=0.0)

julia> trans = ENUfromUTMZ(origin, wgs84)
(ENUfromECEF(ECEF(-5.046925124630393e6, 2.5689157252069353e6, -2.924416653602336e6), lat=-27.468937°, lon=153.023628°) ∘ (ECEFfromLLA(wgs84) ∘ LLAfromUTMZ(wgs84)))
``````

This transformation can then be composed with rotations and translations in CoordinateTransformations (or your own custom-defined `AbstractTransformation` to define further reference frames. For example, in this way, a point measured by a scanner on a moving vehicle at a particular time may be globally georeferenced with a single call to the `Transformation`!

Finally, the Cartesian distance between world points can be calculated via automatic transformation to a Cartesian frame:

``````x_lla = LLA(-27.468937, 153.023628, 0.0) # City Hall, Brisbane, Australia
y_lla = LLA(-27.465933, 153.025900, 0.0) # Central Station, Brisbane, Australia
distance(x_lla, y_lla)                   # 401.54 meters
``````

(assuming the `wgs84` datum, which can be configured in `distance(x, y, datum)`).

## Basic Terminology

This section describes some terminology and concepts that are relevant to Geodesy.jl, attempting to define Geodesy-specific jargon where possible. For a longer, less technical discussion with more historical context, ICSM's Fundamentals of Mapping page is highly recommended.

### Coordinate Reference Systems and Spatial Reference Identifiers

A position on the Earth can be given by some numerical coordinate values, but those don't mean much without more information. The extra information is called the Coordinate Reference System or CRS (also known as a Spatial Reference System or SRS). A CRS tells you two main things:

• The measurement procedure: which real world objects were used to define the frame of reference or datum of the measurement?
• The coordinate system: how do coordinate numerical values relate to the reference frame defined by the datum?

The full specification of a CRS can be complex, so a short label called a Spatial Reference IDentifier or SRID is usually used instead. For example, EPSG:4326 is one way to refer to the 2D WGS84 latitude and longitude you'd get from a mobile phone GPS device. An SRID is of the form `AUTHORITY:CODE`, where the code is a number and the authority is the name of an organization maintaining a list of codes with associated CRS information. There are services where you can look up a CRS, for example, http://epsg.io is a convenient interface to the SRIDs maintained by the European Petroleum Survey Group (EPSG) authority. Likewise, http://spatialreference.org is an open registry to which anyone can contribute.

When maintaining a spatial database, it's typical to define an internal list of SRIDs (effectively making your organization the authority), and a mapping from these to CRS information. A link back to a definitive SRID from an external authority should also be included where possible.

### Datums

In spatial measurement and positioning, a datum is a set of reference objects with given coordinates, relative to which other objects may be positioned. For example, in traditional surveying a datum might comprise a pair of pegs in the ground, separated by a carefully measured distance. When surveying the position of an unknown but nearby point, the angle back to the original datum objects can be measured using a theodolite. After this, the relative position of the new point can be computed using simple triangulation. Repeating this trick with any of the now three known points, an entire triangulation network of surveyed objects can be extended outward. Any point surveyed relative to the network is said to be measured in the datum of the original objects. Datums are often named with an acronym, for example OSGB36 is the Ordnance Survey of Great Britain, 1936.

In the era of satellite geodesy, coordinates are determined for an object by timing signals from a satellite constellation (eg, the GPS satellites) and computing position relative to those satellites. Where is the datum here? At first glance the situation seems quite different from the traditional setup described above. However, the satellite positions as a function of time (ephemerides, in the jargon) must themselves be defined relative to some frame. This is done by continuously observing the satellites from a set of highly stable ground stations equipped with GPS receivers. It is the full set of these ground stations and their assigned coordinates which form the datum.

Let's inspect the flow of positional information in both cases:

``````datum object positions -> triangulation network -> newly surveyed point
``````
• For satellite geodesy,
``````datum object positions -> satellite ephemerides -> newly surveyed point
``````

We see that the basic nature of a datum is precisely the same regardless of whether we're doing a traditional survey or using a GPS receiver.

### Terrestrial reference systems and frames

Coordinates for new points are measured by transferring coordinates from the datum objects, as described above. However, how do we decide on coordinates for the datum objects themselves? This is purely a matter of convention, consistency and measurement.

For example, the International Terrestrial Reference System (ITRS) is a reference system that rotates with the Earth so that the average velocity of the crust is zero. That is, in this reference system the only crust movement is geophysical. Roughly speaking, the defining conventions for the ITRS are:

• Space is modeled as a three-dimensional Euclidean affine space.
• The origin is at the center of mass of the Earth (it is geocentric).
• The z-axis is the axis of rotation of the Earth.
• The scale is set to 1 SI meter.
• The x-axis is orthogonal to the z-axis and aligns with the international reference meridian through Greenwich.
• The y-axis is set to the cross product of the z and x axes, forming a right handed coordinate frame.
• Various rates of change of the above must also be specified, for example, the scale should stay constant in time.

The precise conventions are defined in chapter 4 of the IERS conventions published by the International Earth Rotation and Reference Service (IERS). These conventions define an ideal reference system, but they're useless without physical measurements that give coordinates for a set of real world datum objects. The process of measuring and computing coordinates for datum objects is called realizing the reference system and the result is called a reference frame. For example, the International Terrestrial Reference Frame of 2014 (ITRF2014) realizes the ITRS conventions using raw measurement data gathered in the 25 years prior to 2014.

To measure and compute coordinates, several space geodesy techniques are used to gather raw measurement data; currently the IERS includes VLBI (very long baseline interferometry) of distant astronomical radio sources, SLR (satellite laser ranging), GPS (global positioning system) and DORIS (gosh these acronyms are tiring). The raw data is not in the form of positions, but must be condensed down in a large scale fitting problem, ideally by requiring physical and statistical consistency of all measurements, tying measurements at different sites together with physical models.

### Coordinate systems

In geometry, a coordinate system is a system which uses one or more numbers, or coordinates to uniquely determine the position of a point in a mathematical space such as Euclidean space. For example, in geodesy a point is commonly referred to using geodetic latitude, longitude and height relative to a given reference ellipsoid; this is called a geodetic coordinate system.

An ellipsoid is chosen because it's a reasonable model for the shape of the Earth and its gravitational field without being overly complex; it has only a few parameters, and a simple mathematical form. The term spheroid is also used because the ellipsoids in use today are rotationally symmetric around the pole. Note that there's several ways to define latitude on an ellipsoid. The most natural for geodesy is geodetic latitude, used by default because it's physically accessible in any location as a good approximation to the angle between the gravity vector and the equatorial plane. (This type of latitude is not an angle measured at the centre of the ellipsoid, which may be surprising if you're used to spherical coordinates!)

There are usually several useful coordinate systems for the same space. As well as the geodetic coordinates mentioned above, it's common to see

• The x,y,z components in an Earth-Centred Cartesian coordinate system rotating with the Earth. This is conventionally called an Earth-Centred Earth-Fixed (ECEF) coordinate system. This is a natural coordinate system in which to define coordinates for the datum objects defining a terrestrial reference frame.
• The east,north and up ENU components of a Cartesian coordinate frame at a particular point on the ellipsoid. This coordinate system is useful as a local frame for navigation.
• Easting,northing and vertical components of a projected coordinate system or map projection. There's an entire zoo of these, designed to represent the curved surface of an ellipsoid with a flat map.

Different coordinates systems provide different coordinates for the same point, so it's obviously important to specify exactly which coordinate system you're using. In particular, you should specify which ellipsoid parameters are in use if you deal with latitude and longitude, as in principle you could have more than one ellipsoid. This is a point of confusion, because a datum in geodesy also comes with a reference ellipsoid as a very strong matter of convention (thus being called a geodetic datum).

With its conventional ellipsoid, a geodetic datum also defines a conventional geodetic coordinate system, thus bringing together concepts which are interconnected but conceptually distinct. To emphasize:

• A coordinate system is a mathematical abstraction allowing us to manipulate geometric quantities using numeric and algebraic techniques. By itself, mathematical geometry is pure abstraction without a connection to the physical world.
• A datum is a set of physical objects with associated coordinates, thereby defining a reference frame in a way which is physically accessible. A datum is the bridge which connects physical reality to the abstract ideal of mathematical geometry, via the algebraic mechanism of a coordinate system.

## The API

### Coordinate types

Geodesy provides several in-built coordinate storage types for convenience and safety. The philosophy is to avoid carrying around raw data in generic containers like `Vector`s with no concept of what coordinate system it is in.

`LLA{T}` - latitude, longitude and altitude

The global `LLA` type stores data in a lat-lon-alt order, where latitude and longitude are expected in degrees (not radians). A keyword constructor, `LLA(lat=x, lon=y, alt=z)`, is also provided to help with having to remember the storage order.

`LatLon{T}` - latitude and longitude

The 2D `LatLon` type stores data in a lat-lon order, where latitude and longitude are expected in degrees (not radians). A keyword constructor, `LatLon(lat=x, lon=y)`, is also provided. `LatLon` is currently the only supported 2D coordinate.

`ECEF{T}` - Earth-centered, Earth-fixed

The global `ECEF` type stores Cartesian coordinates `x`, `y`, `z`, according to the usual convention. Being a Cartesian frame, `ECEF` is a subtype of StaticArrays' `StaticVector` and they can be added and subtracted with themselves and other vectors.

`UTM{T}` - universal transverse-Mercator

The `UTM` type encodes the easting `x`, northing `y` and height `z` of a UTM coordinate in an unspecified zone. This data type is also used to encode universal polar-stereographic (UPS) coordinates (where the zone is `0`).

`UTMZ{T}` - universal transverse-Mercator + zone

In addition to the easting `x`, northing `y` and height `z`, the global `UTMZ` type also encodes the UTM `zone` and `hemisphere`, where `zone` is a `UInt8` and `hemisphere` is a `Bool` for compact storage. The northern hemisphere is denoted as `true`, and the southern as `false`. Zone `0` corresponds to the UPS projection about the corresponding pole, otherwise `zone` is an integer between `1` and `60`.

`ENU{T}` - east-north-up

The `ENU` type is a local Cartesian coordinate that encodes a point's distance towards east `e`, towards north `n` and upwards `u` with respect to an unspecified origin. Like `ECEF`, `ENU` is also a subtype of `StaticVector`.

### Geodetic Datums

Geodetic datums are modelled as subtypes of the abstract type `Datum`. The associated ellipsoid may be obtained by calling the `ellipsoid()` function, for example, `ellipsoid(NAD83())`.

There are several pre-defined datums. Worldwide datums include

• `WGS84` - standard GPS datum for moderate precision work (representing both the latest frame realization, or if time is supplied a discontinuous dynamic datum where time looks up the frame implementation date in the broadcast ephemerides.)
• `WGS84{GpsWeek}` - specific realizations of the WGS84 frame.
• `ITRF{Year}` - Realizations of the International Terrestrial Reference System for high precision surveying.

National datums include

• `OSGB36` - Ordnance Survey of Great Britain of 1936.
• `NAD27`, `NAD83` - North American Datums of 1927 and 1983, respectively
• `GDA94` - Geocentric Datum of Australia, 1994.

Datums may also be passed to coordinate transformation constructors such as transverse-Mercator and polar-stereographic projections in which case the associated ellipsoid will be extracted to form the transformation. For datums without extra parameters (everything except `ITRF` and `WGS84{Week}`) there is a standard instance defined to reduce the amount of brackets you have to type. For example, `LLAfromECEF(NAD83())` and `LLAfromECEF(nad83)` are equivalent.

### Transformations and conversions

Geodesy provides two interfaces changing coordinate systems.

"Transformations" are based on CoordinateTransformations interface for defining `AbstractTransformation`s and allow the user to apply them by calling them, invert them with `inv()` and compose them with `compose()` or `∘`. The transformations cache any possible pre-calculations for efficiency when the same transformation is applied to many points.

"Conversions" are based on type-constructors, obeying simple syntax like `LLA(ecef, datum)`. The `datum` or other information is always necessary, as no assumptions are made by Geodesy for safety and consistency reasons. Similarly, `Base.convert` is not defined because, without assumptions, it would require additional information. The main drawback of this approach is that some calculations may not be pre-cached (for instance, the origin of an ENU transformation).

#### Between `LLA` and `ECEF`

The `LLAfromECEF` and `ECEFfromLLA` transformations require an ellipsoidal datum to perform the conversion. The exact transformation is performed in both directions, using a port the ECEF → LLA transformation from GeographicLib.

Note that in some cases where points are very close to the centre of the ellipsoid, multiple equivalent `LLA` points are valid solutions to the transformation problem. Here, as in GeographicLib, the point with the greatest altitude is chosen.

#### Between `LLA` and `UTM`/`UTMZ`

The `LLAfromUTM(Z)` and `UTM(Z)fromLLA` transformations also require an ellipsoidal datum to perform the conversion. The transformation retains a cache of the parameters used in the transformation, which in the case of the transverse-Mercator projection leads to a significant saving.

In all cases zone `0` corresponds to the UPS coordinate system, and the polar-stereographic projection of GeographicLib has been ported to Julia to perform the transformation.

An approximate, 6th-order expansion is used by default for the transverse-Mercator projection and its inverse (though orders 4-8 are defined). The algorithm is a native Julia port of that used in GeographicLib, and is accurate to nanometers for up to several UTM zones away from the reference meridian. However, the series expansion diverges at ±90° from the reference meridian. While the `UTMZ`-methods will automatically choose the canonical zone and hemisphere for the input, extreme care must be taken to choose an appropriate zone for the `UTM` methods. (In the future, we implement the exact UTM transformation as a fallback — contributions welcome!)

There is also `UTMfromUTMZ` and `UTMZfromUTM` transformations that are helpful for converting between these two formats and putting data into the same `UTM` zone.

#### To and from local `ENU` frames

The `ECEFfromENU` and `ENUfromECEF` transformations define the transformation around a specific origin. Both the origin coordinates as an `ECEF` as well as its corresponding latitude and longitude are stored in the transformation for maximal efficiency when performing multiple `transform`s. The transformation can be inverted with `inv` to perform the reverse transformation with respect to the same origin.

#### Web Mercator support

We support the Web Mercator / Pseudo Mercator projection with the `WebMercatorfromLLA` and `LLAfromWebMercator` transformations for interoperability with many web mapping systems. The scaling of the northing and easting is defined to be meters at the Equator, the same as how proj handles this (see https://proj.org/operations/projections/webmerc.html ).

If you need to deal with web mapping tile coordinate systems (zoom levels and pixel coordinates, etc) these could be added by composing another transformation on top of the web mercator projection defined in this package.

#### Composed transformations

Many other methods are defined as convenience constructors for composed transformations, to go between any two of the coordinate types defined here. These include:

• `ECEFfromUTMZ(datum) = ECEFfromLLA(datum) ∘ LLAfromUTMZ(datum)`
• `UTMZfromECEF(datum) = UTMZfromLLA(datum) ∘ LLAfromECEF(datum)`
• `UTMfromECEF(zone, hemisphere, datum) = UTMfromLLA(zone, hemisphere, datum) ∘ LLAfromECEF(datum)`
• `ECEFfromUTM(zone, hemisphere, datum) = ECEFfromLLA(datum) ∘ LLAfromUTM(zone, hemisphere, datum)`
• `ENUfromLLA(origin, datum) = ENUfromECEF(origin, datum) ∘ ECEFfromLLA(datum)`
• `LLAfromENU(origin, datum) = LLAfromECEF(datum) ∘ ECEFfromENU(origin, datum)`
• `ECEFfromUTMZ(datum) = ECEFfromLLA(datum) ∘ LLAfromUTMZ(datum)`
• `ENUfromUTMZ(origin, datum) = ENUfromLLA(origin, datum) ∘ LLAfromUTMZ(datum`
• `UTMZfromENU(origin, datum) = UTMZfromLLA(datum) ∘ LLAfromENU(origin, datum)`
• `UTMfromENU(origin, zone, hemisphere, datum) = UTMfromLLA(zone, hemisphere, datum) ∘ LLAfromENU(origin, datum)`
• `ENUfromUTM(origin, zone, hemisphere, datum) = ENUfromLLA(origin, datum) ∘ LLAfromUTM(zone, hemisphere, datum)`

Constructor-based transforms for these are also provided, such as `UTMZ(ecef, datum)` which converts to `LLA` as an intermediary, as above. When converting multiple points to or from the same ENU reference frame, it is recommended to use the transformation-based approach for efficiency. However, the other constructor-based conversions should be similar in speed to their transformation counterparts.

### Distance

Currently, the only defined distance measure is the straight-line or Euclidean distance, `euclidean_distance(x, y, [datum = wgs84])`, which works for all combinations of types for `x` and `y` - except that the UTM zone and hemisphere must also be provided for `UTM` types, as in `euclidean_distance(utm1, utm2, zone, hemisphere, [datum = wgs84])` (the Cartesian distance for `UTM` types is not approximated, but achieved via conversion to `ECEF`).

This is the only function currently in Geodesy which takes a default datum, and should be relatively accurate for close points where Euclidean distances are most important. Future work may focus on geodesics and related calculations (contributions welcome!).

Author: JuliaGeo

Source Code: https://github.com/JuliaGeo/Geodesy.jl

1655019480

## Learning-v8: Project for Learning V8 internals

The sole purpose of this project is to aid me in leaning Google's V8 JavaScript engine

### Isolate

An Isolate is an independant copy of the V8 runtime which includes its own heap. Two different Isolates can run in parallel and can be seen as entirely different sandboxed instances of a V8 runtime.

### Context

To allow separate JavaScript applications to run in the same isolate a context must be specified for each one. This is to avoid them interfering with each other, for example by changing the builtin objects provided.

### Template

This is the super class of both ObjecTemplate and FunctionTemplate. Remember that in JavaScript a function can have fields just like objects.

``````class V8_EXPORT Template : public Data {
public:
void Set(Local<Name> name, Local<Data> value,
PropertyAttribute attributes = None);
void SetPrivate(Local<Private> name, Local<Data> value,
PropertyAttribute attributes = None);
V8_INLINE void Set(Isolate* isolate, const char* name, Local<Data> value);

void SetAccessorProperty(
Local<Name> name,
Local<FunctionTemplate> getter = Local<FunctionTemplate>(),
Local<FunctionTemplate> setter = Local<FunctionTemplate>(),
PropertyAttribute attribute = None,
AccessControl settings = DEFAULT);
``````

The `Set` function can be used to have an name and a value set on an instance created from this template. The `SetAccessorProperty` is for properties that are get/set using functions.

``````enum PropertyAttribute {
/** None. **/
None = 0,
/** ReadOnly, i.e., not writable. **/
/** DontEnum, i.e., not enumerable. **/
DontEnum = 1 << 1,
/** DontDelete, i.e., not configurable. **/
DontDelete = 1 << 2
};

enum AccessControl {
DEFAULT               = 0,
ALL_CAN_WRITE         = 1 << 1,
PROHIBITS_OVERWRITING = 1 << 2
};
``````

### ObjectTemplate

These allow you to create JavaScript objects without a dedicated constructor. When an instance is created using an ObjectTemplate the new instance will have the properties and functions configured on the ObjectTemplate.

This would be something like:

``````const obj = {};
``````

This class is declared in include/v8.h and extends Template:

``````class V8_EXPORT ObjectTemplate : public Template {
...
}
class V8_EXPORT Template : public Data {
...
}
class V8_EXPORT Data {
private:
Data();
};
``````

We create an instance of ObjectTemplate and we can add properties to it that all instance created using this ObjectTemplate instance will have. This is done by calling `Set` which is member of the `Template` class. You specify a Local for the property. `Name` is a superclass for `Symbol` and `String` which can be both be used as names for a property.

The implementation for `Set` can be found in `src/api/api.cc`:

``````void Template::Set(v8::Local<Name> name, v8::Local<Data> value, v8::PropertyAttribute attribute) {
...

value_obj,
static_cast<i::PropertyAttributes>(attribute));
}
``````

There is an example in objecttemplate_test.cc

### FunctionTemplate

Is a template that is used to create functions and like ObjectTemplate it inherits from Template:

``````class V8_EXPORT FunctionTemplate : public Template {
}
``````

Rememeber that a function in javascript can have properties just like object.

There is an example in functiontemplate_test.cc

An instance of a function template can be created using:

``````  Local<FunctionTemplate> ft = FunctionTemplate::New(isolate_, function_callback, data);
Local<Function> function = ft->GetFunction(context).ToLocalChecked();
``````

And the function can be called using:

``````  MaybeLocal<Value> ret = function->Call(context, recv, 0, nullptr);
``````

Function::Call can be found in `src/api/api.cc`:

``````  bool has_pending_exception = false;
auto self = Utils::OpenHandle(this);
i::Handle<i::Object> recv_obj = Utils::OpenHandle(*recv);
i::Handle<i::Object>* args = reinterpret_cast<i::Handle<i::Object>*>(argv);
Local<Value> result;
has_pending_exception = !ToLocal<Value>(
i::Execution::Call(isolate, self, recv_obj, argc, args), &result);
``````

Notice that the return value of `Call` which is a `MaybeHandle<Object>` will be passed to ToLocal which is defined in `api.h`:

``````template <class T>
inline bool ToLocal(v8::internal::MaybeHandle<v8::internal::Object> maybe,
Local<T>* local) {
v8::internal::Handle<v8::internal::Object> handle;
if (maybe.ToHandle(&handle)) {
*local = Utils::Convert<v8::internal::Object, T>(handle);
return true;
}
return false;
``````

So lets take a look at `Execution::Call` which can be found in `execution/execution.cc` and it calls:

``````return Invoke(isolate, InvokeParams::SetUpForCall(isolate, callable, receiver, argc, argv));
``````

`SetUpForCall` will return an `InvokeParams`. TODO: Take a closer look at InvokeParams.

``````V8_WARN_UNUSED_RESULT MaybeHandle<Object> Invoke(Isolate* isolate,
const InvokeParams& params) {
``````
``````Handle<Object> receiver = params.is_construct
? isolate->factory()->the_hole_value()
``````

In our case `is_construct` is false as we are not using `new` and the receiver, the `this` in the function should be set to the receiver that we passed in. After that we have `Builtins::InvokeApiFunction`

``````auto value = Builtins::InvokeApiFunction(
params.argv, Handle<HeapObject>::cast(params.new_target));
``````
``````result = HandleApiCallHelper<false>(isolate, function, new_target,
``````

`api-arguments-inl.h` has:

``````FunctionCallbackArguments::Call(CallHandlerInfo handler) {
...
FunctionCallbackInfo<v8::Value> info(values_, argv_, argc_);
f(info);
return GetReturnValue<Object>(isolate);
}
``````

The call to f(info) is what invokes the callback, which is just a normal function call.

Back in `HandleApiCallHelper` we have:

``````Handle<Object> result = custom.Call(call_data);

RETURN_EXCEPTION_IF_SCHEDULED_EXCEPTION(isolate, Object);
``````

`RETURN_EXCEPTION_IF_SCHEDULED_EXCEPTION` expands to:

``````Handle<Object> result = custom.Call(call_data);
do {
Isolate* __isolate__ = (isolate);
((void) 0);
if (__isolate__->has_scheduled_exception()) {
__isolate__->PromoteScheduledException();
return MaybeHandle<Object>();
}
} while (false);
``````

Notice that if there was an exception an empty object is returned. Later in `Invoke` in `execution.cc`a:

``````  auto value = Builtins::InvokeApiFunction(
params.argv, Handle<HeapObject>::cast(params.new_target));
bool has_exception = value.is_null();
if (has_exception) {
if (params.message_handling == Execution::MessageHandling::kReport) {
isolate->ReportPendingMessages();
}
return MaybeHandle<Object>();
} else {
isolate->clear_pending_message();
}
return value;
``````

Looking at this is looks like passing back an empty object will cause an exception to be triggered?

`Address` can be found in `include/v8-internal.h`:

``````typedef uintptr_t Address;
``````

`uintptr_t` is an optional type specified in `cstdint` and is capable of storing a data pointer. It is an unsigned integer type that any valid pointer to void can be converted to this type (and back).

### TaggedImpl

This class is declared in `src/objects/tagged-impl.h and has a single private member which is declared as:

`````` public
constexpr StorageType ptr() const { return ptr_; }
private:
StorageType ptr_;
``````

An instance can be created using:

``````  i::TaggedImpl<i::HeapObjectReferenceType::STRONG, i::Address>  tagged{};
``````

Storage type can also be `Tagged_t` which is defined in globals.h:

`````` using Tagged_t = uint32_t;
``````

It looks like it can be a different value when using pointer compression.

See tagged_test.cc for an example.

### Object

This class extends TaggedImpl:

``````class Object : public TaggedImpl<HeapObjectReferenceType::STRONG, Address> {
``````

An Object can be created using the default constructor, or by passing in an Address which will delegate to TaggedImpl constructors. Object itself does not have any members (apart from `ptr_` which is inherited from TaggedImpl that is). So if we create an Object on the stack this is like a pointer/reference to an object:

``````+------+
|Object|
|------|
|ptr_  |---->
+------+
``````

Now, `ptr_` is a StorageType so it could be a `Smi` in which case it would just contains the value directly, for example a small integer:

``````+------+
|Object|
|------|
|  18  |
+------+
``````

See object_test.cc for an example.

### ObjectSlot

``````  i::Object obj{18};
i::FullObjectSlot slot{&obj};
``````
``````+----------+      +---------+
|ObjectSlot|      | Object  |
|----------|      |---------|
| address  | ---> |   18    |
+----------+      +---------+
``````

See objectslot_test.cc for an example.

### Maybe

A Maybe is like an optional which can either hold a value or nothing.

``````template <class T>
class Maybe {
public:
V8_INLINE bool IsNothing() const { return !has_value_; }
V8_INLINE bool IsJust() const { return has_value_; }
...

private:
bool has_value_;
T value_;
}
``````

I first thought that name `Just` was a little confusing but if you read this like:

``````  bool cond = true;
Maybe<int> maybe = cond ? Just<int>(10) : Nothing<int>();
``````

I think it makes more sense. There are functions that check if the Maybe is nothing and crash the process if so. You can also check and return the value by using `FromJust`.

The usage of Maybe is where api calls can fail and returning Nothing is a way of signaling this.

See maybe_test.cc for an example.

### MaybeLocal

``````template <class T>
class MaybeLocal {
public:
V8_INLINE MaybeLocal() : val_(nullptr) {}
V8_INLINE Local<T> ToLocalChecked();
V8_INLINE bool IsEmpty() const { return val_ == nullptr; }
template <class S>
V8_WARN_UNUSED_RESULT V8_INLINE bool ToLocal(Local<S>* out) const {
out->val_ = IsEmpty() ? nullptr : this->val_;
return !IsEmpty();
}

private:
T* val_;
``````

`ToLocalChecked` will crash the process if `val_` is a nullptr. If you want to avoid a crash one can use `ToLocal`.

See maybelocal_test.cc for an example.

### Data

Is the super class of all objects that can exist the V8 heap:

``````class V8_EXPORT Data {
private:
Data();
};
``````

### Value

Value extends `Data` and adds a number of methods that check if a Value is of a certain type, like `IsUndefined()`, `IsNull`, `IsNumber` etc. It also has useful methods to convert to a Local, for example:

``````V8_WARN_UNUSED_RESULT MaybeLocal<Number> ToNumber(Local<Context> context) const;
V8_WARN_UNUSED_RESULT MaybeLocal<String> ToNumber(Local<String> context) const;
...
``````

### Handle

A Handle is similar to a Object and ObjectSlot in that it also contains an Address member (called `location_` and declared in `HandleBase`), but with the difference is that Handles acts as a layer of abstraction and can be relocated by the garbage collector. Can be found in `src/handles/handles.h`.

``````class HandleBase {
...
protected:
}
template <typename T>
class Handle final : public HandleBase {
...
}
``````
``````+----------+                  +--------+         +---------+
|  Handle  |                  | Object |         |   int   |
|----------|      +-----+     |--------|         |---------|
|*location_| ---> |&ptr_| --> | ptr_   | ----->  |     5   |
+----------+      +-----+     +--------+         +---------+
``````
``````(gdb) p handle
\$8 = {<v8::internal::HandleBase> = {location_ = 0x7ffdf81d60c0}, <No data fields>}
``````

Notice that `location_` contains a pointer:

``````(gdb) p /x *(int*)0x7ffdf81d60c0
\$9 = 0xa9d330
``````

And this is the same as the value in obj:

``````(gdb) p /x obj.ptr_
\$14 = 0xa9d330
``````

And we can access the int using any of the pointers:

``````(gdb) p /x *value
\$16 = 0x5
(gdb) p /x *obj.ptr_
\$17 = 0x5
(gdb) p /x *(int*)0x7ffdf81d60c0
\$18 = 0xa9d330
(gdb) p /x *(*(int*)0x7ffdf81d60c0)
\$19 = 0x5
``````

See handle_test.cc for an example.

### HandleScope

Contains a number of Local/Handle's (think pointers to objects but is managed by V8) and will take care of deleting the Local/Handles for us. HandleScopes are stack allocated

When ~HandleScope is called all handles created within that scope are removed from the stack maintained by the HandleScope which makes objects to which the handles point being eligible for deletion from the heap by the GC.

A HandleScope only has three members:

``````  internal::Isolate* isolate_;
``````

Lets take a closer look at what happens when we construct a HandleScope:

``````  v8::HandleScope handle_scope{isolate_};
``````

The constructor call will end up in `src/api/api.cc` and the constructor simply delegates to `Initialize`:

``````HandleScope::HandleScope(Isolate* isolate) { Initialize(isolate); }

void HandleScope::Initialize(Isolate* isolate) {
i::Isolate* internal_isolate = reinterpret_cast<i::Isolate*>(isolate);
...
i::HandleScopeData* current = internal_isolate->handle_scope_data();
isolate_ = internal_isolate;
prev_next_ = current->next;
prev_limit_ = current->limit;
current->level++;
}
``````

Every `v8::internal::Isolate` has member of type HandleScopeData:

``````HandleScopeData* handle_scope_data() { return &handle_scope_data_; }
HandleScopeData handle_scope_data_;
``````

HandleScopeData is a struct defined in `src/handles/handles.h`:

``````struct HandleScopeData final {
int level;
int sealed_level;
CanonicalHandleScope* canonical_scope;

void Initialize() {
next = limit = nullptr;
sealed_level = level = 0;
canonical_scope = nullptr;
}
};
``````

Notice that there are two pointers (Address*) to next and a limit. When a HandleScope is Initialized the current handle_scope_data will be retrieved from the internal isolate. The HandleScope instance that is getting created stores the next/limit pointers of the current isolate so that they can be restored when this HandleScope is closed (see CloseScope).

So with a HandleScope created, how does a Local interact with this instance?

When a Local is created this will/might go through FactoryBase::NewStruct which will allocate a new Map and then create a Handle for the InstanceType being created:

``````Handle<Struct> str = handle(Struct::cast(result), isolate());
``````

This will land in the constructor Handlesrc/handles/handles-inl.h

``````template <typename T>
Handle<T>::Handle(T object, Isolate* isolate): HandleBase(object.ptr(), isolate) {}

: location_(HandleScope::GetHandle(isolate, object)) {}
``````

Notice that `object.ptr()` is used to pass the Address to HandleBase. And also notice that HandleBase sets its location_ to the result of HandleScope::GetHandle.

``````Address* HandleScope::GetHandle(Isolate* isolate, Address value) {
DCHECK(AllowHandleAllocation::IsAllowed());
HandleScopeData* data = isolate->handle_scope_data();
CanonicalHandleScope* canonical = data->canonical_scope;
return canonical ? canonical->Lookup(value) : CreateHandle(isolate, value);
}
``````

Which will call `CreateHandle` in this case and this function will retrieve the current isolate's handle_scope_data:

``````  HandleScopeData* data = isolate->handle_scope_data();
if (result == data->limit) {
result = Extend(isolate);
}
``````

In this case both next and limit will be 0x0 so Extend will be called. Extend will also get the isolates handle_scope_data and check the current level and after that get the isolates HandleScopeImplementer:

``````  HandleScopeImplementer* impl = isolate->handle_scope_implementer();
``````

`HandleScopeImplementer` is declared in `src/api/api.h`

HandleScope:CreateHandle will get the handle_scope_data from the isolate:

``````Address* HandleScope::CreateHandle(Isolate* isolate, Address value) {
HandleScopeData* data = isolate->handle_scope_data();
if (result == data->limit) {
result = Extend(isolate);
}
// Update the current next field, set the value in the created handle,
// and return the result.
*result = value;
return result;
}
``````

Notice that `data->next` is set to the address passed in + the size of an Address.

The destructor for HandleScope will call CloseScope. See handlescope_test.cc for an example.

### EscapableHandleScope

Local handles are located on the stack and are deleted when the appropriate destructor is called. If there is a local HandleScope then it will take care of this when the scope returns. When there are no references left to a handle it can be garbage collected. This means if a function has a HandleScope and wants to return a handle/local it will not be available after the function returns. This is what EscapableHandleScope is for, it enable the value to be placed in the enclosing handle scope to allow it to survive. When the enclosing HandleScope goes out of scope it will be cleaned up.

``````class V8_EXPORT EscapableHandleScope : public HandleScope {
public:
explicit EscapableHandleScope(Isolate* isolate);
V8_INLINE ~EscapableHandleScope() = default;
template <class T>
V8_INLINE Local<T> Escape(Local<T> value) {
return Local<T>(reinterpret_cast<T*>(slot));
}

template <class T>
V8_INLINE MaybeLocal<T> EscapeMaybe(MaybeLocal<T> value) {
return Escape(value.FromMaybe(Local<T>()));
}

private:
...
};
``````

From `api.cc`

``````EscapableHandleScope::EscapableHandleScope(Isolate* v8_isolate) {
i::Isolate* isolate = reinterpret_cast<i::Isolate*>(v8_isolate);
Initialize(v8_isolate);
}
``````

So when an EscapableHandleScope is created it will create a handle with the hole value and store it in the `escape_slot_` which is of type Address. This Handle will be created in the current HandleScope, and EscapableHandleScope can later set a value for that pointer/address which it want to be escaped. Later when that HandleScope goes out of scope it will be cleaned up. It then calls Initialize just like a normal HandleScope would.

``````i::Address* HandleScope::CreateHandle(i::Isolate* isolate, i::Address value) {
return i::HandleScope::CreateHandle(isolate, value);
}
``````

From `handles-inl.h`:

``````Address* HandleScope::CreateHandle(Isolate* isolate, Address value) {
DCHECK(AllowHandleAllocation::IsAllowed());
HandleScopeData* data = isolate->handle_scope_data();
if (result == data->limit) {
result = Extend(isolate);
}
// Update the current next field, set the value in the created handle,
// and return the result.
*result = value;
return result;
}
``````

When Escape is called the following happens (v8.h):

``````template <class T>
V8_INLINE Local<T> Escape(Local<T> value) {
return Local<T>(reinterpret_cast<T*>(slot));
}
``````

An the EscapeableHandleScope::Escape (api.cc):

``````i::Address* EscapableHandleScope::Escape(i::Address* escape_value) {
i::Heap* heap = reinterpret_cast<i::Isolate*>(GetIsolate())->heap();
Utils::ApiCheck(i::Object(*escape_slot_).IsTheHole(heap->isolate()),
"EscapableHandleScope::Escape", "Escape value set twice");
if (escape_value == nullptr) {
return nullptr;
}
*escape_slot_ = *escape_value;
return escape_slot_;
}
``````

If the escape_value is null, the `escape_slot` that is a pointer into the parent HandleScope is set to the undefined_value() instead of the hole value which is was previously, and nullptr will be returned. This returned address/pointer will then be returned after being casted to T*. Next, we take a look at what happens when the EscapableHandleScope goes out of scope. This will call HandleScope::~HandleScope which makes sense as any other Local handles should be cleaned up.

`Escape` copies the value of its argument into the enclosing scope, deletes alli its local handles, and then gives back the new handle copy which can safely be returned.

TODO:

### Local

Has a single member `val_` which is of type pointer to `T`:

``````template <class T> class Local {
...
private:
T* val_
}
``````

Notice that this is a pointer to T. We could create a local using:

``````  v8::Local<v8::Value> empty_value;
``````

So a Local contains a pointer to type T. We can access this pointer using `operator->` and `operator*`.

We can cast from a subtype to a supertype using Local::Cast:

``````v8::Local<v8::Number> nr = v8::Local<v8::Number>(v8::Number::New(isolate_, 12));
v8::Local<v8::Value> val = v8::Local<v8::Value>::Cast(nr);
``````

And there is also the

``````v8::Local<v8::Value> val2 = nr.As<v8::Value>();
``````

See local_test.cc for an example.

### PrintObject

Using _v8_internal_Print_Object from c++:

``````\$ nm -C libv8_monolith.a | grep Print_Object
0000000000000000 T _v8_internal_Print_Object(void*)
``````

Notice that this function does not have a namespace. We can use this as:

``````extern void _v8_internal_Print_Object(void* object);

_v8_internal_Print_Object(*((v8::internal::Object**)(*global)));
``````

Lets take a closer look at the above:

``````  v8::internal::Object** gl = ((v8::internal::Object**)(*global));
``````

We use the dereference operator to get the value of a Local (*global), which is just of type `T*`, a pointer to the type the Local:

``````template <class T>
class Local {
...
private:
T* val_;
}
``````

We are then casting that to be of type pointer-to-pointer to Object.

``````  gl**        Object*         Object
+-----+      +------+      +-------+
|     |----->|      |----->|       |
+-----+      +------+      +-------+
``````

An instance of `v8::internal::Object` only has a single data member which is a field named `ptr_` of type `Address`:

`src/objects/objects.h`:

``````class Object : public TaggedImpl<HeapObjectReferenceType::STRONG, Address> {
public:
explicit constexpr Object(Address ptr) : TaggedImpl(ptr) {}

#define IS_TYPE_FUNCTION_DECL(Type) \
V8_INLINE bool Is##Type() const;  \
V8_INLINE bool Is##Type(const Isolate* isolate) const;
OBJECT_TYPE_LIST(IS_TYPE_FUNCTION_DECL)
HEAP_OBJECT_TYPE_LIST(IS_TYPE_FUNCTION_DECL)
IS_TYPE_FUNCTION_DECL(HashTableBase)
IS_TYPE_FUNCTION_DECL(SmallOrderedHashTable)
#undef IS_TYPE_FUNCTION_DECL
}
``````

Lets take a look at one of these functions and see how it is implemented. For example in the OBJECT_TYPE_LIST we have:

``````#define OBJECT_TYPE_LIST(V) \
V(LayoutDescriptor)       \
V(Primitive)              \
V(Number)                 \
V(Numeric)
``````

So the object class will have a function that looks like:

``````inline bool IsNumber() const;
inline bool IsNumber(const Isolate* isolate) const;
``````

And in src/objects/objects-inl.h we will have the implementations:

``````bool Object::IsNumber() const {
return IsHeapObject() && HeapObject::cast(*this).IsNumber();
}
``````

`IsHeapObject` is defined in TaggedImpl:

``````  constexpr inline bool IsHeapObject() const { return IsStrong(); }

constexpr inline bool IsStrong() const {
#if V8_HAS_CXX14_CONSTEXPR
DCHECK_IMPLIES(!kCanBeWeak, !IsSmi() == HAS_STRONG_HEAP_OBJECT_TAG(ptr_));
#endif
return kCanBeWeak ? HAS_STRONG_HEAP_OBJECT_TAG(ptr_) : !IsSmi();
}
``````

The macro can be found in src/common/globals.h:

``````#define HAS_STRONG_HEAP_OBJECT_TAG(value)                          \
::i::kHeapObjectTag))
``````

So we are casting `ptr_` which is of type Address into type `Tagged_t` which is defined in src/common/global.h and can be different depending on if compressed pointers are used or not. If they are not supported it is the same as Address:

``````using Tagged_t = Address;
``````

`src/objects/tagged-impl.h`:

``````template <HeapObjectReferenceType kRefType, typename StorageType>
class TaggedImpl {

StorageType ptr_;
}
``````

The HeapObjectReferenceType can be either WEAK or STRONG. And the storage type is `Address` in this case. So Object itself only has one member that is inherited from its only super class and this is `ptr_`.

So the following is telling the compiler to treat the value of our Local, `*global`, as a pointer (which it already is) to a pointer that points to a memory location that adhers to the layout of an `v8::internal::Object` type, which we know now has a `prt_` member. And we want to dereference it and pass it into the function.

``````_v8_internal_Print_Object(*((v8::internal::Object**)(*global)));
``````

### ObjectTemplate

But I'm still missing the connection between ObjectTemplate and object. When we create it we use:

``````Local<ObjectTemplate> global = ObjectTemplate::New(isolate);
``````

In `src/api/api.cc` we have:

``````static Local<ObjectTemplate> ObjectTemplateNew(
i::Isolate* isolate, v8::Local<FunctionTemplate> constructor,
bool do_not_cache) {
i::Handle<i::Struct> struct_obj = isolate->factory()->NewStruct(
i::OBJECT_TEMPLATE_INFO_TYPE, i::AllocationType::kOld);
i::Handle<i::ObjectTemplateInfo> obj = i::Handle<i::ObjectTemplateInfo>::cast(struct_obj);
InitializeTemplate(obj, Consts::OBJECT_TEMPLATE);
int next_serial_number = 0;
if (!constructor.IsEmpty())
obj->set_constructor(*Utils::OpenHandle(*constructor));
obj->set_data(i::Smi::zero());
return Utils::ToLocal(obj);
}
``````

What is a `Struct` in this context?
`src/objects/struct.h`

``````#include "torque-generated/class-definitions-tq.h"

class Struct : public TorqueGeneratedStruct<Struct, HeapObject> {
public:
inline void InitializeBody(int object_size);
void BriefPrintDetails(std::ostream& os);
TQ_OBJECT_CONSTRUCTORS(Struct)
``````

Notice that the include is specifying `torque-generated` include which can be found `out/x64.release_gcc/gen/torque-generated/class-definitions-tq`. So, somewhere there must be an call to the `torque` executable which generates the Code Stub Assembler C++ headers and sources before compiling the main source files. There is and there is a section about this in `Building V8`. The macro `TQ_OBJECT_CONSTRUCTORS` can be found in `src/objects/object-macros.h` and expands to:

``````  constexpr Struct() = default;

protected:
template <typename TFieldType, int kFieldOffset>
friend class TaggedField;

``````

So what does the TorqueGeneratedStruct look like?

``````template <class D, class P>
class TorqueGeneratedStruct : public P {
public:
``````

Where D is Struct and P is HeapObject in this case. But the above is the declartion of the type but what we have in the .h file is what was generated.

This type is defined in `src/objects/struct.tq`:

``````@abstract
@generatePrint
@generateCppClass
extern class Struct extends HeapObject {
}
``````

`NewStruct` can be found in `src/heap/factory-base.cc`

``````template <typename Impl>
HandleFor<Impl, Struct> FactoryBase<Impl>::NewStruct(
InstanceType type, AllocationType allocation) {
int size = map.instance_size();
HeapObject result = AllocateRawWithImmortalMap(size, allocation, map);
HandleFor<Impl, Struct> str = handle(Struct::cast(result), isolate());
str->InitializeBody(size);
return str;
}
``````

Every object that is stored on the v8 heap has a Map (`src/objects/map.h`) that describes the structure of the object being stored.

``````class Map : public HeapObject {
``````
``````1725      return Utils::ToLocal(obj);
(gdb) p obj
\$6 = {<v8::internal::HandleBase> = {location_ = 0x30b5160}, <No data fields>}
``````

So this is the connection, what we see as a Local is a HandleBase. TODO: dig into this some more when I have time.

``````(lldb) expr gl
(v8::internal::Object **) \$0 = 0x00000000020ee160
(lldb) memory read -f x -s 8 -c 1 gl
0x020ee160: 0x00000aee081c0121

(lldb) memory read -f x -s 8 -c 1 *gl
0xaee081c0121: 0x0200000002080433
``````

You can reload `.lldbinit` using the following command:

``````(lldb) command source ~/.lldbinit
``````

This can be useful when debugging a lldb command. You can set a breakpoint and break at that location and make updates to the command and reload without having to restart lldb.

Currently, the lldb-commands.py that ships with v8 contains an extra operation of the parameter pased to `ptr_arg_cmd`:

``````def ptr_arg_cmd(debugger, name, param, cmd):
if not param:
print("'{}' requires an argument".format(name))
return
param = '(void*)({})'.format(param)
no_arg_cmd(debugger, cmd.format(param))
``````

Notice that `param` is the object that we want to print, for example lets say it is a local named obj:

``````param = "(void*)(obj)"
``````

This will then be "passed"/formatted into the command string:

``````"_v8_internal_Print_Object(*(v8::internal::Object**)(*(void*)(obj))")
``````

V8 is single threaded (the execution of the functions of the stack) but there are supporting threads used for garbage collection, profiling (IC, and perhaps other things) (I think). Lets see what threads there are:

``````\$ LD_LIBRARY_PATH=../v8_src/v8/out/x64.release_gcc/ lldb ./hello-world
(lldb) br s -n main
(lldb) r
thread #1: tid = 0x2efca6, 0x0000000100001e16 hello-world`main(argc=1, argv=0x00007fff5fbfee98) + 38 at hello-world.cc:40, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
``````

So at startup there is only one thread which is what we expected. Lets skip ahead to where we create the platform:

``````Platform* platform = platform::CreateDefaultPlatform();
...
DefaultPlatform* platform = new DefaultPlatform(idle_task_support, tracing_controller);

``````

Next there is a check for 0 and the number of processors -1 is used as the size of the thread pool:

``````(lldb) fr v thread_pool_size
``````

This is all that `SetThreadPoolSize` does. After this we have:

``````platform->EnsureInitialized();

for (int i = 0; i < thread_pool_size_; ++i)
``````

`new WorkerThread` will create a new pthread (on my system which is MacOSX):

``````result = pthread_create(&data_->thread_, &attr, ThreadEntry, this);
``````

ThreadEntry can be found in src/base/platform/platform-posix.

### International Component for Unicode (ICU)

International Components for Unicode (ICU) deals with internationalization (i18n). ICU provides support locale-sensitve string comparisons, date/time/number/currency formatting etc.

There is an optional API called ECMAScript 402 which V8 suppports and which is enabled by default. i18n-support says that even if your application does not use ICU you still need to call InitializeICU :

``````V8::InitializeICU();
``````

### Local

``````Local<String> script_name = ...;
``````

So what is script_name. Well it is an object reference that is managed by the v8 GC. The GC needs to be able to move things (pointers around) and also track if things should be GC'd. Local handles as opposed to persistent handles are light weight and mostly used local operations. These handles are managed by HandleScopes so you must have a handlescope on the stack and the local is only valid as long as the handlescope is valid. This uses Resource Acquisition Is Initialization (RAII) so when the HandleScope instance goes out of scope it will remove all the Local instances.

The `Local` class (in `include/v8.h`) only has one member which is of type pointer to the type `T`. So for the above example it would be:

``````  String* val_;
``````

You can find the available operations for a Local in `include/v8.h`.

``````(lldb) p script_name.IsEmpty()
(bool) \$12 = false
``````

A Local has overloaded a number of operators, for example ->:

``````(lldb) p script_name->Length()
(int) \$14 = 7
``````

Where Length is a method on the v8 String class.

The handle stack is not part of the C++ call stack, but the handle scopes are embedded in the C++ stack. Handle scopes can only be stack-allocated, not allocated with new.

### Persistent

https://v8.dev/docs/embed: Persistent handles provide a reference to a heap-allocated JavaScript Object, just like a local handle. There are two flavors, which differ in the lifetime management of the reference they handle. Use a persistent handle when you need to keep a reference to an object for more than one function call, or when handle lifetimes do not correspond to C++ scopes. Google Chrome, for example, uses persistent handles to refer to Document Object Model (DOM) nodes.

A persistent handle can be made weak, using PersistentBase::SetWeak, to trigger a callback from the garbage collector when the only references to an object are from weak persistent handles.

A UniquePersistent handle relies on C++ constructors and destructors to manage the lifetime of the underlying object. A Persistent can be constructed with its constructor, but must be explicitly cleared with Persistent::Reset.

So how is a persistent object created?
Let's write a test and find out (`test/persistent-object_text.cc`):

``````\$ make test/persistent-object_test
\$ ./test/persistent-object_test --gtest_filter=PersistentTest.value
``````

Now, to create an instance of Persistent we need a Local instance or the Persistent instance will just be empty.

``````Local<Object> o = Local<Object>::New(isolate_, Object::New(isolate_));
``````

`Local<Object>::New` can be found in `src/api/api.cc`:

``````Local<v8::Object> v8::Object::New(Isolate* isolate) {
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate);
LOG_API(i_isolate, Object, New);
ENTER_V8_NO_SCRIPT_NO_EXCEPTION(i_isolate);
i::Handle<i::JSObject> obj =
i_isolate->factory()->NewJSObject(i_isolate->object_function());
return Utils::ToLocal(obj);
}
``````

The first thing that happens is that the public Isolate pointer is cast to an pointer to the internal `Isolate` type. `LOG_API` is a macro in the same source file (src/api/api.cc):

``````#define LOG_API(isolate, class_name, function_name)                           \
i::RuntimeCallTimerScope _runtime_timer(                                    \
isolate, i::RuntimeCallCounterId::kAPI_##class_name##_##function_name); \
LOG(isolate, ApiEntryCall("v8::" #class_name "::" #function_name))
``````

If our case the preprocessor would expand that to:

``````  i::RuntimeCallTimerScope _runtime_timer(
isolate, i::RuntimeCallCounterId::kAPI_Object_New);
LOG(isolate, ApiEntryCall("v8::Object::New))
``````

`LOG` is a macro that can be found in `src/log.h`:

``````#define LOG(isolate, Call)                              \
do {                                                  \
v8::internal::Logger* logger = (isolate)->logger(); \
if (logger->is_logging()) logger->Call;             \
} while (false)
``````

And this would expand to:

``````  v8::internal::Logger* logger = isolate->logger();
if (logger->is_logging()) logger->ApiEntryCall("v8::Object::New");
``````

So with the LOG_API macro expanded we have:

``````Local<v8::Object> v8::Object::New(Isolate* isolate) {
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate);
i::RuntimeCallTimerScope _runtime_timer( isolate, i::RuntimeCallCounterId::kAPI_Object_New);
v8::internal::Logger* logger = isolate->logger();
if (logger->is_logging()) logger->ApiEntryCall("v8::Object::New");

ENTER_V8_NO_SCRIPT_NO_EXCEPTION(i_isolate);
i::Handle<i::JSObject> obj =
i_isolate->factory()->NewJSObject(i_isolate->object_function());
return Utils::ToLocal(obj);
}
``````

Next we have `ENTER_V8_NO_SCRIPT_NO_EXCEPTION`:

``````#define ENTER_V8_NO_SCRIPT_NO_EXCEPTION(isolate)                    \
i::VMState<v8::OTHER> __state__((isolate));                       \
i::DisallowJavascriptExecutionDebugOnly __no_script__((isolate)); \
i::DisallowExceptions __no_exceptions__((isolate))
``````

So with the macros expanded we have:

``````Local<v8::Object> v8::Object::New(Isolate* isolate) {
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate);
i::RuntimeCallTimerScope _runtime_timer( isolate, i::RuntimeCallCounterId::kAPI_Object_New);
v8::internal::Logger* logger = isolate->logger();
if (logger->is_logging()) logger->ApiEntryCall("v8::Object::New");

i::VMState<v8::OTHER> __state__(i_isolate));
i::DisallowJavascriptExecutionDebugOnly __no_script__(i_isolate);
i::DisallowExceptions __no_exceptions__(i_isolate));

i::Handle<i::JSObject> obj =
i_isolate->factory()->NewJSObject(i_isolate->object_function());

return Utils::ToLocal(obj);
}
``````

TODO: Look closer at `VMState`.

First, `i_isolate->object_function()` is called and the result passed to `NewJSObject`. `object_function` is generated by a macro named `NATIVE_CONTEXT_FIELDS`:

``````#define NATIVE_CONTEXT_FIELD_ACCESSOR(index, type, name)     \
Handle<type> Isolate::name() {                             \
return Handle<type>(raw_native_context()->name(), this); \
}                                                          \
bool Isolate::is_##name(type* value) {                     \
return raw_native_context()->is_##name(value);           \
}
NATIVE_CONTEXT_FIELDS(NATIVE_CONTEXT_FIELD_ACCESSOR)
``````

`NATIVE_CONTEXT_FIELDS` is a macro in `src/contexts` and it c

``````#define NATIVE_CONTEXT_FIELDS(V)                                               \
...                                                                            \
V(OBJECT_FUNCTION_INDEX, JSFunction, object_function)                        \
``````
``````  Handle<type> Isolate::object_function() {
return Handle<JSFunction>(raw_native_context()->object_function(), this);
}

bool Isolate::is_object_function(JSFunction* value) {
return raw_native_context()->is_object_function(value);
}
``````

I'm not clear on the different types of context, there is a native context, a "normal/public" context. In `src/contexts-inl.h` we have the native_context function:

``````Context* Context::native_context() const {
Object* result = get(NATIVE_CONTEXT_INDEX);
DCHECK(IsBootstrappingOrNativeContext(this->GetIsolate(), result));
return reinterpret_cast<Context*>(result);
}
``````

`Context` extends `FixedArray` so the get function is the get function of FixedArray and `NATIVE_CONTEXT_INDEX` is the index into the array where the native context is stored.

Now, lets take a closer look at `NewJSObject`. If you search for NewJSObject in `src/heap/factory.cc`:

``````Handle<JSObject> Factory::NewJSObject(Handle<JSFunction> constructor, PretenureFlag pretenure) {
JSFunction::EnsureHasInitialMap(constructor);
Handle<Map> map(constructor->initial_map(), isolate());
return NewJSObjectFromMap(map, pretenure);
}
``````

`NewJSObjectFromMap`

``````...
HeapObject* obj = AllocateRawWithAllocationSite(map, pretenure, allocation_site);
``````

So we have created a new map

### Map

So an HeapObject contains a pointer to a Map, or rather has a function that returns a pointer to Map. I can't see any member map in the HeapObject class.

Lets take a look at when a map is created.

``````(lldb) br s -f map_test.cc -l 63
``````
``````Handle<Map> Factory::NewMap(InstanceType type,
int instance_size,
ElementsKind elements_kind,
int inobject_properties) {
HeapObject* result = isolate()->heap()->AllocateRawWithRetryOrFail(Map::kSize, MAP_SPACE);
result->set_map_after_allocation(*meta_map(), SKIP_WRITE_BARRIER);
return handle(InitializeMap(Map::cast(result), type, instance_size,
elements_kind, inobject_properties),
isolate());
}
``````

We can see that the above is calling `AllocateRawWithRetryOrFail` on the heap instance passing a size of `88` and specifying the `MAP_SPACE`:

``````HeapObject* Heap::AllocateRawWithRetryOrFail(int size, AllocationSpace space,
AllocationAlignment alignment) {
AllocationResult alloc;
HeapObject* result = AllocateRawWithLigthRetry(size, space, alignment);
if (result) return result;

isolate()->counters()->gc_last_resort_from_handles()->Increment();
CollectAllAvailableGarbage(GarbageCollectionReason::kLastResort);
{
AlwaysAllocateScope scope(isolate());
alloc = AllocateRaw(size, space, alignment);
}
if (alloc.To(&result)) {
DCHECK(result != exception());
return result;
}
// TODO(1181417): Fix this.
FatalProcessOutOfMemory("CALL_AND_RETRY_LAST");
return nullptr;
}
``````

The default value for `alignment` is `kWordAligned`. Reading the docs in the header it says that this function will try to perform an allocation of size `88` in the `MAP_SPACE` and if it fails a full GC will be performed and the allocation retried. Lets take a look at `AllocateRawWithLigthRetry`:

``````  AllocationResult alloc = AllocateRaw(size, space, alignment);
``````

`AllocateRaw` can be found in `src/heap/heap-inl.h`. There are different paths that will be taken depending on the `space` parameteter. Since it is `MAP_SPACE` in our case we will focus on that path:

``````AllocationResult Heap::AllocateRaw(int size_in_bytes, AllocationSpace space, AllocationAlignment alignment) {
...
HeapObject* object = nullptr;
AllocationResult allocation;
if (OLD_SPACE == space) {
...
} else if (MAP_SPACE == space) {
allocation = map_space_->AllocateRawUnaligned(size_in_bytes);
}
...
}
``````

`map_space_` is a private member of Heap (src/heap/heap.h):

``````MapSpace* map_space_;
``````

`AllocateRawUnaligned` can be found in `src/heap/spaces-inl.h`:

``````AllocationResult PagedSpace::AllocateRawUnaligned( int size_in_bytes, UpdateSkipList update_skip_list) {
if (!EnsureLinearAllocationArea(size_in_bytes)) {
return AllocationResult::Retry(identity());
}

HeapObject* object = AllocateLinearly(size_in_bytes);
return object;
}
``````

The default value for `update_skip_list` is `UPDATE_SKIP_LIST`. So lets take a look at `AllocateLinearly`:

``````HeapObject* PagedSpace::AllocateLinearly(int size_in_bytes) {
Address new_top = current_top + size_in_bytes;
allocation_info_.set_top(new_top);
}
``````

Recall that `size_in_bytes` in our case is `88`.

``````(lldb) expr current_top
(lldb) expr new_top
(lldb) expr new_top - current_top
(unsigned long) \$7 = 88
``````

Notice that first the top is set to the new_top and then the current_top is returned and that will be a pointer to the start of the object in memory (which in this case is of v8::internal::Map which is also of type HeapObject). I've been wondering why Map (and other HeapObject) don't have any member fields and only/mostly getters/setters for the various fields that make up an object. Well the answer is that pointers to instances of for example Map point to the first memory location of the instance. And the getters/setter functions use indexed to read/write to memory locations. The indexes are mostly in the form of enum fields that define the memory layout of the type.

Next, in `AllocateRawUnaligned` we have the `MSAN_ALLOCATED_UNINITIALIZED_MEMORY` macro:

``````  MSAN_ALLOCATED_UNINITIALIZED_MEMORY(object->address(), size_in_bytes);
``````

`MSAN_ALLOCATED_UNINITIALIZED_MEMORY` can be found in `src/msan.h` and `ms` stands for `Memory Sanitizer` and would only be used if `V8_US_MEMORY_SANITIZER` is defined. The returned `object` will be used to construct an `AllocationResult` when returned. Back in `AllocateRaw` we have:

``````if (allocation.To(&object)) {
...
OnAllocationEvent(object, size_in_bytes);
}

return allocation;
``````

This will return us in `AllocateRawWithLightRetry`:

``````AllocationResult alloc = AllocateRaw(size, space, alignment);
if (alloc.To(&result)) {
DCHECK(result != exception());
return result;
}
``````

This will return us back in `AllocateRawWithRetryOrFail`:

``````  HeapObject* result = AllocateRawWithLigthRetry(size, space, alignment);
if (result) return result;
``````

And that return will return to `NewMap` in `src/heap/factory.cc`:

``````  result->set_map_after_allocation(*meta_map(), SKIP_WRITE_BARRIER);
return handle(InitializeMap(Map::cast(result), type, instance_size,
elements_kind, inobject_properties),
isolate());
``````

`InitializeMap`:

``````  map->set_instance_type(type);
map->set_prototype(*null_value(), SKIP_WRITE_BARRIER);
map->set_constructor_or_backpointer(*null_value(), SKIP_WRITE_BARRIER);
map->set_instance_size(instance_size);
if (map->IsJSObjectMap()) {
map->SetInObjectPropertiesStartInWords(instance_size / kPointerSize - inobject_properties);
DCHECK_EQ(map->GetInObjectProperties(), inobject_properties);
map->set_prototype_validity_cell(*invalid_prototype_validity_cell());
} else {
DCHECK_EQ(inobject_properties, 0);
map->set_inobject_properties_start_or_constructor_function_index(0);
map->set_prototype_validity_cell(Smi::FromInt(Map::kPrototypeChainValid));
}
map->set_dependent_code(DependentCode::cast(*empty_fixed_array()), SKIP_WRITE_BARRIER);
map->set_weak_cell_cache(Smi::kZero);
map->set_raw_transitions(MaybeObject::FromSmi(Smi::kZero));
map->SetInObjectUnusedPropertyFields(inobject_properties);
map->set_instance_descriptors(*empty_descriptor_array());

map->set_visitor_id(Map::GetVisitorId(map));
map->set_bit_field(0);
int bit_field3 = Map::EnumLengthBits::encode(kInvalidEnumCacheSentinel) |
Map::OwnsDescriptorsBit::encode(true) |
Map::ConstructionCounterBits::encode(Map::kNoSlackTracking);
map->set_bit_field3(bit_field3);
map->set_elements_kind(elements_kind); //HOLEY_ELEMENTS
map->set_new_target_is_base(true);
isolate()->counters()->maps_created()->Increment();
if (FLAG_trace_maps) LOG(isolate(), MapCreate(map));
return map;
``````

Creating a new map (map_test.cc:

``````  i::Handle<i::Map> map = i::Map::Create(asInternal(isolate_), 10);
std::cout << map->instance_type() << '\n';
``````

`Map::Create` can be found in objects.cc:

``````Handle<Map> Map::Create(Isolate* isolate, int inobject_properties) {
Handle<Map> copy = Copy(handle(isolate->object_function()->initial_map()), "MapCreate");
``````

So, the first thing that will happen is `isolate->object_function()` will be called. This is function that is generated by the preprocessor.

``````// from src/context.h
#define NATIVE_CONTEXT_FIELDS(V)                                               \
...                                                                          \
V(OBJECT_FUNCTION_INDEX, JSFunction, object_function)                        \

// from src/isolate.h
#define NATIVE_CONTEXT_FIELD_ACCESSOR(index, type, name)     \
Handle<type> Isolate::name() {                             \
return Handle<type>(raw_native_context()->name(), this); \
}                                                          \
bool Isolate::is_##name(type* value) {                     \
return raw_native_context()->is_##name(value);           \
}
NATIVE_CONTEXT_FIELDS(NATIVE_CONTEXT_FIELD_ACCESSOR)
``````

`object_function()` will become:

``````  Handle<JSFunction> Isolate::object_function() {
return Handle<JSFunction>(raw_native_context()->object_function(), this);
}
``````

Lets look closer at `JSFunction::initial_map()` in in object-inl.h:

``````Map* JSFunction::initial_map() {
return Map::cast(prototype_or_initial_map());
}
``````

`prototype_or_initial_map` is generated by a macro:

``````ACCESSORS_CHECKED(JSFunction, prototype_or_initial_map, Object,
kPrototypeOrInitialMapOffset, map()->has_prototype_slot())
``````

`ACCESSORS_CHECKED` can be found in `src/objects/object-macros.h`:

``````#define ACCESSORS_CHECKED(holder, name, type, offset, condition) \
ACCESSORS_CHECKED2(holder, name, type, offset, condition, condition)

#define ACCESSORS_CHECKED2(holder, name, type, offset, get_condition, \
set_condition)                             \
type* holder::name() const {                                        \
type* value = type::cast(READ_FIELD(this, offset));               \
DCHECK(get_condition);                                            \
return value;                                                     \
}                                                                   \
void holder::set_##name(type* value, WriteBarrierMode mode) {       \
DCHECK(set_condition);                                            \
WRITE_FIELD(this, offset, value);                                 \
CONDITIONAL_WRITE_BARRIER(GetHeap(), this, offset, value, mode);  \
}

``````

The preprocessor will expand `prototype_or_initial_map` to:

``````  JSFunction* JSFunction::prototype_or_initial_map() const {
JSFunction* value = JSFunction::cast(
(*reinterpret_cast<Object* const*>(
DCHECK(map()->has_prototype_slot());
return value;
}
``````

Notice that `map()->has_prototype_slot())` will be called first which looks like this:

``````Map* HeapObject::map() const {
return map_word().ToMap();
}
``````

``````MapWord HeapObject::map_word() const {
return MapWord(
}
``````

First thing that will happen is `RELAXED_READ_FIELD(this, kMapOffset)`

``````#define RELAXED_READ_FIELD(p, offset)           \

``````

This will get expanded by the preprocessor to:

``````  reinterpret_cast<Object*>(base::Relaxed_Load(
reinterpret_cast<const base::AtomicWord*>(
``````

`src/base/atomicops_internals_portable.h`:

``````inline Atomic8 Relaxed_Load(volatile const Atomic8* ptr) {
}
``````

So this will do an atomoic load of the ptr with the memory order of __ATOMIC_RELELAXED.

`ACCESSORS_CHECKED` also generates a `set_prototyp_or_initial_map`:

``````  void JSFunction::set_prototype_or_initial_map(JSFunction* value, WriteBarrierMode mode) {
DCHECK(map()->has_prototype_slot());
WRITE_FIELD(this, kPrototypeOrInitialMapOffset, value);
CONDITIONAL_WRITE_BARRIER(GetHeap(), this, kPrototypeOrInitialMapOffset, value, mode);
}
``````

What does `WRITE_FIELD` do?

``````#define WRITE_FIELD(p, offset, value)                             \
base::Relaxed_Store(                                            \
reinterpret_cast<base::AtomicWord>(value));
``````

Which would expand into:

``````  base::Relaxed_Store(                                            \
reinterpret_cast<base::AtomicWord*>(
reinterpret_cast<base::AtomicWord>(value));
``````

Lets take a look at what `instance_type` does:

``````InstanceType Map::instance_type() const {
}
``````

To see what the above is doing we can do the same thing in the debugger: Note that I got `11` below from `map->kInstanceTypeOffset - i::kHeapObjectTag`

``````(lldb) memory read -f u -c 1 -s 8 `*map + 11`
0x6d4e6609ed4: 585472345729139745
(lldb) expr static_cast<InstanceType>(585472345729139745)
(v8::internal::InstanceType) \$34 = JS_OBJECT_TYPE
``````

Take `map->has_non_instance_prototype()`:

``````(lldb) br s -n has_non_instance_prototype
(lldb) expr -i 0 -- map->has_non_instance_prototype()
``````

The above command will break in `src/objects/map-inl.h`:

``````BIT_FIELD_ACCESSORS(Map, bit_field, has_non_instance_prototype, Map::HasNonInstancePrototypeBit)

// src/objects/object-macros.h
#define BIT_FIELD_ACCESSORS(holder, field, name, BitField)      \
typename BitField::FieldType holder::name() const {           \
return BitField::decode(field());                           \
}                                                             \
void holder::set_##name(typename BitField::FieldType value) { \
set_##field(BitField::update(field(), value));              \
}
``````

The preprocessor will expand that to:

``````  typename Map::HasNonInstancePrototypeBit::FieldType Map::has_non_instance_prototype() const {
return Map::HasNonInstancePrototypeBit::decode(bit_field());
}                                                             \
void holder::set_has_non_instance_prototype(typename BitField::FieldType value) { \
set_bit_field(Map::HasNonInstancePrototypeBit::update(bit_field(), value));              \
}
``````

So where can we find `Map::HasNonInstancePrototypeBit`?
It is generated by a macro in `src/objects/map.h`:

``````// Bit positions for |bit_field|.
#define MAP_BIT_FIELD_FIELDS(V, _)          \
V(HasNonInstancePrototypeBit, bool, 1, _) \
...
DEFINE_BIT_FIELDS(MAP_BIT_FIELD_FIELDS)
#undef MAP_BIT_FIELD_FIELDS

#define DEFINE_BIT_FIELDS(LIST_MACRO) \
DEFINE_BIT_RANGES(LIST_MACRO)       \
LIST_MACRO(DEFINE_BIT_FIELD_TYPE, LIST_MACRO##_Ranges)

#define DEFINE_BIT_RANGES(LIST_MACRO)                               \
struct LIST_MACRO##_Ranges {                                      \
enum { LIST_MACRO(DEFINE_BIT_FIELD_RANGE_TYPE, _) kBitsCount }; \
};

#define DEFINE_BIT_FIELD_RANGE_TYPE(Name, Type, Size, _) \
k##Name##Start, k##Name##End = k##Name##Start + Size - 1,
``````

Alright, lets see what preprocessor expands that to:

``````  struct MAP_BIT_FIELD_FIELDS_Ranges {
enum {
kHasNonInstancePrototypeBitStart,
kHasNonInstancePrototypeBitEnd = kHasNonInstancePrototypeBitStart + 1 - 1,
... // not showing the rest of the entries.
kBitsCount
};
};
``````

So this would create a struct with an enum and it could be accessed using: `i::Map::MAP_BIT_FIELD_FIELDS_Ranges::kHasNonInstancePrototypeBitStart` The next part of the macro is

``````  LIST_MACRO(DEFINE_BIT_FIELD_TYPE, LIST_MACRO##_Ranges)

#define DEFINE_BIT_FIELD_TYPE(Name, Type, Size, RangesName) \
typedef BitField<Type, RangesName::k##Name##Start, Size> Name;
``````

Which will get expanded to:

``````  typedef BitField<HasNonInstancePrototypeBit, MAP_BIT_FIELD_FIELDS_Ranges::kHasNonInstancePrototypeBitStart, 1> HasNonInstancePrototypeBit;
``````

So this is how `HasNonInstancePrototypeBit` is declared and notice that it is of type `BitField` which can be found in `src/utils.h`:

``````template<class T, int shift, int size>
class BitField : public BitFieldBase<T, shift, size, uint32_t> { };

template<class T, int shift, int size, class U>
class BitFieldBase {
public:
typedef T FieldType;
``````

Map::HasNonInstancePrototypeBit::decode(bit_field()); first bit_field is called:

``````byte Map::bit_field() const { return READ_BYTE_FIELD(this, kBitFieldOffset); }
``````

And the result of that is passed to `Map::HasNonInstancePrototypeBit::decode`:

``````(lldb) br s -n bit_field
(lldb) expr -i 0 --  map->bit_field()
``````
``````byte Map::bit_field() const { return READ_BYTE_FIELD(this, kBitFieldOffset); }
``````

So, `this` is the current Map instance, and we are going to read from.

``````#define READ_BYTE_FIELD(p, offset) \

``````

Which will get expanded to:

``````byte Map::bit_field() const {
return *reinterpret_cast<const byte*>(
}
``````

The instance_size is the instance_size_in_words << kPointerSizeLog2 (3 on my machine):

``````(lldb) memory read -f x -s 1 -c 1 *map+8
0x24d1cd509ed1: 0x03
(lldb) expr 0x03 << 3
(int) \$2 = 24
(lldb) expr map->instance_size()
(int) \$3 = 24
``````

`i::HeapObject::kHeaderSize` is 8 on my system which is used in the `DEFINE_FIELD_OFFSET_CONSTANTS:

``````#define MAP_FIELDS(V)
V(kInstanceSizeInWordsOffset, kUInt8Size)
V(kInObjectPropertiesStartOrConstructorFunctionIndexOffset, kUInt8Size)
...
``````

So we can use this information to read the `inobject_properties_start_or_constructor_function_index` directly from memory using:

``````(lldb) expr map->inobject_properties_start_or_constructor_function_index()
(lldb) memory read -f x -s 1 -c 1 map+9
error: address expression "map+9" evaluation failed
(lldb) memory read -f x -s 1 -c 1 *map+9
0x17b027209ed2: 0x03
``````

Inspect the visitor_id (which is the last of the first byte):

``````lldb) memory read -f x -s 1 -c 1 *map+10
0x17b027209ed3: 0x15
(lldb) expr (int) 0x15
(int) \$8 = 21
(lldb) expr map->visitor_id()
(v8::internal::VisitorId) \$11 = kVisitJSObjectFast
(lldb) expr (int) \$11
(int) \$12 = 21
``````

Inspect the instance_type (which is part of the second byte):

``````(lldb) expr map->instance_type()
(v8::internal::InstanceType) \$41 = JS_OBJECT_TYPE
(lldb) expr v8::internal::InstanceType::JS_OBJECT_TYPE
(uint16_t) \$35 = 1057
(lldb) memory read -f x -s 2 -c 1 *map+11
0x17b027209ed4: 0x0421
(lldb) expr (int)0x0421
(int) \$40 = 1057
``````

Notice that `instance_type` is a short so that will take up 2 bytes

``````(lldb) expr map->has_non_instance_prototype()
(bool) \$60 = false
(lldb) expr map->is_callable()
(bool) \$46 = false
(lldb) expr map->has_named_interceptor()
(bool) \$51 = false
(lldb) expr map->has_indexed_interceptor()
(bool) \$55 = false
(lldb) expr map->is_undetectable()
(bool) \$56 = false
(lldb) expr map->is_access_check_needed()
(bool) \$57 = false
(lldb) expr map->is_constructor()
(bool) \$58 = false
(lldb) expr map->has_prototype_slot()
(bool) \$59 = false
``````

Verify that the above is correct:

``````(lldb) expr map->has_non_instance_prototype()
(bool) \$44 = false
(lldb) memory read -f x -s 1 -c 1 *map+13
0x17b027209ed6: 0x00

(lldb) expr map->set_has_non_instance_prototype(true)
(lldb) memory read -f x -s 1 -c 1 *map+13
0x17b027209ed6: 0x01

(lldb) expr map->set_has_prototype_slot(true)
(lldb) memory read -f x -s 1 -c 1 *map+13
0x17b027209ed6: 0x81
``````

Inspect second int field (bit_field2):

``````(lldb) memory read -f x -s 1 -c 1 *map+14
0x17b027209ed7: 0x19
(lldb) expr map->is_extensible()
(bool) \$78 = true
(lldb) expr -- 0x19 & (1 << 0)
(bool) \$90 = 1

(lldb) expr map->is_prototype_map()
(bool) \$79 = false

(lldb) expr map->is_in_retained_map_list()
(bool) \$80 = false

(lldb) expr map->elements_kind()
(v8::internal::ElementsKind) \$81 = HOLEY_ELEMENTS
(lldb) expr v8::internal::ElementsKind::HOLEY_ELEMENTS
(int) \$133 = 3
(lldb) expr  0x19 >> 3
(int) \$134 = 3
``````

Inspect third int field (bit_field3):

``````(lldb) memory read -f b -s 4 -c 1 *map+15
0x17b027209ed8: 0b00001000001000000000001111111111
(lldb) memory read -f x -s 4 -c 1 *map+15
0x17b027209ed8: 0x082003ff
``````

So we know that a Map instance is a pointer allocated by the Heap and with a specific size. Fields are accessed using indexes (remember there are no member fields in the Map class). We also know that all HeapObject have a Map. The Map is sometimes referred to as the HiddenClass and sometimes the shape of an object. If two objects have the same properties they would share the same Map. This makes sense and I've see blog post that show this but I'd like to verify this to fully understand it. I'm going to try to match https://v8project.blogspot.com/2017/08/fast-properties.html with the code.

So, lets take a look at adding a property to a JSObject. We start by creating a new Map and then use it to create a new JSObject:

``````  i::Handle<i::Map> map = factory->NewMap(i::JS_OBJECT_TYPE, 32);
i::Handle<i::JSObject> js_object = factory->NewJSObjectFromMap(map);

i::Handle<i::String> prop_name = factory->InternalizeUtf8String("prop_name");
i::Handle<i::String> prop_value = factory->InternalizeUtf8String("prop_value");
``````

Lets take a closer look at `AddProperty` and how it interacts with the Map. This function can be found in `src/objects.cc`:

``````void JSObject::AddProperty(Handle<JSObject> object, Handle<Name> name,
Handle<Object> value,
PropertyAttributes attributes) {
LookupIterator it(object, name, object, LookupIterator::OWN_SKIP_INTERCEPTOR);
CHECK_NE(LookupIterator::ACCESS_CHECK, it.state());
``````

First we have the LookupIterator constructor (`src/lookup.h`) but since this is a new property which we know does not exist it will not find any property.

``````CHECK(AddDataProperty(&it, value, attributes, kThrowOnError,
CERTAINLY_NOT_STORE_FROM_KEYED)
.IsJust());
``````
``````  Handle<JSReceiver> receiver = it->GetStoreTarget<JSReceiver>();
...
it->UpdateProtector();
// Migrate to the most up-to-date map that will be able to store |value|
// under it->name() with |attributes|.
DCHECK_EQ(LookupIterator::TRANSITION, it->state());

// Write the property value.
it->WriteDataValue(value, true);
``````

`PrepareTransitionToDataProperty`:

``````  Representation representation = value->OptimalRepresentation();
Handle<FieldType> type = value->OptimalType(isolate, representation);
maybe_map = Map::CopyWithField(map, name, type, attributes, constness,
representation, flag);
``````

`Map::CopyWithField`:

``````  Descriptor d = Descriptor::DataField(name, index, attributes, constness, representation, wrapped_type);
``````

Lets take a closer look the Decriptor which can be found in `src/property.cc`:

``````Descriptor Descriptor::DataField(Handle<Name> key, int field_index,
PropertyAttributes attributes,
PropertyConstness constness,
Representation representation,
MaybeObjectHandle wrapped_field_type) {
DCHECK(wrapped_field_type->IsSmi() || wrapped_field_type->IsWeakHeapObject());
PropertyDetails details(kData, attributes, kField, constness, representation,
field_index);
return Descriptor(key, wrapped_field_type, details);
}
``````

`Descriptor` is declared in `src/property.h` and describes the elements in a instance-descriptor array. These are returned when calling `map->instance_descriptors()`. Let check some of the arguments:

``````(lldb) job *key
#prop_name
(lldb) expr attributes
(v8::internal::PropertyAttributes) \$27 = NONE
(lldb) expr constness
(v8::internal::PropertyConstness) \$28 = kMutable
(lldb) expr representation
(v8::internal::Representation) \$29 = (kind_ = '\b')
``````

The Descriptor class contains three members:

`````` private:
Handle<Name> key_;
MaybeObjectHandle value_;
PropertyDetails details_;
``````

Lets take a closer look `PropertyDetails` which only has a single member named `value_`

``````  uint32_t value_;
``````

It also declares a number of classes the extend BitField, for example:

``````class KindField : public BitField<PropertyKind, 0, 1> {};
class LocationField : public BitField<PropertyLocation, KindField::kNext, 1> {};
class ConstnessField : public BitField<PropertyConstness, LocationField::kNext, 1> {};
class AttributesField : public BitField<PropertyAttributes, ConstnessField::kNext, 3> {};
class PropertyCellTypeField : public BitField<PropertyCellType, AttributesField::kNext, 2> {};
class DictionaryStorageField : public BitField<uint32_t, PropertyCellTypeField::kNext, 23> {};

// Bit fields for fast objects.
class RepresentationField : public BitField<uint32_t, AttributesField::kNext, 4> {};
class DescriptorPointer : public BitField<uint32_t, RepresentationField::kNext, kDescriptorIndexBitCount> {};
class FieldIndexField : public BitField<uint32_t, DescriptorPointer::kNext, kDescriptorIndexBitCount> {

enum PropertyKind { kData = 0, kAccessor = 1 };
enum PropertyLocation { kField = 0, kDescriptor = 1 };
enum class PropertyConstness { kMutable = 0, kConst = 1 };
enum PropertyAttributes {
NONE = ::v8::None,
DONT_ENUM = ::v8::DontEnum,
DONT_DELETE = ::v8::DontDelete,
SEALED = DONT_DELETE,
ABSENT = 64,  // Used in runtime to indicate a property is absent.
// ABSENT can never be stored in or returned from a descriptor's attributes
// bitfield.  It is only used as a return value meaning the attributes of
// a non-existent property.
};
enum class PropertyCellType {
// Meaningful when a property cell does not contain the hole.
kUndefined,     // The PREMONOMORPHIC of property cells.
kConstant,      // Cell has been assigned only once.
kConstantType,  // Cell has been assigned only one type.
kMutable,       // Cell will no longer be tracked as constant.
// Meaningful when a property cell contains the hole.
kUninitialized = kUndefined,  // Cell has never been initialized.
kInvalidated = kConstant,     // Cell has been deleted, invalidated or never
// existed.
// For dictionaries not holding cells.
kNoCell = kMutable,
};

template<class T, int shift, int size>
class BitField : public BitFieldBase<T, shift, size, uint32_t> { };
``````

The Type T of KindField will be `PropertyKind`, the `shift` will be 0 , and the `size` 1. Notice that `LocationField` is using `KindField::kNext` as its shift. This is a static class constant of type `uint32_t` and is defined as:

``````static const U kNext = kShift + kSize;
``````

So `LocationField` would get the value from KindField which should be:

``````class LocationField : public BitField<PropertyLocation, 1, 1> {};
``````

The constructor for PropertyDetails looks like this:

``````PropertyDetails(PropertyKind kind, PropertyAttributes attributes, PropertyCellType cell_type, int dictionary_index = 0) {
value_ = KindField::encode(kind) | LocationField::encode(kField) |
AttributesField::encode(attributes) |
DictionaryStorageField::encode(dictionary_index) |
PropertyCellTypeField::encode(cell_type);
}
``````

So what does KindField::encode(kind) actualy do then?

``````(lldb) expr static_cast<uint32_t>(kind())
(uint32_t) \$36 = 0
(lldb) expr static_cast<uint32_t>(kind()) << 0
(uint32_t) \$37 = 0
``````

This value is later returned by calling `kind()`:

``````PropertyKind kind() const { return KindField::decode(value_); }
``````

So we have all this information about this property, its type (Representation), constness, if it is read-only, enumerable, deletable, sealed, frozen. After that little detour we are back in `Descriptor::DataField`:

``````  return Descriptor(key, wrapped_field_type, details);
``````

Here we are using the key (name of the property), the wrapped_field_type, and PropertyDetails we created. What is `wrapped_field_type` again?
If we back up a few frames back into `Map::TransitionToDataProperty` we can see that the type passed in is taken from the following code:

``````  Representation representation = value->OptimalRepresentation();
Handle<FieldType> type = value->OptimalType(isolate, representation);
``````

So this is only taking the type of the field:

``````(lldb) expr representation.kind()
(v8::internal::Representation::Kind) \$51 = kHeapObject
``````

This makes sense as the map only deals with the shape of the propery and not the value. Next in `Map::CopyWithField` we have:

``````  Handle<Map> new_map = Map::CopyAddDescriptor(map, &d, flag);
``````

`CopyAddDescriptor` does:

``````  Handle<DescriptorArray> descriptors(map->instance_descriptors());

int nof = map->NumberOfOwnDescriptors();
Handle<DescriptorArray> new_descriptors = DescriptorArray::CopyUpTo(descriptors, nof, 1);
new_descriptors->Append(descriptor);

Handle<LayoutDescriptor> new_layout_descriptor =
FLAG_unbox_double_fields
? LayoutDescriptor::New(map, new_descriptors, nof + 1)
: handle(LayoutDescriptor::FastPointerLayout(), map->GetIsolate());

return CopyReplaceDescriptors(map, new_descriptors, new_layout_descriptor,
SIMPLE_PROPERTY_TRANSITION);
``````

Lets take a closer look at `LayoutDescriptor`

``````(lldb) expr new_layout_descriptor->Print()
Layout descriptor: <all tagged>
``````

TODO: Take a closer look at LayoutDescritpor

Later when actually adding the value in `Object::AddDataProperty`:

``````  it->WriteDataValue(value, true);
``````

This call will end up in `src/lookup.cc` and in our case the path will be the following call:

``````  JSObject::cast(*holder)->WriteToField(descriptor_number(), property_details_, *value);
``````

TODO: Take a closer look at LookupIterator. `WriteToField` can be found in `src/objects-inl.h`:

``````  FieldIndex index = FieldIndex::ForDescriptor(map(), descriptor);
``````

`FieldIndex::ForDescriptor` can be found in `src/field-index-inl.h`:

``````inline FieldIndex FieldIndex::ForDescriptor(const Map* map, int descriptor_index) {
PropertyDetails details = map->instance_descriptors()->GetDetails(descriptor_index);
int field_index = details.field_index();
return ForPropertyIndex(map, field_index, details.representation());
}
``````

Notice that this is calling `instance_descriptors()` on the passed-in map. This as we recall from earlier returns and DescriptorArray (which is a type of WeakFixedArray). A Descriptor array

Our DecsriptorArray only has one entry:

``````(lldb) expr map->instance_descriptors()->number_of_descriptors()
(int) \$6 = 1
(lldb) expr map->instance_descriptors()->GetKey(0)->Print()
#prop_name
(lldb) expr map->instance_descriptors()->GetFieldIndex(0)
(int) \$11 = 0
``````

We can also use `Print` on the DescriptorArray:

``````lldb) expr map->instance_descriptors()->Print()

[0]: #prop_name (data field 0:h, p: 0, attrs: [WEC]) @ Any
``````

In our case we are accessing the PropertyDetails and then getting the `field_index` which I think tells us where in the object the value for this property is stored. The last call in `ForDescriptor` is `ForProperty:

``````inline FieldIndex FieldIndex::ForPropertyIndex(const Map* map,
int property_index,
Representation representation) {
int inobject_properties = map->GetInObjectProperties();
bool is_inobject = property_index < inobject_properties;
int first_inobject_offset;
int offset;
if (is_inobject) {
first_inobject_offset = map->GetInObjectPropertyOffset(0);
offset = map->GetInObjectPropertyOffset(property_index);
} else {
property_index -= inobject_properties;
offset = FixedArray::kHeaderSize + property_index * kPointerSize;
}
Encoding encoding = FieldEncoding(representation);
return FieldIndex(is_inobject, offset, encoding, inobject_properties,
first_inobject_offset);
}
``````

I was expecting `inobject_propertis` to be 1 here but it is 0:

``````(lldb) expr inobject_properties
(int) \$14 = 0
``````

Why is that, what am I missing?
These in-object properties are stored directly on the object instance and not do not use the properties array. All get back to an example of this later to clarify this. TODO: Add in-object properties example.

Back in `JSObject::WriteToField`:

``````  RawFastPropertyAtPut(index, value);
``````
``````void JSObject::RawFastPropertyAtPut(FieldIndex index, Object* value) {
if (index.is_inobject()) {
int offset = index.offset();
WRITE_FIELD(this, offset, value);
WRITE_BARRIER(GetHeap(), this, offset, value);
} else {
property_array()->set(index.outobject_array_index(), value);
}
}
``````

In our case we know that the index is not inobject()

``````(lldb) expr index.is_inobject()
(bool) \$18 = false
``````

So, `property_array()->set()` will be called.

``````(lldb) expr this
(v8::internal::JSObject *) \$21 = 0x00002c31c6a88b59
``````

JSObject inherits from JSReceiver which is where the property_array() function is declared.

``````  inline PropertyArray* property_array() const;
``````
``````(lldb) expr property_array()->Print()
0x2c31c6a88bb1: [PropertyArray]
- map: 0x2c31f5603e21 <Map>
- length: 3
- hash: 0
0: 0x2c31f56025a1 <Odd Oddball: uninitialized>
1-2: 0x2c31f56026f1 <undefined>
(lldb) expr index.outobject_array_index()
(int) \$26 = 0
(lldb) expr value->Print()
#prop_value
``````

Looking at the above values printed we should see the property be written to entry 0.

``````(lldb) expr property_array()->get(0)->Print()
#uninitialized
// after call to set
(lldb) expr property_array()->get(0)->Print()
#prop_value
``````
``````(lldb) expr map->instance_descriptors()
(v8::internal::DescriptorArray *) \$4 = 0x000039a927082339
``````

So a map has an pointer array of instance of DescriptorArray

``````(lldb) expr map->GetInObjectProperties()
(int) \$19 = 1
``````

Each Map has int that tells us the number of properties it has. This is the number specified when creating a new Map, for example:

``````i::Handle<i::Map> map = i::Map::Create(asInternal(isolate_), 1);
``````

But at this stage we don't really have any properties. The value for a property is associated with the actual instance of the Object. What the Map specifies is index of the value for a particualar property.

#### Creating a Map instance

Lets take a look at when a map is created.

``````(lldb) br s -f map_test.cc -l 63
``````
``````Handle<Map> Factory::NewMap(InstanceType type,
int instance_size,
ElementsKind elements_kind,
int inobject_properties) {
HeapObject* result = isolate()->heap()->AllocateRawWithRetryOrFail(Map::kSize, MAP_SPACE);
result->set_map_after_allocation(*meta_map(), SKIP_WRITE_BARRIER);
return handle(InitializeMap(Map::cast(result), type, instance_size,
elements_kind, inobject_properties),
isolate());
}
``````

We can see that the above is calling `AllocateRawWithRetryOrFail` on the heap instance passing a size of `88` and specifying the `MAP_SPACE`:

``````HeapObject* Heap::AllocateRawWithRetryOrFail(int size, AllocationSpace space,
AllocationAlignment alignment) {
AllocationResult alloc;
HeapObject* result = AllocateRawWithLigthRetry(size, space, alignment);
if (result) return result;

isolate()->counters()->gc_last_resort_from_handles()->Increment();
CollectAllAvailableGarbage(GarbageCollectionReason::kLastResort);
{
AlwaysAllocateScope scope(isolate());
alloc = AllocateRaw(size, space, alignment);
}
if (alloc.To(&result)) {
DCHECK(result != exception());
return result;
}
// TODO(1181417): Fix this.
FatalProcessOutOfMemory("CALL_AND_RETRY_LAST");
return nullptr;
}
``````

The default value for `alignment` is `kWordAligned`. Reading the docs in the header it says that this function will try to perform an allocation of size `88` in the `MAP_SPACE` and if it fails a full GC will be performed and the allocation retried. Lets take a look at `AllocateRawWithLigthRetry`:

``````  AllocationResult alloc = AllocateRaw(size, space, alignment);
``````

`AllocateRaw` can be found in `src/heap/heap-inl.h`. There are different paths that will be taken depending on the `space` parameteter. Since it is `MAP_SPACE` in our case we will focus on that path:

``````AllocationResult Heap::AllocateRaw(int size_in_bytes, AllocationSpace space, AllocationAlignment alignment) {
...
HeapObject* object = nullptr;
AllocationResult allocation;
if (OLD_SPACE == space) {
...
} else if (MAP_SPACE == space) {
allocation = map_space_->AllocateRawUnaligned(size_in_bytes);
}
...
}
``````

`map_space_` is a private member of Heap (src/heap/heap.h):

``````MapSpace* map_space_;
``````

`AllocateRawUnaligned` can be found in `src/heap/spaces-inl.h`:

``````AllocationResult PagedSpace::AllocateRawUnaligned( int size_in_bytes, UpdateSkipList update_skip_list) {
if (!EnsureLinearAllocationArea(size_in_bytes)) {
return AllocationResult::Retry(identity());
}

HeapObject* object = AllocateLinearly(size_in_bytes);
return object;
}
``````

The default value for `update_skip_list` is `UPDATE_SKIP_LIST`. So lets take a look at `AllocateLinearly`:

``````HeapObject* PagedSpace::AllocateLinearly(int size_in_bytes) {
Address new_top = current_top + size_in_bytes;
allocation_info_.set_top(new_top);
}
``````

Recall that `size_in_bytes` in our case is `88`.

``````(lldb) expr current_top
(lldb) expr new_top
(lldb) expr new_top - current_top
(unsigned long) \$7 = 88
``````

Notice that first the top is set to the new_top and then the current_top is returned and that will be a pointer to the start of the object in memory (which in this case is of v8::internal::Map which is also of type HeapObject). I've been wondering why Map (and other HeapObject) don't have any member fields and only/mostly getters/setters for the various fields that make up an object. Well the answer is that pointers to instances of for example Map point to the first memory location of the instance. And the getters/setter functions use indexed to read/write to memory locations. The indexes are mostly in the form of enum fields that define the memory layout of the type.

Next, in `AllocateRawUnaligned` we have the `MSAN_ALLOCATED_UNINITIALIZED_MEMORY` macro:

``````  MSAN_ALLOCATED_UNINITIALIZED_MEMORY(object->address(), size_in_bytes);
``````

`MSAN_ALLOCATED_UNINITIALIZED_MEMORY` can be found in `src/msan.h` and `ms` stands for `Memory Sanitizer` and would only be used if `V8_US_MEMORY_SANITIZER` is defined. The returned `object` will be used to construct an `AllocationResult` when returned. Back in `AllocateRaw` we have:

``````if (allocation.To(&object)) {
...
OnAllocationEvent(object, size_in_bytes);
}

return allocation;
``````

This will return us in `AllocateRawWithLightRetry`:

``````AllocationResult alloc = AllocateRaw(size, space, alignment);
if (alloc.To(&result)) {
DCHECK(result != exception());
return result;
}
``````

This will return us back in `AllocateRawWithRetryOrFail`:

``````  HeapObject* result = AllocateRawWithLigthRetry(size, space, alignment);
if (result) return result;
``````

And that return will return to `NewMap` in `src/heap/factory.cc`:

``````  result->set_map_after_allocation(*meta_map(), SKIP_WRITE_BARRIER);
return handle(InitializeMap(Map::cast(result), type, instance_size,
elements_kind, inobject_properties),
isolate());
``````

`InitializeMap`:

``````  map->set_instance_type(type);
map->set_prototype(*null_value(), SKIP_WRITE_BARRIER);
map->set_constructor_or_backpointer(*null_value(), SKIP_WRITE_BARRIER);
map->set_instance_size(instance_size);
if (map->IsJSObjectMap()) {
map->SetInObjectPropertiesStartInWords(instance_size / kPointerSize - inobject_properties);
DCHECK_EQ(map->GetInObjectProperties(), inobject_properties);
map->set_prototype_validity_cell(*invalid_prototype_validity_cell());
} else {
DCHECK_EQ(inobject_properties, 0);
map->set_inobject_properties_start_or_constructor_function_index(0);
map->set_prototype_validity_cell(Smi::FromInt(Map::kPrototypeChainValid));
}
map->set_dependent_code(DependentCode::cast(*empty_fixed_array()), SKIP_WRITE_BARRIER);
map->set_weak_cell_cache(Smi::kZero);
map->set_raw_transitions(MaybeObject::FromSmi(Smi::kZero));
map->SetInObjectUnusedPropertyFields(inobject_properties);
map->set_instance_descriptors(*empty_descriptor_array());

map->set_visitor_id(Map::GetVisitorId(map));
map->set_bit_field(0);
int bit_field3 = Map::EnumLengthBits::encode(kInvalidEnumCacheSentinel) |
Map::OwnsDescriptorsBit::encode(true) |
Map::ConstructionCounterBits::encode(Map::kNoSlackTracking);
map->set_bit_field3(bit_field3);
map->set_elements_kind(elements_kind); //HOLEY_ELEMENTS
map->set_new_target_is_base(true);
isolate()->counters()->maps_created()->Increment();
if (FLAG_trace_maps) LOG(isolate(), MapCreate(map));
return map;
``````

### Context

Context extends `FixedArray` (`src/context.h`). So an instance of this Context is a FixedArray and we can use Get(index) etc to get entries in the array.

### V8_EXPORT

This can be found in quite a few places in v8 source code. For example:

``````class V8_EXPORT ArrayBuffer : public Object {
``````

What is this?
It is a preprocessor macro which looks like this:

``````#if V8_HAS_ATTRIBUTE_VISIBILITY && defined(V8_SHARED)
# ifdef BUILDING_V8_SHARED
#  define V8_EXPORT __attribute__ ((visibility("default")))
# else
#  define V8_EXPORT
# endif
#else
# define V8_EXPORT
#endif
``````

So we can see that if `V8_HAS_ATTRIBUTE_VISIBILITY`, and `defined(V8_SHARED)`, and also if `BUILDING_V8_SHARED`, `V8_EXPORT` is set to `__attribute__ ((visibility("default"))`. But in all other cases `V8_EXPORT` is empty and the preprocessor does not insert anything (nothing will be there come compile time). But what about the `__attribute__ ((visibility("default"))` what is this?

In the GNU compiler collection (GCC) environment, the term that is used for exporting is visibility. As it applies to functions and variables in a shared object, visibility refers to the ability of other shared objects to call a C/C++ function. Functions with default visibility have a global scope and can be called from other shared objects. Functions with hidden visibility have a local scope and cannot be called from other shared objects.

Visibility can be controlled by using either compiler options or visibility attributes. In your header files, wherever you want an interface or API made public outside the current Dynamic Shared Object (DSO) , place `__attribute__ ((visibility ("default")))` in struct, class and function declarations you wish to make public. With `-fvisibility=hidden`, you are telling GCC that every declaration not explicitly marked with a visibility attribute has a hidden visibility. There is such a flag in build/common.gypi

### ToLocalChecked()

You'll see a few of these calls in the hello_world example:

``````  Local<String> source = String::NewFromUtf8(isolate, js, NewStringType::kNormal).ToLocalChecked();
``````

NewFromUtf8 actually returns a Local wrapped in a MaybeLocal which forces a check to see if the Local<> is empty before using it. NewStringType is an enum which can be kNormalString (k for constant) or kInternalized.

The following is after running the preprocessor (clang -E src/api.cc):

``````# 5961 "src/api.cc"
Local<String> String::NewFromUtf8(Isolate* isolate,
const char* data,
NewStringType type,
int length) {
MaybeLocal<String> result;
if (length == 0) {
result = String::Empty(isolate);
} else if (length > i::String::kMaxLength) {
result = MaybeLocal<String>();
} else {
i::Isolate* i_isolate = reinterpret_cast<internal::Isolate*>(isolate);
i::VMState<v8::OTHER> __state__((i_isolate));
i::RuntimeCallTimerScope _runtime_timer( i_isolate, &i::RuntimeCallStats::API_String_NewFromUtf8);
LOG(i_isolate, ApiEntryCall("v8::" "String" "::" "NewFromUtf8"));
if (length < 0) length = StringLength(data);
i::Handle<i::String> handle_result = NewString(i_isolate->factory(), static_cast<v8::NewStringType>(type), i::Vector<const char>(data, length)) .ToHandleChecked();
result = Utils::ToLocal(handle_result);
};
return result.FromMaybe(Local<String>());;
}
``````

I was wondering where the Utils::ToLocal was defined but could not find it until I found:

``````MAKE_TO_LOCAL(ToLocal, String, String)

#define MAKE_TO_LOCAL(Name, From, To)                                       \
Local<v8::To> Utils::Name(v8::internal::Handle<v8::internal::From> obj) {   \
return Convert<v8::internal::From, v8::To>(obj);                          \
}
``````

The above can be found in `src/api.h`. The same goes for `Local<Object>, Local<String>` etc.

### Small Integers

Reading through v8.h I came accross `// Tag information for Smi` Smi stands for small integers.

A pointer is really just a integer that is treated like a memory address. We can use that memory address to get the start of the data located in that memory slot. But we can also just store an normal value like 18 in it. There might be cases where it does not make sense to store a small integer somewhere in the heap and have a pointer to it, but instead store the value directly in the pointer itself. But that only works for small integers so there needs to be away to know if the value we want is stored in the pointer or if we should follow the value stored to the heap to get the value.

A word on a 64 bit machine is 8 bytes (64 bits) and all of the pointers need to be aligned to multiples of 8. So a pointer could be:

``````1000       = 8
10000      = 16
11000      = 24
100000     = 32
1000000000 = 512
``````

Remember that we are talking about the pointers and not the values store at the memory location they point to. We can see that there are always three bits that are zero in the pointers. So we can use them for something else and just mask them out when using them as pointers.

Tagging involves borrowing one bit of the 32-bit, making it 31-bit and having the leftover bit represent a tag. If the tag is zero then this is a plain value, but if tag is 1 then the pointer must be followed. This does not only have to be for numbers it could also be used for object (I think)

Instead the small integer is represented by the 32 bits plus a pointer to the 64-bit number. V8 needs to know if a value stored in memory represents a 32-bit integer, or if it is really a 64-bit number, in which case it has to follow the pointer to get the complete value. This is where the concept of tagging comes in.

### Properties/Elements

Take the following object:

``````{ firstname: "Jon", lastname: "Doe' }
``````

The above object has two named properties. Named properties differ from integer indexed which is what you have when you are working with arrays.

Memory layout of JavaScript Object:

``````Properties                  JavaScript Object               Elements
+-----------+              +-----------------+         +----------------+
|property1  |<------+      | HiddenClass     |  +----->|                |
+-----------+       |      +-----------------+  |      +----------------+
|...        |       +------| Properties      |  |      | element1       |<------+
+-----------+              +-----------------+  |      +----------------+       |
|...        |              | Elements        |--+      | ...            |       |
+-----------+              +-----------------+         +----------------+       |
|propertyN  | <---------------------+                  | elementN       |       |
+-----------+                       |                  +----------------+       |
|                                           |
|                                           |
|                                           |
Named properties:    { firstname: "Jon", lastname: "Doe' } Indexed Properties: {1: "Jon", 2: "Doe"}
``````

We can see that properies and elements are stored in different data structures. Elements are usually implemented as a plain array and the indexes can be used for fast access to the elements. But for the properties this is not the case. Instead there is a mapping between the property names and the index into the properties.

In `src/objects/objects.h` we can find JSObject:

``````class JSObject: public JSReceiver {
...
DECL_ACCESSORS(elements, FixedArrayBase)
``````

And looking a the `DECL_ACCESSOR` macro:

``````#define DECL_ACCESSORS(name, type)    \
inline type* name() const;          \
inline void set_##name(type* value, \
WriteBarrierMode mode = UPDATE_WRITE_BARRIER);

inline FixedArrayBase* name() const;
inline void set_elements(FixedArrayBase* value, WriteBarrierMode = UPDATE_WRITE_BARRIER)
``````

Notice that JSObject extends JSReceiver which is extended by all types that can have properties defined on them. I think this includes all JSObjects and JSProxy. It is in JSReceiver that the we find the properties array:

``````DECL_ACCESSORS(raw_properties_or_hash, Object)
``````

Now properties (named properties not elements) can be of different kinds internally. These work just like simple dictionaries from the outside but a dictionary is only used in certain curcumstances at runtime.

``````Properties                  JSObject                    HiddenClass (Map)
+-----------+              +-----------------+         +----------------+
|property1  |<------+      | HiddenClass     |-------->| bit field1     |
+-----------+       |      +-----------------+         +----------------+
|...        |       +------| Properties      |         | bit field2     |
+-----------+              +-----------------+         +----------------+
|...        |              | Elements        |         | bit field3     |
+-----------+              +-----------------+         +----------------+
|propertyN  |              | property1       |
+-----------+              +-----------------+
| property2       |
+-----------------+
| ...             |
+-----------------+
``````

#### JSObject

Each JSObject has as its first field a pointer to the generated HiddenClass. A hiddenclass contain mappings from property names to indices into the properties data type. When an instance of JSObject is created a `Map` is passed in. As mentioned earlier JSObject inherits from JSReceiver which inherits from HeapObject

For example,in jsobject_test.cc we first create a new Map using the internal Isolate Factory:

``````v8::internal::Handle<v8::internal::Map> map = factory->NewMap(v8::internal::JS_OBJECT_TYPE, 24);
v8::internal::Handle<v8::internal::JSObject> js_object = factory->NewJSObjectFromMap(map);
EXPECT_TRUE(js_object->HasFastProperties());
``````

When we call `js_object->HasFastProperties()` this will delegate to the map instance:

``````return !map()->is_dictionary_map();
``````

How do you add a property to a JSObject instance? Take a look at jsobject_test.cc for an example.

### Caching

Are ways to optimize polymorphic function calls in dynamic languages, for example JavaScript.

#### Lookup caches

Sending a message to a receiver requires the runtime to find the correct target method using the runtime type of the receiver. A lookup cache maps the type of the receiver/message name pair to methods and stores the most recently used lookup results. The cache is first consulted and if there is a cache miss a normal lookup is performed and the result stored in the cache.

#### Inline caches

Using a lookup cache as described above still takes a considerable amount of time since the cache must be probed for each message. It can be observed that the type of the target does often not vary. If a call to type A is done at a particular call site it is very likely that the next time it is called the type will also be A. The method address looked up by the system lookup routine can be cached and the call instruction can be overwritten. Subsequent calls for the same type can jump directly to the cached method and completely avoid the lookup. The prolog of the called method must verify that the receivers type has not changed and do the lookup if it has changed (the type if incorrect, no longer A for example).

The target methods address is stored in the callers code, or "inline" with the callers code, hence the name "inline cache".

If V8 is able to make a good assumption about the type of object that will be passed to a method, it can bypass the process of figuring out how to access the objects properties, and instead use the stored information from previous lookups to the objects hidden class.

#### Polymorfic Inline cache (PIC)

A polymorfic call site is one where there are many equally likely receiver types (and thus call targets).

• Monomorfic means there is only one receiver type
• Polymorfic a few receiver types
• Megamorfic very many receiver types

This type of caching extends inline caching to not just cache the last lookup, but cache all lookup results for a given polymorfic call site using a specially generated stub. Lets say we have a method that iterates through a list of types and calls a method. If all the types are the same (monomorfic) a PIC acts just like an inline cache. The calls will directly call the target method (with the method prolog followed by the method body). If a different type exists in the list there will be a cache miss in the prolog and the lookup routine called. In normal inline caching this would rebind the call, replacing the call to this types target method. This would happen each time the type changes.

With PIC the cache miss handler will generate a small stub routine and rebinds the call to this stub. The stub will check if the receiver is of a type that it has seen before and branch to the correct targets. Since the type of the target is already known at this point it can directly branch to the target method body without the need for the prolog. If the type has not been seen before it will be added to the stub to handle that type. Eventually the stub will contain all types used and there will be no more cache misses/lookups.

The problem is that we don't have type information so methods cannot be called directly, but instead be looked up. In a static language a virtual table might have been used. In JavaScript there is no inheritance relationship so it is not possible to know a vtable offset ahead of time. What can be done is to observe and learn about the "types" used in the program. When an object is seen it can be stored and the target of that method call can be stored and inlined into that call. Bascially the type will be checked and if that particular type has been seen before the method can just be invoked directly. But how do we check the type in a dynamic language? The answer is hidden classes which allow the VM to quickly check an object against a hidden class.

The inline caching source are located in `src/ic`.

## --trace-ic

``````\$ out/x64.debug/d8 --trace-ic --trace-maps class.js

before
[TraceMaps: Normalize from= 0x19a314288b89 to= 0x19a31428aff9 reason= NormalizeAsPrototype ]
[TraceMaps: ReplaceDescriptors from= 0x19a31428aff9 to= 0x19a31428b051 reason= CopyAsPrototype ]
[TraceMaps: InitialMap map= 0x19a31428afa1 SFI= 34_Person ]

[StoreIC in ~Person+65 at class.js:2 (0->.) map=0x19a31428afa1 0x10e68ba83361 <String[4]: name>]
[TraceMaps: Transition from= 0x19a31428afa1 to= 0x19a31428b0a9 name= name ]
[StoreIC in ~Person+102 at class.js:3 (0->.) map=0x19a31428b0a9 0x2beaa25abd89 <String[3]: age>]
[TraceMaps: Transition from= 0x19a31428b0a9 to= 0x19a31428b101 name= age ]
[TraceMaps: SlowToFast from= 0x19a31428b051 to= 0x19a31428b159 reason= OptimizeAsPrototype ]
[StoreIC in ~Person+65 at class.js:2 (.->1) map=0x19a31428afa1 0x10e68ba83361 <String[4]: name>]
[StoreIC in ~Person+102 at class.js:3 (.->1) map=0x19a31428b0a9 0x2beaa25abd89 <String[3]: age>]
[LoadIC in ~+546 at class.js:9 (0->.) map=0x19a31428b101 0x10e68ba83361 <String[4]: name>]
[CallIC in ~+571 at class.js:9 (0->1) map=0x0 0x32f481082231 <String[5]: print>]
Daniel
[LoadIC in ~+642 at class.js:10 (0->.) map=0x19a31428b101 0x2beaa25abd89 <String[3]: age>]
[CallIC in ~+667 at class.js:10 (0->1) map=0x0 0x32f481082231 <String[5]: print>]
41
[LoadIC in ~+738 at class.js:11 (0->.) map=0x19a31428b101 0x10e68ba83361 <String[4]: name>]
[CallIC in ~+763 at class.js:11 (0->1) map=0x0 0x32f481082231 <String[5]: print>]
Tilda
[LoadIC in ~+834 at class.js:12 (0->.) map=0x19a31428b101 0x2beaa25abd89 <String[3]: age>]
[CallIC in ~+859 at class.js:12 (0->1) map=0x0 0x32f481082231 <String[5]: print>]
2
[CallIC in ~+927 at class.js:13 (0->1) map=0x0 0x32f481082231 <String[5]: print>]
after
``````

LoadIC (0->.) means that it has transitioned from unititialized state (0) to pre-monomophic state (.) monomorphic state is specified with a `1`. These states can be found in src/ic/ic.cc. What we are doing caching knowledge about the layout of the previously seen object inside the StoreIC/LoadIC calls.

``````\$ lldb -- out/x64.debug/d8 class.js
``````

#### HeapObject

This class describes heap allocated objects. It is in this class we find information regarding the type of object. This information is contained in `v8::internal::Map`.

### v8::internal::Map

`src/objects/map.h`

• `bit_field1`
• `bit_field2`
• `bit field3` contains information about the number of properties that this Map has, a pointer to an DescriptorArray. The DescriptorArray contains information like the name of the property, and the posistion where the value is stored in the JSObject. I noticed that this information available in src/objects/map.h.

#### DescriptorArray

Can be found in src/objects/descriptor-array.h. This class extends FixedArray and has the following entries:

``````[0] the number of descriptors it contains
[1] If uninitialized this will be Smi(0) otherwise an enum cache bridge which is a FixedArray of size 2:
[0] enum cache: FixedArray containing all own enumerable keys
[1] either Smi(0) or a pointer to a FixedArray with indices
[2] first key (and internalized String
[3] first descriptor
``````

### Factory

Each Internal Isolate has a Factory which is used to create instances. This is because all handles needs to be allocated using the factory (src/heap/factory.h)

### Objects

All objects extend the abstract class Object (src/objects/objects.h).

### Oddball

This class extends HeapObject and describes `null`, `undefined`, `true`, and `false` objects.

#### Map

Extends HeapObject and all heap objects have a Map which describes the objects structure. This is where you can find the size of the instance, access to the inobject_properties.

### Compiler pipeline

When a script is compiled all of the top level code is parsed. These are function declarartions (but not the function bodies).

``````function f1() {       <- top level code
console.log('f1');  <- non top level
}

function f2() {       <- top level code
f1();               <- non top level
console.logg('f2'); <- non top level
}

f2();                 <- top level code
var i = 10;           <- top level code
``````

The non top level code must be pre-parsed to check for syntax errors. The top level code is parsed and compiles by the full-codegen compiler. This compiler does not perform any optimizations and it's only task is to generate machine code as quickly as possible (this is pre turbofan)

``````Source ------> Parser  --------> Full-codegen ---------> Unoptimized Machine Code
``````

So the whole script is parsed even though we only generated code for the top-level code. The pre-parse (the syntax checking) was not stored in any way. The functions are lazy stubs that when/if the function gets called the function get compiled. This means that the function has to be parsed (again, the first time was the pre-parse remember).

If a function is determined to be hot it will be optimized by one of the two optimizing compilers crankshaft for older parts of JavaScript or Turbofan for Web Assembly (WASM) and some of the newer es6 features.

The first time V8 sees a function it will parse it into an AST but not do any further processing of that tree until that function is used.

``````                     +-----> Full-codegen -----> Unoptimized code
/                               \/ /\       \
Parser  ------> AST -------> Cranshaft    -----> Optimized code  |
\                                           /
+-----> Turbofan     -----> Optimized code
``````

Inline Cachine (IC) is done here which also help to gather type information. V8 also has a profiler thread which monitors which functions are hot and should be optimized. This profiling also allows V8 to find out information about types using IC. This type information can then be fed to Crankshaft/Turbofan. The type information is stored as a 8 bit value.

When a function is optimized the unoptimized code cannot be thrown away as it might be needed since JavaScript is highly dynamic the optimzed function migth change and the in that case we fallback to the unoptimzed code. This takes up alot of memory which may be important for low end devices. Also the time spent in parsing (twice) takes time.

The idea with Ignition is to be an bytecode interpreter and to reduce memory consumption, the bytecode is very consice compared to native code which can vary depending on the target platform. The whole source can be parsed and compiled, compared to the current pipeline the has the pre-parse and parse stages mentioned above. So even unused functions will get compiled. The bytecode becomes the source of truth instead of as before the AST.

``````Source ------> Parser  --------> Ignition-codegen ---------> Bytecode ---------> Turbofan ----> Optimized Code ---+
/\                                                  |
+--------------------------------------------------+

function bajja(a, b, c) {
var d = c - 100;
return a + d * b;
}

var result = bajja(2, 2, 150);
print(result);

\$ ./d8 test.js --ignition  --print_bytecode

[generating bytecode for function: bajja]
Parameter count 4
Frame size 8
14 E> 0x2eef8d9b103e @    0 : 7f                StackCheck
38 S> 0x2eef8d9b103f @    1 : 03 64             LdaSmi [100]   // load 100
38 E> 0x2eef8d9b1041 @    3 : 2b 02 02          Sub a2, [2]    // a2 is the third argument. a2 is an argument register
0x2eef8d9b1044 @    6 : 1f fa             Star r0        // r0 is a register for local variables. We only have one which is d
47 S> 0x2eef8d9b1046 @    8 : 1e 03             Ldar a1        // LoaD accumulator from Register argument a1 which is b
60 E> 0x2eef8d9b1048 @   10 : 2c fa 03          Mul r0, [3]    // multiply that is our local variable in r0
56 E> 0x2eef8d9b104b @   13 : 2a 04 04          Add a0, [4]    // add that to our argument register 0 which is a
65 S> 0x2eef8d9b104e @   16 : 83                Return         // return the value in the accumulator?
``````

### Abstract Syntax Tree (AST)

In src/ast/ast.h. You can print the ast using the `--print-ast` option for d8.

Lets take the following javascript and look at the ast:

``````const msg = 'testing';
console.log(msg);
``````
``````\$ d8 --print-ast simple.js
[generating interpreter code for user-defined function: ]
--- AST ---
FUNC at 0
. KIND 0
. SUSPEND COUNT 0
. NAME ""
. INFERRED NAME ""
. DECLS
. . VARIABLE (0x7ffe5285b0f8) (mode = CONST) "msg"
. BLOCK NOCOMPLETIONS at -1
. . EXPRESSION STATEMENT at 12
. . . INIT at 12
. . . . VAR PROXY context[4] (0x7ffe5285b0f8) (mode = CONST) "msg"
. . . . LITERAL "testing"
. EXPRESSION STATEMENT at 23
. . ASSIGN at -1
. . . VAR PROXY local[0] (0x7ffe5285b330) (mode = TEMPORARY) ".result"
. . . CALL Slot(0)
. . . . PROPERTY Slot(4) at 31
. . . . . VAR PROXY Slot(2) unallocated (0x7ffe5285b3d8) (mode = DYNAMIC_GLOBAL) "console"
. . . . . NAME log
. . . . VAR PROXY context[4] (0x7ffe5285b0f8) (mode = CONST) "msg"
. RETURN at -1
. . VAR PROXY local[0] (0x7ffe5285b330) (mode = TEMPORARY) ".result"
``````

You can find the declaration of EXPRESSION in ast.h.

### Bytecode

Can be found in `src/interpreter/bytecodes.h`

• StackCheck checks that stack limits are not exceeded to guard against overflow.
• `Star` Store content in accumulator regiser in register (the operand).
• Ldar LoaD accumulator from Register argument a1 which is b

The registers are not machine registers, apart from the accumlator as I understand it, but would instead be stack allocated.

#### Parsing

Parsing is the parsing of the JavaScript and the generation of the abstract syntax tree. That tree is then visited and bytecode generated from it. This section tries to figure out where in the code these operations are performed.

For example, take the script example.

``````\$ make run-script
\$ lldb -- run-script
(lldb) br s -n main
(lldb) r
``````

Lets take a look at the following line:

``````Local<Script> script = Script::Compile(context, source).ToLocalChecked();
``````

This will land us in `api.cc`

``````ScriptCompiler::Source script_source(source);
return ScriptCompiler::Compile(context, &script_source);

MaybeLocal<Script> ScriptCompiler::Compile(Local<Context> context, Source* source, CompileOptions options) {
...
auto isolate = context->GetIsolate();
auto maybe = CompileUnboundInternal(isolate, source, options);
``````

`CompileUnboundInternal` will call `GetSharedFunctionInfoForScript` (in src/compiler.cc):

``````result = i::Compiler::GetSharedFunctionInfoForScript(
str, name_obj, line_offset, column_offset, source->resource_options,
source_map_url, isolate->native_context(), NULL, &script_data, options,
i::NOT_NATIVES_CODE);

(lldb) br s -f compiler.cc -l 1259

LanguageMode language_mode = construct_language_mode(FLAG_use_strict);
(lldb) p language_mode
(v8::internal::LanguageMode) \$10 = SLOPPY
``````

`LanguageMode` can be found in src/globals.h and it is an enum with three values:

``````enum LanguageMode : uint32_t { SLOPPY, STRICT, LANGUAGE_END };
``````

`SLOPPY` mode, I assume, is the mode when there is no `"use strict";`. Remember that this can go inside a function and does not have to be at the top level of the file.

``````ParseInfo parse_info(script);
``````

There is a unit test that shows how a ParseInfo instance can be created and inspected.

This will call ParseInfo's constructor (in src/parsing/parse-info.cc), and which will call `ParseInfo::InitFromIsolate`:

``````DCHECK_NOT_NULL(isolate);
set_hash_seed(isolate->heap()->HashSeed());
set_stack_limit(isolate->stack_guard()->real_climit());
set_unicode_cache(isolate->unicode_cache());
set_runtime_call_stats(isolate->counters()->runtime_call_stats());
set_ast_string_constants(isolate->ast_string_constants());
``````

I was curious about these ast_string_constants:

``````(lldb) p *ast_string_constants_
(const v8::internal::AstStringConstants) \$58 = {
zone_ = {
allocation_size_ = 1312
segment_bytes_allocated_ = 8192
position_ = 0x0000000105052538 <no value available>
limit_ = 0x0000000105054000 <no value available>
allocator_ = 0x0000000103e00080
name_ = 0x0000000101623a70 "../../src/ast/ast-value-factory.h:365"
sealed_ = false
}
string_table_ = {
v8::base::TemplateHashMapImpl<void *, void *, v8::base::HashEqualityThenKeyMatcher<void *, bool (*)(void *, void *)>, v8::base::DefaultAllocationPolicy> = {
map_ = 0x0000000105054000
capacity_ = 64
occupancy_ = 41
match_ = {
match_ = 0x000000010014b260 (libv8.dylib`v8::internal::AstRawString::Compare(void*, void*) at ast-value-factory.cc:122)
}
}
}
hash_seed_ = 500815076
anonymous_function_string_ = 0x0000000105052018
arguments_string_ = 0x0000000105052038
async_string_ = 0x0000000105052058
await_string_ = 0x0000000105052078
boolean_string_ = 0x0000000105052098
constructor_string_ = 0x00000001050520b8
default_string_ = 0x00000001050520d8
done_string_ = 0x00000001050520f8
dot_string_ = 0x0000000105052118
dot_for_string_ = 0x0000000105052138
dot_generator_object_string_ = 0x0000000105052158
dot_iterator_string_ = 0x0000000105052178
dot_result_string_ = 0x0000000105052198
dot_switch_tag_string_ = 0x00000001050521b8
dot_catch_string_ = 0x00000001050521d8
empty_string_ = 0x00000001050521f8
eval_string_ = 0x0000000105052218
function_string_ = 0x0000000105052238
get_space_string_ = 0x0000000105052258
length_string_ = 0x0000000105052278
let_string_ = 0x0000000105052298
name_string_ = 0x00000001050522b8
native_string_ = 0x00000001050522d8
new_target_string_ = 0x00000001050522f8
next_string_ = 0x0000000105052318
number_string_ = 0x0000000105052338
object_string_ = 0x0000000105052358
proto_string_ = 0x0000000105052378
prototype_string_ = 0x0000000105052398
return_string_ = 0x00000001050523b8
set_space_string_ = 0x00000001050523d8
star_default_star_string_ = 0x00000001050523f8
string_string_ = 0x0000000105052418
symbol_string_ = 0x0000000105052438
this_string_ = 0x0000000105052458
this_function_string_ = 0x0000000105052478
throw_string_ = 0x0000000105052498
undefined_string_ = 0x00000001050524b8
use_asm_string_ = 0x00000001050524d8
use_strict_string_ = 0x00000001050524f8
value_string_ = 0x0000000105052518
}
``````

So these are constants that are set on the new ParseInfo instance using the values from the isolate. Not exactly sure what I want with this but I might come back to it later. So, we are back in ParseInfo's constructor:

``````set_allow_lazy_parsing();
set_toplevel();
set_script(script);
``````

Script is of type v8::internal::Script which can be found in src/object/script.h

Back now in compiler.cc and the GetSharedFunctionInfoForScript function:

``````Zone compile_zone(isolate->allocator(), ZONE_NAME);

...
if (parse_info->literal() == nullptr && !parsing::ParseProgram(parse_info, isolate))
``````

`ParseProgram`:

``````Parser parser(info);
...
FunctionLiteral* result = nullptr;
result = parser.ParseProgram(isolate, info);
``````

`parser.ParseProgram`:

``````Handle<String> source(String::cast(info->script()->source()));

(lldb) job *source
"var user1 = new Person('Fletch');\x0avar user2 = new Person('Dr.Rosen');\x0aprint("user1 = " + user1.name);\x0aprint("user2 = " + user2.name);\x0a\x0a"
``````

So here we can see our JavaScript as a String.

``````std::unique_ptr<Utf16CharacterStream> stream(ScannerStream::For(source));
scanner_.Initialize(stream.get(), info->is_module());
result = DoParseProgram(info);
``````

`DoParseProgram`:

``````(lldb) br s -f parser.cc -l 639
...

this->scope()->SetLanguageMode(info->language_mode());
ParseStatementList(body, Token::EOS, &ok);
``````

This call will land in parser-base.h and its `ParseStatementList` function.

``````(lldb) br s -f parser-base.h -l 4695

StatementT stat = ParseStatementListItem(CHECK_OK_CUSTOM(Return, kLazyParsingComplete));

result = CompileToplevel(&parse_info, isolate, Handle<SharedFunctionInfo>::null());
``````

This will land in `CompileTopelevel` (in the same file which is src/compiler.cc):

``````// Compile the code.
result = CompileUnoptimizedCode(parse_info, shared_info, isolate);
``````

This will land in `CompileUnoptimizedCode` (in the same file which is src/compiler.cc):

``````// Prepare and execute compilation of the outer-most function.
std::unique_ptr<CompilationJob> outer_job(
PrepareAndExecuteUnoptimizedCompileJob(parse_info, parse_info->literal(),
shared_info, isolate));

std::unique_ptr<CompilationJob> job(
interpreter::Interpreter::NewCompilationJob(parse_info, literal, isolate));
if (job->PrepareJob() == CompilationJob::SUCCEEDED &&
job->ExecuteJob() == CompilationJob::SUCCEEDED) {
return job;
}
``````

PrepareJobImpl:

``````CodeGenerator::MakeCodePrologue(parse_info(), compilation_info(),
"interpreter");
return SUCCEEDED;
``````

codegen.cc `MakeCodePrologue`:

interpreter.cc ExecuteJobImpl:

``````generator()->GenerateBytecode(stack_limit());
``````

src/interpreter/bytecode-generator.cc

`````` RegisterAllocationScope register_scope(this);
``````

The bytecode is register based (if that is the correct term) and we had an example previously. I'm guessing that this is what this call is about.

VisitDeclarations will iterate over all the declarations in the file which in our case are:

``````var user1 = new Person('Fletch');
var user2 = new Person('Dr.Rosen');

(lldb) p *variable->raw_name()
(const v8::internal::AstRawString) \$33 = {
= {
next_ = 0x000000010600a280
string_ = 0x000000010600a280
}
literal_bytes_ = (start_ = "user1", length_ = 5)
hash_field_ = 1303438034
is_one_byte_ = true
has_string_ = false
}

// Perform a stack-check before the body.
builder()->StackCheck(info()->literal()->start_position());
``````

So that call will output a stackcheck instruction, like in the example above:

``````14 E> 0x2eef8d9b103e @    0 : 7f                StackCheck
``````

### Performance

Say you have the expression x + y the full-codegen compiler might produce:

``````movq rax, x
movq rbx, y
``````

If x and y are integers just using the `add` operation would be much quicker:

``````movq rax, x
movq rbx, y
``````

Recall that functions are optimized so if the compiler has to bail out and unoptimize part of a function then the whole functions will be affected and it will go back to the unoptimized version.

## Bytecode

This section will examine the bytecode for the following JavaScript:

``````function beve() {
const p = new Promise((resolve, reject) => {
resolve('ok');
});

p.then(msg => {
console.log(msg);
});
}

beve();

\$ d8 --print-bytecode promise.js
``````

First have the main function which does not have a name:

``````[generating bytecode for function: ]
(The code that generated this can be found in src/objects.cc BytecodeArray::Dissassemble)
Parameter count 1
Frame size 32
// load what ever the FixedArray[4] is in the constant pool into the accumulator.
0x34423e7ac19e @    0 : 09 00             LdaConstant [0]
// store the FixedArray[4] in register r1
0x34423e7ac1a0 @    2 : 1e f9             Star r1
// store zero into the accumulator.
0x34423e7ac1a2 @    4 : 02                LdaZero
// store zero (the contents of the accumulator) into register r2.
0x34423e7ac1a3 @    5 : 1e f8             Star r2
//
0x34423e7ac1a5 @    7 : 1f fe f7          Mov <closure>, r3
0x34423e7ac1a8 @   10 : 53 96 01 f9 03    CallRuntime [DeclareGlobalsForInterpreter], r1-r3
0 E> 0x34423e7ac1ad @   15 : 90                StackCheck
141 S> 0x34423e7ac1ae @   16 : 0a 01 00          LdaGlobal [1], [0]
0x34423e7ac1b1 @   19 : 1e f9             Star r1
141 E> 0x34423e7ac1b3 @   21 : 4f f9 03          CallUndefinedReceiver0 r1, [3]
0x34423e7ac1b6 @   24 : 1e fa             Star r0
148 S> 0x34423e7ac1b8 @   26 : 94                Return

Constant pool (size = 2)
0x34423e7ac149: [FixedArray] in OldSpace
- map = 0x344252182309 <Map(HOLEY_ELEMENTS)>
- length: 2
0: 0x34423e7ac069 <FixedArray[4]>
1: 0x34423e7abf59 <String[4]: beve>

Handler Table (size = 16) Load the global with name in constant pool entry <name_index> into the
// accumulator using FeedBackVector slot <slot> outside of a typeof
``````
• LdaConstant Load the constant at index from the constant pool into the accumulator.
• Star Store the contents of the accumulator register in dst.
• Ldar Load accumulator with value from register src.
• LdaGlobal Load the global with name in constant pool entry idx into the accumulator using FeedBackVector slot outside of a typeof.
• Mov , Store the value of register

You can find the declarations for the these instructions in `src/interpreter/interpreter-generator.cc`.

## FeedbackVector

Is attached to every function and is responsible for recording and managing all execution feedback, which is information about types enabling. You can find the declaration for this class in `src/feedback-vector.h`

## BytecodeGenerator

Is currently the only part of V8 that cares about the AST.

## BytecodeGraphBuilder

Produces high-level IR graph based on interpreter bytecodes.

## TurboFan

Is a compiler backend that gets fed a control flow graph and then does instruction selection, register allocation and code generation. The code generation generates

### Execution/Runtime

I'm not sure if V8 follows this exactly but I've heard and read that when the engine comes across a function declaration it only parses and verifies the syntax and saves a ref to the function name. The statements inside the function are not checked at this stage only the syntax of the function declaration (parenthesis, arguments, brackets etc).

### Function methods

The declaration of Function can be found in `include/v8.h` (just noting this as I've looked for it several times)

### Symbol

The declarations for the Symbol class can be found in `v8.h` and the internal implementation in `src/api/api.cc`.

The well known Symbols are generated using macros so you won't find the just by searching using the static function names like 'GetToPrimitive`.

``````#define WELL_KNOWN_SYMBOLS(V)                 \
V(AsyncIterator, async_iterator)            \
V(HasInstance, has_instance)                \
V(Iterator, iterator)                       \
V(Match, match)                             \
V(Replace, replace)                         \
V(Search, search)                           \
V(Split, split)                             \
V(ToPrimitive, to_primitive)                \
V(ToStringTag, to_string_tag)               \
V(Unscopables, unscopables)

#define SYMBOL_GETTER(Name, name)                                   \
Local<Symbol> v8::Symbol::Get##Name(Isolate* isolate) {           \
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate); \
return Utils::ToLocal(i_isolate->factory()->name##_symbol());   \
}
``````

So GetToPrimitive would become:

``````Local<Symbol> v8::Symbol::GeToPrimitive(Isolate* isolate) {
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate);
return Utils::ToLocal(i_isolate->factory()->to_primitive_symbol());
}
``````

There is an example in symbol-test.cc.

## Builtins

Are JavaScript functions/objects that are provided by V8. These are built using a C++ DSL and are passed through:

``````CodeStubAssembler -> CodeAssembler -> RawMachineAssembler.
``````

Builtins need to have bytecode generated for them so that they can be run in TurboFan.

`src/code-stub-assembler.h`

All the builtins are declared in `src/builtins/builtins-definitions.h` by the `BUILTIN_LIST_BASE` macro. There are different type of builtins (TF = Turbo Fan):

TFJ JavaScript linkage which means it is callable as a JavaScript function

TFS CodeStub linkage. A builtin with stub linkage can be used to extract common code into a separate code object which can then be used by multiple callers. These is useful because builtins are generated at compile time and included in the V8 snapshot. This means that they are part of every isolate that is created. Being able to share common code for multiple builtins will save space.

TFC CodeStub linkage with custom descriptor

To see how this works in action we first need to disable snapshots. If we don't, we won't be able to set breakpoints as the the heap will be serialized at compile time and deserialized upon startup of v8.

To find the option to disable snapshots use:

``````\$ gn args --list out.gn/learning --short | more
...
v8_use_snapshot=true
\$ gn args out.gn/learning
v8_use_snapshot=false
\$ gn -C out.gn/learning
``````

After building we should be able to set a break point in bootstrapper.cc and its function `Genesis::InitializeGlobal`:

``````(lldb) br s -f bootstrapper.cc -l 2684
``````

Lets take a look at how the `JSON` object is setup:

``````Handle<String> name = factory->InternalizeUtf8String("JSON");
Handle<JSObject> json_object = factory->NewJSObject(isolate->object_function(), TENURED);
``````

`TENURED` means that this object should be allocated directly in the old generation.

``````JSObject::AddProperty(global, name, json_object, DONT_ENUM);
``````

`DONT_ENUM` is checked by some builtin functions and if set this object will be ignored by those functions.

``````SimpleInstallFunction(json_object, "parse", Builtins::kJsonParse, 2, false);
``````

Here we can see that we are installing a function named `parse`, which takes 2 parameters. You can find the definition in src/builtins/builtins-json.cc. What does the `SimpleInstallFunction` do?

Lets take `console` as an example which was created using:

``````Handle<JSObject> console = factory->NewJSObject(cons, TENURED);
SimpleInstallFunction(console, "debug", Builtins::kConsoleDebug, 1, false,
NONE);

V8_NOINLINE Handle<JSFunction> SimpleInstallFunction(
Handle<JSObject> base,
const char* name,
Builtins::Name call,
int len,
PropertyAttributes attrs = DONT_ENUM,
BuiltinFunctionId id = kInvalidBuiltinFunctionId) {
``````

So we can see that base is our Handle to a JSObject, and name is "debug". Builtins::Name is Builtins:kConsoleDebug. Where is this defined?
You can find a macro named `CPP` in `src/builtins/builtins-definitions.h`:

CPP(ConsoleDebug)

What does this macro expand to?
It is part of the `BUILTIN_LIST_BASE` macro in builtin-definitions.h We have to look at where BUILTIN_LIST is used which we can find in builtins.cc. In `builtins.cc` we have an array of `BuiltinMetadata` which is declared as:

``````const BuiltinMetadata builtin_metadata[] = {
BUILTIN_LIST(DECL_CPP, DECL_API, DECL_TFJ, DECL_TFC, DECL_TFS, DECL_TFH, DECL_ASM)
};

#define DECL_CPP(Name, ...) { #Name, Builtins::CPP, \
``````

Which will expand to the creation of a BuiltinMetadata struct entry in the array. The BuildintMetadata struct looks like this which might help understand what is going on:

``````struct BuiltinMetadata {
const char* name;
Builtins::Kind kind;
union {
Address cpp_entry;       // For CPP and API builtins.
int8_t parameter_count;  // For TFJ builtins.
} kind_specific_data;
};
``````

So the `CPP(ConsoleDebug)` will expand to an entry in the array which would look something like this:

``````{ ConsoleDebug,
Builtins::CPP,
{
}
},
``````

The third paramter is the creation on the union which might not be obvious.

Back to the question I'm trying to answer which is:
"Buildtins::Name is is Builtins:kConsoleDebug. Where is this defined?"
For this we have to look at `builtins.h` and the enum Name:

``````enum Name : int32_t {
#define DEF_ENUM(Name, ...) k##Name,
BUILTIN_LIST_ALL(DEF_ENUM)
#undef DEF_ENUM
builtin_count
};
``````

This will expand to the complete list of builtins in builtin-definitions.h using the DEF_ENUM macro. So the expansion for ConsoleDebug will look like:

``````enum Name: int32_t {
...
kDebugConsole,
...
};
``````

So backing up to looking at the arguments to SimpleInstallFunction which are:

``````SimpleInstallFunction(console, "debug", Builtins::kConsoleDebug, 1, false,
NONE);

V8_NOINLINE Handle<JSFunction> SimpleInstallFunction(
Handle<JSObject> base,
const char* name,
Builtins::Name call,
int len,
PropertyAttributes attrs = DONT_ENUM,
BuiltinFunctionId id = kInvalidBuiltinFunctionId) {
``````

We know about `Builtins::Name`, so lets look at len which is one, what is this?
SimpleInstallFunction will call:

``````Handle<JSFunction> fun =
``````

`len` would be used if adapt was true but it is false in our case. This is what it would be used for if adapt was true:

``````fun->shared()->set_internal_formal_parameter_count(len);
``````

I'm not exactly sure what adapt is referring to here.

PropertyAttributes is not specified so it will get the default value of `DONT_ENUM`. The last parameter which is of type BuiltinFunctionId is not specified either so the default value of `kInvalidBuiltinFunctionId` will be used. This is an enum defined in `src/objects/objects.h`.

This blog provides an example of adding a function to the String object.

``````\$ out.gn/learning/mksnapshot --print-code > output
``````

You can then see the generated code from this. This will produce a code stub that can be called through C++. Lets update this to have it be called from JavaScript:

Update builtins/builtins-string-get.cc :

``````TF_BUILTIN(GetStringLength, StringBuiltinsAssembler) {
}
``````

We also have to update builtins/builtins-definitions.h:

``````TFJ(GetStringLength, 0)
``````

And bootstrapper.cc:

``````SimpleInstallFunction(prototype, "len", Builtins::kGetStringLength, 0, true);
``````

If you now build using 'ninja -C out.gn/learning_v8' you should be able to run d8 and try this out:

``````d8> const s = 'testing'
undefined
d8> s.len()
7
``````

Now lets take a closer look at the code that is generated for this:

``````\$ out.gn/learning/mksnapshot --print-code > output
``````

Looking at the output generated I was surprised to see two entries for GetStringLength (I changed the name just to make sure there was not something else generating the second one). Why two?

The following uses Intel Assembly syntax which means that no register/immediate prefixes and the first operand is the destination and the second operand the source.

``````--- Code ---
kind = BUILTIN
name = BeveStringLength
compiler = turbofan
Instructions (size = 136)
0x1fafde09b3a0     0  55             push rbp
0x1fafde09b3a1     1  4889e5         REX.W movq rbp,rsp                  // movq rsp into rbp

0x1fafde09b3a4     4  56             push rsi                            // push the value of rsi (first parameter) onto the stack
0x1fafde09b3a5     5  57             push rdi                            // push the value of rdi (second parameter) onto the stack
0x1fafde09b3a6     6  50             push rax                            // push the value of rax (accumulator) onto the stack

0x1fafde09b3a7     7  4883ec08       REX.W subq rsp,0x8                  // make room for a 8 byte value on the stack
0x1fafde09b3ab     b  488b4510       REX.W movq rax,[rbp+0x10]           // move the value rpm + 10 to rax
0x1fafde09b3af     f  488b58ff       REX.W movq rbx,[rax-0x1]
0x1fafde09b3b3    13  807b0b80       cmpb [rbx+0xb],0x80                // IsString(object). compare byte to zero
0x1fafde09b3b7    17  0f8350000000   jnc 0x1fafde09b40d  <+0x6d>        // jump it carry flag was not set

0x1fafde09b3bd    1d  488b400f       REX.W movq rax,[rax+0xf]
0x1fafde09b3c1    21  4989e2         REX.W movq r10,rsp
0x1fafde09b3c4    24  4883ec08       REX.W subq rsp,0x8
0x1fafde09b3c8    28  4883e4f0       REX.W andq rsp,0xf0
0x1fafde09b3cc    2c  4c891424       REX.W movq [rsp],r10
0x1fafde09b3d0    30  488945e0       REX.W movq [rbp-0x20],rax
0x1fafde09b3d4    34  48be0000000001000000 REX.W movq rsi,0x100000000
0x1fafde09b3de    3e  48bad9c228dfa8090000 REX.W movq rdx,0x9a8df28c2d9    ;; object: 0x9a8df28c2d9 <String[101]: CAST(LoadObjectField(object, offset, MachineTypeOf<T>::value)) at ../../src/code-stub-assembler.h:432>
0x1fafde09b3e8    48  488bf8         REX.W movq rdi,rax
0x1fafde09b3eb    4b  48b830726d0a01000000 REX.W movq rax,0x10a6d7230    ;; external reference (check_object_type)
0x1fafde09b3f5    55  40f6c40f       testb rsp,0xf
0x1fafde09b3f9    59  7401           jz 0x1fafde09b3fc  <+0x5c>
0x1fafde09b3fb    5b  cc             int3l
0x1fafde09b3fc    5c  ffd0           call rax
0x1fafde09b3fe    5e  488b2424       REX.W movq rsp,[rsp]
0x1fafde09b402    62  488b45e0       REX.W movq rax,[rbp-0x20]
0x1fafde09b406    66  488be5         REX.W movq rsp,rbp
0x1fafde09b409    69  5d             pop rbp
0x1fafde09b40a    6a  c20800         ret 0x8

0x1fafde09b40d    6d  48ba71c228dfa8090000 REX.W movq rdx,0x9a8df28c271    ;; object: 0x9a8df28c271 <String[76]\: CSA_ASSERT failed: IsString(object) [../../src/code-stub-assembler.cc:1498]\n>
0x1fafde09b417    77  e8e4d1feff     call 0x1fafde088600     ;; code: BUILTIN
0x1fafde09b41c    7c  cc             int3l
0x1fafde09b41d    7d  cc             int3l
0x1fafde09b41e    7e  90             nop
0x1fafde09b41f    7f  90             nop

Safepoints (size = 8)

RelocInfo (size = 7)
0x1fafde09b3e0  embedded object  (0x9a8df28c2d9 <String[101]: CAST(LoadObjectField(object, offset, MachineTypeOf<T>::value)) at ../../src/code-stub-assembler.h:432>)
0x1fafde09b3ed  external reference (check_object_type)  (0x10a6d7230)
0x1fafde09b40f  embedded object  (0x9a8df28c271 <String[76]\: CSA_ASSERT failed: IsString(object) [../../src/code-stub-assembler.cc:1498]\n>)
0x1fafde09b418  code target (BUILTIN)  (0x1fafde088600)

--- End code ---
``````

### TF_BUILTIN macro

Is a macro to defining Turbofan (TF) builtins and can be found in `builtins/builtins-utils-gen.h`

If we take a look at the file src/builtins/builtins-bigint-gen.cc and the following function:

``````TF_BUILTIN(BigIntToI64, CodeStubAssembler) {
if (!Is64()) {
Unreachable();
return;
}

TNode<Object> value = CAST(Parameter(Descriptor::kArgument));
TNode<Context> context = CAST(Parameter(Descriptor::kContext));
TNode<BigInt> n = ToBigInt(context, value);

TVARIABLE(UintPtrT, var_low);
TVARIABLE(UintPtrT, var_high);

BigIntToRawBytes(n, &var_low, &var_high);
Return(var_low.value());
}
``````

Let's take our GetStringLength example from above and see what this will be expanded to after processing this macro:

``````\$ clang++ --sysroot=build/linux/debian_sid_amd64-sysroot -isystem=./buildtools/third_party/libc++/trunk/include -isystem=buildtools/third_party/libc++/trunk/include -I. -E src/builtins/builtins-bigint-gen.cc > builtins-bigint-gen.cc.pp
``````
``````static void Generate_BigIntToI64(compiler::CodeAssemblerState* state);

class BigIntToI64Assembler : public CodeStubAssembler {
public:
using Descriptor = Builtin_BigIntToI64_InterfaceDescriptor;
explicit BigIntToI64Assembler(compiler::CodeAssemblerState* state) : CodeStubAssembler(state) {}
void GenerateBigIntToI64Impl();
Node* Parameter(Descriptor::ParameterIndices index) {
return CodeAssembler::Parameter(static_cast<int>(index));
}
};

void Builtins::Generate_BigIntToI64(compiler::CodeAssemblerState* state) {
BigIntToI64Assembler assembler(state);
state->SetInitialDebugInformation("BigIntToI64", "src/builtins/builtins-bigint-gen.cc", 14);
if (Builtins::KindOf(Builtins::kBigIntToI64) == Builtins::TFJ) {
assembler.PerformStackCheck(assembler.GetJSContextParameter());
}
assembler.GenerateBigIntToI64Impl();
}
void BigIntToI64Assembler::GenerateBigIntToI64Impl() {
if (!Is64()) {
Unreachable();
return;
}

TNode<Object> value = Cast(Parameter(Descriptor::kArgument));
TNode<Context> context = Cast(Parameter(Descriptor::kContext));
TNode<BigInt> n = ToBigInt(context, value);

TVariable<UintPtrT> var_low(this);
TVariable<UintPtrT> var_high(this);

BigIntToRawBytes(n, &var_low, &var_high);
Return(var_low.value());
}
``````

From the resulting class you can see how `Parameter` can be used from within `TF_BUILTIN` macro.

## Building V8

You'll need to have checked out the Google V8 sources to you local file system and build it by following the instructions found here.

### Configure v8 build for learning-v8

There is a make target that can generate a build configuration for V8 that is specific to this project. It can be run using the following command:

``````\$ make configure_v8
``````

Then to compile this configuration:

``````\$ make compile_v8
``````

### gclient sync

``````\$ gclient sync
``````

#### Troubleshooting build:

``````/v8_src/v8/out/x64.release/obj/libv8_monolith.a(eh-frame.o):eh-frame.cc:function v8::internal::EhFrameWriter::WriteEmptyEhFrame(std::__1::basic_ostream<char, std::__1::char_traits<char> >&): error: undefined reference to 'std::__1::basic_ostream<char, std::__1::char_traits<char> >::write(char const*, long)'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
``````

`-stdlib=libc++` is llvm's C++ runtime. This runtime has a `__1` namespace. I looks like the static library above was compiled with clangs/llvm's `libc++` as we are seeing the `__1` namespace.

-stdlib=libstdc++ is GNU's C++ runtime

So we can see that the namespace `std::__1` is used which we now know is the namespace that libc++ which is clangs libc++ library. I guess we could go about this in two ways, either we can change v8 build of to use glibc++ when compiling so that the symbols are correct when we want to link against it, or we can update our linker (ld) to use libc++.

We need to include the correct libraries to link with during linking, which means specifying:

``````-stdlib=libc++ -Wl,-L\$(v8_build_dir)
``````

If we look in \$(v8_build_dir) we find `libc++.so`. We also need to this library to be found at runtime by the dynamic linker using `LD_LIBRARY_PATH`:

``````\$ LD_LIBRARY_PATH=../v8_src/v8/out/x64.release/ ./hello-world
``````

Notice that this is using `ld` from our path. We can tell clang to use a different search path with the `-B` option:

``````\$ clang++ --help | grep -- '-B'
-B <dir>                Add <dir> to search path for binaries and object files used implicitly
``````

`libgcc_s` is GCC low level runtime library. I've been confusing this with glibc++ libraries for some reason but they are not the same.

Running cctest:

``````\$ out.gn/learning/cctest test-heap-profiler/HeapSnapshotRetainedObjectInfo
``````

To get a list of the available tests:

``````\$ out.gn/learning/cctest --list
``````

Checking formating/linting:

``````\$ git cl format
``````

You can then `git diff` and see the changes.

Running pre-submit checks:

``````\$ git cl presubmit
``````

``````\$ git cl upload
``````

#### Build details

So when we run gn it will generate Ninja build file. GN itself is written in C++ but has a python wrapper around it.

A group in gn is just a collection of other targets which enables them to have a name.

So when we run gn there will be a number of .ninja files generated. If we look in the root of the output directory we find two .ninja files:

``````build.ninja  toolchain.ninja
``````

By default ninja will look for `build.ninja` and when we run ninja we usually specify the `-C out/dir`. If no targets are specified on the command line ninja will execute all outputs unless there is one specified as default. V8 has the following default target:

``````default all

build all: phony \$
./bytecode_builtins_list_generator \$
./d8 \$
obj/fuzzer_support.stamp \$
./gen-regexp-special-case \$
obj/generate_bytecode_builtins_list.stamp \$
obj/gn_all.stamp \$
obj/json_fuzzer.stamp \$
obj/lib_wasm_fuzzer_common.stamp \$
./mksnapshot \$
obj/multi_return_fuzzer.stamp \$
obj/parser_fuzzer.stamp \$
obj/regexp_builtins_fuzzer.stamp \$
obj/regexp_fuzzer.stamp \$
obj/run_gen-regexp-special-case.stamp \$
obj/run_mksnapshot_default.stamp \$
obj/run_torque.stamp \$
./torque \$
./torque-language-server \$
obj/torque_base.stamp \$
obj/torque_generated_definitions.stamp \$
obj/torque_generated_initializers.stamp \$
obj/torque_ls_base.stamp \$
./libv8.so.TOC \$
obj/v8_archive.stamp \$
...
``````

A `phony` rule can be used to create an alias for other targets. The `\$` in ninja is an escape character so in the case of the all target it escapes the new line, like using \ in a shell script.

Lets take a look at `bytecode_builtins_list_generator`:

``````build \$:bytecode_builtins_list_generator: phony ./bytecode_builtins_list_generator
``````

The format of the ninja build statement is:

``````build outputs: rulename inputs
``````

We are again seeing the `\$` ninja escape character but this time it is escaping the colon which would otherwise be interpreted as separating file names. The output in this case is bytecode_builtins_list_generator. And I'm guessing, as I can't find a connection between `./bytecode_builtins_list_generator` and

The default `target_out_dir` in this case is //out/x64.release_gcc/obj. The executable in BUILD.gn which generates this does not specify any output directory so I'm assuming that it the generated .ninja file is place in the target_out_dir in this case where we can find `bytecode_builtins_list_generator.ninja` This file has a label named:

``````label_name = bytecode_builtins_list_generator
``````

Hmm, notice that in build.ninja there is the following command:

``````subninja toolchain.ninja
``````

And in `toolchain.ninja` we have:

``````subninja obj/bytecode_builtins_list_generator.ninja
``````

This is what is making `./bytecode_builtins_list_generator` available.

``````\$ ninja -C out/x64.release_gcc/ -t targets all  | grep bytecode_builtins_list_generator
\$ rm out/x64.release_gcc/bytecode_builtins_list_generator
\$ ninja -C out/x64.release_gcc/ bytecode_builtins_list_generator
ninja: Entering directory `out/x64.release_gcc/'
``````

Alright, so I'd like to understand when in the process torque is run to generate classes like TorqueGeneratedStruct:

``````class Struct : public TorqueGeneratedStruct<Struct, HeapObject> {
``````
``````./torque \$
./torque-language-server \$
obj/torque_base.stamp \$
obj/torque_generated_definitions.stamp \$
obj/torque_generated_initializers.stamp \$
obj/torque_ls_base.stamp \$
``````

Like before we can find that obj/torque.ninja in included by the subninja command in toolchain.ninja:

``````subninja obj/torque.ninja
``````

So this is building the executable `torque`, but it has not been run yet.

``````\$ gn ls out/x64.release_gcc/ --type=action
//:generate_bytecode_builtins_list
//:run_gen-regexp-special-case
//:run_mksnapshot_default
//:run_torque
//:v8_dump_build_config
//src/inspector:protocol_compatibility
//src/inspector:protocol_generated_sources
//tools/debug_helper:gen_heap_constants
//tools/debug_helper:run_mkgrokdump
``````

Notice the `run_torque` target

``````\$ gn desc out/x64.release_gcc/ //:run_torque
``````

If we look in toolchain.ninja we have a rule named `___run_torque___build_toolchain_linux_x64__rule`

``````command = python ../../tools/run.py ./torque -o gen/torque-generated -v8-root ../..
src/builtins/array-copywithin.tq
src/builtins/array-every.tq
src/builtins/array-filter.tq
src/builtins/array-find.tq
...
``````

And there is a build that specifies the .h and cc files in gen/torque-generated which has this rule in it if they change.

## Building chromium

When making changes to V8 you might need to verify that your changes have not broken anything in Chromium.

Generate Your Project (gpy) : You'll have to run this once before building:

``````\$ gclient sync
\$ gclient runhooks
``````

#### Update the code base

``````\$ git fetch origin master
\$ git co master
\$ git merge origin/master
``````

### Building using GN

``````\$ gn args out.gn/learning
``````

### Building using Ninja

``````\$ ninja -C out.gn/learning
``````

Building the tests:

``````\$ ninja -C out.gn/learning chrome/test:unit_tests
``````

An error I got when building the first time:

``````traceback (most recent call last):
File "./gyp-mac-tool", line 713, in <module>
sys.exit(main(sys.argv[1:]))
File "./gyp-mac-tool", line 29, in main
exit_code = executor.Dispatch(args)
File "./gyp-mac-tool", line 44, in Dispatch
return getattr(self, method)(*args[1:])
File "./gyp-mac-tool", line 68, in ExecCopyBundleResource
self._CopyStringsFile(source, dest)
File "./gyp-mac-tool", line 134, in _CopyStringsFile
import CoreFoundation
ImportError: No module named CoreFoundation
[6644/20987] ACTION base_nacl: build newlib plib_9b4f41e4158ebb93a5d28e6734a13e85
ninja: build stopped: subcommand failed.
``````

I was able to get around this by:

``````\$ pip install -U pyobjc
``````

#### Using a specific version of V8

The instructions below work but it is also possible to create a soft link from chromium/src/v8 to local v8 repository and the build/test.

So, we want to include our updated version of V8 so that we can verify that it builds correctly with our change to V8. While I'm not sure this is the proper way to do it, I was able to update DEPS in src (chromium) and set the v8 entry to git@github.com:danbev/v8.git@064718a8921608eaf9b5eadbb7d734ec04068a87:

``````"git@github.com:danbev/v8.git@064718a8921608eaf9b5eadbb7d734ec04068a87"
``````

You'll have to run `gclient sync` after this.

Another way is to not updated the `DEPS` file, which is a version controlled file, but instead update `.gclientrc` and add a `custom_deps` entry:

``````solutions = [{u'managed': False, u'name': u'src', u'url': u'https://chromium.googlesource.com/chromium/src.git',
u'custom_deps': {
"src/v8": "git@github.com:danbev/v8.git@27a666f9be7ca3959c7372bdeeee14aef2a4b7ba"
}, u'deps_file': u'.DEPS.git', u'safesync_url': u''}]
``````

## Buiding pdfium

You may have to compile this project (in addition to chromium to verify that changes in v8 are not breaking code in pdfium.

### Create/clone the project

`````` \$ mkdir pdfuim_reop
\$ gclient sync
\$ cd pdfium
``````

### Building

``````\$ ninja -C out/Default
``````

#### Using a branch of v8

You should be able to update the .gclient file adding a custom_deps entry:

``````solutions = [
{
"name"        : "pdfium",
"deps_file"   : "DEPS",
"managed"     : False,
"custom_deps" : {
},
},
``````

] cache_dir = None You'll have to run `gclient sync` after this too.

## Code in this repo

#### hello-world

hello-world is heavily commented and show the usage of a static int being exposed and accessed from JavaScript.

#### instances

instances shows the usage of creating new instances of a C++ class from JavaScript.

#### run-script

run-script is basically the same as instance but reads an external file, script.js and run the script.

#### tests

The test directory contains unit tests for individual classes/concepts in V8 to help understand them.

## Building this projects code

``````\$ make
``````

## Running

``````\$ ./hello-world
``````

## Cleaning

``````\$ make clean
``````

## Contributing a change to V8

1. Create a working branch using `git new-branch name`

See Googles contributing-code for more details.

### Find the current issue number

``````\$ git cl issue
``````

## Debugging

``````\$ lldb hello-world
(lldb) br s -f hello-world.cc -l 27
``````

There are a number of useful functions in `src/objects-printer.cc` which can also be used in lldb.

#### Print value of a Local object

``````(lldb) print _v8_internal_Print_Object(*(v8::internal::Object**)(*init_fn))
``````

#### Print stacktrace

``````(lldb) p _v8_internal_Print_StackTrace()
``````

#### Creating command aliases in lldb

Create a file named .lldbinit (in your project director or home directory). This file can now be found in v8's tools directory.

### Using d8

This is the source used for the following examples:

``````\$ cat class.js
function Person(name, age) {
this.name = name;
this.age = age;
}

print("before");
const p = new Person("Daniel", 41);
print(p.name);
print(p.age);
print("after");
``````

### V8_shell startup

What happens when the v8_shell is run?

``````\$ lldb -- out/x64.debug/d8 --enable-inspector class.js
(lldb) breakpoint set --file d8.cc --line 2662
Breakpoint 1: where = d8`v8::Shell::Main(int, char**) + 96 at d8.cc:2662, address = 0x0000000100015150
``````

First v8::base::debug::EnableInProcessStackDumping() is called followed by some windows specific code guarded by macros. Next is all the options are set using `v8::Shell::SetOptions`

SetOptions will call `v8::V8::SetFlagsFromCommandLine` which is found in src/api.cc:

``````i::FlagList::SetFlagsFromCommandLine(argc, argv, remove_flags);
``````

This function can be found in src/flags.cc. The flags themselves are defined in src/flag-definitions.h

Next a new SourceGroup array is create:

``````options.isolate_sources = new SourceGroup[options.num_isolates];
SourceGroup* current = options.isolate_sources;
current->Begin(argv, 1);
for (int i = 1; i < argc; i++) {
const char* str = argv[i];

(lldb) p str
(const char *) \$6 = 0x00007fff5fbfed4d "manual.js"
``````

There are then checks performed to see if the args is `--isolate` or `--module`, or `-e` and if not (like in our case)

``````} else if (strncmp(str, "-", 1) != 0) {
// Not a flag, so it must be a script to execute.
options.script_executed = true;
``````

TODO: I'm not exactly sure what SourceGroups are about but just noting this and will revisit later.

This will take us back `int Shell::Main` in src/d8.cc

``````::V8::InitializeICUDefaultLocation(argv[0], options.icu_data_file);

(lldb) p argv[0]
(char *) \$8 = 0x00007fff5fbfed48 "./d8"
``````

See ICU a little more details.

Next the default V8 platform is initialized:

``````g_platform = i::FLAG_verify_predictable ? new PredictablePlatform() : v8::platform::CreateDefaultPlatform();
``````

v8::platform::CreateDefaultPlatform() will be called in our case.

We are then back in Main and have the following lines:

``````2685 v8::V8::InitializePlatform(g_platform);
2686 v8::V8::Initialize();
``````

This is very similar to what I've seen in the Node.js startup process.

We did not specify any natives_blob or snapshot_blob as an option on the command line so the defaults will be used:

``````v8::V8::InitializeExternalStartupData(argv[0]);
``````

back in src/d8.cc line 2918:

``````Isolate* isolate = Isolate::New(create_params);
``````

this call will bring us into api.cc line 8185:

`````` i::Isolate* isolate = new i::Isolate(false);
``````

So, we are invoking the Isolate constructor (in src/isolate.cc).

``````isolate->set_snapshot_blob(i::Snapshot::DefaultSnapshotBlob());
``````

api.cc:

``````isolate->Init(NULL);

compilation_cache_ = new CompilationCache(this);
context_slot_cache_ = new ContextSlotCache();
descriptor_lookup_cache_ = new DescriptorLookupCache();
unicode_cache_ = new UnicodeCache();
inner_pointer_to_code_cache_ = new InnerPointerToCodeCache(this);
global_handles_ = new GlobalHandles(this);
eternal_handles_ = new EternalHandles();
bootstrapper_ = new Bootstrapper(this);
handle_scope_implementer_ = new HandleScopeImplementer(this);
store_stub_cache_ = new StubCache(this, Code::STORE_IC);
materialized_object_store_ = new MaterializedObjectStore(this);
regexp_stack_ = new RegExpStack();
regexp_stack_->isolate_ = this;
date_cache_ = new DateCache();
call_descriptor_data_ =
new CallInterfaceDescriptorData[CallDescriptors::NUMBER_OF_DESCRIPTORS];
access_compiler_data_ = new AccessCompilerData();
cpu_profiler_ = new CpuProfiler(this);
heap_profiler_ = new HeapProfiler(heap());
interpreter_ = new interpreter::Interpreter(this);
compiler_dispatcher_ =
new CompilerDispatcher(this, V8::GetCurrentPlatform(), FLAG_stack_size);
``````

src/builtins/builtins.cc, this is where the builtins are defined. TODO: sort out what these macros do.

In src/v8.cc we have a couple of checks for if the options passed are for a stress_run but since we did not pass in any such flags this code path will be followed which will call RunMain:

``````result = RunMain(isolate, argc, argv, last_run);
``````

this will end up calling:

``````options.isolate_sources[0].Execute(isolate);
``````

Which will call SourceGroup::Execute(Isolate* isolate)

``````// Use all other arguments as names of files to load and run.
HandleScope handle_scope(isolate);
Local<String> file_name = String::NewFromUtf8(isolate, arg, NewStringType::kNormal).ToLocalChecked();
if (source.IsEmpty()) {
Shell::Exit(1);
}
Shell::options.script_executed = true;
if (!Shell::ExecuteString(isolate, source, file_name, false, true)) {
exception_was_thrown = true;
break;
}

ScriptOrigin origin(name);
if (compile_options == ScriptCompiler::kNoCompileOptions) {
ScriptCompiler::Source script_source(source, origin);
return ScriptCompiler::Compile(context, &script_source, compile_options);
}
``````

Which will delegate to ScriptCompiler(Local, Source* source, CompileOptions options):

``````auto maybe = CompileUnboundInternal(isolate, source, options);
``````

CompileUnboundInternal

``````result = i::Compiler::GetSharedFunctionInfoForScript(
str, name_obj, line_offset, column_offset, source->resource_options,
source_map_url, isolate->native_context(), NULL, &script_data, options,
i::NOT_NATIVES_CODE);
``````

src/compiler.cc

``````// Compile the function and add it to the cache.
ParseInfo parse_info(script);
Zone compile_zone(isolate->allocator(), ZONE_NAME);
CompilationInfo info(&compile_zone, &parse_info, Handle<JSFunction>::null());
``````

Back in src/compiler.cc-info.cc:

``````result = CompileToplevel(&info);

(lldb) job *result
0x17df0df309f1: [SharedFunctionInfo]
- name = 0x1a7f12d82471 <String[0]: >
- formal_parameter_count = 0
- expected_nof_properties = 10
- ast_node_count = 23
- instance class name = #Object

- code = 0x1d8484d3661 <Code: BUILTIN>
- source code = function bajja(a, b, c) {
var d = c - 100;
return a + d * b;
}

var result = bajja(2, 2, 150);
print(result);

- anonymous expression
- function token position = -1
- start position = 0
- end position = 114
- no debug info
- length = 0
- optimized_code_map = 0x1a7f12d82241 <FixedArray[0]>
- length: 3
- slot_count: 11
Slot #2 kCreateClosure
Slot #5 CALL_IC
Slot #7 CALL_IC

- bytecode_array = 0x17df0df30c61
``````

Back in d8.cc:

``````maybe_result = script->Run(realm);
``````

src/api.cc

``````auto fun = i::Handle<i::JSFunction>::cast(Utils::OpenHandle(this));

(lldb) job *fun
0x17df0df30e01: [Function]
- map = 0x19cfe0003859 [FastProperties]
- prototype = 0x17df0df043b1
- elements = 0x1a7f12d82241 <FixedArray[0]> [FAST_HOLEY_ELEMENTS]
- initial_map =
- shared_info = 0x17df0df309f1 <SharedFunctionInfo>
- name = 0x1a7f12d82471 <String[0]: >
- formal_parameter_count = 0
- context = 0x17df0df03bf9 <FixedArray[245]>
- feedback vector cell = 0x17df0df30ed1 Cell for 0x17df0df30e49 <FixedArray[13]>
- code = 0x1d8484d3661 <Code: BUILTIN>
- properties = 0x1a7f12d82241 <FixedArray[0]> {
#length: 0x2c35a5718089 <AccessorInfo> (const accessor descriptor)
#name: 0x2c35a57180f9 <AccessorInfo> (const accessor descriptor)
#arguments: 0x2c35a5718169 <AccessorInfo> (const accessor descriptor)
#caller: 0x2c35a57181d9 <AccessorInfo> (const accessor descriptor)
#prototype: 0x2c35a5718249 <AccessorInfo> (const accessor descriptor)

}

Local<Value> result;
has_pending_exception = !ToLocal<Value>(i::Execution::Call(isolate, fun, receiver, 0, nullptr), &result);
``````

src/execution.cc

### Zone

Taken directly from src/zone/zone.h:

``````// The Zone supports very fast allocation of small chunks of
// memory. The chunks cannot be deallocated individually, but instead
// the Zone supports deallocating all chunks in one fast
// operation. The Zone is used to hold temporary data structures like
// the abstract syntax tree, which is deallocated after compilation.
``````

### V8 flags

``````\$ ./d8 --help
``````

### d8

``````(lldb) br s -f d8.cc -l 2935

return v8::Shell::Main(argc, argv);

api.cc:6112
natives-external.cc
``````

### v8::String::NewFromOneByte

So I was a little confused when I first read this function name and thought it had something to do with the length of the string. But the byte is the type of the chars that make up the string. For example, a one byte char would be reinterpreted as uint8_t:

``````const char* data

reinterpret_cast<const uint8_t*>(data)
``````

• gdbinit has been updated. Check if there is something that should be ported to lldbinit

### Invocation walkthrough

This section will go through calling a Script to understand what happens in V8.

I'll be using run-scripts.cc as the example for this.

``````\$ lldb -- ./run-scripts
(lldb) br s -n main
``````

I'll step through until the following call:

``````script->Run(context).ToLocalChecked();
``````

So, Script::Run is defined in api.cc First things that happens in this function is a macro:

``````PREPARE_FOR_EXECUTION_WITH_CONTEXT_IN_RUNTIME_CALL_STATS_SCOPE(
"v8",
"V8.Execute",
context,
Script,
Run,
MaybeLocal<Value>(),
InternalEscapableScope,
true);
TRACE_EVENT_CALL_STATS_SCOPED(isolate, category, name);
PREPARE_FOR_EXECUTION_GENERIC(isolate, context, class_name, function_name, \
bailout_value, HandleScopeClass, do_callback);
``````

So, what does the preprocessor replace this with then:

``````auto isolate = context.IsEmpty() ? i::Isolate::Current()                               : reinterpret_cast<i::Isolate*>(context->GetIsolate());
``````

I'm skipping TRACE_EVENT_CALL_STATS_SCOPED for now. `PREPARE_FOR_EXECUTION_GENERIC` will be replaced with:

``````if (IsExecutionTerminatingCheck(isolate)) {                        \
return bailout_value;                                            \
}                                                                  \
HandleScopeClass handle_scope(isolate);                            \
CallDepthScope<do_callback> call_depth_scope(isolate, context);    \
LOG_API(isolate, class_name, function_name);                       \
ENTER_V8_DO_NOT_USE(isolate);                                      \
bool has_pending_exception = false

auto fun = i::Handle<i::JSFunction>::cast(Utils::OpenHandle(this));

(lldb) job *fun
0x33826912c021: [Function]
- map = 0x1d0656c03599 [FastProperties]
- prototype = 0x338269102e69
- elements = 0x35190d902241 <FixedArray[0]> [FAST_HOLEY_ELEMENTS]
- initial_map =
- shared_info = 0x33826912bc11 <SharedFunctionInfo>
- name = 0x35190d902471 <String[0]: >
- formal_parameter_count = 0
- context = 0x338269102611 <FixedArray[265]>
- feedback vector cell = 0x33826912c139 <Cell value= 0x33826912c069 <FixedArray[24]>>
- code = 0x1319e25fcf21 <Code BUILTIN>
- properties = 0x35190d902241 <FixedArray[0]> {
#length: 0x2e9d97ce68b1 <AccessorInfo> (const accessor descriptor)
#name: 0x2e9d97ce6921 <AccessorInfo> (const accessor descriptor)
#arguments: 0x2e9d97ce6991 <AccessorInfo> (const accessor descriptor)
#caller: 0x2e9d97ce6a01 <AccessorInfo> (const accessor descriptor)
#prototype: 0x2e9d97ce6a71 <AccessorInfo> (const accessor descriptor)
}
``````

The code for i::JSFunction is generated in src/api.h. Lets take a closer look at this.

``````#define DECLARE_OPEN_HANDLE(From, To) \
static inline v8::internal::Handle<v8::internal::To> \
OpenHandle(const From* that, bool allow_empty_handle = false);

OPEN_HANDLE_LIST(DECLARE_OPEN_HANDLE)
``````

OPEN_HANDLE_LIST looks like this:

``````#define OPEN_HANDLE_LIST(V)                    \
....
V(Script, JSFunction)                        \
``````

So lets expand this for JSFunction and it should become:

``````  static inline v8::internal::Handle<v8::internal::JSFunction> \
OpenHandle(const Script* that, bool allow_empty_handle = false);
``````

So there will be an function named OpenHandle that will take a const pointer to Script.

A little further down in src/api.h there is another macro which looks like this:

``````OPEN_HANDLE_LIST(MAKE_OPEN_HANDLE)
``````

MAKE_OPEN_HANDLE:

``````    #define MAKE_OPEN_HANDLE(From, To)
v8::internal::Handle<v8::internal::To> Utils::OpenHandle(
const v8::From* that, bool allow_empty_handle) {
return v8::internal::Handle<v8::internal::To>(
}
``````

And remember that JSFunction is included in the `OPEN_HANDLE_LIST` so there will be the following in the source after the preprocessor has processed this header: A concrete example would look like this:

``````v8::internal::Handle<v8::internal::JSFunction> Utils::OpenHandle(
const v8::Script* that, bool allow_empty_handle) {
return v8::internal::Handle<v8::internal::JSFunction>(
``````

You can inspect the output of the preprocessor using:

``````\$ clang++ -I./out/x64.release/gen -I. -I./include -E src/api/api-inl.h > api-inl.output
``````

So where is JSFunction declared? It is defined in objects.h

## Ignition interpreter

User JavaScript also needs to have bytecode generated for them and they also use the C++ DLS and use the CodeStubAssembler -> CodeAssembler -> RawMachineAssembler just like builtins.

## C++ Domain Specific Language (DLS)

#### Build failure

After rebasing I've seen the following issue:

``````\$ ninja -C out/Debug chrome
ninja: Entering directory `out/Debug'
ninja: error: '../../chrome/renderer/resources/plugins/plugin_delay.html', needed by 'gen/chrome/grit/renderer_resources.h', missing and no known rule to make it
``````

The "solution" was to remove the out directory and rebuild.

To find suitable task you can use `label:HelpWanted` at bugs.chromium.org.

### OpenHandle

What does this call do:

``````Utils::OpenHandle(*(source->source_string));

OPEN_HANDLE_LIST(MAKE_OPEN_HANDLE)
``````

Which is a macro defined in src/api.h:

``````#define MAKE_OPEN_HANDLE(From, To)                                             \
v8::internal::Handle<v8::internal::To> Utils::OpenHandle(                    \
const v8::From* that, bool allow_empty_handle) {                         \
DCHECK(allow_empty_handle || that != NULL);                                \
DCHECK(that == NULL ||                                                     \
(*reinterpret_cast<v8::internal::Object* const*>(that))->Is##To()); \
return v8::internal::Handle<v8::internal::To>(                             \
reinterpret_cast<v8::internal::To**>(const_cast<v8::From*>(that)));    \
}

OPEN_HANDLE_LIST(MAKE_OPEN_HANDLE)
``````

If we take a closer look at the macro is should expand to something like this in our case:

`````` v8::internal::Handle<v8::internal::To> Utils::OpenHandle(const v8:String* that, false) {
DCHECK(allow_empty_handle || that != NULL);                                \
DCHECK(that == NULL ||                                                     \
(*reinterpret_cast<v8::internal::Object* const*>(that))->IsString()); \
return v8::internal::Handle<v8::internal::String>(                             \
reinterpret_cast<v8::internal::String**>(const_cast<v8::String*>(that)));    \
}
``````

So this is returning a new v8::internal::Handle, the constructor is defined in src/handles.h:95.

src/objects.cc Handle WeakFixedArray::Add(Handle maybe_array, 10167 Handle value, 10168 int* assigned_index) { Notice the name of the first parameter `maybe_array` but it is not of type maybe?

### Context

JavaScript provides a set of builtin functions and objects. These functions and objects can be changed by user code. Each context is separate collection of these objects and functions.

And internal::Context is declared in `deps/v8/src/contexts.h` and extends FixedArray

``````class Context: public FixedArray {
``````

A Context can be create by calling:

``````const v8::HandleScope handle_scope(isolate_);
Handle<Context> context = Context::New(isolate_,
nullptr,
v8::Local<v8::ObjectTemplate>());
``````

`Context::New` can be found in `src/api.cc:6405`:

``````Local<Context> v8::Context::New(
v8::Isolate* external_isolate, v8::ExtensionConfiguration* extensions,
v8::MaybeLocal<ObjectTemplate> global_template,
v8::MaybeLocal<Value> global_object,
DeserializeInternalFieldsCallback internal_fields_deserializer) {
return NewContext(external_isolate, extensions, global_template,
global_object, 0, internal_fields_deserializer);
}
``````

The declaration of this function can be found in `include/v8.h`:

``````static Local<Context> New(
Isolate* isolate, ExtensionConfiguration* extensions = NULL,
MaybeLocal<ObjectTemplate> global_template = MaybeLocal<ObjectTemplate>(),
MaybeLocal<Value> global_object = MaybeLocal<Value>(),
DeserializeInternalFieldsCallback internal_fields_deserializer =
DeserializeInternalFieldsCallback());
``````

So we can see the reason why we did not have to specify `internal_fields_deserialize`. What is `ExtensionConfiguration`?
This class can be found in `include/v8.h` and only has two members, a count of the extension names and an array with the names.

If specified these will be installed by `Boostrapper::InstallExtensions` which will delegate to `Genesis::InstallExtensions`, both can be found in `src/boostrapper.cc`. Where are extensions registered?
This is done once per process and called from `V8::Initialize()`:

``````void Bootstrapper::InitializeOncePerProcess() {
free_buffer_extension_ = new FreeBufferExtension;
v8::RegisterExtension(free_buffer_extension_);
gc_extension_ = new GCExtension(GCFunctionName());
v8::RegisterExtension(gc_extension_);
externalize_string_extension_ = new ExternalizeStringExtension;
v8::RegisterExtension(externalize_string_extension_);
statistics_extension_ = new StatisticsExtension;
v8::RegisterExtension(statistics_extension_);
trigger_failure_extension_ = new TriggerFailureExtension;
v8::RegisterExtension(trigger_failure_extension_);
ignition_statistics_extension_ = new IgnitionStatisticsExtension;
v8::RegisterExtension(ignition_statistics_extension_);
}
``````

The extensions can be found in `src/extensions`. You register your own extensions and an example of this can be found in test/context_test.cc.

``````(lldb) br s -f node.cc -l 4439
(lldb) expr context->length()
(int) \$522 = 281
``````

This output was taken

Creating a new Context is done by `v8::CreateEnvironment`

``````(lldb) br s -f api.cc -l 6565
``````
``````InvokeBootstrapper<ObjectType> invoke;
6635    result =
-> 6636        invoke.Invoke(isolate, maybe_proxy, proxy_template, extensions,
6637                      context_snapshot_index, embedder_fields_deserializer);
``````

This will later end up in `Snapshot::NewContextFromSnapshot`:

``````Vector<const byte> context_data =
ExtractContextData(blob, static_cast<uint32_t>(context_index));
SnapshotData snapshot_data(context_data);

MaybeHandle<Context> maybe_result = PartialDeserializer::DeserializeContext(
isolate, &snapshot_data, can_rehash, global_proxy,
embedder_fields_deserializer);
``````

So we can see here that the Context is deserialized from the snapshot. What does the Context contain at this stage:

``````(lldb) expr result->length()
(int) \$650 = 281
(lldb) expr result->Print()
// not inlcuding the complete output
``````

Lets take a look at an entry:

``````(lldb) expr result->get(0)->Print()
0xc201584331: [Function] in OldSpace
- map = 0xc24c002251 [FastProperties]
- prototype = 0xc201584371
- elements = 0xc2b2882251 <FixedArray[0]> [HOLEY_ELEMENTS]
- initial_map =
- shared_info = 0xc2b2887521 <SharedFunctionInfo>
- name = 0xc2b2882441 <String[0]: >
- formal_parameter_count = -1
- kind = [ NormalFunction ]
- context = 0xc201583a59 <FixedArray[281]>
- code = 0x2df1f9865a61 <Code BUILTIN>
- source code = () {}
- properties = 0xc2b2882251 <FixedArray[0]> {
#length: 0xc2cca83729 <AccessorInfo> (const accessor descriptor)
#name: 0xc2cca83799 <AccessorInfo> (const accessor descriptor)
#arguments: 0xc201587fd1 <AccessorPair> (const accessor descriptor)
#caller: 0xc201587fd1 <AccessorPair> (const accessor descriptor)
#constructor: 0xc201584c29 <JSFunction Function (sfi = 0xc2b28a6fb1)> (const data descriptor)
#apply: 0xc201588079 <JSFunction apply (sfi = 0xc2b28a7051)> (const data descriptor)
#bind: 0xc2015880b9 <JSFunction bind (sfi = 0xc2b28a70f1)> (const data descriptor)
#call: 0xc2015880f9 <JSFunction call (sfi = 0xc2b28a7191)> (const data descriptor)
#toString: 0xc201588139 <JSFunction toString (sfi = 0xc2b28a7231)> (const data descriptor)
0xc2b28bc669 <Symbol: Symbol.hasInstance>: 0xc201588179 <JSFunction [Symbol.hasInstance] (sfi = 0xc2b28a72d1)> (const data descriptor)
}

- feedback vector: not available
``````

So we can see that this is of type `[Function]` which we can cast using:

``````(lldb) expr JSFunction::cast(result->get(0))->code()->Print()
0x2df1f9865a61: [Code]
kind = BUILTIN
name = EmptyFunction
``````
``````(lldb) expr JSFunction::cast(result->closure())->Print()
0xc201584331: [Function] in OldSpace
- map = 0xc24c002251 [FastProperties]
- prototype = 0xc201584371
- elements = 0xc2b2882251 <FixedArray[0]> [HOLEY_ELEMENTS]
- initial_map =
- shared_info = 0xc2b2887521 <SharedFunctionInfo>
- name = 0xc2b2882441 <String[0]: >
- formal_parameter_count = -1
- kind = [ NormalFunction ]
- context = 0xc201583a59 <FixedArray[281]>
- code = 0x2df1f9865a61 <Code BUILTIN>
- source code = () {}
- properties = 0xc2b2882251 <FixedArray[0]> {
#length: 0xc2cca83729 <AccessorInfo> (const accessor descriptor)
#name: 0xc2cca83799 <AccessorInfo> (const accessor descriptor)
#arguments: 0xc201587fd1 <AccessorPair> (const accessor descriptor)
#caller: 0xc201587fd1 <AccessorPair> (const accessor descriptor)
#constructor: 0xc201584c29 <JSFunction Function (sfi = 0xc2b28a6fb1)> (const data descriptor)
#apply: 0xc201588079 <JSFunction apply (sfi = 0xc2b28a7051)> (const data descriptor)
#bind: 0xc2015880b9 <JSFunction bind (sfi = 0xc2b28a70f1)> (const data descriptor)
#call: 0xc2015880f9 <JSFunction call (sfi = 0xc2b28a7191)> (const data descriptor)
#toString: 0xc201588139 <JSFunction toString (sfi = 0xc2b28a7231)> (const data descriptor)
0xc2b28bc669 <Symbol: Symbol.hasInstance>: 0xc201588179 <JSFunction [Symbol.hasInstance] (sfi = 0xc2b28a72d1)> (const data descriptor)
}

- feedback vector: not available
``````

So this is the JSFunction associated with the deserialized context. Not sure what this is about as looking at the source code it looks like an empty function. A function can also be set on the context so I'm guessing that this give access to the function of a context once set. Where is function set, well it is probably deserialized but we can see it be used in `deps/v8/src/bootstrapper.cc`:

``````{
Handle<JSFunction> function = SimpleCreateFunction(isolate, factory->empty_string(), Builtins::kAsyncFunctionAwaitCaught, 2, false);
native_context->set_async_function_await_caught(*function);
}
​```console
(lldb) expr isolate()->builtins()->builtin_handle(Builtins::Name::kAsyncFunctionAwaitCaught)->Print()
``````

`Context::Scope` is a RAII class used to Enter/Exit a context. Lets take a closer look at `Enter`:

``````void Context::Enter() {
i::Handle<i::Context> env = Utils::OpenHandle(this);
i::Isolate* isolate = env->GetIsolate();
ENTER_V8_NO_SCRIPT_NO_EXCEPTION(isolate);
i::HandleScopeImplementer* impl = isolate->handle_scope_implementer();
impl->EnterContext(env);
impl->SaveContext(isolate->context());
isolate->set_context(*env);
}
``````

So the current context is saved and then the this context `env` is set as the current on the isolate. `EnterContext` will push the passed-in context (deps/v8/src/api.cc):

``````void HandleScopeImplementer::EnterContext(Handle<Context> context) {
entered_contexts_.push_back(*context);
}
...
DetachableVector<Context*> entered_contexts_;
``````
``````DetachableVector is a delegate/adaptor with some additonaly features on a std::vector.
Handle<Context> context1 = NewContext(isolate);
Handle<Context> context2 = NewContext(isolate);
Context::Scope context_scope1(context1);        // entered_contexts_ [context1], saved_contexts_[isolateContext]
Context::Scope context_scope2(context2);        // entered_contexts_ [context1, context2], saved_contexts[isolateContext, context1]
``````

Now, `SaveContext` is using the current context, not `this` context (`env`) and pushing that to the end of the saved_contexts_ vector. We can look at this as we entered context_scope2 from context_scope1:

And `Exit` looks like:

``````void Context::Exit() {
i::Handle<i::Context> env = Utils::OpenHandle(this);
i::Isolate* isolate = env->GetIsolate();
ENTER_V8_NO_SCRIPT_NO_EXCEPTION(isolate);
i::HandleScopeImplementer* impl = isolate->handle_scope_implementer();
if (!Utils::ApiCheck(impl->LastEnteredContextWas(env),
"v8::Context::Exit()",
"Cannot exit non-entered context")) {
return;
}
impl->LeaveContext();
isolate->set_context(impl->RestoreContext());
}
``````

#### EmbedderData

A context can have embedder data set on it. Like decsribed above a Context is internally A FixedArray. `SetEmbedderData` in Context is implemented in `src/api.cc`:

``````const char* location = "v8::Context::SetEmbedderData()";
i::Handle<i::FixedArray> data = EmbedderDataFor(this, index, true, location);
i::Handle<i::FixedArray> data(env->embedder_data());
``````

`location` is only used for logging and we can ignore it for now. `EmbedderDataFor`:

``````i::Handle<i::Context> env = Utils::OpenHandle(context);
...
i::Handle<i::FixedArray> data(env->embedder_data());
``````

We can find `embedder_data` in `src/contexts-inl.h`

``````#define NATIVE_CONTEXT_FIELD_ACCESSORS(index, type, name) \
inline void set_##name(type* value);                    \
inline bool is_##name(type* value) const;               \
inline type* name() const;
NATIVE_CONTEXT_FIELDS(NATIVE_CONTEXT_FIELD_ACCESSORS)
``````

And `NATIVE_CONTEXT_FIELDS` in context.h:

``````#define NATIVE_CONTEXT_FIELDS(V)                                               \
V(GLOBAL_PROXY_INDEX, JSObject, global_proxy_object)                         \
V(EMBEDDER_DATA_INDEX, FixedArray, embedder_data)                            \
...

#define NATIVE_CONTEXT_FIELD_ACCESSORS(index, type, name) \
void Context::set_##name(type* value) {                 \
DCHECK(IsNativeContext());                            \
set(index, value);                                    \
}                                                       \
bool Context::is_##name(type* value) const {            \
DCHECK(IsNativeContext());                            \
return type::cast(get(index)) == value;               \
}                                                       \
type* Context::name() const {                           \
DCHECK(IsNativeContext());                            \
return type::cast(get(index));                        \
}
NATIVE_CONTEXT_FIELDS(NATIVE_CONTEXT_FIELD_ACCESSORS)
#undef NATIVE_CONTEXT_FIELD_ACCESSORS
``````

So the preprocessor would expand this to:

``````FixedArray embedder_data() const;

void Context::set_embedder_data(FixedArray value) {
DCHECK(IsNativeContext());
set(EMBEDDER_DATA_INDEX, value);
}

bool Context::is_embedder_data(FixedArray value) const {
DCHECK(IsNativeContext());
return FixedArray::cast(get(EMBEDDER_DATA_INDEX)) == value;
}

FixedArray Context::embedder_data() const {
DCHECK(IsNativeContext());
return FixedArray::cast(get(EMBEDDER_DATA_INDEX));
}
``````

We can take a look at the initial data:

``````lldb) expr data->Print()
0x2fac3e896439: [FixedArray] in OldSpace
- map = 0x2fac9de82341 <Map(HOLEY_ELEMENTS)>
- length: 3
0-2: 0x2fac1cb822e1 <undefined>
(lldb) expr data->length()
(int) \$5 = 3
``````

And after setting:

``````(lldb) expr data->Print()
0x2fac3e896439: [FixedArray] in OldSpace
- map = 0x2fac9de82341 <Map(HOLEY_ELEMENTS)>
- length: 3
0: 0x2fac20c866e1 <String[7]: embdata>
1-2: 0x2fac1cb822e1 <undefined>

(lldb) expr v8::internal::String::cast(data->get(0))->Print()
"embdata"
``````

This was taken while debugging ContextTest::EmbedderData.

### ENTER_V8_FOR_NEW_CONTEXT

This macro is used in `CreateEnvironment` (src/api.cc) and the call in this function looks like this:

``````ENTER_V8_FOR_NEW_CONTEXT(isolate);
``````

### Factory::NewMap

This section will take a look at the following call:

``````i::Handle<i::Map> map = factory->NewMap(i::JS_OBJECT_TYPE, 24);
``````

Lets take a closer look at this function which can be found in `src/factory.cc`:

``````Handle<Map> Factory::NewMap(InstanceType type, int instance_size,
ElementsKind elements_kind,
int inobject_properties) {
CALL_HEAP_FUNCTION(
isolate(),
isolate()->heap()->AllocateMap(type, instance_size, elements_kind,
inobject_properties),
Map);
}
``````

If we take a look at factory.h we can see the default values for elements_kind and inobject_properties:

``````Handle<Map> NewMap(InstanceType type, int instance_size,
ElementsKind elements_kind = TERMINAL_FAST_ELEMENTS_KIND,
int inobject_properties = 0);
``````

If we expand the CALL_HEAP_FUNCTION macro we will get:

``````    AllocationResult __allocation__ = isolate()->heap()->AllocateMap(type,
instance_size,
elements_kind,
inobject_properties),
Object* __object__ = nullptr;
RETURN_OBJECT_UNLESS_RETRY(isolate(), Map)
/* Two GCs before panicking.  In newspace will almost always succeed. */
for (int __i__ = 0; __i__ < 2; __i__++) {
(isolate())->heap()->CollectGarbage(
__allocation__.RetrySpace(),
GarbageCollectionReason::kAllocationFailure);
__allocation__ = FUNCTION_CALL;
RETURN_OBJECT_UNLESS_RETRY(isolate, Map)
}
(isolate())->counters()->gc_last_resort_from_handles()->Increment();
(isolate())->heap()->CollectAllAvailableGarbage(
GarbageCollectionReason::kLastResort);
{
AlwaysAllocateScope __scope__(isolate());
t __allocation__ = isolate()->heap()->AllocateMap(type,
instance_size,
elements_kind,
inobject_properties),
}
RETURN_OBJECT_UNLESS_RETRY(isolate, Map)
/* TODO(1181417): Fix this. */
v8::internal::Heap::FatalProcessOutOfMemory("CALL_AND_RETRY_LAST", true);
return Handle<Map>();
``````

So, lets take a look at `isolate()->heap()->AllocateMap` in 'src/heap/heap.cc':

``````  HeapObject* result = nullptr;
AllocationResult allocation = AllocateRaw(Map::kSize, MAP_SPACE);
``````

`AllocateRaw` can be found in src/heap/heap-inl.h:

``````  bool large_object = size_in_bytes > kMaxRegularHeapObjectSize;
HeapObject* object = nullptr;
AllocationResult allocation;
if (NEW_SPACE == space) {
if (large_object) {
space = LO_SPACE;
} else {
allocation = new_space_->AllocateRaw(size_in_bytes, alignment);
if (allocation.To(&object)) {
OnAllocationEvent(object, size_in_bytes);
}
return allocation;
}
}
} else if (MAP_SPACE == space) {
allocation = map_space_->AllocateRawUnaligned(size_in_bytes);
}
``````
``````(lldb) expr large_object
(bool) \$3 = false
(lldb) expr size_in_bytes
(int) \$5 = 80
(lldb) expr map_space_
(v8::internal::MapSpace *) \$6 = 0x0000000104700f60
``````

`AllocateRawUnaligned` can be found in `src/heap/spaces-inl.h`

``````  HeapObject* object = AllocateLinearly(size_in_bytes);
``````

### v8::internal::Object

Is an abstract super class for all classes in the object hierarch and both Smi and HeapObject are subclasses of Object so there are no data members in object only functions. For example:

``````  bool IsObject() const { return true; }
INLINE(bool IsSmi() const
INLINE(bool IsLayoutDescriptor() const
INLINE(bool IsHeapObject() const
INLINE(bool IsPrimitive() const
INLINE(bool IsNumber() const
INLINE(bool IsNumeric() const
INLINE(bool IsAbstractCode() const
INLINE(bool IsAccessCheckNeeded() const
INLINE(bool IsArrayList() const
INLINE(bool IsBigInt() const
INLINE(bool IsUndefined() const
INLINE(bool IsNull() const
INLINE(bool IsTheHole() const
INLINE(bool IsException() const
INLINE(bool IsUninitialized() const
INLINE(bool IsTrue() const
INLINE(bool IsFalse() const
...
``````

### v8::internal::Smi

Extends v8::internal::Object and are not allocated on the heap. There are no members as the pointer itself is used to store the information.

In our case the calling v8::Isolate::New which is done by the test fixture:

``````virtual void SetUp() {
isolate_ = v8::Isolate::New(create_params_);
}
``````

This will call:

``````Isolate* Isolate::New(const Isolate::CreateParams& params) {
Isolate* isolate = Allocate();
Initialize(isolate, params);
return isolate;
}
``````

In `Isolate::Initialize` we'll call `i::Snapshot::Initialize(i_isolate)`:

``````if (params.entry_hook || !i::Snapshot::Initialize(i_isolate)) {
...
``````

Which will call:

``````bool success = isolate->Init(&deserializer);
``````

Before this call all the roots are uninitialized. Reading this blog it says that the Isolate class contains a roots table. It looks to me that the Heap contains this data structure but perhaps that is what they meant.

``````(lldb) bt 3
* frame #0: 0x0000000101584f43 libv8.dylib`v8::internal::StartupDeserializer::DeserializeInto(this=0x00007ffeefbfe200, isolate=0x000000010481cc00) at startup-deserializer.cc:39
frame #1: 0x0000000101028bb6 libv8.dylib`v8::internal::Isolate::Init(this=0x000000010481cc00, des=0x00007ffeefbfe200) at isolate.cc:3036
frame #2: 0x000000010157c682 libv8.dylib`v8::internal::Snapshot::Initialize(isolate=0x000000010481cc00) at snapshot-common.cc:54
``````

In `startup-deserializer.cc` we can find `StartupDeserializer::DeserializeInto`:

``````  DisallowHeapAllocation no_gc;
isolate->heap()->IterateSmiRoots(this);
isolate->heap()->IterateStrongRoots(this, VISIT_ONLY_STRONG);
``````

After If we take a look in `src/roots.h` we can find the read-only roots in Heap. If we take the 10 value, which is:

``````V(String, empty_string, empty_string)                                        \
``````

we can then inspect this value:

``````(lldb) expr roots_[9]
(v8::internal::Object *) \$32 = 0x0000152d30b82851
(lldb) expr roots_[9]->IsString()
(bool) \$30 = true
(lldb) expr roots_[9]->Print()
#
``````

So this entry is a pointer to objects on the managed heap which have been deserialized from the snapshot.

The heap class has a lot of members that are initialized during construction by the body of the constructor looks like this:

``````{
// Ensure old_generation_size_ is a multiple of kPageSize.
DCHECK_EQ(0, max_old_generation_size_ & (Page::kPageSize - 1));

memset(roots_, 0, sizeof(roots_[0]) * kRootListLength);
set_native_contexts_list(nullptr);
set_allocation_sites_list(Smi::kZero);
set_encountered_weak_collections(Smi::kZero);
// Put a dummy entry in the remembered pages so we can find the list the
// minidump even if there are no real unmapped pages.
RememberUnmappedPage(nullptr, false);
}
``````

We can see that roots_ is filled with 0 values. We can inspect `roots_` using:

``````(lldb) expr roots_
(lldb) expr RootListIndex::kRootListLength
(int) \$16 = 509
``````

Now they are all 0 at this stage, so when will this array get populated?
These will happen in `Isolate::Init`:

``````  heap_.SetUp()
if (!create_heap_objects) des->DeserializeInto(this);

void StartupDeserializer::DeserializeInto(Isolate* isolate) {
-> 17    Initialize(isolate);
startup-deserializer.cc:37

isolate->heap()->IterateSmiRoots(this);
``````

This will delegate to `ConfigureHeapDefaults()` which will call Heap::ConfigureHeap:

``````enum RootListIndex {
kFreeSpaceMapRootIndex,
kOnePointerFillerMapRootIndex,
...
}
``````
``````(lldb) expr heap->RootListIndex::kFreeSpaceMapRootIndex
(int) \$3 = 0
(lldb) expr heap->RootListIndex::kOnePointerFillerMapRootIndex
(int) \$4 = 1
``````

### MemoryChunk

Found in `src/heap/spaces.h` an instace of a MemoryChunk represents a region in memory that is owned by a specific space.

### Embedded builtins

In the blog post explains how the builtins are embedded into the executable in to the .TEXT section which is readonly and therefore can be shared amoung multiple processes. We know that builtins are compiled and stored in the snapshot but now it seems that the are instead placed in to `out.gn/learning/gen/embedded.cc` and the combined with the object files from the compile to produce the libv8.dylib. V8 has a configuration option named `v8_enable_embedded_builtins` which which case `embedded.cc` will be added to the list of sources. This is done in `BUILD.gn` and the `v8_snapshot` target. If `v8_enable_embedded_builtins` is false then `src/snapshot/embedded-empty.cc` will be included instead. Both of these files have the following functions:

``````const uint8_t* DefaultEmbeddedBlob()
uint32_t DefaultEmbeddedBlobSize()

#ifdef V8_MULTI_SNAPSHOTS
const uint8_t* TrustedEmbeddedBlob()
uint32_t TrustedEmbeddedBlobSize()
#endif
``````

These functions are used by `isolate.cc` and declared `extern`:

``````extern const uint8_t* DefaultEmbeddedBlob();
extern uint32_t DefaultEmbeddedBlobSize();
``````

And the usage of `DefaultEmbeddedBlob` can be see in Isolate::Isolate where is sets the embedded blob:

``````SetEmbeddedBlob(DefaultEmbeddedBlob(), DefaultEmbeddedBlobSize());
``````

Lets set a break point there and see if this is empty of not.

``````(lldb) expr v8_embedded_blob_size_
(uint32_t) \$0 = 4021088
``````

So we can see that we are not using the empty one. Isolate::SetEmbeddedBlob

We can see in `src/snapshot/deserializer.cc` (line 552) we have a check for the embedded_blob():

``````  CHECK_NOT_NULL(isolate->embedded_blob());
EmbeddedData d = EmbeddedData::FromBlob();
``````

`EmbeddedData can be found in `src/snapshot/snapshot.h` and the implementation can be found in snapshot-common.cc.

``````Address EmbeddedData::InstructionStartOfBuiltin(int i) const {
const uint8_t* result = RawData() + metadata[i].instructions_offset;
}
``````
``````(lldb) expr *metadata
(const v8::internal::EmbeddedData::Metadata) \$7 = (instructions_offset = 0, instructions_length = 1464)
``````
``````  struct Metadata {
// Blob layout information.
uint32_t instructions_offset;
uint32_t instructions_length;
};
``````
``````(lldb) expr *this
(v8::internal::EmbeddedData) \$10 = (data_ = "\xffffffdc\xffffffc0\xffffff88'"y[\xffffffd6", size_ = 4021088)
(const v8::internal::EmbeddedData::Metadata) \$8 = (instructions_offset = 0, instructions_length = 1464)
``````

So, is it possible for us to verify that this information is in the .text section?

``````(lldb) expr result
(const uint8_t *) \$13 = 0x0000000101b14ee0 "UH\x89�jH\x83�(H\x89U�H�\x16H\x89}�H�u�H�E�H\x89U�H\x83�
(lldb) image lookup --address 0x0000000101b14ee0 --verbose
Summary: libv8.dylib`v8_Default_embedded_blob_ + 7072
Module: file = "/Users/danielbevenius/work/google/javascript/v8/out.gn/learning/libv8.dylib", arch = "x86_64"
Symbol: id = {0x0004b596}, range = [0x0000000101b13340-0x0000000101ee8ea0), name="v8_Default_embedded_blob_"
``````

So what we have is a pointer to the .text segment which is returned:

``````(lldb) memory read -f x -s 1 -c 13 0x0000000101b14ee0
0x101b14ee0: 0x55 0x48 0x89 0xe5 0x6a 0x18 0x48 0x83
0x101b14ee8: 0xec 0x28 0x48 0x89 0x55
``````

And we can compare this with `out.gn/learning/gen/embedded.cc`:

``````V8_EMBEDDED_TEXT_HEADER(v8_Default_embedded_blob_)
__asm__(
...
".byte 0x55,0x48,0x89,0xe5,0x6a,0x18,0x48,0x83,0xec,0x28,0x48,0x89,0x55\n"
...
);
``````

The macro `V8_EMBEDDED_TEXT_HEADER` can be found `src/snapshot/macros.h`:

``````#define V8_EMBEDDED_TEXT_HEADER(LABEL)         \
__asm__(V8_ASM_DECLARE(#LABEL)               \
".csect " #LABEL "[DS]\n"            \
#LABEL ":\n"                         \
".llong ." #LABEL ", TOC[tc0], 0\n"  \
V8_ASM_TEXT_SECTION                  \
"." #LABEL ":\n");

define V8_ASM_DECLARE(NAME) ".private_extern " V8_ASM_MANGLE_LABEL NAME "\n"
#define V8_ASM_MANGLE_LABEL "_"
#define V8_ASM_TEXT_SECTION ".csect .text[PR]\n"
``````

And would be expanded by the preprocessor into:

``````  __asm__(".private_extern " _ v8_Default_embedded_blob_ "\n"
".csect " v8_Default_embedded_blob_ "[DS]\n"
v8_Default_embedded_blob_ ":\n"
".llong ." v8_Default_embedded_blob_ ", TOC[tc0], 0\n"
".csect .text[PR]\n"
"." v8_Default_embedded_blob_ ":\n");
__asm__(
...
".byte 0x55,0x48,0x89,0xe5,0x6a,0x18,0x48,0x83,0xec,0x28,0x48,0x89,0x55\n"
...
);
``````

Back in `src/snapshot/deserialzer.cc` we are on this line:

``````  Address address = d.InstructionStartOfBuiltin(builtin_index);
if (RelocInfo::OffHeapTargetIsCodedSpecially()) {
// is false in our case so skipping the code here
} else {
UnalignedCopy(current, &o);
current++;
}
break;
``````

### print-code

``````\$ ./d8 -print-bytecode  -print-code sample.js
[generated bytecode for function:  (0x2a180824ffbd <SharedFunctionInfo>)]
Parameter count 1
Register count 5
Frame size 40
0x2a1808250066 @    0 : 12 00             LdaConstant [0]
0x2a1808250068 @    2 : 26 f9             Star r2
0x2a180825006a @    4 : 27 fe f8          Mov <closure>, r3
0x2a180825006d @    7 : 61 32 01 f9 02    CallRuntime [DeclareGlobals], r2-r3
0x2a1808250072 @   12 : 0b                LdaZero
0x2a1808250073 @   13 : 26 fa             Star r1
0x2a1808250075 @   15 : 0d                LdaUndefined
0x2a1808250076 @   16 : 26 fb             Star r0
0x2a1808250078 @   18 : 00 0c 10 27       LdaSmi.Wide [10000]
0x2a180825007c @   22 : 69 fa 00          TestLessThan r1, [0]
0x2a180825007f @   25 : 9a 1c             JumpIfFalse [28] (0x2a180825009b @ 53)
0x2a1808250081 @   27 : a7                StackCheck
0x2a1808250082 @   28 : 13 01 01          LdaGlobal [1], [1]
0x2a1808250085 @   31 : 26 f9             Star r2
0x2a1808250087 @   33 : 0c 02             LdaSmi [2]
0x2a1808250089 @   35 : 26 f7             Star r4
0x2a180825008b @   37 : 5e f9 fa f7 03    CallUndefinedReceiver2 r2, r1, r4, [3]
0x2a1808250090 @   42 : 26 fb             Star r0
0x2a1808250092 @   44 : 25 fa             Ldar r1
0x2a1808250094 @   46 : 4c 05             Inc [5]
0x2a1808250096 @   48 : 26 fa             Star r1
0x2a1808250098 @   50 : 8a 20 00          JumpLoop [32], [0] (0x2a1808250078 @ 18)
0x2a180825009b @   53 : 25 fb             Ldar r0
0x2a180825009d @   55 : ab                Return
Constant pool (size = 2)
0x2a1808250035: [FixedArray] in OldSpace
- map: 0x2a18080404b1 <Map>
- length: 2
0: 0x2a180824ffe5 <FixedArray[2]>
1: 0x2a180824ff61 <String[#9]: something>
Handler Table (size = 0)
Source Position Table (size = 0)
[generated bytecode for function: something (0x2a180824fff5 <SharedFunctionInfo something>)]
Parameter count 3
Register count 0
Frame size 0
0x2a18082501ba @    0 : 25 02             Ldar a1
0x2a18082501bc @    2 : 34 03 00          Add a0, [0]
0x2a18082501bf @    5 : ab                Return
Constant pool (size = 0)
Handler Table (size = 0)
Source Position Table (size = 0)
--- Raw source ---
function something(x, y) {
return x + y
}
for (let i = 0; i < 10000; i++) {
something(i, 2);
}

--- Optimized code ---
optimization_id = 0
source_position = 0
kind = OPTIMIZED_FUNCTION
stack_slots = 14
compiler = turbofan

Instructions (size = 536)
0x108400082b20     0  488d1df9ffffff REX.W leaq rbx,[rip+0xfffffff9]
0x108400082b27     7  483bd9         REX.W cmpq rbx,rcx
0x108400082b2a     a  7418           jz 0x108400082b44  <+0x24>
0x108400082b2c     c  48ba6800000000000000 REX.W movq rdx,0x68
0x108400082b36    16  49bae0938c724b560000 REX.W movq r10,0x564b728c93e0  (Abort)    ;; off heap target
0x108400082b40    20  41ffd2         call r10
0x108400082b43    23  cc             int3l
0x108400082b44    24  8b59d0         movl rbx,[rcx-0x30]
0x108400082b47    27  4903dd         REX.W addq rbx,r13
0x108400082b4a    2a  f6430701       testb [rbx+0x7],0x1
0x108400082b4e    2e  740d           jz 0x108400082b5d  <+0x3d>
0x108400082b50    30  49bae0f781724b560000 REX.W movq r10,0x564b7281f7e0  (CompileLazyDeoptimizedCode)    ;; off heap target
0x108400082b5a    3a  41ffe2         jmp r10
0x108400082b5d    3d  55             push rbp
0x108400082b5e    3e  4889e5         REX.W movq rbp,rsp
0x108400082b61    41  56             push rsi
0x108400082b62    42  57             push rdi
0x108400082b63    43  48ba4200000000000000 REX.W movq rdx,0x42
0x108400082b6d    4d  4c8b15c4ffffff REX.W movq r10,[rip+0xffffffc4]
0x108400082b74    54  41ffd2         call r10
0x108400082b77    57  cc             int3l
0x108400082b78    58  4883ec18       REX.W subq rsp,0x18
0x108400082b7c    5c  488975a0       REX.W movq [rbp-0x60],rsi
0x108400082b80    60  488b4dd0       REX.W movq rcx,[rbp-0x30]
0x108400082b84    64  f6c101         testb rcx,0x1
0x108400082b87    67  0f8557010000   jnz 0x108400082ce4  <+0x1c4>
0x108400082b8d    6d  81f9204e0000   cmpl rcx,0x4e20
0x108400082b93    73  0f8c0b000000   jl 0x108400082ba4  <+0x84>
0x108400082b99    79  488b45d8       REX.W movq rax,[rbp-0x28]
0x108400082b9d    7d  488be5         REX.W movq rsp,rbp
0x108400082ba0    80  5d             pop rbp
0x108400082ba1    81  c20800         ret 0x8
0x108400082ba4    84  493b6560       REX.W cmpq rsp,[r13+0x60] (external value (StackGuard::address_of_jslimit()))
0x108400082ba8    88  0f8669000000   jna 0x108400082c17  <+0xf7>
0x108400082bae    8e  488bf9         REX.W movq rdi,rcx
0x108400082bb1    91  d1ff           sarl rdi, 1
0x108400082bb3    93  4c8bc7         REX.W movq r8,rdi
0x108400082bba    9a  0f8030010000   jo 0x108400082cf0  <+0x1d0>
0x108400082bc3    a3  0f8033010000   jo 0x108400082cfc  <+0x1dc>
0x108400082bc9    a9  e921000000     jmp 0x108400082bef  <+0xcf>
0x108400082bce    ae  6690           nop
0x108400082bd0    b0  488bcf         REX.W movq rcx,rdi
0x108400082bd6    b6  0f802c010000   jo 0x108400082d08  <+0x1e8>
0x108400082bdc    bc  4c8bc7         REX.W movq r8,rdi
0x108400082be3    c3  0f802b010000   jo 0x108400082d14  <+0x1f4>
0x108400082be9    c9  498bf8         REX.W movq rdi,r8
0x108400082bec    cc  4c8bc1         REX.W movq r8,rcx
0x108400082bef    cf  81ff10270000   cmpl rdi,0x2710
0x108400082bf5    d5  0f8d0b000000   jge 0x108400082c06  <+0xe6>
0x108400082bfb    db  493b6560       REX.W cmpq rsp,[r13+0x60] (external value (StackGuard::address_of_jslimit()))
0x108400082bff    df  77cf           ja 0x108400082bd0  <+0xb0>
0x108400082c01    e1  e943000000     jmp 0x108400082c49  <+0x129>
0x108400082c06    e6  498bc8         REX.W movq rcx,r8
0x108400082c0c    ec  0f8061000000   jo 0x108400082c73  <+0x153>
0x108400082c12    f2  488bc1         REX.W movq rax,rcx
0x108400082c15    f5  eb86           jmp 0x108400082b9d  <+0x7d>
0x108400082c17    f7  33c0           xorl rax,rax
0x108400082c19    f9  48bef50c240884100000 REX.W movq rsi,0x108408240cf5    ;; object: 0x108408240cf5 <NativeContext[261]>
0x108400082c23   103  48bb101206724b560000 REX.W movq rbx,0x564b72061210    ;; external reference (Runtime::StackGuard)
0x108400082c2d   10d  488bf8         REX.W movq rdi,rax
0x108400082c30   110  4c8bc6         REX.W movq r8,rsi
0x108400082c33   113  49ba2089a3724b560000 REX.W movq r10,0x564b72a38920  (CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit)    ;; off heap target
0x108400082c3d   11d  41ffd2         call r10
0x108400082c40   120  488b4dd0       REX.W movq rcx,[rbp-0x30]
0x108400082c44   124  e965ffffff     jmp 0x108400082bae  <+0x8e>
0x108400082c49   129  48897da8       REX.W movq [rbp-0x58],rdi
0x108400082c4d   12d  488b1dd1ffffff REX.W movq rbx,[rip+0xffffffd1]
0x108400082c54   134  33c0           xorl rax,rax
0x108400082c56   136  48bef50c240884100000 REX.W movq rsi,0x108408240cf5    ;; object: 0x108408240cf5 <NativeContext[261]>
0x108400082c60   140  4c8b15ceffffff REX.W movq r10,[rip+0xffffffce]
0x108400082c67   147  41ffd2         call r10
0x108400082c6a   14a  488b7da8       REX.W movq rdi,[rbp-0x58]
0x108400082c6e   14e  e95dffffff     jmp 0x108400082bd0  <+0xb0>
0x108400082c73   153  48b968ea2f744b560000 REX.W movq rcx,0x564b742fea68    ;; external reference (Heap::NewSpaceAllocationTopAddress())
0x108400082c7d   15d  488b39         REX.W movq rdi,[rcx]
0x108400082c80   160  4c8d4f0c       REX.W leaq r9,[rdi+0xc]
0x108400082c84   164  4c8945b0       REX.W movq [rbp-0x50],r8
0x108400082c88   168  49bb70ea2f744b560000 REX.W movq r11,0x564b742fea70    ;; external reference (Heap::NewSpaceAllocationLimitAddress())
0x108400082c92   172  4d390b         REX.W cmpq [r11],r9
0x108400082c95   175  0f8721000000   ja 0x108400082cbc  <+0x19c>
0x108400082c9b   17b  ba0c000000     movl rdx,0xc
0x108400082ca0   180  49ba200282724b560000 REX.W movq r10,0x564b72820220  (AllocateRegularInYoungGeneration)    ;; off heap target
0x108400082caa   18a  41ffd2         call r10
0x108400082cad   18d  488d78ff       REX.W leaq rdi,[rax-0x1]
0x108400082cb1   191  488b0dbdffffff REX.W movq rcx,[rip+0xffffffbd]
0x108400082cb8   198  4c8b45b0       REX.W movq r8,[rbp-0x50]
0x108400082cbc   19c  4c8d4f0c       REX.W leaq r9,[rdi+0xc]
0x108400082cc0   1a0  4c8909         REX.W movq [rcx],r9
0x108400082cc3   1a3  488d4f01       REX.W leaq rcx,[rdi+0x1]
0x108400082cc7   1a7  498bbd40010000 REX.W movq rdi,[r13+0x140] (root (heap_number_map))
0x108400082cce   1ae  8979ff         movl [rcx-0x1],rdi
0x108400082cd1   1b1  c4c1032ac0     vcvtlsi2sd xmm0,xmm15,r8
0x108400082cd6   1b6  c5fb114103     vmovsd [rcx+0x3],xmm0
0x108400082cdb   1bb  488bc1         REX.W movq rax,rcx
0x108400082cde   1be  e9bafeffff     jmp 0x108400082b9d  <+0x7d>
0x108400082ce3   1c3  90             nop
0x108400082ce4   1c4  49c7c500000000 REX.W movq r13,0x0
0x108400082ceb   1cb  e850f30300     call 0x1084000c2040     ;; eager deoptimization bailout
0x108400082cf0   1d0  49c7c501000000 REX.W movq r13,0x1
0x108400082cf7   1d7  e844f30300     call 0x1084000c2040     ;; eager deoptimization bailout
0x108400082cfc   1dc  49c7c502000000 REX.W movq r13,0x2
0x108400082d03   1e3  e838f30300     call 0x1084000c2040     ;; eager deoptimization bailout
0x108400082d08   1e8  49c7c503000000 REX.W movq r13,0x3
0x108400082d0f   1ef  e82cf30300     call 0x1084000c2040     ;; eager deoptimization bailout
0x108400082d14   1f4  49c7c504000000 REX.W movq r13,0x4
0x108400082d1b   1fb  e820f30300     call 0x1084000c2040     ;; eager deoptimization bailout
0x108400082d20   200  49c7c505000000 REX.W movq r13,0x5
0x108400082d27   207  e814f30700     call 0x108400102040     ;; lazy deoptimization bailout
0x108400082d2c   20c  49c7c506000000 REX.W movq r13,0x6
0x108400082d33   213  e808f30700     call 0x108400102040     ;; lazy deoptimization bailout

Source positions:
pc offset  position
f7         0

Inlined functions (count = 1)
0x10840824fff5 <SharedFunctionInfo something>

Deoptimization Input Data (deopt points = 7)
index  bytecode-offset    pc
0               22    NA
1                2    NA
2               46    NA
3                2    NA
4               46    NA
5               27   120
6               27   14a

Safepoints (size = 50)
0x108400082c40     120   200  10000010000000 (sp -> fp)       5
0x108400082c6a     14a   20c  10000000000000 (sp -> fp)       6
0x108400082cad     18d    NA  00000000000000 (sp -> fp)  <none>

RelocInfo (size = 34)
0x108400082b38  off heap target
0x108400082b52  off heap target
0x108400082c1b  full embedded object  (0x108408240cf5 <NativeContext[261]>)
0x108400082c25  external reference (Runtime::StackGuard)  (0x564b72061210)
0x108400082c35  off heap target
0x108400082c58  full embedded object  (0x108408240cf5 <NativeContext[261]>)
0x108400082ca2  off heap target
0x108400082cec  runtime entry  (eager deoptimization bailout)
0x108400082cf8  runtime entry  (eager deoptimization bailout)
0x108400082d04  runtime entry  (eager deoptimization bailout)
0x108400082d10  runtime entry  (eager deoptimization bailout)
0x108400082d1c  runtime entry  (eager deoptimization bailout)
0x108400082d28  runtime entry  (lazy deoptimization bailout)
0x108400082d34  runtime entry  (lazy deoptimization bailout)

--- End code ---
\$
``````

``````\$ mkdir lib
\$ mkdir deps ; cd deps
\$ /usr/bin/clang++ --std=c++14 -Iinclude -I. -pthread -c src/gtest-all.cc
\$ ar -rv libgtest-linux.a gtest-all.o
\$ cp libgtest-linux.a ../../../../lib/gtest
``````

``````./lib/gtest/libgtest-linux.a(gtest-all.o):gtest-all.cc:function testing::internal::BoolFromGTestEnv(char const*, bool): error: undefined reference to 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::c_str() const'
``````
``````\$ nm lib/gtest/libgtest-linux.a | grep basic_string | c++filt
....
``````

There are a lot of symbols listed above but the point is that in the object file of `libgtest-linux.a` these symbols were compiled in. Now, when we compile v8 and the tests we are using `-std=c++14` and we have to use the same when compiling gtest. Lets try that. Just adding that does not help in this case. We need to check which c++ headers are being used:

``````\$ /usr/bin/clang++ -print-search-dirs
programs: =/usr/bin:/usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../x86_64-redhat-linux/bin
libraries: =/usr/lib64/clang/9.0.0:
/usr/bin/../lib/gcc/x86_64-redhat-linux/9:
/usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../../../lib64:
/usr/bin/../lib64:
/lib/../lib64:
/usr/lib/../lib64:
/usr/bin/../lib/gcc/x86_64-redhat-linux/9/../../..:
/usr/bin/../lib:
/lib:/usr/lib
\$
``````

Lets search for the `string` header and inspect the namespace in that header:

``````\$ find /usr/ -name string
/usr/include/c++/9/debug/string
/usr/include/c++/9/experimental/string
/usr/include/c++/9/string
/usr/src/debug/gcc-9.2.1-1.fc31.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/string
``````
``````\$ vi /usr/include/c++/9/string
``````

So this looks alright and thinking about this a little more I've been bitten by the linking with different libc++ symbols issue (again). When we compile using Make we are using the c++ headers that are shipped with v8 (clang libc++). Take the string header for example in v8/buildtools/third_party/libc++/trunk/include/string which is from clang's c++ library which does not use namespaces (__11 or __14 etc).

But when I compiled gtest did not specify the istystem include path and the default would be used adding symbols with __11 into them. When the linker tries to find these symbols it fails as it does not have any such symbols in the libraries that it searches.

Create a simple test linking with the standard build of gtest to see if that compiles and runs:

``````\$ /usr/bin/clang++ -std=c++14 -I./deps/googletest/googletest/include  -L\$PWD/lib -g -O0 -o test/simple_test test/main.cc test/simple.cc lib/libgtest.a -lpthread
``````

That worked and does not segfault.

But when I run the version that is built using the makefile I get:

``````lldb) target create "./test/persistent-object_test"
Current executable set to './test/persistent-object_test' (x86_64).
(lldb) r
warning: (x86_64) /lib64/libgcc_s.so.1 unsupported DW_FORM values: 0x1f20 0x1f21

[ FATAL ] Process 1024232 stopped
frame #0: 0x00007ffff7c0a7b0 libc.so.6`__GI___libc_free + 32
libc.so.6`__GI___libc_free:
->  0x7ffff7c0a7b0 <+32>: mov    rax, qword ptr [rdi - 0x8]
0x7ffff7c0a7b4 <+36>: lea    rsi, [rdi - 0x10]
0x7ffff7c0a7b8 <+40>: test   al, 0x2
0x7ffff7c0a7ba <+42>: jne    0x7ffff7c0a7f0            ; <+96>
(lldb) bt
* frame #0: 0x00007ffff7c0a7b0 libc.so.6`__GI___libc_free + 32
frame #1: 0x000000000042bb58 persistent-object_test`std::__1::basic_stringbuf<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_stringbuf(this=0x000000000046e908) at iosfwd:130:32
frame #2: 0x000000000042ba4f persistent-object_test`std::__1::basic_stringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_stringstream(this=0x000000000046e8f0, vtt=0x000000000044db28) at iosfwd:139:32
frame #3: 0x0000000000420176 persistent-object_test`std::__1::basic_stringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_stringstream(this=0x000000000046e8f0) at iosfwd:139:32
frame #4: 0x000000000042bacc persistent-object_test`std::__1::basic_stringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_stringstream(this=0x000000000046e8f0) at iosfwd:139:32
frame #5: 0x0000000000427f4e persistent-object_test`testing::internal::scoped_ptr<std::__1::basic_stringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> > >::reset(this=0x00007fffffffcee8, p=0x0000000000000000) at gtest-port.h:1216:9
frame #6: 0x0000000000427ee9 persistent-object_test`testing::internal::scoped_ptr<std::__1::basic_stringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> > >::~scoped_ptr(this=0x00007fffffffcee8) at gtest-port.h:1201:19
frame #7: 0x000000000041f265 persistent-object_test`testing::Message::~Message(this=0x00007fffffffcee8) at gtest-message.h:89:18
frame #8: 0x00000000004235ec persistent-object_test`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > testing::internal::StreamableToString<int>(streamable=0x00007fffffffcf9c) at gtest-message.h:247:3
frame #11: 0x000000000042242c persistent-object_test`testing::internal::UnitTestImpl::AddTestInfo(this=0x000000000046e480, set_up_tc=(persistent-object_test`testing::Test::SetUpTestCase() at gtest.h:427), tear_down_tc=(persistent-object_test`testing::Test::TearDownTestCase() at gtest.h:435), test_info=0x000000000046e320)(), void (*)(), testing::TestInfo*) at gtest-internal-inl.h:663:7
frame #12: 0x000000000040d04f persistent-object_test`testing::internal::MakeAndRegisterTestInfo(test_case_name="Persistent", name="object", type_param=0x0000000000000000, value_param=0x0000000000000000, code_location=<unavailable>, fixture_class_id=0x000000000046d748, set_up_tc=(persistent-object_test`testing::Test::SetUpTestCase() at gtest.h:427), tear_down_tc=(persistent-object_test`testing::Test::TearDownTestCase() at gtest.h:435), factory=0x000000000046e300)(), void (*)(), testing::internal::TestFactoryBase*) at gtest.cc:2599:22
frame #13: 0x00000000004048b8 persistent-object_test`::__cxx_global_var_init() at persistent-object_test.cc:5:1
frame #14: 0x00000000004048e9 persistent-object_test`_GLOBAL__sub_I_persistent_object_test.cc at persistent-object_test.cc:0
frame #15: 0x00000000004497a5 persistent-object_test`__libc_csu_init + 69
frame #16: 0x00007ffff7ba512e libc.so.6`__libc_start_main + 126
frame #17: 0x0000000000404eba persistent-object_test`_start + 42
``````

This issue came up when linking a unit test with gtest:

``````/usr/bin/ld: ./lib/gtest/libgtest-linux.a(gtest-all.o): in function `testing::internal::BoolFromGTestEnv(char const*, bool)':
``````

So this indicated that the object files in `libgtest-linux.a` where infact using headers from libc++ and not libstc++. This was a really stupig mistake on my part, I'd not specified the output file explicitly (-o) so this was getting added into the current working directory, but the file included in the archive was taken from within deps/googltest/googletest/ directory which was old and compiled using libc++.

### Peristent cast-function-type

This issue was seen in Node.js when compiling with GCC. It can also been see if building V8 using GCC and also enabling `-Wcast-function-type` in BUILD.gn:

``````      "-Wcast-function-type",
``````

There are unit tests in V8 that also produce this warning, for example `test/cctest/test-global-handles.cc`: Original:

``````g++ -MMD -MF obj/test/cctest/cctest_sources/test-global-handles.o.d -DV8_INTL_SUPPORT -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_X11=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DCR_SYSROOT_HASH=9c905c99558f10e19cc878b5dca1d4bd58c607ae -D_DEBUG -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DENABLE_DISASSEMBLER -DV8_TYPED_ARRAY_MAX_SIZE_IN_HEAP=64 -DENABLE_GDB_JIT_INTERFACE -DENABLE_MINOR_MC -DOBJECT_PRINT -DV8_TRACE_MAPS -DV8_ENABLE_ALLOCATION_TIMEOUT -DV8_ENABLE_FORCE_SLOW_PATH -DV8_ENABLE_DOUBLE_CONST_STORE_CHECK -DV8_INTL_SUPPORT -DENABLE_HANDLE_ZAPPING -DV8_SNAPSHOT_NATIVE_CODE_COUNTERS -DV8_CONCURRENT_MARKING -DV8_ENABLE_LAZY_SOURCE_POSITIONS -DV8_CHECK_MICROTASKS_SCOPES_CONSISTENCY -DV8_EMBEDDED_BUILTINS -DV8_WIN64_UNWINDING_INFO -DV8_ENABLE_REGEXP_INTERPRETER_THREADED_DISPATCH -DV8_SNAPSHOT_COMPRESSION -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DV8_IMMINENT_DEPRECATION_WARNINGS -DV8_TARGET_ARCH_X64 -DV8_HAVE_TARGET_OS -DV8_TARGET_OS_LINUX -DDEBUG -DDISABLE_UNTRUSTED_CODE_MITIGATIONS -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DV8_IMMINENT_DEPRECATION_WARNINGS -DU_USING_ICU_NAMESPACE=0 -DU_ENABLE_DYLOAD=0 -DUSE_CHROMIUM_ICU=1 -DU_STATIC_IMPLEMENTATION -DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -DUCHAR_TYPE=uint16_t -I../.. -Igen -I../../include -Igen/include -I../.. -Igen -I../../third_party/icu/source/common -I../../third_party/icu/source/i18n -I../../include -I../../tools/debug_helper -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -funwind-tables -fPIC -pipe -B../../third_party/binutils/Linux_x64/Release/bin -pthread -m64 -march=x86-64 -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -Wall -Wno-unused-local-typedefs -Wno-maybe-uninitialized -Wno-deprecated-declarations -Wno-comments -Wno-packed-not-aligned -Wno-missing-field-initializers -Wno-unused-parameter -fno-omit-frame-pointer -g2 -Wno-strict-overflow -Wno-return-type -Wcast-function-type -O3 -fno-ident -fdata-sections -ffunction-sections -fvisibility=default -std=gnu++14 -Wno-narrowing -Wno-class-memaccess -fno-exceptions -fno-rtti --sysroot=../../build/linux/debian_sid_amd64-sysroot -c ../../test/cctest/test-global-handles.cc -o obj/test/cctest/cctest_sources/test-global-handles.o
In file included from ../../include/v8-inspector.h:14,
from ../../src/execution/isolate.h:15,
from ../../src/api/api.h:10,
from ../../src/api/api-inl.h:8,
from ../../test/cctest/test-global-handles.cc:28:
../../include/v8.h: In instantiation of ‘void v8::PersistentBase<T>::SetWeak(P*, typename v8::WeakCallbackInfo<P>::Callback, v8::WeakCallbackType) [with P = v8::Global<v8::Object>; T = v8::Object; typename v8::WeakCallbackInfo<P>::Callback = void (*)(const v8::WeakCallbackInfo<v8::Global<v8::Object> >&)]’:
../../test/cctest/test-global-handles.cc:292:47:   required from here
../../include/v8.h:10750:16: warning: cast between incompatible function types from ‘v8::WeakCallbackInfo<v8::Global<v8::Object> >::Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<v8::Global<v8::Object> >&)’} to ‘Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<void>&)’} [-Wcast-function-type]
10750 |                reinterpret_cast<Callback>(callback), type);
|                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../include/v8.h: In instantiation of ‘void v8::PersistentBase<T>::SetWeak(P*, typename v8::WeakCallbackInfo<P>::Callback, v8::WeakCallbackType) [with P = v8::internal::{anonymous}::FlagAndGlobal; T = v8::Object; typename v8::WeakCallbackInfo<P>::Callback = void (*)(const v8::WeakCallbackInfo<v8::internal::{anonymous}::FlagAndGlobal>&)]’:
../../test/cctest/test-global-handles.cc:493:53:   required from here
../../include/v8.h:10750:16: warning: cast between incompatible function types from ‘v8::WeakCallbackInfo<v8::internal::{anonymous}::FlagAndGlobal>::Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<v8::internal::{anonymous}::FlagAndGlobal>&)’} to ‘Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<void>&)’} [-Wcast-function-type]
``````

Formatted for git commit message:

``````g++ -MMD -MF obj/test/cctest/cctest_sources/test-global-handles.o.d
...
In file included from ../../include/v8-inspector.h:14,
from ../../src/execution/isolate.h:15,
from ../../src/api/api.h:10,
from ../../src/api/api-inl.h:8,
from ../../test/cctest/test-global-handles.cc:28:
../../include/v8.h:
In instantiation of ‘void v8::PersistentBase<T>::SetWeak(
P*,
typename v8::WeakCallbackInfo<P>::Callback,
v8::WeakCallbackType)
[with
P = v8::Global<v8::Object>;
T = v8::Object;
typename v8::WeakCallbackInfo<P>::Callback =
void (*)(const v8::WeakCallbackInfo<v8::Global<v8::Object> >&)
]’:
../../test/cctest/test-global-handles.cc:292:47:   required from here
../../include/v8.h:10750:16: warning:
cast between incompatible function types from
‘v8::WeakCallbackInfo<v8::Global<v8::Object> >::Callback’ {aka
‘void (*)(const v8::WeakCallbackInfo<v8::Global<v8::Object> >&)’} to
‘Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<void>&)’}
[-Wcast-function-type]
10750 |                reinterpret_cast<Callback>(callback), type);
|                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``````

This commit suggests adding a pragma specifically for GCC to suppress this warning. The motivation for this is that there were quite a few of these warnings in the Node.js build, but these have been suppressed by adding a similar pragma but around the include of v8.h [1].

``````\$
In file included from persistent-obj.cc:8:
/home/danielbevenius/work/google/v8_src/v8/include/v8.h: In instantiation of ‘void v8::PersistentBase<T>::SetWeak(P*, typename v8::WeakCallbackInfo<P>::Callback, v8::WeakCallbackType) [with P = Something; T = v8::Object; typename v8::WeakCallbackInfo<P>::Callback = void (*)(const v8::WeakCallbackInfo<Something>&)]’:

persistent-obj.cc:57:38:   required from here
/home/danielbevenius/work/google/v8_src/v8/include/v8.h:10750:16: warning: cast between incompatible function types from ‘v8::WeakCallbackInfo<Something>::Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<Something>&)’} to ‘Callback’ {aka ‘void (*)(const v8::WeakCallbackInfo<void>&)’} [-Wcast-function-type]
10750 |                reinterpret_cast<Callback>(callback), type);
|                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``````

Currently, we have added a pragma to avoid this warning in node.js but we'd like to add this in v8 and closer to the actual code that is causing it. In node we have to set the praga on the header.

``````template <class T>
template <typename P>
V8_INLINE void PersistentBase<T>::SetWeak(
P* parameter,
typename WeakCallbackInfo<P>::Callback callback,
WeakCallbackType type) {
typedef typename WeakCallbackInfo<void>::Callback Callback;
reinterpret_cast<Callback>(callback), type);
}
``````

Notice the second parameter is `typename WeakCallbackInfo<P>::Callback` which is a typedef:

``````  typedef void (*Callback)(const WeakCallbackInfo<T>& data);
``````

This is a function declaration for `Callback` which is a function that takes a reference to a const WeakCallbackInfo and returns void. So we could define it like this:

``````void WeakCallback(const v8::WeakCallbackInfo<Something>& data) {
Something* obj = data.GetParameter();
std::cout << "in make weak callback..." << '\n';
}
``````

And the trying to cast it into:

``````  typedef typename v8::WeakCallbackInfo<void>::Callback Callback;
Callback cb = reinterpret_cast<Callback>(WeakCallback);
``````

This is done as V8::MakeWeak has the following signature:

``````void V8::MakeWeak(i::Address* location, void* parameter,
WeakCallbackInfo<void>::Callback weak_callback,
WeakCallbackType type) {
i::GlobalHandles::MakeWeak(location, parameter, weak_callback, type);
}
``````

### gdb warnings

``````warning: Could not find DWO CU obj/v8_compiler/common-node-cache.dwo(0x42b8adb87d74d56b) referenced by CU at offset 0x206f7 [in module /home/danielbevenius/work/google/learning-v8/hello-world]
``````

This can be worked around by specifying the `--cd` argument to gdb:

``````\$ gdb --cd=/home/danielbevenius/work/google/v8_src/v8/out/x64.release --args /home/danielbevenius/work/google/learning-v8/hello-world
``````

### Building with g++

Update args.gn to include:

``````is_clang = false
``````

Next I got the following error when trying to compile:

``````\$ ninja -v -C out/x64.release/ obj/test/cctest/cctest_sources/test-global-handles.o
ux/debian_sid_amd64-sysroot -fexceptions -frtti -c ../../src/torque/instance-type-generator.cc -o obj/torque_base/instance-type-generator.o
In file included from /usr/include/c++/9/bits/stl_algobase.h:59,
from /usr/include/c++/9/memory:62,
from ../../src/torque/implementation-visitor.h:8,
from ../../src/torque/instance-type-generator.cc:5:
/usr/include/c++/9/x86_64-redhat-linux/bits/c++config.h:3:10: fatal error: bits/wordsize.h: No such file or directory
3 | #include <bits/wordsize.h>
|          ^~~~~~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
``````
``````\$ export CPATH=/usr/include
``````
``````third_party/binutils/Linux_x64/Release/bin/ld.gold: error: cannot open /usr/lib64/libatomic.so.1.2.0: No such file or directory
``````
``````\$ sudo dnf install -y libatomic
``````

I still got an error because of a warning but I'm trying to build using:

``````treat_warnings_as_errors = false
``````

Lets see how that works out. I also had to use gnus linker by disableing gold:

``````use_gold = false
``````

### CodeStubAssembler

This history of this is that JavaScript builtins used be written in assembly which gave very good performance but made porting V8 to different architectures more difficult as these builtins had to have specific implementations for each supported architecture, so it dit not scale very well. With the addition of features to the JavaScript specifications having to support new features meant having to implement them for all platforms which made it difficult to keep up and deliver these new features.

The goal is to have the perfomance of handcoded assembly but not have to write it for every platform. So a portable assembly language was build on top of Tubofans backend. This is an API that generates Turbofan's machine-level IR. This IR can be used by Turbofan to produce very good machine code on all platforms. So one "only" has to implement one component/function/feature (not sure what to call this) and then it can be made available to all platforms. They no longer have to maintain all that handwritten assembly.

Just to be clear CSA is a C++ API that is used to generate IR which is then compiled in to machine code for the target instruction set architectur.

### Torque

Torque is a DLS language to avoid having to use the CodeStubAssembler directly (it is still used behind the scene). This language is statically typed, garbage collected, and compatible with JavaScript.

The JavaScript standard library was implemented in V8 previously using hand written assembly. But as we mentioned in the previous section this did not scale.

It could have been written in JavaScript too, and I think this was done in the past but this has some issues as builtins would need warmup time to become optimized, there were also issues with monkey-patching and exposing VM internals unintentionally.

Is torque run a build time, I'm thinking yes as it would have to generate the c++ code.

There is a main function in torque.cc which will be built into an executable

``````\$ ./out/x64.release_gcc/torque --help
Unexpected command-line argument "--help", expected a .tq file.
``````

The files that are processed by torque are defined in BUILD.gc in the `torque_files` section. There is also a template named `run_torque`. I've noticed that this template and others in GN use the script `tools/run.py`. This is apperently because GN can only execute scripts at the moment and what this script does is use python to create a subprocess with the passed in argument:

``````\$ gn help action
``````

And a template is way to reuse code in GN.

There is a make target that shows what is generated by torque:

``````\$ make torque-example
``````

This will create a directory in the current directory named `gen/torque-generated`. Notice that this directory contains c++ headers and sources.

It take torque-example.tq as input. For this file the following header will be generated:

``````#ifndef V8_GEN_TORQUE_GENERATED_TORQUE_EXAMPLE_TQ_H_
#define V8_GEN_TORQUE_GENERATED_TORQUE_EXAMPLE_TQ_H_

#include "src/builtins/builtins-promise.h"
#include "src/compiler/code-assembler.h"
#include "src/codegen/code-stub-assembler.h"
#include "src/utils/utils.h"
#include "torque-generated/field-offsets-tq.h"
#include "torque-generated/csa-types-tq.h"

namespace v8 {
namespace internal {

void HelloWorld_0(compiler::CodeAssemblerState* state_);

}  // namespace internal
}  // namespace v8

#endif  // V8_GEN_TORQUE_GENERATED_TORQUE_EXAMPLE_TQ_H_
``````

This is only to show the generated files and make it clear that torque will generate these file which will then be compiled during the v8 build. So, lets try copying `example-torque.tq` to v8/src/builtins directory.

``````\$ cp torque-example.tq ../v8_src/v8/src/builtins/
``````

This is not enough to get it included in the build, we have to update BUILD.gn and add this file to the `torque_files` list. After running the build we can see that there is a file named `src/builtins/torque-example-tq-csa.h` generated along with a .cc.

To understand how this works I'm going to use https://v8.dev/docs/torque-builtins as a starting point:

``````  transitioning javascript builtin
MathIs42(js-implicit context: NativeContext, receiver: JSAny)(x: JSAny): Boolean {
const number: Number = ToNumber_Inline(x);
typeswitch (number) {
case (smi: Smi): {
return smi == 42 ? True : False;
}
case (heapNumber: HeapNumber): {
return Convert<float64>(heapNumber) == 42 ? True : False;
}
}
}
``````

This has been updated to work with the latest V8 version.

Next, we need to update `src/init/bootstrappers.cc` to add/install this function on the math object:

``````  SimpleInstallFunction(isolate_, math, "is42", Builtins::kMathIs42, 1, true);
``````

After this we need to rebuild v8:

``````\$ env CPATH=/usr/include ninja -v -C out/x64.release_gcc
``````
``````\$ d8
d8> Math.is42(42)
true
d8> Math.is42(2)
false
``````

If we look at the generated code that Torque has produced in `out/x64.release_gcc/gen/torque-generated/src/builtins/math-tq-csa.cc` (we can run it through the preprocessor using):

``````\$ clang++ --sysroot=build/linux/debian_sid_amd64-sysroot -isystem=./buildtools/third_party/libc++/trunk/include -isystem=buildtools/third_party/libc++/trunk/include -I. -E out/x64.release_gcc/gen/torque-generated/src/builtins/math-tq-csa.cc > math.cc.pp
``````

If we open math.cc.pp and search for `Is42` we can find:

``````class MathIs42Assembler : public CodeStubAssembler {
public:
using Descriptor = Builtin_MathIs42_InterfaceDescriptor;
explicit MathIs42Assembler(compiler::CodeAssemblerState* state) : CodeStubAssembler(state) {}
void GenerateMathIs42Impl();
Node* Parameter(Descriptor::ParameterIndices index) {
return CodeAssembler::Parameter(static_cast<int>(index));
}
};

void Builtins::Generate_MathIs42(compiler::CodeAssemblerState* state) {
MathIs42Assembler assembler(state);
state->SetInitialDebugInformation("MathIs42", "out/x64.release_gcc/gen/torque-generated/src/builtins/math-tq-csa.cc", 2121);
if (Builtins::KindOf(Builtins::kMathIs42) == Builtins::TFJ) {
assembler.PerformStackCheck(assembler.GetJSContextParameter());
}
assembler.GenerateMathIs42Impl();
}

void MathIs42Assembler::GenerateMathIs42Impl() {
...
``````

So this is what gets generated by the Torque compiler and what we see above is CodeStubAssemble class.

If we take a look in out/x64.release_gcc/gen/torque-generated/builtin-definitions-tq.h we can find the following line that has been generated:

``````TFJ(MathIs42, 1, kReceiver, kX) \
``````

Now, there is a section about the TF_BUILTIN macro, and it will create function declarations, and function and class definitions:

Now, in src/builtins/builtins.h we have the following macros:

``````class Builtins {
public:

enum Name : int32_t {
#define DEF_ENUM(Name, ...) k##Name,
BUILTIN_LIST(DEF_ENUM, DEF_ENUM, DEF_ENUM, DEF_ENUM, DEF_ENUM, DEF_ENUM,
DEF_ENUM)
#undef DEF_ENUM
...
}

#define DECLARE_TF(Name, ...) \
static void Generate_##Name(compiler::CodeAssemblerState* state);

BUILTIN_LIST(IGNORE_BUILTIN, DECLARE_TF, DECLARE_TF, DECLARE_TF, DECLARE_TF,
IGNORE_BUILTIN, DECLARE_ASM)
``````

And `BUILTINS_LIST` is declared in src/builtins/builtins-definitions.h and this file includes:

``````#include "torque-generated/builtin-definitions-tq.h"

#define BUILTIN_LIST(CPP, TFJ, TFC, TFS, TFH, BCH, ASM)  \
BUILTIN_LIST_BASE(CPP, TFJ, TFC, TFS, TFH, ASM)        \
BUILTIN_LIST_FROM_TORQUE(CPP, TFJ, TFC, TFS, TFH, ASM) \
BUILTIN_LIST_INTL(CPP, TFJ, TFS)                       \
BUILTIN_LIST_BYTECODE_HANDLERS(BCH)
``````

Notice `BUILTIN_LIST_FROM_TORQUE`, this is how our MathIs42 gets included from builtin-definitions-tq.h. This is in turn included by builtins.h.

If we take a look at the this header after it has gone through the preprocessor we can see what has been generated for MathIs42:

``````\$ clang++ --sysroot=build/linux/debian_sid_amd64-sysroot -isystem=./buildtools/third_party/libc++/trunk/include -isystem=buildtools/third_party/libc++/trunk/include -I. -I./out/x64.release_gcc/gen/ -E src/builtins/builtins.h > builtins.h.pp
``````

First MathIs42 will be come a member in the Name enum of the Builtins class:

``````class Builtins {
public:

enum Name : int32_t {
...
kMathIs42,
};

static void Generate_MathIs42(compiler::CodeAssemblerState* state);
``````

We should also take a look in `src/builtins/builtins-descriptors.h` as the BUILTIN_LIST is used there two and specifically to our current example there is a `DEFINE_TFJ_INTERFACE_DESCRIPTOR` macro used:

``````BUILTIN_LIST(IGNORE_BUILTIN, DEFINE_TFJ_INTERFACE_DESCRIPTOR,
DEFINE_TFC_INTERFACE_DESCRIPTOR, DEFINE_TFS_INTERFACE_DESCRIPTOR,
DEFINE_TFH_INTERFACE_DESCRIPTOR, IGNORE_BUILTIN,
DEFINE_ASM_INTERFACE_DESCRIPTOR)

#define DEFINE_TFJ_INTERFACE_DESCRIPTOR(Name, Argc, ...)                \
struct Builtin_##Name##_InterfaceDescriptor {                         \
enum ParameterIndices {                                             \
kJSTarget = compiler::CodeAssembler::kTargetParameterIndex,       \
##__VA_ARGS__,                                                    \
kJSNewTarget,                                                     \
kJSActualArgumentsCount,                                          \
kContext,                                                         \
kParameterCount,                                                  \
};                                                                  \
};
``````

So the above will generate the following code but this time for builtins.cc:

``````\$ clang++ --sysroot=build/linux/debian_sid_amd64-sysroot -isystem=./buildtools/third_party/libc++/trunk/include -isystem=buildtools/third_party/libc++/trunk/include -I. -I./out/x64.release_gcc/gen/ -E src/builtins/builtins.cc > builtins.cc.pp
``````
``````struct Builtin_MathIs42_InterfaceDescriptor {
enum ParameterIndices {
kJSTarget = compiler::CodeAssembler::kTargetParameterIndex,
kX,
kJSNewTarget,
kJSActualArgumentsCount,
kContext,
kParameterCount,
};

...
{"MathIs42", Builtins::TFJ, {1, 0}}
...
};
``````

BuiltinMetadata is a struct defined in builtins.cc and in our case the name is passed, then the type, and the last struct is specifying the number of parameters and the last 0 is unused as far as I can tell and only there make it different from the constructor that takes an Address parameter.

So, where is `Generate_MathIs42` used:

``````void SetupIsolateDelegate::SetupBuiltinsInternal(Isolate* isolate) {
Code code;
...
code = BuildWithCodeStubAssemblerJS(isolate, index, &Builtins::Generate_MathIs42, 1, "MathIs42");
...
``````

`BuildWithCodeStubAssemblerJS` can be found in `src/builtins/setup-builtins-internal.cc`

``````Code BuildWithCodeStubAssemblerJS(Isolate* isolate, int32_t builtin_index,
CodeAssemblerGenerator generator, int argc,
const char* name) {
Zone zone(isolate->allocator(), ZONE_NAME);
const int argc_with_recv = (argc == kDontAdaptArgumentsSentinel) ? 0 : argc + 1;
compiler::CodeAssemblerState state(
isolate, &zone, argc_with_recv, Code::BUILTIN, name,
PoisoningMitigationLevel::kDontPoison, builtin_index);
generator(&state);
Handle<Code> code = compiler::CodeAssembler::GenerateCode(
&state, BuiltinAssemblerOptions(isolate, builtin_index));
return *code;
``````

Lets add a conditional break point so that we can stop in this function when `MathIs42` is passed in:

``````(gdb) br setup-builtins-internal.cc:161
(gdb) cond 1 ((int)strcmp(name, "MathIs42")) == 0
``````

We can see that we first create a new `CodeAssemblerState`, which we say previously was that type that the `Generate_MathIs42` function takes. TODO: look into this class a litte more. After this `generator` will be called with the newly created state passed in:

``````(gdb) p generator
\$8 = (v8::internal::(anonymous namespace)::CodeAssemblerGenerator) 0x5619fd61b66e <v8::internal::Builtins::Generate_MathIs42(v8::internal::compiler::CodeAssemblerState*)>
``````

TODO: Take a closer look at generate and how that code works. After generate returns we will have the following call:

``````  generator(&state);
Handle<Code> code = compiler::CodeAssembler::GenerateCode(
&state, BuiltinAssemblerOptions(isolate, builtin_index));
return *code;
``````

Then next thing that will happen is the code returned will be added to the builtins by calling `SetupIsolateDelegate::AddBuiltin`:

``````void SetupIsolateDelegate::AddBuiltin(Builtins* builtins, int index, Code code) {
builtins->set_builtin(index, code);
}
``````

`set_builtins` can be found in src/builtins/builtins.cc` and looks like this:

``````void Builtins::set_builtin(int index, Code builtin) {
isolate_->heap()->set_builtin(index, builtin);
}
``````

And Heap::set_builtin does:

`````` void Heap::set_builtin(int index, Code builtin) {
isolate()->builtins_table()[index] = builtin.ptr();
}
``````

So this is how the builtins_table is populated.

And when is `SetupBuiltinsInternal` called?
It is called from `SetupIsolateDelegat::SetupBuiltins` which is called from Isolate::Init.

Just to recap before I loose track of what is going on...We have math.tq, which is the torque source file. This is parsed by the torque compiler/parser and it will generate c++ headers and source files, one of which will be a CodeStubAssembler class for our MathI42 function. It will also generate the "torque-generated/builtin-definitions-tq.h. After this has happened the sources need to be compiled into object files. After that if a snapshot is configured to be created, mksnapshot will create a new Isolate and in that process the MathIs42 builtin will get added. Then a context will be created and saved. The snapshot can then be deserialized into an Isoalte as some later point.

Alright, so we have seen what gets generated for the function MathIs42 but how does this get "hooked" but to enable us to call `Math.is42(11)`?

In bootstrapper.cc we can see a number of lines:

`````` SimpleInstallFunction(isolate_, math, "trunc", Builtins::kMathTrunc, 1, true);
``````

And we are going to add a line like the following:

`````` SimpleInstallFunction(isolate_, math, "is42", Builtins::kMathIs42, 1, true);
``````

The signature for `SimpleInstallFunction` looks like this

``````V8_NOINLINE Handle<JSFunction> SimpleInstallFunction(
Isolate* isolate, Handle<JSObject> base, const char* name,
Builtins::Name call, int len, bool adapt,
PropertyAttributes attrs = DONT_ENUM) {
Handle<String> internalized_name = isolate->factory()->InternalizeUtf8String(name);
Handle<JSFunction> fun = SimpleCreateFunction(isolate, internalized_name, call, len, adapt);
return fun;
}
``````

So we see that the function is added as a property to the Math object. Notice that we also have to add `kMathIs42` to the Builtins class which is now part of the builtins_table_ array which we went through above.

#### Transitioning/Transient

In torgue source files we can sometimes see types declared as `transient`, and functions that have a `transitioning` specifier. In V8 HeapObjects can change at runtime (I think an example of this would be deleting an element in an array which would transition it to a different type of array HoleyElementArray or something like that. TODO: verify and explain this). And a function that calls JavaScript which cause such a transition is marked with transitioning.

#### Callables

Are like functions is js/c++ but have some additional capabilities and there are several different types of callables:

macro callables

These correspond to generated CodeStubAssebler C++ that will be inlined at the callsite.

builtin callables

These will become V8 builtins with info added to builtin-definitions.h (via the include of torque-generated/builtin-definitions-tq.h). There is only one copy of this and this will be a call instead of being inlined as is the case with macros.

runtime callables

intrinsic callables

#### Explicit parameters

macros and builtins can have parameters. For example:

``````@export
macro HelloWorld1(msg: JSAny) {
Print(msg);
}
``````

And we can call this from another macro like this:

``````@export
macro HelloWorld() {
HelloWorld1('Hello World');
}
``````

#### Implicit parameters

In the previous section we showed explicit parameters but we can also have implicit parameters:

``````@export
macro HelloWorld2(implicit msg: JSAny)() {
Print(msg);
}
@export
macro HelloWorld() {
const msg = 'Hello implicit';
HelloWorld2();
}
``````

### Troubleshooting

Compilation error when including `src/objects/objects-inl.h:

``````/home/danielbevenius/work/google/v8_src/v8/src/objects/object-macros.h:263:14: error: no declaration matches ‘bool v8::internal::HeapObject::IsJSCollator() const’
``````

Does this need i18n perhaps?

``````\$ gn args --list out/x64.release_gcc | grep i18n
v8_enable_i18n_support
``````
``````usr/bin/ld: /tmp/ccJOrUMl.o: in function `v8::internal::MaybeHandle<v8::internal::Object>::Check() const':
/home/danielbevenius/work/google/v8_src/v8/src/handles/maybe-handles.h:44: undefined reference to `V8_Fatal(char const*, ...)'
collect2: error: ld returned 1 exit status
``````

V8_Fatal is referenced but not defined in v8_monolith.a:

``````\$ nm libv8_monolith.a | grep V8_Fatal | c++filt
...
U V8_Fatal(char const*, int, char const*, ...)
``````

And I thought it might be defined in libv8_libbase.a but it is the same there. Actually, I was looking at the wrong symbol. This was not from the logging.o object file. If we look at it we find:

``````v8_libbase/logging.o:
...
0000000000000000 T V8_Fatal(char const*, int, char const*, ...)
``````

In out/x64.release/obj/logging.o we can find it defined:

``````\$ nm -C  libv8_libbase.a | grep -A 50 logging.o | grep V8_Fatal
0000000000000000 T V8_Fatal(char const*, int, char const*, ...)
``````

`T` means that the symbol is in the text section. So if the linker is able to find libv8_libbase.a it should be able to resolve this.

So we need to make sure the linker can find the directory where the libraries are located ('-Wl,-Ldir'), and also that it will include the library ('-Wl,-llibname')

With this in place I can see that the linker can open the archive:

``````attempt to open /home/danielbevenius/work/google/v8_src/v8/out/x64.release_gcc/obj/libv8_libbase.so failed
``````

But I'm still getting the same linking error. If we look closer at the error message we can see that it is maybe-handles.h that is complaining. Could it be that the order is incorrect when linking. libv8_libbase.a needs to come after libv8_monolith Something I noticed is that even though the library libv8_libbase.a is found it does not look like the linker actually reads the object files. I can see that it does this for libv8_monolith.a:

``````(/home/danielbevenius/work/google/v8_src/v8/out/x64.release_gcc/obj/libv8_monolith.a)common-node-cache.o
``````

Hmm, actually looking at the signature of the function it is V8_Fatal(char const*, ...) and not char const*, int, char const*, ...)

For a debug build it will be:

``````    void V8_Fatal(const char* file, int line, const char* format, ...);
``````

And else

``````    void V8_Fatal(const char* format, ...);
``````

So it looks like I need to set debug to false. With this the V8_Fatal symbol in logging.o is:

``````\$ nm -C out/x64.release_gcc/obj/v8_libbase/logging.o | grep V8_Fatal
0000000000000000 T V8_Fatal(char const*, ...)
``````

### V8 Build artifacts

What is actually build when you specify v8_monolithic: When this type is chosen the build cannot be a component build, there is an assert for this. In this case a static library build:

``````if (v8_monolithic) {
# A component build is not monolithic.
assert(!is_component_build)

# Using external startup data would produce separate files.
assert(!v8_use_external_startup_data)
v8_static_library("v8_monolith") {
deps = [
":v8",
":v8_libbase",
":v8_libplatform",
":v8_libsampler",
"//build/win:default_exe_manifest",
]

configs = [ ":internal_config" ]
}
}
``````

Notice that the builtin function is called `static_library` so is a template that can be found in `gni/v8.gni`

v8_static_library: This will use source_set instead of creating a static library when compiling. When set to false, the object files that would be included in the linker command. The can speed up the build as the creation of the static libraries is skipped. But this does not really help when linking to v8 externally as from this project.

is_component_build: This will compile targets declared as components as shared libraries. All the v8_components in BUILD.gn will be built as .so files in the output director (not the obj directory which is the case for static libraries).

So the only two options are the v8_monolith or is_component_build where it might be an advantage of being able to build a single component and not have to rebuild the whole monolith at times.

### wee8

`libwee8` can be produced which is a library which only supports WebAssembly and does not support JavaScript.

``````\$ ninja -C out/wee8 wee8
``````

### V8 Internal Isolate

`src/execution/isolate.h` is where you can find the v8::internal::Isolate.

``````class V8_EXPORT_PRIVATE Isolate final : private HiddenFactory {
``````

And HiddenFactory is just to allow Isolate to inherit privately from Factory which can be found in src/heap/factory.h.

### Startup Walk through

This section will walk through the start up on V8 by using the hello_world example in this project:

``````\$ LD_LIBRARY_PATH=../v8_src/v8/out/x64.release_gcc/ lldb ./hello-world
(lldb) br s -n main
Breakpoint 1: where = hello-world`main + 25 at hello-world.cc:41:38, address = 0x0000000000402821
``````
``````    V8::InitializeExternalStartupData(argv[0]);
``````

This call will land in `api.cc` which will just delegate the call to and internal (internal namespace that is). If you try to step into this function you will just land on the next line in hello_world. This is because we compiled v8 without external start up data so this function will be empty:

``````\$ objdump -Cd out/x64.release_gcc/obj/v8_base_without_compiler/startup-data-util.o
Disassembly of section .text._ZN2v88internal37InitializeExternalStartupDataFromFileEPKc:

0000000000000000 <v8::internal::InitializeExternalStartupDataFromFile(char const*)>:
0:    c3                       retq
``````

Next, we have:

``````    std::unique_ptr<Platform> platform = platform::NewDefaultPlatform();
``````

This will land in `src/libplatform/default-platform.cc` which will create a new DefaultPlatform.

``````Isolate* isolate = Isolate::New(create_params);
``````

This will call Allocate:

``````Isolate* isolate = Allocate();
``````
``````Isolate* Isolate::Allocate() {
return reinterpret_cast<Isolate*>(i::Isolate::New());
}
``````

Remember that the internal Isolate can be found in `src/execution/isolate.h`. In `src/execution/isolate.cc` we find `Isolate::New`

``````Isolate* Isolate::New(IsolateAllocationMode mode) {
std::unique_ptr<IsolateAllocator> isolate_allocator = std::make_unique<IsolateAllocator>(mode);
void* isolate_ptr = isolate_allocator->isolate_memory();
Isolate* isolate = new (isolate_ptr) Isolate(std::move(isolate_allocator));
``````

So we first create an IsolateAllocator instance which will allocate memory for a single Isolate instance. This is then passed into the Isolate constructor, notice the usage of `new` here, this is just a normal heap allocation.

The default new operator has been deleted and an override provided that takes a void pointer, which is just returned:

``````  void* operator new(size_t, void* ptr) { return ptr; }
void* operator new(size_t) = delete;
void operator delete(void*) = delete;
``````

In this case it just returns the memory allocateed by isolate-memory(). The reason for doing this is that using the new operator not only invokes the new operator but the compiler will also add a call the types constructor passing in the address of the allocated memory.

``````Isolate::Isolate(std::unique_ptr<i::IsolateAllocator> isolate_allocator)
: isolate_data_(this),
isolate_allocator_(std::move(isolate_allocator)),
allocator_(FLAG_trace_zone_stats
? new VerboseAccountingAllocator(&heap_, 256 * KB)
: new AccountingAllocator()),
builtins_(this),
rail_mode_(PERFORMANCE_ANIMATION),
code_event_dispatcher_(new CodeEventDispatcher()),
jitless_(FLAG_jitless),
#if V8_SFI_HAS_UNIQUE_ID
next_unique_sfi_id_(0),
#endif
``````

Notice that `isolate_data_` will be populated by calling the constructor which takes an pointer to an Isolate.

``````class IsolateData final {
public:
explicit IsolateData(Isolate* isolate) : stack_guard_(isolate) {}
``````

Back in Isolate's constructor we have:

``````#define ISOLATE_INIT_LIST(V)                                                   \
/* Assembler state. */                                                       \
V(FatalErrorCallback, exception_behavior, nullptr)                           \
...

#define ISOLATE_INIT_EXECUTE(type, name, initial_value) \
name##_ = (initial_value);
ISOLATE_INIT_LIST(ISOLATE_INIT_EXECUTE)
#undef ISOLATE_INIT_EXECUTE
``````

So lets expand the first entry to understand what is going on:

``````   exception_behavior_ = (nullptr);
oom_behavior_ = (nullptr);
event_logger_ = (nullptr);
allow_code_gen_callback_ = (nullptr);
modify_code_gen_callback_ = (nullptr);
allow_wasm_code_gen_callback_ = (nullptr);
wasm_module_callback_ = (&NoExtension);
wasm_instance_callback_ = (&NoExtension);
wasm_streaming_callback_ = (nullptr);
relocatable_top_ = (nullptr);
string_stream_debug_object_cache_ = (nullptr);
string_stream_current_security_token_ = (Object());
api_external_references_ = (nullptr);
external_reference_map_ = (nullptr);
root_index_map_ = (nullptr);
turbo_statistics_ = (nullptr);
code_tracer_ = (nullptr);
per_isolate_assert_data_ = (0xFFFFFFFFu);
promise_reject_callback_ = (nullptr);
snapshot_blob_ = (nullptr);
external_script_source_size_ = (0);
is_profiling_ = (false);
num_cpu_profilers_ = (0);
formatting_stack_trace_ = (false);
debug_execution_mode_ = (DebugInfo::kBreakpoints);
code_coverage_mode_ = (debug::CoverageMode::kBestEffort);
type_profile_mode_ = (debug::TypeProfileMode::kNone);
last_stack_frame_info_id_ = (0);
last_console_context_id_ = (0);
inspector_ = (nullptr);
next_v8_call_is_safe_for_termination_ = (false);
only_terminate_in_safe_scope_ = (false);
detailed_source_positions_for_profiling_ = (FLAG_detailed_line_info);
embedder_wrapper_type_index_ = (-1);
embedder_wrapper_object_index_ = (-1);
``````

So all of the entries in this list will become private members of the Isolate class after the preprocessor is finished. There will also be public assessor to get and set these initial values values (which is the last entry in the ISOLATE_INIT_LIST above.

Back in isolate.cc constructor we have:

``````#define ISOLATE_INIT_ARRAY_EXECUTE(type, name, length) \
memset(name##_, 0, sizeof(type) * length);
ISOLATE_INIT_ARRAY_LIST(ISOLATE_INIT_ARRAY_EXECUTE)
#undef ISOLATE_INIT_ARRAY_EXECUTE
#define ISOLATE_INIT_ARRAY_LIST(V)                                             \
/* SerializerDeserializer state. */                                          \
V(int32_t, jsregexp_static_offsets_vector, kJSRegexpStaticOffsetsVectorSize) \
...

InitializeDefaultEmbeddedBlob();
``````

After that we have created a new Isolate, we were in this function call:

``````  Isolate* isolate = new (isolate_ptr) Isolate(std::move(isolate_allocator));
``````

After this we will be back in `api.cc`:

``````  Initialize(isolate, params);
``````
``````void Isolate::Initialize(Isolate* isolate,
const v8::Isolate::CreateParams& params) {
``````

We are not using any external snapshot data so the following will be false:

``````  if (params.snapshot_blob != nullptr) {
i_isolate->set_snapshot_blob(params.snapshot_blob);
} else {
i_isolate->set_snapshot_blob(i::Snapshot::DefaultSnapshotBlob());
``````
``````(gdb) p snapshot_blob_
\$7 = (const v8::StartupData *) 0x0
(gdb) n
(gdb) p i_isolate->snapshot_blob_
\$8 = (const v8::StartupData *) 0x7ff92d7d6cf0 <v8::internal::blob>
``````

`snapshot_blob_` is also one of the members that was set up with ISOLATE_INIT_LIST. So we are setting up the Isolate instance for creation.

``````Isolate::Scope isolate_scope(isolate);
if (!i::Snapshot::Initialize(i_isolate)) {
``````

In `src/snapshot/snapshot-common.cc` we find

``````bool Snapshot::Initialize(Isolate* isolate) {
...
const v8::StartupData* blob = isolate->snapshot_blob();
Vector<const byte> startup_data = ExtractStartupData(blob);
SnapshotData startup_snapshot_data(MaybeDecompress(startup_data));
StartupDeserializer startup_deserializer(&startup_snapshot_data);
startup_deserializer.SetRehashability(ExtractRehashability(blob));

``````

So we get the blob and create deserializers for it which are then passed to `isolate->InitWithSnapshot` which delegated to `Isolate::Init`. The blob will have be create previously using `mksnapshot` (more on this can be found later).

This will use a `FOR_EACH_ISOLATE_ADDRESS_NAME` macro to assign to the `isolate_addresses_` field:

``````isolate_addresses_[IsolateAddressId::kHandlerAddress] = reinterpret_cast<Address>(handler_address());
``````

After this we have a number of members that are assigned to:

``````  compilation_cache_ = new CompilationCache(this);
descriptor_lookup_cache_ = new DescriptorLookupCache();
inner_pointer_to_code_cache_ = new InnerPointerToCodeCache(this);
global_handles_ = new GlobalHandles(this);
eternal_handles_ = new EternalHandles();
bootstrapper_ = new Bootstrapper(this);
handle_scope_implementer_ = new HandleScopeImplementer(this);
store_stub_cache_ = new StubCache(this);
materialized_object_store_ = new MaterializedObjectStore(this);
regexp_stack_ = new RegExpStack();
regexp_stack_->isolate_ = this;
date_cache_ = new DateCache();
heap_profiler_ = new HeapProfiler(heap());
interpreter_ = new interpreter::Interpreter(this);
compiler_dispatcher_ =
new CompilerDispatcher(this, V8::GetCurrentPlatform(), FLAG_stack_size);
``````

After this we have:

``````isolate_data_.external_reference_table()->Init(this);
``````

This will land in `src/codegen/external-reference-table.cc` where we have:

``````void ExternalReferenceTable::Init(Isolate* isolate) {
int index = 0;
is_initialized_ = static_cast<uint32_t>(true);

CHECK_EQ(kSize, index);
}

}

``````

Now, lets take a look at `AddReferences`:

``````Add(ExternalReference::abort_with_reason().address(), index);
``````

What are ExternalReferences?
They represent c++ addresses used in generated code.

``````static const Address c_builtins[] = {
...

``````

I can see that the function declaration is in external-reference.h but the implementation is not there. Instead this is defined in `src/builtins/builtins-api.cc`:

``````BUILTIN(HandleApiCall) {
(will expand to:)

V8_WARN_UNUSED_RESULT static Object Builtin_Impl_HandleApiCall(
BuiltinArguments args, Isolate* isolate);

int args_length, Address* args_object, Isolate* isolate) {
BuiltinArguments args(args_length, args_object);
RuntimeCallTimerScope timer(isolate,
RuntimeCallCounterId::kBuiltin_HandleApiCall);
TRACE_EVENT0(TRACE_DISABLED_BY_DEFAULT("v8.runtime"), "V8.Builtin_HandleApiCall");
return CONVERT
}
int args_length, Address* args_object, Isolate* isolate) {
DCHECK(isolate->context().is_null() || isolate->context().IsContext());
if (V8_UNLIKELY(TracingFlags::is_runtime_stats_enabled())) {
return Builtin_Impl_Stats_HandleApiCall(args_length, args_object, isolate);
}
BuiltinArguments args(args_length, args_object);
return CONVERT_OBJECT(Builtin_Impl_HandleApiCall(args, isolate));
}

V8_WARN_UNUSED_RESULT static Object Builtin_Impl_HandleApiCall(
BuiltinArguments args, Isolate* isolate) {
HandleScope scope(isolate);
Handle<JSFunction> function = args.target();
Handle<HeapObject> new_target = args.new_target();
Handle<FunctionTemplateInfo> fun_data(function->shared().get_api_func_data(),
isolate);
RETURN_RESULT_OR_FAILURE(
isolate, HandleApiCallHelper<true>(isolate, function, new_target,
} else {
RETURN_RESULT_OR_FAILURE(
isolate, HandleApiCallHelper<false>(isolate, function, new_target,
}
}
``````

The `BUILTIN` macro can be found in `src/builtins/builtins-utils.h`:

``````#define BUILTIN(name)                                                       \
V8_WARN_UNUSED_RESULT static Object Builtin_Impl_##name(                  \
BuiltinArguments args, Isolate* isolate);
``````
``````  if (setup_delegate_ == nullptr) {
setup_delegate_ = new SetupIsolateDelegate(create_heap_objects);
}

if (!setup_delegate_->SetupHeap(&heap_)) {
V8::FatalProcessOutOfMemory(this, "heap object creation");
return false;
}
``````

This does nothing in the current code path and the code comment says that the heap will be deserialized from the snapshot and true will be returned.

``````InitializeThreadLocal();
startup_deserializer->DeserializeInto(this);
``````
``````DisallowHeapAllocation no_gc;
isolate->heap()->IterateSmiRoots(this);
isolate->heap()->IterateStrongRoots(this, VISIT_FOR_SERIALIZATION);
Iterate(isolate, this);
isolate->heap()->IterateWeakRoots(this, VISIT_FOR_SERIALIZATION);
DeserializeDeferredObjects();
RestoreExternalReferenceRedirectors(accessor_infos());
RestoreExternalReferenceRedirectors(call_handler_infos());
``````

In `heap.cc` we find IterateSmiRoots`which takes a pointer to a`RootVistor`. RootVisitor is used for visiting and modifying (optionally) the pointers contains in roots. This is used in garbage collection and also in serializing and deserializing snapshots.

### Roots

RootVistor:

``````class RootVisitor {
public:
virtual void VisitRootPointers(Root root, const char* description,
FullObjectSlot start, FullObjectSlot end) = 0;

virtual void VisitRootPointer(Root root, const char* description,
FullObjectSlot p) {
VisitRootPointers(root, description, p, p + 1);
}

static const char* RootName(Root root);
``````

Root is an enum in `src/object/visitors.h`. This enum is generated by a macro and expands to:

``````enum class Root {
kStringTable,
kExternalStringsTable,
kStrongRootList,
kSmiRootList,
kBootstrapper,
kTop,
kRelocatable,
kDebug,
kCompilationCache,
kHandleScope,
kBuiltins,
kGlobalHandles,
kEternalHandles,
kStrongRoots,
kExtensions,
kCodeFlusher,
kPartialSnapshotCache,
kWeakCollections,
kWrapperTracing,
kUnknown,
kNumberOfRoots
};
``````

These can be displayed using:

``````\$ ./test/roots_test --gtest_filter=RootsTest.visitor_roots
``````

Just to keep things clear for myself here, these visitor roots are only used for GC and serialization/deserialization (at least I think so) and should not be confused with the RootIndex enum in `src/roots/roots.h`.

Lets set a break point in `mksnapshot` and see if we can find where one of the above Root enum elements is used to make it a little more clear what these are used for.

``````\$ lldb ../v8_src/v8/out/x64.debug/mksnapshot
(lldb) target create "../v8_src/v8/out/x64.debug/mksnapshot"
Current executable set to '../v8_src/v8/out/x64.debug/mksnapshot' (x86_64).
(lldb) br s -n main
Breakpoint 1: where = mksnapshot`main + 42, address = 0x00000000009303ca
(lldb) r
``````

What this does is that it creates an V8 environment (Platform, Isolate, Context) and then saves it to a file, either a binary file on disk but it can also save it to a .cc file that can be used in programs in which case the binary is a byte array. It does this in much the same way as the hello-world example create a platform and then initializes it, and the creates and initalizes a new Isolate. After the Isolate a new Context will be create using the Isolate. If there was an embedded-src flag passed to mksnaphot it will be run.

StartupSerializer will use the Root enum elements for example and the deserializer will use the same enum elements.

Adding a script to a snapshot:

``````\$ gdb ../v8_src/v8/out/x64.release_gcc/mksnapshot --embedded-src="\$PWD/embed.js"
``````

TODO: Look into CreateOffHeapTrampolines.

So the VisitRootPointers function takes one of these Root's and visits all those roots. In our case the first Root to be visited is Heap::IterateSmiRoots:

``````void Heap::IterateSmiRoots(RootVisitor* v) {
ExecutionAccess access(isolate());
v->VisitRootPointers(Root::kSmiRootList, nullptr,
roots_table().smi_roots_begin(),
roots_table().smi_roots_end());
v->Synchronize(VisitorSynchronization::kSmiRootList);
}
``````

And here we can see that it is using `Root::kSmiRootList`, and passing nullptr for the description argument (I wonder what this is used for?). Next, comes the start and end arguments.

``````(lldb) p roots_table().smi_roots_begin()
(v8::internal::FullObjectSlot) \$5 = {
v8::internal::SlotBase<v8::internal::FullObjectSlot, unsigned long, 8> = (ptr_ = 50680614097760)
}
``````

We can list all the values of roots_table using:

``````(lldb) expr -A -- roots_table()
``````

In `src/snapshot/deserializer.cc` we can find VisitRootPointers:

``````void Deserializer::VisitRootPointers(Root root, const char* description,
FullObjectSlot start, FullObjectSlot end)
``````

Notice that description is never used. `ReadData`is in the same source file:

The class SnapshotByteSource has a `data` member that is initialized upon construction from a const char* or a Vector. Where is this done?
This was done back in `Snapshot::Initialize`:

``````  const v8::StartupData* blob = isolate->snapshot_blob();
Vector<const byte> startup_data = ExtractStartupData(blob);
SnapshotData startup_snapshot_data(MaybeDecompress(startup_data));
StartupDeserializer startup_deserializer(&startup_snapshot_data);
``````
``````(lldb) expr *this
(v8::internal::SnapshotByteSource) \$30 = (data_ = "`\x04", length_ = 125752, position_ = 1)
``````

All the roots in a heap are declared in src/roots/roots.h. You can access the roots using RootsTable via the Isolate using isolate_data->roots() or by using isolate->roots_table. The roots_ field is an array of Address elements:

``````class RootsTable {
public:
static constexpr size_t kEntriesCount = static_cast<size_t>(RootIndex::kRootListLength);
...
private:
static const char* root_names_[kEntriesCount];
``````

RootIndex is generated by a macro

``````enum class RootIndex : uint16_t {
``````

The complete enum can be displayed using:

``````\$ ./test/roots_test --gtest_filter=RootsTest.list_root_index
``````

Lets take a look at an entry:

``````(lldb) p roots_[(uint16_t)RootIndex::kError_string]
``````

Now, there are functions in factory which can be used to retrieve these addresses, like factory->Error_string():

``````(lldb) expr *isolate->factory()->Error_string()
(v8::internal::String) \$9 = {
v8::internal::TorqueGeneratedString<v8::internal::String, v8::internal::Name> = {
v8::internal::Name = {
v8::internal::TorqueGeneratedName<v8::internal::Name, v8::internal::PrimitiveHeapObject> = {
v8::internal::PrimitiveHeapObject = {
v8::internal::TorqueGeneratedPrimitiveHeapObject<v8::internal::PrimitiveHeapObject, v8::internal::HeapObject> = {
v8::internal::HeapObject = {
v8::internal::Object = {
v8::internal::TaggedImpl<v8::internal::HeapObjectReferenceType::STRONG, unsigned long> = (ptr_ = 42318447256121)
}
}
}
}
}
}
}
}
(lldb) expr \$9.length()
(int32_t) \$10 = 5
(lldb) expr \$9.Print()
#Error
``````

These accessor functions declarations are generated by the `ROOT_LIST(ROOT_ACCESSOR))` macros:

``````#define ROOT_ACCESSOR(Type, name, CamelName) inline Handle<Type> name();
ROOT_LIST(ROOT_ACCESSOR)
#undef ROOT_ACCESSOR
``````

And the definitions can be found in `src/heap/factory-inl.h` and look like this The implementations then look like this:

``````String ReadOnlyRoots::Error_string() const {
return  String::unchecked_cast(Object(at(RootIndex::kError_string)));
}

return Handle<String>(&at(RootIndex::kError_string));
}
``````

The unit test roots_test shows and example of this.

This shows the usage of root entries but where are the roots added to this array. `roots_` is a member of `IsolateData` in `src/execution/isolate-data.h`:

``````  RootsTable roots_;
``````

We can inspect the roots_ content by using the interal Isolate:

``````(lldb) f
frame #0: 0x00007ffff6261cdf libv8.so`v8::Isolate::Initialize(isolate=0x00000eb900000000, params=0x00007fffffffd0d0) at api.cc:8269:31
8266    void Isolate::Initialize(Isolate* isolate,
8267                             const v8::Isolate::CreateParams& params) {

(lldb) expr i_isolate->isolate_data_.roots_
(v8::internal::RootsTable) \$5 = {
roots_ = {
[0] = 0
[1] = 0
[2] = 0
``````

So we can see that the roots are intially zero:ed out. And the type of `roots_` is an array of `Address`'s.

``````    frame #3: 0x00007ffff6c33d58 libv8.so`v8::internal::Deserializer::VisitRootPointers(this=0x00007fffffffcce0, root=kReadOnlyRootList, description=0x0000000000000000, start=FullObjectSlot @ 0x00007fffffffc530, end=FullObjectSlot @ 0x00007fffffffc528) at deserializer.cc:94:11
frame #4: 0x00007ffff6b6212f libv8.so`v8::internal::ReadOnlyRoots::Iterate(this=0x00007fffffffc5c8, visitor=0x00007fffffffcce0) at roots.cc:21:29
``````

This will land us in `roots.cc` ReadOnlyRoots::Iterate(RootVisitor* visitor):

``````void ReadOnlyRoots::Iterate(RootVisitor* visitor) {
}
``````

Deserializer::VisitRootPointers calls `Deserializer::ReadData` and the roots_ array is still zero:ed out when we enter this function.

``````void Deserializer::VisitRootPointers(Root root, const char* description,
FullObjectSlot start, FullObjectSlot end) {
``````

Notice that we called VisitRootPointer and pased in `Root:kReadOnlyRootList`, nullptr (the description), and start and end addresses as FullObjectSlots. The signature of `VisitRootPointers` looks like this:

``````virtual void VisitRootPointers(Root root, const char* description,
FullObjectSlot start, FullObjectSlot end)
``````

In our case we are using the address of `read_only_roots_` from `src/roots/roots.h` and the end is found by using the static member of ReadOnlyRoots::kEntrysCount.

The switch statement in `ReadData` is generated by macros so lets take a look at an expanded snippet to understand what is going on:

``````template <typename TSlot>
SnapshotSpace source_space,
Isolate* const isolate = isolate_;
...
while (current < limit) {
byte data = source_.Get();
``````

So current is the start address of the read_only_list and limit the end. `source_` is a member of `ReadOnlyDeserializer` and is of type SnapshotByteSource.

`source_` got populated back in Snapshot::Initialize(internal_isolate):

``````const v8::StartupData* blob = isolate->snapshot_blob();
``````

And `ReadOnlyDeserializer` extends `Deserialier` (src/snapshot/deserializer.h) which has a constructor that sets the source_ member to data->Payload(). So `source_` is will be pointer to an instance of `SnapshotByteSource` which can be found in `src/snapshot-source-sink.h`:

``````class SnapshotByteSource final {
public:
SnapshotByteSource(const char* data, int length)
: data_(reinterpret_cast<const byte*>(data)),
length_(length),
position_(0) {}

byte Get() {
return data_[position_++];
}
...
private:
const byte* data_;
int length_;
int posistion_;
``````

Alright, so we are calling source_.Get() which we can see returns the current entry from the byte array data_ and increment the position. So with that in mind lets take closer look at the switch statment:

``````  while (current < limit) {
byte data = source_.Get();
switch (data) {
case kNewObject + static_cast<int>(SnapshotSpace::kNew):
break;
case kNewObject + static_cast<int>(SnapshotSpace::kOld):
[[clang::fallthrough]];
case kNewObject + static_cast<int>(SnapshotSpace::kCode):
[[clang::fallthrough]];
case kNewObject + static_cast<int>(SnapshotSpace::kMap):
[[clang::fallthrough]];
...
``````

We can see that switch statement will assign the passed-in `current` with a new instance of `ReadDataCase`.

``````  current = ReadDataCase<TSlot, kNewObject, SnapshotSpace::kNew>(isolate,
``````

Notice that kNewObject is the type of SerializerDeserliazer::Bytecode that is to be read (I think), this enum can be found in `src/snapshot/serializer-common.h`. `TSlot` I think stands for the "Type of Slot", which in our case is a FullMaybyObjectSlot.

``````  HeapObject heap_object;
if (bytecode == kNewObject) {
``````

ReadObject is also in deserializer.cc :

``````Address address = allocator()->Allocate(space, size);
isolate_->heap()->OnAllocationEvent(obj, size);

Alright, lets set a watch point on the roots_ array to see when the first entry
is populated and try to figure this out that way:
```console
(lldb) watch set variable  isolate->isolate_data_.roots_.roots_[0]
Watchpoint created: Watchpoint 5: addr = 0xf7500000080 size = 8 state = enabled type = w
watchpoint spec = 'isolate->isolate_data_.roots_.roots_[0]'
new value: 0
(lldb) r

Watchpoint 5 hit:
old value: 0
new value: 16995320070433
Process 1687448 stopped
* thread #1, name = 'hello-world', stop reason = watchpoint 5
frame #0: 0x00007ffff664e5b1 libv8.so`v8::internal::FullMaybeObjectSlot::store(this=0x00007fffffffc3b0, value=MaybeObject @ 0x00007fffffffc370) const at slots-inl.h:74:1
71
72      void FullMaybeObjectSlot::store(MaybeObject value) const {
73        *location() = value.ptr();
-> 74      }
75
``````

We can verify that location actually contains the address of `roots_[0]`:

``````(lldb) expr -f hex -- this->ptr_
(lldb) expr -f hex -- &this->isolate_->isolate_data_.roots_.roots_[0]

(lldb) expr -f hex -- value.ptr()
(unsigned long) \$184 = 0x00000f7508040121
(lldb) expr -f hex -- isolate_->isolate_data_.roots_.roots_[0]
``````

The first entry is free_space_map.

``````(lldb) expr v8::internal::Map::unchecked_cast(v8::internal::Object(value->ptr()))
(v8::internal::Map) \$185 = {
v8::internal::HeapObject = {
v8::internal::Object = {
v8::internal::TaggedImpl<v8::internal::HeapObjectReferenceType::STRONG, unsigned long> = (ptr_ = 16995320070433)
}
}
``````

Next, we will go through the while loop again:

``````(lldb) expr -f hex -- isolate_->isolate_data_.roots_.roots_[1]
(lldb) expr -f hex -- &isolate_->isolate_data_.roots_.roots_[1]
(lldb) expr -f hex -- location()
(v8::internal::SlotBase<v8::internal::FullMaybeObjectSlot, unsigned long, 8>::TData *) \$194 = 0x00000f7500000088
``````

Notice that in Deserializer::Write we have:

``````  dest.store(value);
return dest + 1;
``````

And it's current value is:

``````(v8::internal::Address) \$197 = 0x00000f7500000088
``````

Which is the same address as roots_[1] that we just wrote to.

If we know the type that an Address points to we can use the Type::cast(Object obj) to cast it into a pointer of that type. I think this works will all types.

``````(lldb) expr -A -f hex  -- v8::internal::Oddball::cast(v8::internal::Object(isolate_->isolate_data_.roots_.roots_[4]))
(v8::internal::Oddball) \$258 = {
v8::internal::TorqueGeneratedOddball<v8::internal::Oddball, v8::internal::PrimitiveHeapObject> = {
v8::internal::PrimitiveHeapObject = {
v8::internal::TorqueGeneratedPrimitiveHeapObject<v8::internal::PrimitiveHeapObject, v8::internal::HeapObject> = {
v8::internal::HeapObject = {
v8::internal::Object = {
v8::internal::TaggedImpl<v8::internal::HeapObjectReferenceType::STRONG, unsigned long> = (ptr_ = 0x00000f750804030d)
}
}
}
}
}
}
``````

You can also just cast it to an object and try printing it:

``````(lldb) expr -A -f hex  -- v8::internal::Object(isolate_->isolate_data_.roots_.roots_[4]).Print()
#undefined
``````

This is actually the Oddball UndefinedValue so it makes sense in this case I think. With this value in the roots_ array we can use the function ReadOnlyRoots::undefined_value():

``````(lldb) expr v8::internal::ReadOnlyRoots(&isolate_->heap_).undefined_value()
(v8::internal::Oddball) \$265 = {
v8::internal::TorqueGeneratedOddball<v8::internal::Oddball, v8::internal::PrimitiveHeapObject> = {
v8::internal::PrimitiveHeapObject = {
v8::internal::TorqueGeneratedPrimitiveHeapObject<v8::internal::PrimitiveHeapObject, v8::internal::HeapObject> = {
v8::internal::HeapObject = {
v8::internal::Object = {
v8::internal::TaggedImpl<v8::internal::HeapObjectReferenceType::STRONG, unsigned long> = (ptr_ = 16995320070925)
}
}
}
}
}
}
``````

So how are these roots used, take the above `undefined_value` for example?
Well most things (perhaps all) that are needed go via the Factory which the internal Isolate is a type of. In factory we can find:

``````Handle<Oddball> Factory::undefined_value() {
return Handle<Oddball>(&isolate()->roots_table()[RootIndex::kUndefinedValue]);
}
``````

Notice that this is basically what we did in the debugger before but here it is wrapped in Handle so that it can be tracked by the GC.

The unit test isolate_test explores the internal isolate and has example of usages of the above mentioned methods.

InitwithSnapshot will call Isolate::Init:

``````bool Isolate::Init(ReadOnlyDeserializer* read_only_deserializer,
StartupDeserializer* startup_deserializer) {

#define ASSIGN_ELEMENT(CamelName, hacker_name)                  \
#undef ASSIGN_ELEMENT
``````
``````  Address isolate_addresses_[kIsolateAddressCount + 1] = {};
``````
``````(gdb) p isolate_addresses_
\$16 = {0 <repeats 13 times>}
``````

Lets take a look at the expanded code in Isolate::Init:

``````\$ clang++ -I./out/x64.release/gen -I. -I./include -E src/execution/isolate.cc > output
``````
``````isolate_addresses_[IsolateAddressId::kHandlerAddress] = reinterpret_cast<Address>(handler_address());
``````

Then functions, like handler_address() are implemented as:

``````inline Address* handler_address() { return &thread_local_top()->handler_; }
``````
``````(gdb) x/x isolate_addresses_[0]
0x1a3500003240:    0x00000000
``````

At this point in the program we have only set the entries to point contain the addresses specified in ThreadLocalTop, At the time there are initialized the will mostly be initialized to `kNullAddress`:

``````static const Address kNullAddress = 0;
``````

And notice that the functions above return pointers so later these pointers can be updated to point to something. What/when does this happen? Lets continue and find out...

Back in Isolate::Init we have:

``````  compilation_cache_ = new CompilationCache(this);
descriptor_lookup_cache_ = new DescriptorLookupCache();
inner_pointer_to_code_cache_ = new InnerPointerToCodeCache(this);
global_handles_ = new GlobalHandles(this);
eternal_handles_ = new EternalHandles();
bootstrapper_ = new Bootstrapper(this);
handle_scope_implementer_ = new HandleScopeImplementer(this);
store_stub_cache_ = new StubCache(this);
materialized_object_store_ = new MaterializedObjectStore(this);
regexp_stack_ = new RegExpStack();
regexp_stack_->isolate_ = this;
date_cache_ = new DateCache();
heap_profiler_ = new HeapProfiler(heap());
interpreter_ = new interpreter::Interpreter(this);

compiler_dispatcher_ =
new CompilerDispatcher(this, V8::GetCurrentPlatform(), FLAG_stack_size);

// SetUp the object heap.
DCHECK(!heap_.HasBeenSetUp());
heap_.SetUp();

...
``````

Lets take a look at `InitializeThreadLocal`

``````void Isolate::InitializeThreadLocal() {
clear_pending_exception();
clear_pending_message();
clear_scheduled_exception();
}
``````
``````void Isolate::clear_pending_exception() {
}
``````

``````#define ROOT_ACCESSOR(Type, name, CamelName) \
V8_INLINE class Type name() const;         \
V8_INLINE Handle<Type> name##_handle() const;

#undef ROOT_ACCESSOR
``````

This will expand to a number of function declarations that looks like this:

``````\$ clang++ -I./out/x64.release/gen -I. -I./include -E src/roots/roots.h > output
``````
``````inline __attribute__((always_inline)) class Map free_space_map() const;
inline __attribute__((always_inline)) Handle<Map> free_space_map_handle() const;
``````

The Map class is what all HeapObject use to describe their structure. Notice that there is also a Handle declared. These are generated by a macro in roots-inl.h:

``````Map ReadOnlyRoots::free_space_map() const {
((void) 0);
return Map::unchecked_cast(Object(at(RootIndex::kFreeSpaceMap)));
}

((void) 0);
return Handle<Map>(&at(RootIndex::kFreeSpaceMap));
}
``````

Notice that this is using the RootIndex enum that was mentioned earlier:

``````  return Map::unchecked_cast(Object(at(RootIndex::kFreeSpaceMap)));
``````

In object/map.h there is the following line:

``````  DECL_CAST(Map)
``````

Which can be found in objects/object-macros.h:

``````#define DECL_CAST(Type)                                 \
V8_INLINE static Type cast(Object object);            \
V8_INLINE static Type unchecked_cast(Object object) { \
return bit_cast<Type>(object);                      \
}
``````

This will expand to something like

``````  static Map cast(Object object);
static Map unchecked_cast(Object object) {
return bit_cast<Map>(object);
}
``````

And the `Object` part is the Object contructor that takes an Address:

``````  explicit constexpr Object(Address ptr) : TaggedImpl(ptr) {}
``````

That leaves the at function which is a private function in ReadOnlyRoots:

``````  V8_INLINE Address& at(RootIndex root_index) const;
``````

So we are now back in Isolate::Init after the call to InitializeThreadLocal we have:

``````setup_delegate_->SetupBuiltins(this);
``````

In the following line in api.cc, where does `i::OBJECT_TEMPLATE_INFO_TYPE` come from:

``````  i::Handle<i::Struct> struct_obj = isolate->factory()->NewStruct(
i::OBJECT_TEMPLATE_INFO_TYPE, i::AllocationType::kOld);
``````

### InstanceType

The enum `InstanceType` is defined in `src/objects/instance-type.h`:

``````#include "torque-generated/instance-types-tq.h"

enum InstanceType : uint16_t {
...
#define MAKE_TORQUE_INSTANCE_TYPE(TYPE, value) TYPE = value,
TORQUE_ASSIGNED_INSTANCE_TYPES(MAKE_TORQUE_INSTANCE_TYPE)
#undef MAKE_TORQUE_INSTANCE_TYPE
...
};
``````

And in `gen/torque-generated/instance-types-tq.h` we can find:

``````#define TORQUE_ASSIGNED_INSTANCE_TYPES(V) \
...
V(OBJECT_TEMPLATE_INFO_TYPE, 79) \
...
``````

There is list in `src/objects/objects-definitions.h`:

``````#define STRUCT_LIST_GENERATOR_BASE(V, _)                                      \
...
V(_, OBJECT_TEMPLATE_INFO_TYPE, ObjectTemplateInfo, object_template_info)   \
...
``````
``````template <typename Impl>
Handle<Struct> FactoryBase<Impl>::NewStruct(InstanceType type,
AllocationType allocation) {
``````

If we look in `Map::GetInstanceTypeMap` in map.cc we find:

``````  Map map;
switch (type) {
#define MAKE_CASE(TYPE, Name, name) \
case TYPE:                        \
map = roots.name##_map();       \
break;
STRUCT_LIST(MAKE_CASE)
#undef MAKE_CASE
``````

Now, we know that our type is:

``````(gdb) p type
\$1 = v8::internal::OBJECT_TEMPLATE_INFO_TYPE
``````
``````    map = roots.object_template_info_map();       \
``````

And we can inspect the output of the preprocessor of roots.cc and find:

``````Map ReadOnlyRoots::object_template_info_map() const {
((void) 0);
return Map::unchecked_cast(Object(at(RootIndex::kObjectTemplateInfoMap)));
}
``````

And this is something we have seen before.

One things I ran into was wanting to print the InstanceType using the overloaded << operator which is defined for the InstanceType in objects.cc.

``````std::ostream& operator<<(std::ostream& os, InstanceType instance_type) {
switch (instance_type) {
#define WRITE_TYPE(TYPE) \
case TYPE:             \
return os << #TYPE;
INSTANCE_TYPE_LIST(WRITE_TYPE)
#undef WRITE_TYPE
}
UNREACHABLE();
}
``````

The code I'm using is the followig:

``````  i::InstanceType type = map.instance_type();
std::cout << "object_template_info_map type: " << type << '\n';
``````

This will cause the `UNREACHABLE()` function to be called and a Fatal error thrown. But note that the following line works:

``````  std::cout << "object_template_info_map type: " << v8::internal::OBJECT_TEMPLATE_INFO_TYPE << '\n';
``````

And prints

``````object_template_info_map type: OBJECT_TEMPLATE_INFO_TYPE
``````

In the switch/case block above the case for this value is:

``````  case OBJECT_TEMPLATE_INFO_TYPE:
return os << "OBJECT_TEMPLATE_INFO_TYPE"
``````

When map.instance_type() is called, it returns a value of `1023` but the value of OBJECT_TEMPLATE_INFO_TYPE is:

``````OBJECT_TEMPLATE_INFO_TYPE = 79
``````

And we can confirm this using:

``````  std::cout << "object_template_info_map type: " << static_cast<uint16_t>(v8::internal::OBJECT_TEMPLATE_INFO_TYPE) << '\n';
``````

Which will print:

``````object_template_info_map type: 79
``````

### Context creation

When we create a new context using:

``````  Local<ObjectTemplate> global = ObjectTemplate::New(isolate_);
Local<Context> context = Context::New(isolate_, nullptr, global);
``````

The Context class in `include/v8.h` declares New as follows:

``````static Local<Context> New(Isolate* isolate,
ExtensionConfiguration* extensions = nullptr,
MaybeLocal<ObjectTemplate> global_template = MaybeLocal<ObjectTemplate>(),
MaybeLocal<Value> global_object = MaybeLocal<Value>(),
DeserializeInternalFieldsCallback internal_fields_deserializer = DeserializeInternalFieldsCallback(),
``````

When a step into Context::New(isolate_, nullptr, global) this will first break in the constructor of DeserializeInternalFieldsCallback in v8.h which has default values for the callback function and data_args (both are nullptr). After that gdb will break in MaybeLocal and setting val_ to nullptr. Next it will break in Local::operator* for the value of `global` which is then passed to the MaybeLocalv8::ObjectTemplate constructor. After those break points the break point will be in api.cc and v8::Context::New. New will call NewContext in api.cc.

There will be some checks and logging/tracing and then a call to CreateEnvironment:

``````i::Handle<i::Context> env = CreateEnvironment<i::Context>(
isolate,
extensions,
global_template,
global_object,
context_snapshot_index,
embedder_fields_deserializer,
``````

The first line in CreateEnironment is:

``````ENTER_V8_FOR_NEW_CONTEXT(isolate);
``````

Which is a macro defined in api.cc

``````i::VMState<v8::OTHER> __state__((isolate)); \
i::DisallowExceptions __no_exceptions__((isolate))
``````

So the first break point we break on will be the execution/vm-state-inl.h and VMState's constructor:

``````template <StateTag Tag>
VMState<Tag>::VMState(Isolate* isolate)
: isolate_(isolate), previous_tag_(isolate->current_vm_state()) {
isolate_->set_current_vm_state(Tag);
}
``````

In gdb you'll see this:

``````(gdb) s
v8::internal::VMState<(v8::StateTag)5>::VMState (isolate=0x372500000000, this=<synthetic pointer>) at ../../src/api/api.cc:6005
(gdb) s
v8::internal::Isolate::current_vm_state (this=0x372500000000) at ../../src/execution/isolate.h:1072
``````

Notice that VMState's constructor sets its `previous_tag_` to isolate->current_vm_state() which is generated by the macro THREAD_LOCAL_TOP_ACCESSOR. The next break point will be:

``````#0  v8::internal::PerIsolateAssertScopeDebugOnly<(v8::internal::PerIsolateAssertType)5, false>::PerIsolateAssertScopeDebugOnly (
isolate=0x372500000000, this=0x7ffc7b51b500) at ../../src/common/assert-scope.h:107
107      explicit PerIsolateAssertScopeDebugOnly(Isolate* isolate)
``````

We can find that `DisallowExceptions` is defined in src/common/assert-scope.h as:

``````using DisallowExceptions =
PerIsolateAssertScopeDebugOnly<NO_EXCEPTION_ASSERT, false>;
``````

After all that we can start to look at the code in CreateEnvironment.

``````    // Create the environment.
InvokeBootstrapper<ObjectType> invoke;
result = invoke.Invoke(isolate, maybe_proxy, proxy_template, extensions,
context_snapshot_index, embedder_fields_deserializer,

template <typename ObjectType>
struct InvokeBootstrapper;

template <>
struct InvokeBootstrapper<i::Context> {
i::Handle<i::Context> Invoke(
i::Isolate* isolate, i::MaybeHandle<i::JSGlobalProxy> maybe_global_proxy,
v8::Local<v8::ObjectTemplate> global_proxy_template,
v8::ExtensionConfiguration* extensions, size_t context_snapshot_index,
v8::DeserializeInternalFieldsCallback embedder_fields_deserializer,
return isolate->bootstrapper()->CreateEnvironment(
maybe_global_proxy, global_proxy_template, extensions,
}
};
``````

Bootstrapper can be found in `src/init/bootstrapper.cc`:

``````HandleScope scope(isolate_);
Handle<Context> env;
{
Genesis genesis(isolate_, maybe_global_proxy, global_proxy_template,
context_snapshot_index, embedder_fields_deserializer,
env = genesis.result();
if (env.is_null() || !InstallExtensions(env, extensions)) {
return Handle<Context>();
}
}
``````

Notice that the break point will be in the HandleScope constructor. Then a new instance of Genesis is created which performs some actions in its constructor.

``````global_proxy = isolate->factory()->NewUninitializedJSGlobalProxy(instance_size);
``````

This will land in factory.cc:

``````Handle<Map> map = NewMap(JS_GLOBAL_PROXY_TYPE, size);
``````

`size` will be 16 in this case. `NewMap` is declared in factory.h which has default values for its parameters:

``````  Handle<Map> NewMap(InstanceType type, int instance_size,
ElementsKind elements_kind = TERMINAL_FAST_ELEMENTS_KIND,
int inobject_properties = 0);
``````

In Factory::InitializeMap we have the following check:

``````DCHECK_EQ(map.GetInObjectProperties(), inobject_properties);
``````

Remember that I called `Context::New` with the following arguments:

``````  Local<ObjectTemplate> global = ObjectTemplate::New(isolate_);
Local<Context> context = Context::New(isolate_, nullptr, global);
``````

### TaggedImpl

Has a single private member which is declared as:

``````StorageType ptr_;
``````

An instance can be created using:

``````  i::TaggedImpl<i::HeapObjectReferenceType::STRONG, i::Address>  tagged{};
``````

Storage type can also be `Tagged_t` which is defined in globals.h:

`````` using Tagged_t = uint32_t;
``````

It looks like it can be a different value when using pointer compression.

### Object (internal)

This class extends TaggedImpl:

``````class Object : public TaggedImpl<HeapObjectReferenceType::STRONG, Address> {
``````

An Object can be created using the default constructor, or by passing in an Address which will delegate to TaggedImpl constructors. Object itself does not have any members (apart from ptr_ which is inherited from TaggedImpl that is). So if we create an Object on the stack this is like a pointer/reference to an object:

``````+------+
|Object|
|------|
|ptr_  |---->
+------+
``````

Now, `ptr_` is a TaggedImpl so it would be a Smi in which case it would just contains the value directly, for example a small integer:

``````+------+
|Object|
|------|
|  18  |
+------+
``````

### Handle

A Handle is similar to a Object and ObjectSlot in that it also contains an Address member (called location_ and declared in HandleBase), but with the difference is that Handles can be relocated by the garbage collector.

### NewContext

When we create a new context using:

``````const v8::Local<v8::ObjectTemplate> obt = v8::Local<v8::ObjectTemplate>();
v8::Handle<v8::Context> context = v8::Context::New(isolate_, nullptr, obt);
``````

The above is using the static function New declared in `include/v8.h`

``````static Local<Context> New(
Isolate* isolate,
ExtensionConfiguration* extensions = nullptr,
MaybeLocal<ObjectTemplate> global_template = MaybeLocal<ObjectTemplate>(),
MaybeLocal<Value> global_object = MaybeLocal<Value>(),
DeserializeInternalFieldsCallback internal_fields_deserializer = DeserializeInternalFieldsCallback(),
``````

The implementation for this function can be found in `src/api/api.cc` How does a Local become a MaybeLocal in this above case?
This is because MaybeLocal has a constructor that takes a `Local<S>` and this will be casted into the `val_` member of the MaybeLocal instance.

TODO

### What is the difference between a Local and a Handle?

Currently, the torque generator will generate Print functions that look like the following:

``````template <>
void TorqueGeneratedEnumCache<EnumCache, Struct>::EnumCachePrint(std::ostream& os) {
os << "\n - keys: " << Brief(this->keys());
os << "\n - indices: " << Brief(this->indices());
os << "\n";
}
``````

Notice the last line where the newline character is printed as a string. This would just be a char instead `'\n'`.

There are a number of things that need to happen only once upon startup for each process. These things are placed in `V8::InitializeOncePerProcessImpl` which can be found in `src/init/v8.cc`. This is called by v8::V8::Initialize().

``````  CpuFeatures::Probe(false);
ElementsAccessor::InitializeOncePerProcess();
Bootstrapper::InitializeOncePerProcess();
CallDescriptors::InitializeOncePerProcess();
wasm::WasmEngine::InitializeOncePerProcess();
``````

ElementsAccessor populates the accessor_array with Elements listed in `ELEMENTS_LIST`. TODO: take a closer look at Elements.

v8::Isolate::Initialize will set up the heap.

``````i_isolate->heap()->ConfigureHeap(params.constraints);
``````

It is when we create an new Context that Genesis is created. This will call Snapshot::NewContextFromSnapshot. So the context is read from the StartupData* blob with ExtractContextData(blob).

What is the global proxy?

### Builtins runtime error

Builtins is a member of Isolate and an instance is created by the Isolate constructor. We can inspect the value of `initialized_` and that it is false:

``````(gdb) p *this->builtins()
\$3 = {static kNoBuiltinId = -1, static kFirstWideBytecodeHandler = 1248, static kFirstExtraWideBytecodeHandler = 1398,
static kLastBytecodeHandlerPlusOne = 1548, static kAllBuiltinsAreIsolateIndependent = true, isolate_ = 0x0, initialized_ = false,
js_entry_handler_offset_ = 0}
``````

The above is printed form Isolate's constructor and it is not changes in the contructor.

This is very strange, while I though that the `initialized_` was being updated it now looks like there might be two instances, one with has this value as false and the other as true. And also one has a nullptr as the isolate and the other as an actual value. For example, when I run the hello-world example:

``````\$4 = (v8::internal::Builtins *) 0x33b20000a248
(gdb) p &builtins_
\$5 = (v8::internal::Builtins *) 0x33b20000a248
``````

Notice that these are poiting to the same location in memory.

``````(gdb) p &builtins_
\$1 = (v8::internal::Builtins *) 0x25210000a248
(gdb) p builtins()
\$2 = (v8::internal::Builtins *) 0x25210000a228
``````

Alright, so after looking into this closer I noticed that I was including internal headers in the test itself. When I include `src/builtins/builtins.h` I will get an implementation of isolate->builtins() in the object file which is in the shared library libv8.so, but the field is part of object file that is part of the cctest. This will be a different method and not the method that is in libv8_v8.so shared library.

As I'm only interested in exploring v8 internals and my goal is only for each unit test to verify my understanding I've statically linked those object files needed, like builtins.o and code.o to the test.

`````` Fatal error in ../../src/snapshot/read-only-deserializer.cc, line 35
# Debug check failed: !isolate->builtins()->is_initialized().
#
#
#
#FailureMessage Object: 0x7ffed92ceb20
==== C stack trace ===============================

/home/danielbevenius/work/google/v8_src/v8/out/x64.release_gcc/libv8_libbase.so(V8_Fatal(char const*, int, char const*, ...)+0x172) [0x7fabe6c2416d]
/home/danielbevenius/work/google/v8_src/v8/out/x64.release_gcc/libv8_libbase.so(V8_Dcheck(char const*, int, char const*)+0x2d) [0x7fabe6c241b1]
./test/builtins_test() [0x4135a2]
./test/builtins_test() [0x43a1b7]
./test/builtins_test() [0x434c99]
./test/builtins_test() [0x41a3a7]
./test/builtins_test() [0x41aafb]
./test/builtins_test() [0x41b085]
./test/builtins_test() [0x4238e0]
./test/builtins_test() [0x43b1aa]
./test/builtins_test() [0x435773]
./test/builtins_test() [0x422836]
./test/builtins_test() [0x412ea4]
./test/builtins_test() [0x412e3d]
/lib64/libc.so.6(__libc_start_main+0xf3) [0x7fabe66b31a3]
./test/builtins_test() [0x412d5e]
Illegal instruction (core dumped)
``````

The issue here is that I'm including the header in the test, which means that code will be in the object code of the test, while the implementation part will be in the linked dynamic library which is why these are pointing to different areas in memory. The one retreived by the function call will use the

### Goma

I've goma referenced in a number of places so just makeing a note of what it is here: Goma is googles internal distributed compile service.

### WebAssembly

This section is going to take a closer look at how wasm works in V8.

We can use a wasm module like this:

``````  const buffer = fixtures.readSync('add.wasm');
const module = new WebAssembly.Module(buffer);
const instance = new WebAssembly.Instance(module);
``````

Where is the WebAssembly object setup? We have sen previously that objects and function are added in `src/init/bootstrapper.cc` and for Wasm there is a function named Genisis::InstallSpecialObjects which calls:

``````  WasmJs::Install(isolate, true);
``````

This call will land in `src/wasm/wasm-js.cc` where we can find:

``````void WasmJs::Install(Isolate* isolate, bool exposed_on_global_object) {
...
Handle<String> name = v8_str(isolate, "WebAssembly")
...
NewFunctionArgs args = NewFunctionArgs::ForFunctionWithoutCode(
name, isolate->strict_function_map(), LanguageMode::kStrict);
Handle<JSFunction> cons = factory->NewFunction(args);
JSFunction::SetPrototype(cons, isolate->initial_object_prototype());
Handle<JSObject> webassembly =
factory->NewJSObject(cons, AllocationType::kOld);
name, ro_attributes);

InstallFunc(isolate, webassembly, "compile", WebAssemblyCompile, 1);
InstallFunc(isolate, webassembly, "validate", WebAssemblyValidate, 1);
InstallFunc(isolate, webassembly, "instantiate", WebAssemblyInstantiate, 1);
...
Handle<JSFunction> module_constructor =
InstallConstructorFunc(isolate, webassembly, "Module", WebAssemblyModule);
...
}
``````

And all the rest of the functions that are available on the `WebAssembly` object are setup in the same function.

``````(lldb) br s -name Genesis::InstallSpecialObjects
``````

Now, lets also set a break point in WebAssemblyModule:

``````(lldb) br s -n WebAssemblyModule
(lldb) r
``````
``````  v8::Isolate* isolate = args.GetIsolate();
i::Isolate* i_isolate = reinterpret_cast<i::Isolate*>(isolate);
if (i_isolate->wasm_module_callback()(args)) return;
``````

Notice the `wasm_module_callback()` function which is a function that is setup on the internal Isolate in `src/execution/isolate.h`:

``````#define ISOLATE_INIT_LIST(V)                                                   \
...
V(ExtensionCallback, wasm_module_callback, &NoExtension)                     \
V(ExtensionCallback, wasm_instance_callback, &NoExtension)                   \
V(WasmStreamingCallback, wasm_streaming_callback, nullptr)                   \

#define GLOBAL_ACCESSOR(type, name, initialvalue)                \
inline type name() const {                                     \
DCHECK(OFFSET_OF(Isolate, name##_) == name##_debug_offset_); \
return name##_;                                              \
}                                                              \
inline void set_##name(type value) {                           \
DCHECK(OFFSET_OF(Isolate, name##_) == name##_debug_offset_); \
name##_ = value;                                             \
}
ISOLATE_INIT_LIST(GLOBAL_ACCESSOR)
#undef GLOBAL_ACCESSOR
``````

So this would be expanded by the preprocessor into:

``````inline ExtensionCallback wasm_module_callback() const {
((void) 0);
return wasm_module_callback_;
}
inline void set_wasm_module_callback(ExtensionCallback value) {
((void) 0);
wasm_module_callback_ = value;
}
``````

Also notice that if `wasm_module_callback()` return true the `WebAssemblyModule` fuction will return and no further processing of the instructions in that function will be done. `NoExtension` is a function that looks like this:

``````bool NoExtension(const v8::FunctionCallbackInfo<v8::Value>&) { return false; }
``````

And is set as the default function for module/instance callbacks.

Looking a little further we can see checks for WASM Threads support (TODO: take a look at this). And then we have:

``````  module_obj = i_isolate->wasm_engine()->SyncCompile(
i_isolate, enabled_features, &thrower, bytes);
``````

`SyncCompile` can be found in `src/wasm/wasm-engine.cc` and will call `DecodeWasmModule` which can be found in `src/wasm/module-decoder.cc`.

``````ModuleResult result = DecodeWasmModule(enabled, bytes.start(), bytes.end(),
false, kWasmOrigin,
isolate->counters(), allocator());
``````
``````ModuleResult DecodeWasmModule(const WasmFeatures& enabled,
const byte* module_start, const byte* module_end,
bool verify_functions, ModuleOrigin origin,
Counters* counters,
AccountingAllocator* allocator) {
...
ModuleDecoderImpl decoder(enabled, module_start, module_end, origin);
return decoder.DecodeModule(counters, allocator, verify_functions);
``````

``````  uint32_t magic_word = consume_u32("wasm magic");
``````

This will land in `src/wasm/decoder.h` consume_little_endian(name):

``````
``````

A wasm module has the following preamble:

``````magic nr: 0x6d736100
version: 0x1
``````

These can be found as a constant in `src/wasm/wasm-constants.h`:

``````constexpr uint32_t kWasmMagic = 0x6d736100;
constexpr uint32_t kWasmVersion = 0x01;
``````

After the DecodeModuleHeader the code will iterate of the sections (type, import, function, table, memory, global, export, start, element, code, data, custom). For each section `DecodeSection` will be called:

``````DecodeSection(section_iter.section_code(), section_iter.payload(),
offset, verify_functions);
``````

There is an enum named `SectionCode` in `src/wasm/wasm-constants.h` which contains the various sections which is used in switch statement in DecodeSection . Depending on the `section_code` there are DecodeSection methods that will be called. In our case section_code is:

``````(lldb) expr section_code
(v8::internal::wasm::SectionCode) \$5 = kTypeSectionCode
``````

And this will match the `kTypeSectionCode` and `DecodeTypeSection` will be called.

ValueType can be found in `src/wasm/value-type.h` and there are types for each of the currently supported types:

``````constexpr ValueType kWasmI32 = ValueType(ValueType::kI32);
constexpr ValueType kWasmI64 = ValueType(ValueType::kI64);
constexpr ValueType kWasmF32 = ValueType(ValueType::kF32);
constexpr ValueType kWasmF64 = ValueType(ValueType::kF64);
constexpr ValueType kWasmAnyRef = ValueType(ValueType::kAnyRef);
constexpr ValueType kWasmExnRef = ValueType(ValueType::kExnRef);
constexpr ValueType kWasmFuncRef = ValueType(ValueType::kFuncRef);
constexpr ValueType kWasmNullRef = ValueType(ValueType::kNullRef);
constexpr ValueType kWasmS128 = ValueType(ValueType::kS128);
constexpr ValueType kWasmStmt = ValueType(ValueType::kStmt);
constexpr ValueType kWasmBottom = ValueType(ValueType::kBottom);
``````

`FunctionSig` is declared with a `using` statement in value-type.h:

``````using FunctionSig = Signature<ValueType>;
``````

We can find `Signature` in src/codegen/signature.h:

``````template <typename T>
class Signature : public ZoneObject {
public:
constexpr Signature(size_t return_count, size_t parameter_count,
const T* reps)
: return_count_(return_count),
parameter_count_(parameter_count),
reps_(reps) {}
``````

The return count can be zero, one (or greater if multi-value return types are enabled). The parameter count also makes sense, but reps is not clear to me what that represents.

``````(lldb) fr v
(v8::internal::Signature<v8::internal::wasm::ValueType> *) this = 0x0000555555583950
(size_t) return_count = 1
(size_t) parameter_count = 2
(const v8::internal::wasm::ValueType *) reps = 0x0000555555583948
``````

Before the call to `Signature`s construtor we have:

``````    // FunctionSig stores the return types first.
ValueType* buffer = zone->NewArray<ValueType>(param_count + return_count);
uint32_t b = 0;
for (uint32_t i = 0; i < return_count; ++i) buffer[b++] = returns[i];
for (uint32_t i = 0; i < param_count; ++i) buffer[b++] = params[i];

return new (zone) FunctionSig(return_count, param_count, buffer);
``````

So `reps_` contains the return (re?) and the params (ps?).

After the DecodeWasmModule has returned in SyncCompile we will have a ModuleResult. This will be compiled to NativeModule:

``````ModuleResult result =
DecodeWasmModule(enabled, bytes.start(), bytes.end(), false, kWasmOrigin,
isolate->counters(), allocator());
Handle<FixedArray> export_wrappers;
std::shared_ptr<NativeModule> native_module =
CompileToNativeModule(isolate, enabled, thrower,
std::move(result).value(), bytes, &export_wrappers);
``````

`CompileToNativeModule` can be found in `module-compiler.cc`

TODO: CompileNativeModule...

There is an example in wasm_test.cc.

### ExtensionCallback

Is a typedef defined in `include/v8.h`:

``````typedef bool (*ExtensionCallback)(const FunctionCallbackInfo<Value>&);
``````

### JSEntry

TODO: This section should describe the functions calls below.

`````` * frame #0: 0x00007ffff79a52e4 libv8.so`v8::(anonymous namespace)::WebAssemblyModule(v8::FunctionCallbackInfo<v8::Value> const&) [inlined] v8::FunctionCallbackInfo<v8::Value>::GetIsolate(this=0x00007fffffffc9a0) const at v8.h:11204:40
frame #1: 0x00007ffff79a52e4 libv8.so`v8::(anonymous namespace)::WebAssemblyModule(args=0x00007fffffffc9a0) at wasm-js.cc:638
frame #2: 0x00007ffff6fe9e92 libv8.so`v8::internal::FunctionCallbackArguments::Call(this=0x00007fffffffca40, handler=CallHandlerInfo @ 0x00007fffffffc998) at api-arguments-inl.h:158:3
frame #3: 0x00007ffff6fe7c42 libv8.so`v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<true>(isolate=<unavailable>, function=Handle<v8::internal::HeapObject> @ 0x00007fffffffca20, new_target=<unavailable>, fun_data=<unavailable>, receiver=<unavailable>, args=BuiltinArguments @ 0x00007fffffffcae0) at builtins-api.cc:111:36
frame #4: 0x00007ffff6fe67d4 libv8.so`v8::internal::Builtin_Impl_HandleApiCall(args=BuiltinArguments @ 0x00007fffffffcb20, isolate=0x00000f8700000000) at builtins-api.cc:137:5
frame #5: 0x00007ffff6fe6319 libv8.so`v8::internal::Builtin_HandleApiCall(args_length=6, args_object=0x00007fffffffcc10, isolate=0x00000f8700000000) at builtins-api.cc:129:1
frame #6: 0x00007ffff6b2c23f libv8.so`Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit + 63
frame #7: 0x00007ffff68fde25 libv8.so`Builtins_JSBuiltinsConstructStub + 101
frame #8: 0x00007ffff6daf46d libv8.so`Builtins_ConstructHandler + 1485
frame #9: 0x00007ffff690e1d5 libv8.so`Builtins_InterpreterEntryTrampoline + 213
frame #10: 0x00007ffff6904b5a libv8.so`Builtins_JSEntryTrampoline + 90
frame #11: 0x00007ffff6904938 libv8.so`Builtins_JSEntry + 120
frame #12: 0x00007ffff716ba0c libv8.so`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [inlined] v8::internal::GeneratedCode<unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long**>::Call(this=<unavailable>, args=17072495001600, args=<unavailable>, args=17072631376141, args=17072630006049, args=<unavailable>, args=<unavailable>) at simulator.h:142:12
frame #13: 0x00007ffff716ba01 libv8.so`v8::internal::(anonymous namespace)::Invoke(isolate=<unavailable>, params=0x00007fffffffcf50)::InvokeParams const&) at execution.cc:367
frame #14: 0x00007ffff716aa10 libv8.so`v8::internal::Execution::Call(isolate=0x00000f8700000000, callable=<unavailable>, receiver=<unavailable>, argc=<unavailable>, argv=<unavailable>) at execution.cc:461:10
``````

### CustomArguments

Subclasses of CustomArguments, like PropertyCallbackArguments and FunctionCallabackArguments are used for setting up and accessing values on the stack, and also the subclasses provide methods to call various things like `CallNamedSetter` for PropertyCallbackArguments and `Call` for FunctionCallbackArguments.

#### FunctionCallbackArguments

``````class FunctionCallbackArguments
: public CustomArguments<FunctionCallbackInfo<Value> > {
FunctionCallbackArguments(internal::Isolate* isolate, internal::Object data,
internal::HeapObject callee,
internal::Object holder,
internal::HeapObject new_target,
``````

This class is in the namespace v8::internal so I'm curious why the explicit namespace is used here?

#### BuiltinArguments

This class extends `JavaScriptArguments`

``````class BuiltinArguments : public JavaScriptArguments {
public:
: Arguments(length, arguments) {

static constexpr int kNewTargetOffset = 0;
static constexpr int kTargetOffset = 1;
static constexpr int kArgcOffset = 2;
static constexpr int kPaddingOffset = 3;

static constexpr int kNumExtraArgs = 4;
static constexpr int kNumExtraArgsWithReceiver = 5;
``````

`JavaScriptArguments is declared in `src/common/global.h`:

``````using JavaScriptArguments = Arguments<ArgumentsType::kJS>;
``````

`Arguments` can be found in `src/execution/arguments.h`and is templated with the a type of `ArgumentsType` (in `src/common/globals.h`):

``````enum class ArgumentsType {
kRuntime,
kJS,
};
``````

An instance of Arguments only has a length which is the number of arguments, and an Address pointer which points to the first argument. The functions it provides allows for getting/setting specific arguments and handling various types (like `Handle<S>`, smi, etc). It also overloads the operator[] allowing to specify an index and getting back an Object to that argument. In `BuiltinArguments` the constants specify the index's and provides functions to get them:

``````  inline Handle<Object> receiver() const;
inline Handle<JSFunction> target() const;
inline Handle<HeapObject> new_target() const;
``````

### NativeContext

Can be found in `src/objects/contexts.h` and has the following definition:

``````class NativeContext : public Context {
public:

inline OSROptimizedCodeCache GetOSROptimizedCodeCache();
void ResetErrorsThrown();
void IncrementErrorsThrown();
int GetErrorsThrown();
``````

`src/parsing/parser.h` we can find:

``````class V8_EXPORT_PRIVATE Parser : public NON_EXPORTED_BASE(ParserBase<Parser>) {
...
enum CompletionKind {
kNormalCompletion,
kThrowCompletion,
kAbruptCompletion
};
``````

But I can't find any usages of this enum?

#### Internal fields/methods

When you see something like [[Notation]] you can think of this as a field in an object that is not exposed to JavaScript user code but internal to the JavaScript engine. These can also be used for internal methods.

Author: Danbev
Source Code: https://github.com/danbev/learning-v8

1648972740

## Generis: Versatile Go Code Generator

Generis

Versatile Go code generator.

## Description

Generis is a lightweight code preprocessor adding the following features to the Go language :

• Generics.
• Free-form macros.
• Conditional compilation.
• HTML templating.
• Allman style conversion.

## Sample

``````package main;

// -- IMPORTS

import (
"html"
"io"
"log"
"net/http"
"net/url"
"strconv"
);

// -- DEFINITIONS

#define DebugMode
#as true

// ~~

#define HttpPort
#as 8080

// ~~

#define WriteLine( {{text}} )
#as log.Println( {{text}} )

// ~~

#define local {{variable}} : {{type}};
#as var {{variable}} {{type}};

// ~~

#define DeclareStack( {{type}}, {{name}} )
#as
// -- TYPES

type {{name}}Stack struct
{
ElementArray []{{type}};
}

// -- INQUIRIES

func ( stack * {{name}}Stack ) IsEmpty(
) bool
{
return len( stack.ElementArray ) == 0;
}

// -- OPERATIONS

func ( stack * {{name}}Stack ) Push(
element {{type}}
)
{
stack.ElementArray = append( stack.ElementArray, element );
}

// ~~

func ( stack * {{name}}Stack ) Pop(
) {{type}}
{
local
element : {{type}};

element = stack.ElementArray[ len( stack.ElementArray ) - 1 ];

stack.ElementArray = stack.ElementArray[ : len( stack.ElementArray ) - 1 ];

return element;
}
#end

// ~~

#define DeclareStack( {{type}} )
#as DeclareStack( {{type}}, {{type:PascalCase}} )

// -- TYPES

DeclareStack( string )
DeclareStack( int32 )

// -- FUNCTIONS

func HandleRootPage(
response_writer http.ResponseWriter,
request * http.Request
)
{
local
boolean : bool;
local
natural : uint;
local
integer : int;
local
real : float64;
local
escaped_html_text,
escaped_url_text,
text : string;
local
integer_stack : Int32Stack;

boolean = true;
natural = 10;
integer = 20;
real = 30.0;
text = "text";
escaped_url_text = "&escaped text?";
escaped_html_text = "<escaped text/>";

integer_stack.Push( 10 );
integer_stack.Push( 20 );
integer_stack.Push( 30 );

#write response_writer
<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<title><%= request.URL.Path %></title>
<body>
<% if ( boolean ) { %>
<%= "URL : " + request.URL.Path %>
<br/>
<%@ natural %>
<%# integer %>
<%& real %>
<br/>
<%~ text %>
<%^ escaped_url_text %>
<%= escaped_html_text %>
<%= "<%% ignored %%>" %>
<%% ignored %%>
<% } %>
<br/>
Stack :
<br/>
<% for !integer_stack.IsEmpty() { %>
<%# integer_stack.Pop() %>
<% } %>
</body>
</html>
#end
}

// ~~

func main()
{
http.HandleFunc( "/", HandleRootPage );

#if DebugMode
WriteLine( "Listening on http://localhost:HttpPort" );
#end

log.Fatal(
http.ListenAndServe( ":HttpPort", nil )
);
}
``````

## Syntax

### #define directive

Constants and generic code can be defined with the following syntax :

``````#define old code
#as new code

#define old code
#as
new
code
#end

#define
old
code
#as new code

#define
old
code
#as
new
code
#end
``````

#### #define parameter

The `#define` directive can contain one or several parameters :

``````{{variable name}} : hierarchical code (with properly matching brackets and parentheses)
{{variable name#}} : statement code (hierarchical code without semicolon)
{{variable name\$}} : plain code
{{variable name:boolean expression}} : conditional hierarchical code
{{variable name#:boolean expression}} : conditional statement code
{{variable name\$:boolean expression}} : conditional plain code
``````

They can have a boolean expression to require they match specific conditions :

``````HasText text
HasPrefix prefix
HasSuffix suffix
HasIdentifier text
false
true
!expression
expression && expression
expression || expression
( expression )
``````

The `#define` directive must not start or end with a parameter.

#### #as parameter

The `#as` directive can use the value of the `#define` parameters :

``````{{variable name}}
{{variable name:filter function}}
{{variable name:filter function:filter function:...}}
``````

Their value can be changed through one or several filter functions :

``````LowerCase
UpperCase
MinorCase
MajorCase
SnakeCase
PascalCase
CamelCase
RemoveBlanks
PackStrings
PackIdentifiers
ReplacePrefix old_prefix new_prefix
ReplaceSuffix old_suffix new_suffix
ReplaceText old_text new_text
ReplaceIdentifier old_identifier new_identifier
RemovePrefix prefix
RemoveSuffix suffix
RemoveText text
RemoveIdentifier identifier
``````

### #if directive

Conditional code can be defined with the following syntax :

``````#if boolean expression
#if boolean expression
...
#else
...
#end
#else
#if boolean expression
...
#else
...
#end
#end
``````

The boolean expression can use the following operators :

``````false
true
!expression
expression && expression
expression || expression
( expression )
``````

### #write directive

Templated HTML code can be sent to a stream writer using the following syntax :

``````#write writer expression
<% code %>
<%@ natural expression %>
<%# integer expression %>
<%& real expression %>
<%~ text expression %>
<%= escaped text expression %>
<%! removed content %>
<%% ignored tags %%>
#end
``````

## Limitations

• There is no operator precedence in boolean expressions.
• The `--join` option requires to end the statements with a semicolon.
• The `#writer` directive is only available for the Go language.

## Installation

Install the DMD 2 compiler (using the MinGW setup option on Windows).

Build the executable with the following command line :

``````dmd -m64 generis.d
``````

## Command line

``````generis [options]
``````

### Options

``````--prefix # : set the command prefix
--parse INPUT_FOLDER/ : parse the definitions of the Generis files in the input folder
--process INPUT_FOLDER/ OUTPUT_FOLDER/ : reads the Generis files in the input folder and writes the processed files in the output folder
--trim : trim the HTML templates
--join : join the split statements
--create : create the output folders if needed
--watch : watch the Generis files for modifications
--pause 500 : time to wait before checking the Generis files again
--tabulation 4 : set the tabulation space count
--extension .go : generate files with this extension
``````

### Examples

``````generis --process GS/ GO/
``````

Reads the Generis files in the `GS/` folder and writes Go files in the `GO/` folder.

``````generis --process GS/ GO/ --create
``````

Reads the Generis files in the `GS/` folder and writes Go files in the `GO/` folder, creating the output folders if needed.

``````generis --process GS/ GO/ --create --watch
``````

Reads the Generis files in the `GS/` folder and writes Go files in the `GO/` folder, creating the output folders if needed and watching the Generis files for modifications.

``````generis --process GS/ GO/ --trim --join --create --watch
``````

Reads the Generis files in the `GS/` folder and writes Go files in the `GO/` folder, trimming the HTML templates, joining the split statements, creating the output folders if needed and watching the Generis files for modifications.

## Version

2.0

Author: Senselogic
Source Code: https://github.com/senselogic/GENERIS

1651546800

## Klib | A Standalone and Lightweight C Library

Klib is a standalone and lightweight C library distributed under MIT/X11 license. Most components are independent of external libraries, except the standard C library, and independent of each other. To use a component of this library, you only need to copy a couple of files to your source code tree without worrying about library dependencies.

Klib strives for efficiency and a small memory footprint. Some components, such as khash.h, kbtree.h, ksort.h and kvec.h, are among the most efficient implementations of similar algorithms or data structures in all programming languages, in terms of both speed and memory use.

A new documentation is available here which includes most information in this README file.

## Methodology

For the implementation of generic containers, klib extensively uses C macros. To use these data structures, we usually need to instantiate methods by expanding a long macro. This makes the source code look unusual or even ugly and adds difficulty to debugging. Unfortunately, for efficient generic programming in C that lacks template, using macros is the only solution. Only with macros, we can write a generic container which, once instantiated, compete with a type-specific container in efficiency. Some generic libraries in C, such as Glib, use the `void*` type to implement containers. These implementations are usually slower and use more memory than klib (see this benchmark).

To effectively use klib, it is important to understand how it achieves generic programming. We will use the hash table library as an example:

``````#include "khash.h"
KHASH_MAP_INIT_INT(m32, char)        // instantiate structs and methods
int main() {
int ret, is_missing;
khint_t k;
khash_t(m32) *h = kh_init(m32);  // allocate a hash table
k = kh_put(m32, h, 5, &ret);     // insert a key to the hash table
if (!ret) kh_del(m32, h, k);
kh_value(h, k) = 10;             // set the value
k = kh_get(m32, h, 10);          // query the hash table
is_missing = (k == kh_end(h));   // test if the key is present
k = kh_get(m32, h, 5);
kh_del(m32, h, k);               // remove a key-value pair
for (k = kh_begin(h); k != kh_end(h); ++k)  // traverse
if (kh_exist(h, k))          // test if a bucket contains data
kh_value(h, k) = 1;
kh_destroy(m32, h);              // deallocate the hash table
return 0;
}
``````

In this example, the second line instantiates a hash table with `unsigned` as the key type and `char` as the value type. `m32` names such a type of hash table. All types and functions associated with this name are macros, which will be explained later. Macro `kh_init()` initiates a hash table and `kh_destroy()` frees it. `kh_put()` inserts a key and returns the iterator (or the position) in the hash table. `kh_get()` and `kh_del()` get a key and delete an element, respectively. Macro `kh_exist()` tests if an iterator (or a position) is filled with data.

An immediate question is this piece of code does not look like a valid C program (e.g. lacking semicolon, assignment to an apparent function call and apparent undefined `m32` 'variable'). To understand why the code is correct, let's go a bit further into the source code of `khash.h`, whose skeleton looks like:

``````#define KHASH_INIT(name, SCOPE, key_t, val_t, is_map, _hashf, _hasheq) \
typedef struct { \
int n_buckets, size, n_occupied, upper_bound; \
unsigned *flags; \
key_t *keys; \
val_t *vals; \
} kh_##name##_t; \
SCOPE inline kh_##name##_t *init_##name() { \
return (kh_##name##_t*)calloc(1, sizeof(kh_##name##_t)); \
} \
SCOPE inline int get_##name(kh_##name##_t *h, key_t k) \
... \
SCOPE inline void destroy_##name(kh_##name##_t *h) { \
if (h) { \
free(h->keys); free(h->flags); free(h->vals); free(h); \
} \
}

#define _int_hf(key) (unsigned)(key)
#define _int_heq(a, b) (a == b)
#define khash_t(name) kh_##name##_t
#define kh_value(h, k) ((h)->vals[k])
#define kh_begin(h, k) 0
#define kh_end(h) ((h)->n_buckets)
#define kh_init(name) init_##name()
#define kh_get(name, h, k) get_##name(h, k)
#define kh_destroy(name, h) destroy_##name(h)
...
#define KHASH_MAP_INIT_INT(name, val_t) \
KHASH_INIT(name, static, unsigned, val_t, is_map, _int_hf, _int_heq)
``````

`KHASH_INIT()` is a huge macro defining all the structs and methods. When this macro is called, all the code inside it will be inserted by the C preprocess to the place where it is called. If the macro is called multiple times, multiple copies of the code will be inserted. To avoid naming conflict of hash tables with different key-value types, the library uses token concatenation, which is a preprocessor feature whereby we can substitute part of a symbol based on the parameter of the macro. In the end, the C preprocessor will generate the following code and feed it to the compiler (macro `kh_exist(h,k)` is a little complex and not expanded for simplicity):

``````typedef struct {
int n_buckets, size, n_occupied, upper_bound;
unsigned *flags;
unsigned *keys;
char *vals;
} kh_m32_t;
static inline kh_m32_t *init_m32() {
return (kh_m32_t*)calloc(1, sizeof(kh_m32_t));
}
static inline int get_m32(kh_m32_t *h, unsigned k)
...
static inline void destroy_m32(kh_m32_t *h) {
if (h) {
free(h->keys); free(h->flags); free(h->vals); free(h);
}
}

int main() {
int ret, is_missing;
khint_t k;
kh_m32_t *h = init_m32();
k = put_m32(h, 5, &ret);
if (!ret) del_m32(h, k);
h->vals[k] = 10;
k = get_m32(h, 10);
is_missing = (k == h->n_buckets);
k = get_m32(h, 5);
del_m32(h, k);
for (k = 0; k != h->n_buckets; ++k)
if (kh_exist(h, k)) h->vals[k] = 1;
destroy_m32(h);
return 0;
}
``````

This is the C program we know.

From this example, we can see that macros and the C preprocessor plays a key role in klib. Klib is fast partly because the compiler knows the key-value type at the compile time and is able to optimize the code to the same level as type-specific code. A generic library written with `void*` will not get such performance boost.

Massively inserting code upon instantiation may remind us of C++'s slow compiling speed and huge binary size when STL/boost is in use. Klib is much better in this respect due to its small code size and component independency. Inserting several hundreds lines of code won't make compiling obviously slower.

## Resources

• Library documentation, if present, is available in the header files. Examples can be found in the test/ directory.
• Obsolete documentation of the hash table library can be found at SourceForge. This README is partly adapted from the old documentation.
• Blog post describing the hash table library.
• Blog post on why using `void*` for generic programming may be inefficient.
• Blog post on the generic stream buffer.
• Blog post evaluating the performance of `kvec.h`.
• Blog post arguing B-tree may be a better data structure than a binary search tree.
• Blog post evaluating the performance of `khash.h` and `kbtree.h` among many other implementations. An older version of the benchmark is also available.
• Blog post benchmarking internal sorting algorithms and implementations.
• Blog post on the k-small algorithm.
• Blog post on the Hooke-Jeeve's algorithm for nonlinear programming.

Author: attractivechaos
Official Website: https://github.com/attractivechaos/klib

#cprogramming #c

1661092140

## ud.hpp

``````#pragma once
#include <optional>
#include <string>
#include <vector>
#include <array>
#include <algorithm>
#include <string_view>
#include <fstream>
#include <unordered_map>

#include <Windows.h>
#include <winternl.h>

#if defined(_MSC_VER)
#define UD_FORCEINLINE __forceinline
#pragma warning( push )
#pragma warning( disable : 4244 4083 )
#else
#define UD_FORCEINLINE __attribute__( ( always_inline ) )
#endif

#define ud_encode_c( str ) ud::rot::decode( ud::rot::rot_t<str>{ } ).data
#define ud_encode( str ) std::string_view( ud::rot::decode( ud::rot::rot_t<str>{ } ) )

#define ud_xorstr_c( str ) ud::xorstr::decrypt( ud::xorstr::xorstr_t< str, __COUNTER__ + 1 ^ 0x90 >{ } ).data
#define ud_xorstr( str ) std::string_view{ ud::xorstr::decrypt( ud::xorstr::xorstr_t< str, __COUNTER__ + 1 ^ 0x90 >{ } ) }

#define ud_stack_str( str ) ud::details::comp_string_t{ str }.data

#define ud_import( mod, func )	reinterpret_cast< decltype( &func ) >( ud::lazy_import::find_module_export< TEXT( mod ), #func >( ) )
#define ud_first_import( func ) reinterpret_cast< decltype( &func ) >( ud::lazy_import::find_first_export< #func >( ) )

// preprocessed settings due to MSVC (not clang or gcc) throwing errors even in `if constexpr` bodies
#define UD_USE_SEH false

namespace ud
{
namespace details
{
struct LDR_DATA_TABLE_ENTRY32
{

std::uintptr_t dll_base;
std::uintptr_t entry_point;
std::size_t size_of_image;

UNICODE_STRING full_name;
UNICODE_STRING base_name;
};

struct LDR_DATA_TABLE_ENTRY64
{
LIST_ENTRY dummy_0;
LIST_ENTRY dummy_1;

std::uintptr_t dll_base;
std::uintptr_t entry_point;
union {
unsigned long size_of_image;
const char* _dummy;
};

UNICODE_STRING full_name;
UNICODE_STRING base_name;
};

#if defined( _M_X64 )
using LDR_DATA_TABLE_ENTRY = LDR_DATA_TABLE_ENTRY64;
#else
using LDR_DATA_TABLE_ENTRY = LDR_DATA_TABLE_ENTRY32;
#endif

template < std::size_t sz >
struct comp_string_t
{
std::size_t size = sz;
char data[ sz ]{ };

comp_string_t( ) = default;
consteval explicit comp_string_t( const char( &str )[ sz ] )
{
std::copy_n( str, sz, data );
}

constexpr explicit operator std::string_view( ) const
{
return { data, size };
}
};

template < std::size_t sz >
struct wcomp_string_t
{
std::size_t size = sz;
wchar_t data[ sz ]{ };

wcomp_string_t( ) = default;
consteval explicit wcomp_string_t( const wchar_t( &str )[ sz ] )
{
std::copy_n( str, sz, data );
}

constexpr explicit operator std::wstring_view( ) const
{
return { data, size };
}
};

inline constexpr std::uint64_t multiplier = 0x5bd1e995;
inline consteval std::uint64_t get_seed( )
{
constexpr auto time_str = __TIME__;
constexpr auto time_len = sizeof( __TIME__ ) - 1;

constexpr auto time_int = [ ] ( const char* const str, const std::size_t len )
{
auto res = 0ull;
for ( auto i = 0u; i < len; ++i )
if ( str[ i ] >= '0' && str[ i ] <= '9' )
res = res * 10 + str[ i ] - '0';

return res;
}( time_str, time_len );

return time_int;
}

template < auto v >
struct constant_t
{
enum : decltype( v )
{
value = v
};
};

template < auto v >
inline constexpr auto constant_v = constant_t< v >::value;

#undef max
#undef min

template < std::uint32_t seq >
consteval std::uint64_t recursive_random( )
{
constexpr auto seed = get_seed( );
constexpr auto mask = std::numeric_limits< std::uint64_t >::max( );

constexpr auto x = ( ( seq * multiplier ) + seed ) & mask;
constexpr auto x_prime = ( x >> 0x10 ) | ( x << 0x10 );

return constant_v< x_prime >;
}
}

namespace rot
{
template < details::comp_string_t str >
struct rot_t
{
char rotted[ str.size ];

[[nodiscard]] consteval const char* encoded( ) const
{
return rotted;
}

consteval rot_t( )
{
for ( auto i = 0u; i < str.size; ++i )
{
const auto c = str.data[ i ];
const auto set = c >= 'A' && c <= 'Z' ? 'A' : c >= 'a' && c <= 'z' ? 'a' : c;

if ( set == 'a' || set == 'A' )
rotted[ i ] = ( c - set - 13 + 26 ) % 26 + set;

else
rotted[ i ] = c;
}
}
};

template < details::comp_string_t str >
UD_FORCEINLINE details::comp_string_t< str.size > decode( rot_t< str > encoded )
{
details::comp_string_t< str.size > result{ };

for ( auto i = 0u; i < str.size; ++i )
{
const auto c = encoded.rotted[ i ];
const auto set = c >= 'A' && c <= 'Z' ? 'A' : c >= 'a' && c <= 'z' ? 'a' : c;

if ( set == 'a' || set == 'A' )
result.data[ i ] = ( c - set - 13 + 26 ) % 26 + set;

else
result.data[ i ] = c;
}

return result;
}
}

namespace fnv
{
inline constexpr std::uint32_t fnv_1a( const char* const str, const std::size_t size )
{
constexpr auto prime = 16777619u;

std::uint32_t hash = 2166136261;

for ( auto i = 0u; i < size; ++i )
{
hash ^= str[ i ];
hash *= prime;
}

return hash;
}

inline constexpr std::uint32_t fnv_1a( const wchar_t* const str, const std::size_t size )
{
constexpr auto prime = 16777619u;

std::uint32_t hash = 2166136261;

for ( auto i = 0u; i < size; ++i )
{
hash ^= static_cast< char >( str[ i ] );
hash *= prime;
}

return hash;
}

inline constexpr std::uint32_t fnv_1a( const std::wstring_view str )
{
return fnv_1a( str.data( ), str.size( ) );
}

inline constexpr std::uint32_t fnv_1a( const std::string_view str )
{
return fnv_1a( str.data( ), str.size( ) );
}

template < details::comp_string_t str >
consteval std::uint32_t fnv_1a( )
{
return fnv_1a( str.data, str.size );
}

template < details::wcomp_string_t str >
consteval std::uint32_t fnv_1a( )
{
return fnv_1a( str.data, str.size );
}
}

namespace xorstr
{
template < details::comp_string_t str, std::uint32_t key_multiplier >
struct xorstr_t
{
char xored[ str.size ];

[[nodiscard]] consteval std::uint64_t xor_key( ) const
{
return details::recursive_random< key_multiplier >( );
}

consteval xorstr_t( )
{
for ( auto i = 0u; i < str.size; ++i )
xored[ i ] = str.data[ i ] ^ xor_key( );
}
};

template < details::comp_string_t str, std::uint32_t key_multiplier >
UD_FORCEINLINE details::comp_string_t< str.size > decrypt( xorstr_t< str, key_multiplier > enc )
{
details::comp_string_t< str.size > result{ };

for ( auto i = 0u; i < str.size; ++i )
{
const auto c = enc.xored[ i ];

result.data[ i ] = c ^ enc.xor_key( );
}

return result;
}
}

namespace lazy_import
{
UD_FORCEINLINE std::uintptr_t get_module_handle( const std::uint64_t hash )
{
#if defined( _M_X64 )
const auto peb = reinterpret_cast< const PEB* >( __readgsqword( 0x60 ) );
#else
const auto peb = reinterpret_cast< const PEB* >( __readfsdword( 0x30 ) );
#endif

const auto modules = reinterpret_cast< const LIST_ENTRY* >( peb->Ldr->InMemoryOrderModuleList.Flink );

for ( auto i = modules->Flink; i != modules; i = i->Flink )
{
const auto entry = reinterpret_cast< const details::LDR_DATA_TABLE_ENTRY* >( i );

const auto name = entry->base_name.Buffer;
const auto len = entry->base_name.Length;

if ( fnv::fnv_1a( static_cast< const wchar_t* >( name ), len ) == hash )
return entry->dll_base;
}

return 0;
}

UD_FORCEINLINE void* find_primitive_export( const std::uint64_t dll_hash, const std::uint64_t function_hash )
{
const auto module = get_module_handle( dll_hash );

if ( !module )
return nullptr;

const auto dos = reinterpret_cast< const IMAGE_DOS_HEADER* >( module );
const auto nt = reinterpret_cast< const IMAGE_NT_HEADERS* >( module + dos->e_lfanew );

const auto names = reinterpret_cast< const std::uint32_t* >( module + exports->AddressOfNames );
const auto ordinals = reinterpret_cast< const std::uint16_t* >( module + exports->AddressOfNameOrdinals );
const auto functions = reinterpret_cast< const std::uint32_t* >( module + exports->AddressOfFunctions );

for ( auto i = 0u; i < exports->NumberOfNames; ++i )
{
const auto name = reinterpret_cast< const char* >( module + names[ i ] );
std::size_t len = 0;

for ( ; name[ len ]; ++len );

if ( fnv::fnv_1a( name, len ) == function_hash )
return reinterpret_cast< void* >( module + functions[ ordinals[ i ] ] );
}

return nullptr;
}

template < details::wcomp_string_t dll_name, details::comp_string_t function_name >
UD_FORCEINLINE void* find_module_export( )
{
return find_primitive_export( fnv::fnv_1a< dll_name >( ), fnv::fnv_1a< function_name >( ) );
}

template < details::comp_string_t function_name >
UD_FORCEINLINE void* find_first_export( )
{
constexpr auto function_hash = fnv::fnv_1a< function_name >( );

#if defined( _M_X64 )
const auto peb = reinterpret_cast< const PEB* >( __readgsqword( 0x60 ) );
#else
const auto peb = reinterpret_cast< const PEB* >( __readfsdword( 0x30 ) );
#endif

const auto modules = reinterpret_cast< const LIST_ENTRY* >( peb->Ldr->InMemoryOrderModuleList.Flink );

for ( auto i = modules->Flink; i != modules; i = i->Flink )
{
const auto entry = reinterpret_cast< const details::LDR_DATA_TABLE_ENTRY* >( i );

const auto name = entry->base_name.Buffer;
std::size_t len = 0;

if ( !name )
continue;

for ( ; name[ len ]; ++len );

if ( const auto exp = find_primitive_export( fnv::fnv_1a( name, len ), function_hash ) )
return exp;
}

return nullptr;
}
}

template < typename ty = std::uintptr_t >
std::optional< ty > find_pattern_primitive( const std::uintptr_t start, const std::uintptr_t end, const std::string_view pattern )
{
std::vector< std::pair< bool, std::uint8_t > > bytes;

for ( auto it = pattern.begin( ); it != pattern.end( ); ++it )
{
if ( *it == ' ' )
continue;

else if ( *it == '?' )
{
if ( it + 1 < pattern.end( ) && *( it + 1 ) == '?' )
{
bytes.push_back( { true, 0x00 } );
++it;
}

else
bytes.push_back( { false, 0x00 } );
}

else
{
if ( it + 1 == pattern.end( ) )
break;

const auto get_byte = [ ] ( const std::string& x ) -> std::uint8_t
{
return static_cast< std::uint8_t >( std::stoul( x, nullptr, 16 ) );
};

bytes.emplace_back( false, get_byte( std::string( it - 1, ( ++it ) + 1 ) ) );
}
}

for ( auto i = reinterpret_cast< const std::uint8_t* >( start ); i < reinterpret_cast< const std::uint8_t* >( end ); )
{
auto found = true;
for ( const auto& [ is_wildcard, byte ] : bytes )
{
++i;

if ( is_wildcard )
continue;

if ( *i != byte )
{
found = false;
break;
}
}

if ( found )
return ty( i - bytes.size( ) + 1 );
}

return std::nullopt;
}

struct segment_t
{
std::string_view name = "";
std::uintptr_t start{ }, end{ };
std::size_t size{ };

template < typename ty = std::uintptr_t >
std::optional< ty > find_pattern( const std::string_view pattern ) const
{
return find_pattern_primitive< ty >( start, end, pattern );
}

explicit segment_t( const std::string_view segment_name )
{
init( GetModuleHandle( nullptr ), segment_name );
}

segment_t( const void* const module, const std::string_view segment_name )
{
init( module, segment_name );
}

segment_t( const void* const handle, const IMAGE_SECTION_HEADER* section )
{
init( handle, section );
}

private:
void init( const void* const handle, const IMAGE_SECTION_HEADER* section )
{
name = std::string_view( reinterpret_cast< const char* >( section->Name ), 8 );
start = reinterpret_cast< std::uintptr_t >( handle ) + section->VirtualAddress;
end = start + section->Misc.VirtualSize;
size = section->Misc.VirtualSize;
}

void init( const void* const handle, const std::string_view segment_name )
{
const auto dos = reinterpret_cast< const IMAGE_DOS_HEADER* >( handle );
const auto nt = reinterpret_cast< const IMAGE_NT_HEADERS* >( reinterpret_cast< const std::uint8_t* >( handle ) + dos->e_lfanew );

for ( auto i = 0u; i < nt->FileHeader.NumberOfSections; ++i )
{
if ( std::string_view( reinterpret_cast< const char* >( section[ i ].Name ), 8 ).find( segment_name ) != std::string_view::npos )
{
start = reinterpret_cast< std::uintptr_t >( handle ) + section[ i ].VirtualAddress;
end = start + section[ i ].Misc.VirtualSize;
size = section[ i ].Misc.VirtualSize;
name = segment_name;
return;
}
}
}
};

#pragma code_seg( push, ".text" )
template < auto... bytes>
struct shellcode_t
{
static constexpr std::size_t size = sizeof...( bytes );
__declspec( allocate( ".text" ) ) static constexpr std::uint8_t data[ ]{ bytes... };
};
#pragma code_seg( pop )

template < typename ty, auto... bytes >
constexpr ty make_shellcode( )
{
return reinterpret_cast< const ty >( &shellcode_t< bytes... >::data );
}

template < std::uint8_t... bytes >
UD_FORCEINLINE constexpr void emit( )
{
#if defined( __clang__ ) || defined( __GNUC__ )
constexpr std::uint8_t data[ ]{ bytes... };

for ( auto i = 0u; i < sizeof...( bytes ); ++i )
__asm volatile( ".byte %c0\t\n" :: "i" ( data[ i ] ) );
#endif
}

template < std::size_t size, std::uint32_t seed = __COUNTER__ + 0x69, std::size_t count = 0 >
UD_FORCEINLINE constexpr void emit_random( )
{
if constexpr ( count < size )
{
constexpr auto random = details::recursive_random< seed >( );
emit< static_cast< std::uint8_t >( random ) >( );
emit_random< size, static_cast< std::uint32_t >( random )* seed, count + 1 >( );
}
}

inline bool is_valid_page( const void* const data, const std::uint32_t flags = PAGE_READWRITE )
{
MEMORY_BASIC_INFORMATION mbi{ };

if ( !VirtualQuery( data, &mbi, sizeof( mbi ) ) )
return false;

return mbi.Protect & flags;
}

struct export_t
{
std::string_view name;
std::uint16_t ordinal{ };
};

struct module_t
{
std::string name;
std::uintptr_t start, end;

segment_t operator[ ]( const std::string_view segment_name ) const
{
return { reinterpret_cast< const void* >( start ), segment_name };
}

std::vector< export_t > get_exports( ) const
{
const auto dos = reinterpret_cast< const IMAGE_DOS_HEADER* >( start );
const auto nt = reinterpret_cast< const IMAGE_NT_HEADERS* >( start + dos->e_lfanew );

return { };

const auto export_dir = reinterpret_cast< const IMAGE_EXPORT_DIRECTORY* >( start + directory_header.VirtualAddress );
const auto name_table = reinterpret_cast< const std::uint32_t* >( start + export_dir->AddressOfNames );
const auto ord_table = reinterpret_cast< const std::uint16_t* >( start + export_dir->AddressOfNameOrdinals );
const auto addr_table = reinterpret_cast< const std::uint32_t* >( start + export_dir->AddressOfFunctions );

std::vector< export_t > exports( export_dir->NumberOfNames );

for ( auto i = 0u; i < export_dir->NumberOfNames; ++i )
{
const auto name_str = reinterpret_cast< const char* >( start + name_table[ i ] );
const auto ord = ord_table[ i ];

exports[ i ] = { name_str, ord, addr };
}

return exports;
}

[[nodiscard]] std::vector< segment_t > get_segments( ) const
{
const auto dos = reinterpret_cast< const IMAGE_DOS_HEADER* >( start );
const auto nt = reinterpret_cast< const IMAGE_NT_HEADERS* >( start + dos->e_lfanew );

std::vector< segment_t > segments;

for ( auto i = 0u; i < nt->FileHeader.NumberOfSections; ++i )
{
const segment_t seg( dos, &section[ i ] );
segments.push_back( seg );
}

return segments;
}

[[nodiscard]] std::vector< export_t > get_imports( ) const
{
const auto dos = reinterpret_cast< const IMAGE_DOS_HEADER* >( start );
const auto nt = reinterpret_cast< const IMAGE_NT_HEADERS* >( start + dos->e_lfanew );

return { };

const auto import_dir = reinterpret_cast< const IMAGE_IMPORT_DESCRIPTOR* >( start + directory_header->VirtualAddress );
std::vector< export_t > imports;

for ( auto i = 0u;; ++i )
{
if ( !import_dir[ i ].OriginalFirstThunk )
break;

const auto directory = &import_dir[ i ];

const auto name_table = reinterpret_cast< const std::uint32_t* >( start + directory->OriginalFirstThunk );
const auto addr_table = reinterpret_cast< const std::uint32_t* >( start + directory->FirstThunk );

for ( auto j = 0u;; ++j )
{
if ( !addr_table[ j ] )
break;

if ( !name_table[ j ] )
continue;

std::string_view name_str;

constexpr auto name_alignment = 2;

const auto name_ptr = reinterpret_cast< const char* >( start + name_table[ j ] ) + name_alignment;

#if UD_USE_SEH
// using SEH here is not a very good solution
// however, it's faster than querying that page protection to see if it's readable
__try
{
name = name_ptr;
}
__except ( EXCEPTION_EXECUTE_HANDLER )
{
name = "";
}
#else
// runtime overhead of ~3us compared to SEH on single calls
// on bulk calls it can go up to ~300-500us
name_str = is_valid_page( name_ptr, PAGE_READONLY ) ? name_ptr : "";
#endif

// emplace_back doesn't allow for implicit conversion, so we have to do it manually
imports.push_back( { name_str, static_cast< std::uint16_t >( j ), reinterpret_cast< std::uintptr_t >( addr ) } );
}
}

return imports;
}

template < typename ty = std::uintptr_t >
ty get_address( const std::string_view name ) const
{
for ( const auto& export_ : get_exports( ) )
{
if ( export_.name.find( name ) != std::string_view::npos )
}

return 0;
}

template < typename ty = std::uintptr_t >
std::optional< ty > find_pattern( const std::string_view pattern ) const
{
return find_pattern_primitive< ty >( start, end, pattern );
}

[[nodiscard]] std::vector< std::string_view > get_strings( const std::size_t minimum_size = 0 ) const
{
std::vector< std::string_view > result;

const auto rdata = ( *this )[ ".rdata" ];

if ( !rdata.size )
return { };

const auto start = reinterpret_cast< const std::uint8_t* >( rdata.start );
const auto end = reinterpret_cast< const std::uint8_t* >( rdata.end );

for ( auto i = start; i < end; ++i )
{
if ( *i == 0 || *i > 127 )
continue;

const auto str = reinterpret_cast< const char* >( i );
const auto sz = std::strlen( str );

if ( !sz || sz < minimum_size )
continue;

result.emplace_back( str, sz );
i += sz;
}

return result;
}

module_t( )
{
init( GetModuleHandle( nullptr ) );
}

explicit module_t( void* const handle )
{
init( handle );
}

explicit module_t( const std::string_view module_name )
{
init( GetModuleHandleA( module_name.data( ) ) );
}

private:
void* module;

void init( void* const handle )
{
module = handle;

const auto dos = reinterpret_cast< const IMAGE_DOS_HEADER* >( handle );
const auto nt = reinterpret_cast< const IMAGE_NT_HEADERS* >( reinterpret_cast< const std::uint8_t* >( handle ) + dos->e_lfanew );

start = reinterpret_cast< std::uintptr_t >( handle );

char buffer[ MAX_PATH ];
const auto sz = GetModuleFileNameA( static_cast< HMODULE >( handle ), buffer, MAX_PATH );

name = sz ? std::string{ buffer, sz } : std::string{ };
}
};

inline std::vector< module_t > get_modules( )
{
std::vector< module_t > result;

#if defined( _M_X64 )
const auto peb = reinterpret_cast< const PEB* >( __readgsqword( 0x60 ) );
#else
const auto peb = reinterpret_cast< const PEB* >( __readfsdword( 0x30 ) );
#endif

const auto modules = reinterpret_cast< const LIST_ENTRY* >( peb->Ldr->InMemoryOrderModuleList.Flink );
for ( auto i = modules->Flink; i != modules; i = i->Flink )
{
const auto entry = reinterpret_cast< const LDR_DATA_TABLE_ENTRY* >( i );

if ( entry->Reserved2[ 0 ] || entry->DllBase )
result.emplace_back( entry->Reserved2[ 0 ] ? entry->Reserved2[ 0 ] : entry->DllBase );
}

return result;
}

{
for ( const auto& module : get_modules( ) )
{
return module;
}

return std::nullopt;
}

inline std::optional< export_t > get_export( const std::uintptr_t address )
{
for ( const auto& module : get_modules( ) )
{
{
const auto exports = module.get_exports( );
for ( const auto& export_ : exports )
{
return export_;
}
}
}

return std::nullopt;
}

template < typename rel_t, typename ty = std::uintptr_t >
ty calculate_relative( const std::uintptr_t address, const std::uint8_t size, const std::uint8_t offset )
{
return ty( address + *reinterpret_cast< rel_t* >( address + offset ) + size );
}
}

template < std::size_t size >
UD_FORCEINLINE std::ostream& operator<<( std::ostream& os, const ud::details::comp_string_t< size >& str )
{
return os << std::string_view{ str.data, str.size };
}

#if defined( _MSC_VER )
#pragma warning( pop )
#endif``````

Author: AmJayden
Source code: https://github.com/AmJayden/udlib

#cpluplus

1659817260

## Overview

The AWS IoT Device SDK for Embedded C (C-SDK) is a collection of C source files under the MIT open source license that can be used in embedded applications to securely connect IoT devices to AWS IoT Core. It contains MQTT client, HTTP client, JSON Parser, AWS IoT Device Shadow, AWS IoT Jobs, and AWS IoT Device Defender libraries. This SDK is distributed in source form, and can be built into customer firmware along with application code, other libraries and an operating system (OS) of your choice. These libraries are only dependent on standard C libraries, so they can be ported to various OS's - from embedded Real Time Operating Systems (RTOS) to Linux/Mac/Windows. You can find sample usage of C-SDK libraries on POSIX systems using OpenSSL (e.g. Linux demos in this repository), and on FreeRTOS using mbedTLS (e.g. FreeRTOS demos in FreeRTOS repository).

For the latest release of C-SDK, please see the section for Releases and Documentation.

C-SDK includes libraries that are part of the FreeRTOS 202012.01 LTS release. Learn more about the FreeRTOS 202012.01 LTS libraries by clicking here.

### Features

C-SDK simplifies access to various AWS IoT services. C-SDK has been tested to work with AWS IoT Core and an open source MQTT broker to ensure interoperability. The AWS IoT Device Shadow, AWS IoT Jobs, and AWS IoT Device Defender libraries are flexible to work with any MQTT client and JSON parser. The MQTT client and JSON parser libraries are offered as choices without being tightly coupled with the rest of the SDK. C-SDK contains the following libraries:

#### coreMQTT

The coreMQTT library provides the ability to establish an MQTT connection with a broker over a customer-implemented transport layer, which can either be a secure channel like a TLS session (mutually authenticated or server-only authentication) or a non-secure channel like a plaintext TCP connection. This MQTT connection can be used for performing publish operations to MQTT topics and subscribing to MQTT topics. The library provides a mechanism to register customer-defined callbacks for receiving incoming PUBLISH, acknowledgement and keep-alive response events from the broker. The library has been refactored for memory optimization and is compliant with the MQTT 3.1.1 standard. It has no dependencies on any additional libraries other than the standard C library, a customer-implemented network transport interface, and optionally a customer-implemented platform time function. The refactored design embraces different use-cases, ranging from resource-constrained platforms using only QoS 0 MQTT PUBLISH messages to resource-rich platforms using QoS 2 MQTT PUBLISH over TLS connections.

See memory requirements for the latest release here.

#### coreHTTP

The coreHTTP library provides the ability to establish an HTTP connection with a server over a customer-implemented transport layer, which can either be a secure channel like a TLS session (mutually authenticated or server-only authentication) or a non-secure channel like a plaintext TCP connection. The HTTP connection can be used to make "GET" (include range requests), "PUT", "POST" and "HEAD" requests. The library provides a mechanism to register a customer-defined callback for receiving parsed header fields in an HTTP response. The library has been refactored for memory optimization, and is a client implementation of a subset of the HTTP/1.1 standard.

See memory requirements for the latest release here.

#### coreJSON

The coreJSON library is a JSON parser that strictly enforces the ECMA-404 JSON standard. It provides a function to validate a JSON document, and a function to search for a key and return its value. A search can descend into nested structures using a compound query key. A JSON document validation also checks for illegal UTF8 encodings and illegal Unicode escape sequences.

See memory requirements for the latest release here.

#### corePKCS11

The corePKCS11 library is an implementation of the PKCS #11 interface (API) that makes it easier to develop applications that rely on cryptographic operations. Only a subset of the PKCS #11 v2.4 standard has been implemented, with a focus on operations involving asymmetric keys, random number generation, and hashing.

The Cryptoki or PKCS #11 standard defines a platform-independent API to manage and use cryptographic tokens. The name, "PKCS #11", is used interchangeably to refer to the API itself and the standard which defines it.

The PKCS #11 API is useful for writing software without taking a dependency on any particular implementation or hardware. By writing against the PKCS #11 standard interface, code can be used interchangeably with multiple algorithms, implementations and hardware.

Generally vendors for secure cryptoprocessors such as Trusted Platform Module (TPM), Hardware Security Module (HSM), Secure Element, or any other type of secure hardware enclave, distribute a PKCS #11 implementation with the hardware. The purpose of corePKCS11 mock is therefore to provide a PKCS #11 implementation that allows for rapid prototyping and development before switching to a cryptoprocessor specific PKCS #11 implementation in production devices.

Since the PKCS #11 interface is defined as part of the PKCS #11 specification replacing corePKCS11 with another implementation should require little porting effort, as the interface will not change. The system tests distributed in corePKCS11 repository can be leveraged to verify the behavior of a different implementation is similar to corePKCS11.

See memory requirements for the latest release here.

The AWS IoT Device Shadow library enables you to store and retrieve the current state one or more shadows of every registered device. A device’s shadow is a persistent, virtual representation of your device that you can interact with from AWS IoT Core even if the device is offline. The device state is captured in its "shadow" is represented as a JSON document. The device can send commands over MQTT to get, update and delete its latest state as well as receive notifications over MQTT about changes in its state. The device’s shadow(s) are uniquely identified by the name of the corresponding "thing", a representation of a specific device or logical entity on the AWS Cloud. See Managing Devices with AWS IoT for more information on IoT "thing". This library supports named shadows, a feature of the AWS IoT Device Shadow service that allows you to create multiple shadows for a single IoT device. More details about AWS IoT Device Shadow can be found in AWS IoT documentation.

The AWS IoT Device Shadow library has no dependencies on additional libraries other than the standard C library. It also doesn’t have any platform dependencies, such as threading or synchronization. It can be used with any MQTT library and any JSON library (see demos with coreMQTT and coreJSON).

See memory requirements for the latest release here.

#### AWS IoT Jobs

The AWS IoT Jobs library enables you to interact with the AWS IoT Jobs service which notifies one or more connected devices of a pending “Job”. A Job can be used to manage your fleet of devices, update firmware and security certificates on your devices, or perform administrative tasks such as restarting devices and performing diagnostics. For documentation of the service, please see the AWS IoT Developer Guide. Interactions with the Jobs service use the MQTT protocol. This library provides an API to compose and recognize the MQTT topic strings used by the Jobs service.

The AWS IoT Jobs library has no dependencies on additional libraries other than the standard C library. It also doesn’t have any platform dependencies, such as threading or synchronization. It can be used with any MQTT library and any JSON library (see demos with libmosquitto and coreJSON).

See memory requirements for the latest release here.

#### AWS IoT Device Defender

The AWS IoT Device Defender library enables you to interact with the AWS IoT Device Defender service to continuously monitor security metrics from devices for deviations from what you have defined as appropriate behavior for each device. If something doesn’t look right, AWS IoT Device Defender sends out an alert so you can take action to remediate the issue. More details about Device Defender can be found in AWS IoT Device Defender documentation. This library supports custom metrics, a feature that helps you monitor operational health metrics that are unique to your fleet or use case. For example, you can define a new metric to monitor the memory usage or CPU usage on your devices.

The AWS IoT Device Defender library has no dependencies on additional libraries other than the standard C library. It also doesn’t have any platform dependencies, such as threading or synchronization. It can be used with any MQTT library and any JSON library (see demos with coreMQTT and coreJSON).

See memory requirements for the latest release here.

#### AWS IoT Over-the-air Update

The AWS IoT Over-the-air Update (OTA) library enables you to manage the notification of a newly available update, download the update, and perform cryptographic verification of the firmware update. Using the OTA library, you can logically separate firmware updates from the application running on your devices. You can also use the library to send other files (e.g. images, certificates) to one or more devices registered with AWS IoT. More details about OTA library can be found in AWS IoT Over-the-air Update documentation.

The AWS IoT Over-the-air Update library has a dependency on coreJSON for parsing of JSON job document and tinyCBOR for decoding encoded data streams, other than the standard C library. It can be used with any MQTT library, HTTP library, and operating system (e.g. Linux, FreeRTOS) (see demos with coreMQTT and coreHTTP over Linux).

See memory requirements for the latest release here.

#### AWS IoT Fleet Provisioning

The AWS IoT Fleet Provisioning library enables you to interact with the AWS IoT Fleet Provisioning MQTT APIs in order to provison IoT devices without preexisting device certificates. With AWS IoT Fleet Provisioning, devices can securely receive unique device certificates from AWS IoT when they connect for the first time. For an overview of all provisioning options offered by AWS IoT, see device provisioning documentation. For details about Fleet Provisioning, refer to the AWS IoT Fleet Provisioning documentation.

See memory requirements for the latest release here.

#### AWS SigV4

The AWS SigV4 library enables you to sign HTTP requests with Signature Version 4 Signing Process. Signature Version 4 (SigV4) is the process to add authentication information to HTTP requests to AWS services. For security, most requests to AWS must be signed with an access key. The access key consists of an access key ID and secret access key.

See memory requirements for the latest release here.

#### backoffAlgorithm

The backoffAlgorithm library is a utility library to calculate backoff period using an exponential backoff with jitter algorithm for retrying network operations (like failed network connection with server). This library uses the "Full Jitter" strategy for the exponential backoff with jitter algorithm. More information about the algorithm can be seen in the Exponential Backoff and Jitter AWS blog.

Exponential backoff with jitter is typically used when retrying a failed connection or network request to the server. An exponential backoff with jitter helps to mitigate the failed network operations with servers, that are caused due to network congestion or high load on the server, by spreading out retry requests across multiple devices attempting network operations. Besides, in an environment with poor connectivity, a client can get disconnected at any time. A backoff strategy helps the client to conserve battery by not repeatedly attempting reconnections when they are unlikely to succeed.

The backoffAlgorithm library has no dependencies on libraries other than the standard C library.

See memory requirements for the latest release here.

### Sending metrics to AWS IoT

When establishing a connection with AWS IoT, users can optionally report the Operating System, Hardware Platform and MQTT client version information of their device to AWS. This information can help AWS IoT provide faster issue resolution and technical support. If users want to report this information, they can send a specially formatted string (see below) in the username field of the MQTT CONNECT packet.

Format

The format of the username string with metrics is:

``````<Actual_Username>?SDK=<OS_Name>&Version=<OS_Version>&Platform=<Hardware_Platform>&MQTTLib=<MQTT_Library_name>@<MQTT_Library_version>
``````

Where

• is the actual username used for authentication, if username and password are used for authentication. When username and password based authentication is not used, this is an empty value.
• is the Operating System the application is running on (e.g. Ubuntu)
• is the version number of the Operating System (e.g. 20.10)
• is the Hardware Platform the application is running on (e.g. RaspberryPi)
• is the MQTT Client library being used (e.g. coreMQTT)
• is the version of the MQTT Client library being used (e.g. 1.1.0)

Example

• Actual_Username = “iotuser”, OS_Name = Ubuntu, OS_Version = 20.10, Hardware_Platform_Name = RaspberryPi, MQTT_Library_Name = coremqtt, MQTT_Library_version = 1.1.0. If username is not used, then “iotuser” can be removed.
``````/* Username string:
* iotuser?SDK=Ubuntu&Version=20.10&Platform=RaspberryPi&MQTTLib=coremqtt@1.1.0
*/

#define OS_NAME                   "Ubuntu"
#define OS_VERSION                "20.10"
#define HARDWARE_PLATFORM_NAME    "RaspberryPi"
#define MQTT_LIB                  "coremqtt@1.1.0"

#define USERNAME_STRING           "iotuser?SDK=" OS_NAME "&Version=" OS_VERSION "&Platform=" HARDWARE_PLATFORM_NAME "&MQTTLib=" MQTT_LIB
#define USERNAME_STRING_LENGTH    ( ( uint16_t ) ( sizeof( USERNAME_STRING ) - 1 ) )

MQTTConnectInfo_t connectInfo;
mqttStatus = MQTT_Connect( pMqttContext, &connectInfo, NULL, CONNACK_RECV_TIMEOUT_MS, pSessionPresent );
``````

## Versioning

C-SDK releases will now follow a date based versioning scheme with the format YYYYMM.NN, where:

• Y represents the year.
• M represents the month.
• N represents the release order within the designated month (00 being the first release).

For example, a second release in June 2021 would be 202106.01. Although the SDK releases have moved to date-based versioning, each library within the SDK will still retain semantic versioning. In semantic versioning, the version number itself (X.Y.Z) indicates whether the release is a major, minor, or point release. You can use the semantic version of a library to assess the scope and impact of a new release on your application.

## Releases and Documentation

All of the released versions of the C-SDK libraries are available as git tags. For example, the last release of the v3 SDK version is available at tag 3.1.2.

### 202108.00

API documentation of 202108.00 release

This release introduces the refactored AWS IoT Fleet Provisioning library and the new AWS SigV4 library.

Additionally, this release brings minor version updates in the AWS IoT Over-the-Air Update and corePKCS11 libraries.

### 202103.00

API documentation of 202103.00 release

This release includes a major update to the APIs of the AWS IoT Over-the-air Update library.

Additionally, AWS IoT Device Shadow library introduces a minor update by adding support for named shadow, a feature of the AWS IoT Device Shadow service that allows you to create multiple shadows for a single IoT device. AWS IoT Jobs library introduces a minor update by introducing macros for `\$next` job ID and compile-time generation of topic strings. AWS IoT Device Defender library introduces a minor update that adds macros to API for custom metrics feature of AWS IoT Device Defender service.

corePKCS11 also introduces a patch update by removing the `pkcs11configPAL_DESTROY_SUPPORTED` config and mbedTLS platform abstraction layer of `DestroyObject`. Lastly, no code changes are introduced for backoffAlgorithm, coreHTTP, coreMQTT, and coreJSON; however, patch updates are made to improve documentation and CI.

### 202012.01

API documentation of 202012.01 release

This release includes AWS IoT Over-the-air Update(Release Candidate), backoffAlgorithm, and PKCS #11 libraries. Additionally, there is a major update to the coreJSON and coreHTTP APIs. All libraries continue to undergo code quality checks (e.g. MISRA-C compliance), and Coverity static analysis. In addition, all libraries except AWS IoT Over-the-air Update and backoffAlgorithm undergo validation of memory safety with the C Bounded Model Checker (CBMC) automated reasoning tool.

### 202011.00

API documentation of 202011.00 release

This release includes refactored HTTP client, AWS IoT Device Defender, and AWS IoT Jobs libraries. Additionally, there is a major update to the coreJSON API. All libraries continue to undergo code quality checks (e.g. MISRA-C compliance), Coverity static analysis, and validation of memory safety with the C Bounded Model Checker (CBMC) automated reasoning tool.

### 202009.00

API documentation of 202009.00 release

This release includes refactored MQTT, JSON Parser, and AWS IoT Device Shadow libraries for optimized memory usage and modularity. These libraries are included in the SDK via Git submoduling. These libraries have gone through code quality checks including verification that no function has a GNU Complexity score over 8, and checks against deviations from mandatory rules in the MISRA coding standard. Deviations from the MISRA C:2012 guidelines are documented under MISRA Deviations. These libraries have also undergone both static code analysis from Coverity static analysis, and validation of memory safety and data structure invariance through the CBMC automated reasoning tool.

If you are upgrading from v3.x API of the C-SDK to the 202009.00 release, please refer to Migration guide from v3.1.2 to 202009.00 and newer releases. If you are using the C-SDK v4_beta_deprecated branch, note that we will continue to maintain this branch for critical bug fixes and security patches but will not add new features to it. See the C-SDK v4_beta_deprecated branch README for additional details.

### v3.1.2

Details available here.

## Porting Guide for 202009.00 and newer releases

All libraries depend on the ISO C90 standard library and additionally on the `stdint.h` library for fixed-width integers, including `uint8_t`, `int8_t`, `uint16_t`, `uint32_t` and `int32_t`, and constant macros like `UINT16_MAX`. If your platform does not support the `stdint.h` library, definitions of the mentioned fixed-width integer types will be required for porting any C-SDK library to your platform.

### Porting coreMQTT

Guide for porting coreMQTT library to your platform is available here.

### Porting coreHTTP

Guide for porting coreHTTP library is available here.

### Porting AWS IoT Device Shadow

Guide for porting AWS IoT Device Shadow library is available here.

### Porting AWS IoT Device Defender

Guide for porting AWS IoT Device Defender library is available here.

### Porting AWS IoT Over-the-air Update

Guide for porting OTA library to your platform is available here.

## Migration guide from v3.1.2 to 202009.00 and newer releases

### MQTT Migration

Migration guide for MQTT library is available here.

Migration guide for Shadow library is available here.

### Jobs Migration

Migration guide for Jobs library is available here.

## Branches

### main branch

The main branch hosts the continuous development of the AWS IoT Embedded C SDK (C-SDK) libraries. Please be aware that the development at the tip of the main branch is continuously in progress, and may have bugs. Consider using the tagged releases of the C-SDK for production ready software.

### v4_beta_deprecated branch (formerly named v4_beta)

The v4_beta_deprecated branch contains a beta version of the C-SDK libraries, which is now deprecated. This branch was earlier named as v4_beta, and was renamed to v4_beta_deprecated. The libraries in this branch will not be released. However, critical bugs will be fixed and tested. No new features will be added to this branch.

## Getting Started

### Cloning

This repository uses Git Submodules to bring in the C-SDK libraries (eg, MQTT ) and third-party dependencies (eg, mbedtls for POSIX platform transport layer). Note: If you download the ZIP file provided by GitHub UI, you will not get the contents of the submodules (The ZIP file is also not a valid git repository). If you download from the 202012.00 Release Page page, you will get the entire repository (including the submodules) in the ZIP file, aws-iot-device-sdk-embedded-c-202012.00.zip. To clone the latest commit to main branch using HTTPS:

``````git clone --recurse-submodules https://github.com/aws/aws-iot-device-sdk-embedded-C.git
``````

Using SSH:

``````git clone --recurse-submodules git@github.com:aws/aws-iot-device-sdk-embedded-C.git
``````

If you have downloaded the repo without using the `--recurse-submodules` argument, you need to run:

``````git submodule update --init --recursive
``````

When building with CMake, submodules are also recursively cloned automatically. However, `-DBUILD_CLONE_SUBMODULES=0` can be passed as a CMake flag to disable this functionality. This is useful when you'd like to build CMake while using a different commit from a submodule.

### Configuring Demos

The libraries in this SDK are not dependent on any operating system. However, the demos for the libraries in this SDK are built and tested on a Linux platform. The demos build with CMake, a cross-platform build tool.

#### Prerequisites

• CMake 3.2.0 or any newer version for utilizing the build system of the repository.
• C90 compiler such as gcc
• Due to the use of mbedtls in corePKCS11, a C99 compiler is required if building the PKCS11 demos or the CMake install target.
• Although not a part of the ISO C90 standard, `stdint.h` is required for fixed-width integer types that include `uint8_t`, `int8_t`, `uint16_t`, `uint32_t` and `int32_t`, and constant macros like `UINT16_MAX`, while `stdbool.h` is required for boolean parameters in coreMQTT. For compilers that do not provide these header files, coreMQTT provides the files stdint.readme and stdbool.readme, which can be renamed to `stdint.h` and `stdbool.h`, respectively, to provide the required type definitions.
• A supported operating system. The ports provided with this repo are expected to work with all recent versions of the following operating systems, although we cannot guarantee the behavior on all systems.
• Linux system with POSIX sockets, threads, RT, and timer APIs. (We have tested on Ubuntu 18.04).

Build Dependencies

The follow table shows libraries that need to be installed in your system to run certain demos. If a dependency is not installed and cannot be built from source, demos that require that dependency will be excluded from the default `all` target.

#### AWS IoT Account Setup

You need to setup an AWS account and access the AWS IoT console for running the AWS IoT Device Shadow library, AWS IoT Device Defender library, AWS IoT Jobs library, AWS IoT OTA library and coreHTTP S3 download demos. Also, the AWS account can be used for running the MQTT mutual auth demo against AWS IoT broker. Note that running the AWS IoT Device Defender, AWS IoT Jobs and AWS IoT Device Shadow library demos require the setup of a Thing resource for the device running the demo. Follow the links to:

The MQTT Mutual Authentication and AWS IoT Shadow demos include example AWS IoT policy documents to run each respective demo with AWS IoT. You may use the MQTT Mutual auth and Shadow example policies by replacing `[AWS_REGION]` and `[AWS_ACCOUNT_ID]` with the strings of your region and account identifier. While the IoT Thing name and MQTT client identifier do not need to match for the demos to run, the example policies have the Thing name and client identifier identical as per AWS IoT best practices.

It can be very helpful to also have the AWS Command Line Interface tooling installed.

#### Configuring mutual authentication demos of MQTT and HTTP

You can pass the following configuration settings as command line options in order to run the mutual auth demos. Make sure to run the following command in the root directory of the C-SDK:

``````## optionally find your-aws-iot-endpoint from the command line
aws iot describe-endpoint --endpoint-type iot:Data-ATS
cmake -S . -Bbuild
-DAWS_IOT_ENDPOINT="<your-aws-iot-endpoint>" -DCLIENT_CERT_PATH="<your-client-certificate-path>" -DCLIENT_PRIVATE_KEY_PATH="<your-client-private-key-path>"
``````

In order to set these configurations manually, edit `demo_config.h` in `demos/mqtt/mqtt_demo_mutual_auth/` and `demos/http/http_demo_mutual_auth/` to `#define` the following:

• Set `AWS_IOT_ENDPOINT` to your custom endpoint. This is found on the Settings page of the AWS IoT Console and has a format of `ABCDEFG1234567.iot.<aws-region>.amazonaws.com` where `<aws-region>` can be an AWS region like `us-east-2`.
• Optionally, it can also be found with the AWS CLI command `aws iot describe-endpoint --endpoint-type iot:Data-ATS`.
• Set `CLIENT_CERT_PATH` to the path of the client certificate downloaded when setting up the device certificate in AWS IoT Account Setup.
• Set `CLIENT_PRIVATE_KEY_PATH` to the path of the private key downloaded when setting up the device certificate in AWS IoT Account Setup.

It is possible to configure `ROOT_CA_CERT_PATH` to any PEM-encoded Root CA Certificate. However, this is optional because CMake will download and set it to AmazonRootCA1.pem when unspecified.

#### Configuring AWS IoT Device Defender and AWS IoT Device Shadow demos

To build the AWS IoT Device Defender and AWS IoT Device Shadow demos, you can pass the following configuration settings as command line options. Make sure to run the following command in the root directory of the C-SDK:

``````cmake -S . -Bbuild -DAWS_IOT_ENDPOINT="<your-aws-iot-endpoint>" -DROOT_CA_CERT_PATH="<your-path-to-amazon-root-ca>" -DCLIENT_CERT_PATH="<your-client-certificate-path>" -DCLIENT_PRIVATE_KEY_PATH="<your-client-private-key-path>" -DTHING_NAME="<your-registered-thing-name>"
``````

In order to set these configurations manually, edit `demo_config.h` in the demo folder to `#define` the following:

• Set `AWS_IOT_ENDPOINT` to your custom endpoint. This is found on the Settings page of the AWS IoT Console and has a format of `ABCDEFG1234567.iot.us-east-2.amazonaws.com`.
• Set `ROOT_CA_CERT_PATH` to the path of the root CA certificate downloaded when setting up the device certificate in AWS IoT Account Setup.
• Set `CLIENT_CERT_PATH` to the path of the client certificate downloaded when setting up the device certificate in AWS IoT Account Setup.
• Set `CLIENT_PRIVATE_KEY_PATH` to the path of the private key downloaded when setting up the device certificate in AWS IoT Account Setup.
• Set `THING_NAME` to the name of the Thing created in AWS IoT Account Setup.

#### Configuring the AWS IoT Fleet Provisioning demo

To build the AWS IoT Fleet Provisioning Demo, you can pass the following configuration settings as command line options. Make sure to run the following command in the root directory of the C-SDK:

``````cmake -S . -Bbuild -DAWS_IOT_ENDPOINT="<your-aws-iot-endpoint>" -DROOT_CA_CERT_PATH="<your-path-to-amazon-root-ca>" -DCLAIM_CERT_PATH="<your-claim-certificate-path>" -DCLAIM_PRIVATE_KEY_PATH="<your-claim-private-key-path>" -DPROVISIONING_TEMPLATE_NAME="<your-template-name>" -DDEVICE_SERIAL_NUMBER="<your-serial-number>"
``````

To create a provisioning template and claim credentials, sign into your AWS account and visit here. Make sure to enable the "Use the AWS IoT registry to manage your device fleet" option. Once you have created the template and credentials, modify the claim certificate's policy to match the sample policy.

In order to set these configurations manually, edit `demo_config.h` in the demo folder to `#define` the following:

• Set `AWS_IOT_ENDPOINT` to your custom endpoint. This is found on the Settings page of the AWS IoT Console and has a format of `ABCDEFG1234567.iot.us-east-2.amazonaws.com`.
• Set `ROOT_CA_CERT_PATH` to the path of the root CA certificate downloaded when setting up the device certificate in AWS IoT Account Setup.
• Set `CLAIM_CERT_PATH` to the path of the claim certificate downloaded when setting up the template and claim credentials.
• Set `CLAIM_PRIVATE_KEY_PATH` to the path of the private key downloaded when setting up the template and claim credentials.
• Set `PROVISIONING_TEMPLATE_NAME` to the name of the provisioning template created.
• Set `DEVICE_SERIAL_NUMBER` to an arbitrary string representing a device identifier.

#### Configuring the S3 demos

You can pass the following configuration settings as command line options in order to run the S3 demos. Make sure to run the following command in the root directory of the C-SDK:

``````cmake -S . -Bbuild -DS3_PRESIGNED_GET_URL="s3-get-url" -DS3_PRESIGNED_PUT_URL="s3-put-url"
``````

`S3_PRESIGNED_PUT_URL` is only needed for the S3 upload demo.

In order to set these configurations manually, edit `demo_config.h` in `demos/http/http_demo_s3_download_multithreaded`, and `demos/http/http_demo_s3_upload` to `#define` the following:

• Set `S3_PRESIGNED_GET_URL` to a S3 presigned URL with GET access.
• Set `S3_PRESIGNED_PUT_URL` to a S3 presigned URL with PUT access.

You can generate the presigned urls using demos/http/common/src/presigned_urls_gen.py. More info can be found here.

#### Setup for AWS IoT Jobs demo

1. The demo requires the Linux platform to contain curl and libmosquitto. On a Debian platform, these dependencies can be installed with:
``````    apt install curl libmosquitto-dev
``````

If the platform does not contain the `libmosquitto` library, the demo will build the library from source.

`libmosquitto` 1.4.10 or any later version of the first major release is required to run this demo.

1. A job that specifies the URL to download for the demo needs to be created on the AWS account for the Thing resource that will be used by the demo.
The job can be created directly from the AWS IoT console or using the aws cli tool.

`````` aws iot create-job \
--job-id 'job_1' \
--targets arn:aws:iot:us-west-2:<account-id>:thing/<thing-name> \
--document '{"url":"https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.8.5.tar.xz"}'
``````

#### Prerequisites for the AWS Over-The-Air Update (OTA) demos

1. To perform a successful OTA update, you need to complete the prerequisites mentioned here.
2. A code signing certificate is required to authenticate the update. A code signing certificate based on the SHA-256 ECDSA algorithm will work with the current demos. An example of how to generate this kind of certificate can be found here.

#### Scheduling an OTA Update Job

After you build and run the initial executable you will have to create another executable and schedule an OTA update job with this image.

1. Increase the version of the application by setting macro `APP_VERSION_BUILD` in `demos/ota/ota_demo_core_[mqtt/http]/demo_config.h` to a different version than what is running.
2. Rebuild the application using the build steps below into a different directory, say `build-dir-2`.
3. Rename the demo executable to reflect the change, e.g. `mv ota_demo_core_mqtt ota_demo_core_mqtt2`
4. Create an OTA job:
1. Go to the AWS IoT Core console.
2. Manage → Jobs → Create → Create a FreeRTOS OTA update job → Select the corresponding name for your device from the thing list.
3. Sign a new firmware → Create a new profile → Select any SHA-ECDSA signing platform → Upload the code signing certificate(from prerequisites) and provide its path on the device.
4. Select the image → Select the bucket you created during the prerequisite steps → Upload the binary `build-dir-2/bin/ota_demo2`.
5. The path on device should be the absolute path to place the executable and the binary name: e.g. `/home/ubuntu/aws-iot-device-sdk-embedded-C-staging/build-dir/bin/ota_demo_core_mqtt2`.
6. Select the IAM role created during the prerequisite steps.
7. Create the Job.
5. Run the initial executable again with the following command: `sudo ./ota_demo_core_mqtt` or `sudo ./ota_demo_core_http`.
6. After the initial executable has finished running, go to the directory where the downloaded firmware image resides which is the path name used when creating an OTA job.
7. Change the permissions of the downloaded firmware to make it executable, as it may be downloaded with read (user default) permissions only: `chmod 775 ota_demo_core_mqtt2`
8. Run the downloaded firmware image with the following command: `sudo ./ota_demo_core_mqtt2`

### Building and Running Demos

Before building the demos, ensure you have installed the prerequisite software. On Ubuntu 18.04 and 20.04, `gcc`, `cmake`, and OpenSSL can be installed with:

``````sudo apt install build-essential cmake libssl-dev
``````

#### Build a single demo

• Go to the root directory of the C-SDK.
• Run cmake to generate the Makefiles: `cmake -S . -Bbuild && cd build`
• Choose a demo from the list below or alternatively, run `make help | grep demo`:
``````defender_demo
http_demo_basic_tls
http_demo_mutual_auth
http_demo_plaintext
jobs_demo_mosquitto
mqtt_demo_basic_tls
mqtt_demo_mutual_auth
mqtt_demo_plaintext
mqtt_demo_serializer
mqtt_demo_subscription_manager
ota_demo_core_http
ota_demo_core_mqtt
pkcs11_demo_management_and_rng
pkcs11_demo_mechanisms_and_digests
pkcs11_demo_objects
pkcs11_demo_sign_and_verify
• Replace `demo_name` with your desired demo then build it: `make demo_name`
• Go to the `build/bin` directory and run any demo executables from there.

#### Build all configured demos

• Go to the root directory of the C-SDK.
• Run cmake to generate the Makefiles: `cmake -S . -Bbuild && cd build`
• Run this command to build all configured demos: `make`
• Go to the `build/bin` directory and run any demo executables from there.

#### Running corePKCS11 demos

The corePKCS11 demos do not require any AWS IoT resources setup, and are standalone. The demos build upon each other to introduce concepts in PKCS #11 sequentially. Below is the recommended order.

1. `pkcs11_demo_management_and_rng`
2. `pkcs11_demo_mechanisms_and_digests`
3. `pkcs11_demo_objects`
4. `pkcs11_demo_sign_and_verify`
1. Please note that this demo requires the private and public key generated from `pkcs11_demo_objects` to be in the directory the demo is executed from.

#### Alternative option of Docker containers for running demos locally

Install Docker:

``````curl -fsSL https://get.docker.com -o get-docker.sh

sh get-docker.sh
``````

Installing Mosquitto to run MQTT demos locally

The following instructions have been tested on an Ubuntu 18.04 environment with Docker and OpenSSL installed.

Download the official Docker image for Mosquitto 1.6.14. This version is deliberately chosen so that the Docker container can load certificates from the host system. Any version after 1.6.14 will drop privileges as soon as the configuration file has been read (before TLS certificates are loaded).

``docker pull eclipse-mosquitto:1.6.14``

If a Mosquitto broker with TLS communication needs to be run, ignore this step and proceed to the next step. A Mosquitto broker with plain text communication can be run by executing the command below.

``docker run -it -p 1883:1883 --name mosquitto-plain-text eclipse-mosquitto:1.6.14``

Set `BROKER_ENDPOINT` defined in `demos/mqtt/mqtt_demo_plaintext/demo_config.h` to `localhost`.

Ignore the remaining steps unless a Mosquitto broker with TLS communication also needs to be run.

For TLS communication with Mosquitto broker, server and CA credentials need to be created. Use OpenSSL commands to generate the credentials for the Mosquitto server.

``````# Generate CA key and certificate. Provide the Subject field information as appropriate for CA certificate.
openssl req -x509 -nodes -sha256 -days 365 -newkey rsa:2048 -keyout ca.key -out ca.crt``````
``````# Generate server key and certificate.# Provide the Subject field information as appropriate for Server certificate. Make sure the Common Name (CN) field is different from the root CA certificate.
openssl req -nodes -sha256 -new -keyout server.key -out server.csr # Sign with the CA cert.
openssl x509 -req -sha256 -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt -days 365``````

Note: Make sure to use different Common Name (CN) detail between the CA and server certificates; otherwise, SSL handshake fails with exactly same Common Name (CN) detail in both the certificates.

``````port 8883

cafile /mosquitto/config/ca.crt
certfile /mosquitto/config/server.crt
keyfile /mosquitto/config/server.key

# Use this option for TLS mutual authentication (where client will provide CA signed certificate)
#require_certificate true
tls_version tlsv1.2

Create a mosquitto.conf file to use port 8883 (for TLS communication) and providing path to the generated credentials.

Run the docker container from the local directory containing the generated credential and mosquitto.conf files.

``````docker run -it -p 8883:8883 -v \$(pwd):/mosquitto/config/ --name mosquitto-basic-tls eclipse-mosquitto:1.6.14
``````

Update `demos/mqtt/mqtt_demo_basic_tls/demo_config.h` to the following:
Set `BROKER_ENDPOINT` to `localhost`.
Set `ROOT_CA_CERT_PATH` to the absolute path of the CA certificate created in step 4. for the local Mosquitto server.

Installing httpbin to run HTTP demos locally

Run httpbin through port 80:

``````docker pull kennethreitz/httpbin
docker run -p 80:80 kennethreitz/httpbin
``````

`SERVER_HOST` defined in `demos/http/http_demo_plaintext/demo_config.h` can now be set to `localhost`.

To run `http_demo_basic_tls`, download ngrok in order to create an HTTPS tunnel to the httpbin server currently hosted on port 80:

``````./ngrok http 80 # May have to use ./ngrok.exe depending on OS or filename of the executable
``````

`ngrok` will provide an https link that can be substituted in `demos/http/http_demo_basic_tls/demo_config.h` and has a format of `https://ABCDEFG12345.ngrok.io`.

Set `SERVER_HOST` in `demos/http/http_demo_basic_tls/demo_config.h` to the https link provided by ngrok, without `https://` preceding it.

You must also download the Root CA certificate provided by the ngrok https link and set `ROOT_CA_CERT_PATH` in `demos/http/http_demo_basic_tls/demo_config.h` to the file path of the downloaded certificate.

### Installation

The C-SDK libraries and platform abstractions can be installed to a file system through CMake. To do so, run the following command in the root directory of the C-SDK. Note that installation is not required to run any of the demos.

``````cmake -S . -Bbuild -DBUILD_DEMOS=0 -DBUILD_TESTS=0
cd build
sudo make install
``````

Note that because `make install` will automatically build the `all` target, it may be useful to disable building demos and tests with `-DBUILD_DEMOS=0 -DBUILD_TESTS=0` unless they have already been configured. Super-user permissions may be needed if installing to a system include or system library path.

To install only a subset of all libraries, pass `-DINSTALL_LIBS` to install only the libraries you need. By default, all libraries will be installed, but you may exclude any library that you don't need from this list:

``````-DINSTALL_LIBS="DEFENDER;SHADOW;JOBS;OTA;OTA_HTTP;OTA_MQTT;BACKOFF_ALGORITHM;HTTP;JSON;MQTT;PKCS"
``````

By default, the install path will be in the `project` directory of the SDK. You can also set `-DINSTALL_TO_SYSTEM=1` to install to the system path for headers and libraries in your OS (e.g. `/usr/local/include` & `/usr/local/lib` for Linux).

Upon entering `make install`, the location of each library will be specified first followed by the location of all installed headers:

``````-- Installing: /usr/local/lib/libaws_iot_defender.so
...
-- Installing: /usr/local/include/aws/defender.h
-- Installing: /usr/local/include/aws/defender_config_defaults.h
``````

You may also set an installation path of your choice by passing the following flags through CMake. Make sure to run the following command in the root directory of the C-SDK:

``````cmake -S . -Bbuild -DBUILD_DEMOS=0 -DBUILD_TESTS=0 \
cd build
sudo make install
``````

POSIX platform abstractions are used together with the C-SDK libraries in the demos. By default, these abstractions are also installed but can be excluded by passing the flag: `-DINSTALL_PLATFORM_ABSTRACTIONS=0`.

Lastly, a custom config path for any specific library can also be specified through the following CMake flags, allowing libraries to be compiled with a config of your choice:

``````-DDEFENDER_CUSTOM_CONFIG_DIR="defender-config-directory"
-DJOBS_CUSTOM_CONFIG_DIR="jobs-config-directory"
-DOTA_CUSTOM_CONFIG_DIR="ota-config-directory"
-DHTTP_CUSTOM_CONFIG_DIR="http-config-directory"
-DJSON_CUSTOM_CONFIG_DIR="json-config-directory"
-DMQTT_CUSTOM_CONFIG_DIR="mqtt-config-directory"
-DPKCS_CUSTOM_CONFIG_DIR="pkcs-config-directory"
``````

Note that the file name of the header should not be included in the directory.

## Generating Documentation

Note: For pre-generated documentation, please visit Releases and Documentation section.

The Doxygen references were created using Doxygen version 1.9.2. To generate the Doxygen pages, use the provided Python script at tools/doxygen/generate_docs.py. Please ensure that each of the library submodules under `libraries/standard/` and `libraries/aws/` are cloned before using this script.

``````cd <CSDK_ROOT>
git submodule update --init --recursive --checkout
python3 tools/doxygen/generate_docs.py
``````

The generated documentation landing page is located at `docs/doxygen/output/html/index.html`.

Author: aws
Source code: https://github.com/aws/aws-iot-device-sdk-embedded-C