Hibernate Naming Under Spring Boot and Embedded JPA Entities

Hibernate Naming Under Spring Boot and Embedded JPA Entities

I encountered a little gotcha when setting up multiple Embedded JPA entities while developing a module for my Spring Core online course. In this video, I show you a simple solution to support multiple Embedded JPA Entities without additional JPA annotations to customize the default column names.

You can learn more about my courses on the Spring Framework here: http://bit.ly/1RPhI2A

#spring boot #spring #jpa #hibernate

What is GEEK

Buddha Community

Hibernate Naming Under Spring Boot and Embedded JPA Entities

Enhance Amazon Aurora Read/Write Capability with ShardingSphere-JDBC

1. Introduction

Amazon Aurora is a relational database management system (RDBMS) developed by AWS(Amazon Web Services). Aurora gives you the performance and availability of commercial-grade databases with full MySQL and PostgreSQL compatibility. In terms of high performance, Aurora MySQL and Aurora PostgreSQL have shown an increase in throughput of up to 5X over stock MySQL and 3X over stock PostgreSQL respectively on similar hardware. In terms of scalability, Aurora achieves enhancements and innovations in storage and computing, horizontal and vertical functions.

Aurora supports up to 128TB of storage capacity and supports dynamic scaling of storage layer in units of 10GB. In terms of computing, Aurora supports scalable configurations for multiple read replicas. Each region can have an additional 15 Aurora replicas. In addition, Aurora provides multi-primary architecture to support four read/write nodes. Its Serverless architecture allows vertical scaling and reduces typical latency to under a second, while the Global Database enables a single database cluster to span multiple AWS Regions in low latency.

Aurora already provides great scalability with the growth of user data volume. Can it handle more data and support more concurrent access? You may consider using sharding to support the configuration of multiple underlying Aurora clusters. To this end, a series of blogs, including this one, provides you with a reference in choosing between Proxy and JDBC for sharding.

1.1 Why sharding is needed

AWS Aurora offers a single relational database. Primary-secondary, multi-primary, and global database, and other forms of hosting architecture can satisfy various architectural scenarios above. However, Aurora doesn’t provide direct support for sharding scenarios, and sharding has a variety of forms, such as vertical and horizontal forms. If we want to further increase data capacity, some problems have to be solved, such as cross-node database Join, associated query, distributed transactions, SQL sorting, page turning, function calculation, database global primary key, capacity planning, and secondary capacity expansion after sharding.

1.2 Sharding methods

It is generally accepted that when the capacity of a MySQL table is less than 10 million, the time spent on queries is optimal because at this time the height of its BTREE index is between 3 and 5. Data sharding can reduce the amount of data in a single table and distribute the read and write loads to different data nodes at the same time. Data sharding can be divided into vertical sharding and horizontal sharding.

1. Advantages of vertical sharding

  • Address the coupling of business system and make clearer.
  • Implement hierarchical management, maintenance, monitoring, and expansion to data of different businesses, like micro-service governance.
  • In high concurrency scenarios, vertical sharding removes the bottleneck of IO, database connections, and hardware resources on a single machine to some extent.

2. Disadvantages of vertical sharding

  • After splitting the library, Join can only be implemented by interface aggregation, which will increase the complexity of development.
  • After splitting the library, it is complex to process distributed transactions.
  • There is a large amount of data on a single table and horizontal sharding is required.

3. Advantages of horizontal sharding

  • There is no such performance bottleneck as a large amount of data on a single database and high concurrency, and it increases system stability and load capacity.
  • The business modules do not need to be split due to minor modification on the application client.

4. Disadvantages of horizontal sharding

  • Transaction consistency across shards is hard to be guaranteed;
  • The performance of associated query in cross-library Join is poor.
  • It’s difficult to scale the data many times and maintenance is a big workload.

Based on the analysis above, and the available studis on popular sharding middleware, we selected ShardingSphere, an open source product, combined with Amazon Aurora to introduce how the combination of these two products meets various forms of sharding and how to solve the problems brought by sharding.

ShardingSphere is an open source ecosystem including a set of distributed database middleware solutions, including 3 independent products, Sharding-JDBC, Sharding-Proxy & Sharding-Sidecar.

2. ShardingSphere introduction:

The characteristics of Sharding-JDBC are:

  1. With the client end connecting directly to the database, it provides service in the form of jar and requires no extra deployment and dependence.
  2. It can be considered as an enhanced JDBC driver, which is fully compatible with JDBC and all kinds of ORM frameworks.
  3. Applicable in any ORM framework based on JDBC, such as JPA, Hibernate, Mybatis, Spring JDBC Template or direct use of JDBC.
  4. Support any third-party database connection pool, such as DBCP, C3P0, BoneCP, Druid, HikariCP;
  5. Support any kind of JDBC standard database: MySQL, Oracle, SQLServer, PostgreSQL and any databases accessible to JDBC.
  6. Sharding-JDBC adopts decentralized architecture, applicable to high-performance light-weight OLTP application developed with Java

Hybrid Structure Integrating Sharding-JDBC and Applications

Sharding-JDBC’s core concepts

Data node: The smallest unit of a data slice, consisting of a data source name and a data table, such as ds_0.product_order_0.

Actual table: The physical table that really exists in the horizontal sharding database, such as product order tables: product_order_0, product_order_1, and product_order_2.

Logic table: The logical name of the horizontal sharding databases (tables) with the same schema. For instance, the logic table of the order product_order_0, product_order_1, and product_order_2 is product_order.

Binding table: It refers to the primary table and the joiner table with the same sharding rules. For example, product_order table and product_order_item are sharded by order_id, so they are binding tables with each other. Cartesian product correlation will not appear in the multi-tables correlating query, so the query efficiency will increase greatly.

Broadcast table: It refers to tables that exist in all sharding database sources. The schema and data must consist in each database. It can be applied to the small data volume that needs to correlate with big data tables to query, dictionary table and configuration table for example.

3. Testing ShardingSphere-JDBC

3.1 Example project

Download the example project code locally. In order to ensure the stability of the test code, we choose shardingsphere-example-4.0.0 version.

git clone https://github.com/apache/shardingsphere-example.git

Project description:

shardingsphere-example
  ├── example-core
  │   ├── config-utility
  │   ├── example-api
  │   ├── example-raw-jdbc
  │   ├── example-spring-jpa #spring+jpa integration-based entity,repository
  │   └── example-spring-mybatis
  ├── sharding-jdbc-example
  │   ├── sharding-example
  │   │   ├── sharding-raw-jdbc-example
  │   │   ├── sharding-spring-boot-jpa-example #integration-based sharding-jdbc functions
  │   │   ├── sharding-spring-boot-mybatis-example
  │   │   ├── sharding-spring-namespace-jpa-example
  │   │   └── sharding-spring-namespace-mybatis-example
  │   ├── orchestration-example
  │   │   ├── orchestration-raw-jdbc-example
  │   │   ├── orchestration-spring-boot-example #integration-based sharding-jdbc governance function
  │   │   └── orchestration-spring-namespace-example
  │   ├── transaction-example
  │   │   ├── transaction-2pc-xa-example #sharding-jdbc sample of two-phase commit for a distributed transaction
  │   │   └──transaction-base-seata-example #sharding-jdbc distributed transaction seata sample
  │   ├── other-feature-example
  │   │   ├── hint-example
  │   │   └── encrypt-example
  ├── sharding-proxy-example
  │   └── sharding-proxy-boot-mybatis-example
  └── src/resources
        └── manual_schema.sql  

Configuration file description:

application-master-slave.properties #read/write splitting profile
application-sharding-databases-tables.properties #sharding profile
application-sharding-databases.properties       #library split profile only
application-sharding-master-slave.properties    #sharding and read/write splitting profile
application-sharding-tables.properties          #table split profile
application.properties                         #spring boot profile

Code logic description:

The following is the entry class of the Spring Boot application below. Execute it to run the project.

The execution logic of demo is as follows:

3.2 Verifying read/write splitting

As business grows, the write and read requests can be split to different database nodes to effectively promote the processing capability of the entire database cluster. Aurora uses a reader/writer endpoint to meet users' requirements to write and read with strong consistency, and a read-only endpoint to meet the requirements to read without strong consistency. Aurora's read and write latency is within single-digit milliseconds, much lower than MySQL's binlog-based logical replication, so there's a lot of loads that can be directed to a read-only endpoint.

Through the one primary and multiple secondary configuration, query requests can be evenly distributed to multiple data replicas, which further improves the processing capability of the system. Read/write splitting can improve the throughput and availability of system, but it can also lead to data inconsistency. Aurora provides a primary/secondary architecture in a fully managed form, but applications on the upper-layer still need to manage multiple data sources when interacting with Aurora, routing SQL requests to different nodes based on the read/write type of SQL statements and certain routing policies.

ShardingSphere-JDBC provides read/write splitting features and it is integrated with application programs so that the complex configuration between application programs and database clusters can be separated from application programs. Developers can manage the Shard through configuration files and combine it with ORM frameworks such as Spring JPA and Mybatis to completely separate the duplicated logic from the code, which greatly improves the ability to maintain code and reduces the coupling between code and database.

3.2.1 Setting up the database environment

Create a set of Aurora MySQL read/write splitting clusters. The model is db.r5.2xlarge. Each set of clusters has one write node and two read nodes.

3.2.2 Configuring Sharding-JDBC

application.properties spring boot Master profile description:

You need to replace the green ones with your own environment configuration.

# Jpa automatically creates and drops data tables based on entities
spring.jpa.properties.hibernate.hbm2ddl.auto=create-drop
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
spring.jpa.properties.hibernate.show_sql=true

#spring.profiles.active=sharding-databases
#spring.profiles.active=sharding-tables
#spring.profiles.active=sharding-databases-tables
#Activate master-slave configuration item so that sharding-jdbc can use master-slave profile
spring.profiles.active=master-slave
#spring.profiles.active=sharding-master-slave

application-master-slave.properties sharding-jdbc profile description:

spring.shardingsphere.datasource.names=ds_master,ds_slave_0,ds_slave_1
# data souce-master
spring.shardingsphere.datasource.ds_master.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master.password=Your master DB password
spring.shardingsphere.datasource.ds_master.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master.jdbc-url=Your primary DB data sourceurl spring.shardingsphere.datasource.ds_master.username=Your primary DB username
# data source-slave
spring.shardingsphere.datasource.ds_slave_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_slave_0.password= Your slave DB password
spring.shardingsphere.datasource.ds_slave_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_slave_0.jdbc-url=Your slave DB data source url
spring.shardingsphere.datasource.ds_slave_0.username= Your slave DB username
# data source-slave
spring.shardingsphere.datasource.ds_slave_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_slave_1.password= Your slave DB password
spring.shardingsphere.datasource.ds_slave_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_slave_1.jdbc-url= Your slave DB data source url
spring.shardingsphere.datasource.ds_slave_1.username= Your slave DB username
# Routing Policy Configuration
spring.shardingsphere.masterslave.load-balance-algorithm-type=round_robin
spring.shardingsphere.masterslave.name=ds_ms
spring.shardingsphere.masterslave.master-data-source-name=ds_master
spring.shardingsphere.masterslave.slave-data-source-names=ds_slave_0,ds_slave_1
# sharding-jdbc configures the information storage mode
spring.shardingsphere.mode.type=Memory
# start shardingsphere log,and you can see the conversion from logical SQL to actual SQL from the print
spring.shardingsphere.props.sql.show=true

 

3.2.3 Test and verification process description

  • Test environment data initialization: Spring JPA initialization automatically creates tables for testing.

  • Write data to the master instance

As shown in the ShardingSphere-SQL log figure below, the write SQL is executed on the ds_master data source.

  • Data query operations are performed on the slave library.

As shown in the ShardingSphere-SQL log figure below, the read SQL is executed on the ds_slave data source in the form of polling.

[INFO ] 2022-04-02 19:43:39,376 --main-- [ShardingSphere-SQL] Rule Type: master-slave 
[INFO ] 2022-04-02 19:43:39,376 --main-- [ShardingSphere-SQL] SQL: select orderentit0_.order_id as order_id1_1_, orderentit0_.address_id as address_2_1_, 
orderentit0_.status as status3_1_, orderentit0_.user_id as user_id4_1_ from t_order orderentit0_ ::: DataSources: ds_slave_0 
---------------------------- Print OrderItem Data -------------------
Hibernate: select orderiteme1_.order_item_id as order_it1_2_, orderiteme1_.order_id as order_id2_2_, orderiteme1_.status as status3_2_, orderiteme1_.user_id 
as user_id4_2_ from t_order orderentit0_ cross join t_order_item orderiteme1_ where orderentit0_.order_id=orderiteme1_.order_id
[INFO ] 2022-04-02 19:43:40,898 --main-- [ShardingSphere-SQL] Rule Type: master-slave 
[INFO ] 2022-04-02 19:43:40,898 --main-- [ShardingSphere-SQL] SQL: select orderiteme1_.order_item_id as order_it1_2_, orderiteme1_.order_id as order_id2_2_, orderiteme1_.status as status3_2_, 
orderiteme1_.user_id as user_id4_2_ from t_order orderentit0_ cross join t_order_item orderiteme1_ where orderentit0_.order_id=orderiteme1_.order_id ::: DataSources: ds_slave_1 

Note: As shown in the figure below, if there are both reads and writes in a transaction, Sharding-JDBC routes both read and write operations to the master library. If the read/write requests are not in the same transaction, the corresponding read requests are distributed to different read nodes according to the routing policy.

@Override
@Transactional // When a transaction is started, both read and write in the transaction go through the master library. When closed, read goes through the slave library and write goes through the master library
public void processSuccess() throws SQLException {
    System.out.println("-------------- Process Success Begin ---------------");
    List<Long> orderIds = insertData();
    printData();
    deleteData(orderIds);
    printData();
    System.out.println("-------------- Process Success Finish --------------");
}

3.2.4 Verifying Aurora failover scenario

The Aurora database environment adopts the configuration described in Section 2.2.1.

3.2.4.1 Verification process description

  1. Start the Spring-Boot project

2. Perform a failover on Aurora’s console

3. Execute the Rest API request

4. Repeatedly execute POST (http://localhost:8088/save-user) until the call to the API failed to write to Aurora and eventually recovered successfully.

5. The following figure shows the process of executing code failover. It takes about 37 seconds from the time when the latest SQL write is successfully performed to the time when the next SQL write is successfully performed. That is, the application can be automatically recovered from Aurora failover, and the recovery time is about 37 seconds.

3.3 Testing table sharding-only function

3.3.1 Configuring Sharding-JDBC

application.properties spring boot master profile description

# Jpa automatically creates and drops data tables based on entities
spring.jpa.properties.hibernate.hbm2ddl.auto=create-drop
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
spring.jpa.properties.hibernate.show_sql=true
#spring.profiles.active=sharding-databases
#Activate sharding-tables configuration items
#spring.profiles.active=sharding-tables
#spring.profiles.active=sharding-databases-tables
# spring.profiles.active=master-slave
#spring.profiles.active=sharding-master-slave

application-sharding-tables.properties sharding-jdbc profile description

## configure primary-key policy
spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id
spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=123
spring.shardingsphere.sharding.tables.t_order_item.actual-data-nodes=ds.t_order_item_$->{0..1}
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.sharding-column=order_id
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.algorithm-expression=t_order_item_$->{order_id % 2}
spring.shardingsphere.sharding.tables.t_order_item.key-generator.column=order_item_id
spring.shardingsphere.sharding.tables.t_order_item.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order_item.key-generator.props.worker.id=123
# configure the binding relation of t_order and t_order_item
spring.shardingsphere.sharding.binding-tables[0]=t_order,t_order_item
# configure broadcast tables
spring.shardingsphere.sharding.broadcast-tables=t_address
# sharding-jdbc mode
spring.shardingsphere.mode.type=Memory
# start shardingsphere log
spring.shardingsphere.props.sql.show=true

 

3.3.2 Test and verification process description

1. DDL operation

JPA automatically creates tables for testing. When Sharding-JDBC routing rules are configured, the client executes DDL, and Sharding-JDBC automatically creates corresponding tables according to the table splitting rules. If t_address is a broadcast table, create a t_address because there is only one master instance. Two physical tables t_order_0 and t_order_1 will be created when creating t_order.

2. Write operation

As shown in the figure below, Logic SQL inserts a record into t_order. When Sharding-JDBC is executed, data will be distributed to t_order_0 and t_order_1 according to the table splitting rules.

When t_order and t_order_item are bound, the records associated with order_item and order are placed on the same physical table.

3. Read operation

As shown in the figure below, perform the join query operations to order and order_item under the binding table, and the physical shard is precisely located based on the binding relationship.

The join query operations on order and order_item under the unbound table will traverse all shards.

3.4 Testing database sharding-only function

3.4.1 Setting up the database environment

Create two instances on Aurora: ds_0 and ds_1

When the sharding-spring-boot-jpa-example project is started, tables t_order, t_order_itemt_address will be created on two Aurora instances.

3.4.2 Configuring Sharding-JDBC

application.properties springboot master profile description

# Jpa automatically creates and drops data tables based on entities
spring.jpa.properties.hibernate.hbm2ddl.auto=create
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
spring.jpa.properties.hibernate.show_sql=true

# Activate sharding-databases configuration items
spring.profiles.active=sharding-databases
#spring.profiles.active=sharding-tables
#spring.profiles.active=sharding-databases-tables
#spring.profiles.active=master-slave
#spring.profiles.active=sharding-master-slave

application-sharding-databases.properties sharding-jdbc profile description

spring.shardingsphere.datasource.names=ds_0,ds_1
# ds_0
spring.shardingsphere.datasource.ds_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_0.jdbc-url= spring.shardingsphere.datasource.ds_0.username= 
spring.shardingsphere.datasource.ds_0.password=
# ds_1
spring.shardingsphere.datasource.ds_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_1.jdbc-url= 
spring.shardingsphere.datasource.ds_1.username= 
spring.shardingsphere.datasource.ds_1.password=
spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=ds_$->{user_id % 2}
spring.shardingsphere.sharding.binding-tables=t_order,t_order_item
spring.shardingsphere.sharding.broadcast-tables=t_address
spring.shardingsphere.sharding.default-data-source-name=ds_0

spring.shardingsphere.sharding.tables.t_order.actual-data-nodes=ds_$->{0..1}.t_order
spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id
spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=123
spring.shardingsphere.sharding.tables.t_order_item.actual-data-nodes=ds_$->{0..1}.t_order_item
spring.shardingsphere.sharding.tables.t_order_item.key-generator.column=order_item_id
spring.shardingsphere.sharding.tables.t_order_item.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order_item.key-generator.props.worker.id=123
# sharding-jdbc mode
spring.shardingsphere.mode.type=Memory
# start shardingsphere log
spring.shardingsphere.props.sql.show=true

 

3.4.3 Test and verification process description

1. DDL operation

JPA automatically creates tables for testing. When Sharding-JDBC’s library splitting and routing rules are configured, the client executes DDL, and Sharding-JDBC will automatically create corresponding tables according to table splitting rules. If t_address is a broadcast table, physical tables will be created on ds_0 and ds_1. The three tables, t_address, t_order and t_order_item will be created on ds_0 and ds_1 respectively.

2. Write operation

For the broadcast table t_address, each record written will also be written to the t_address tables of ds_0 and ds_1.

The tables t_order and t_order_item of the slave library are written on the table in the corresponding instance according to the slave library field and routing policy.

3. Read operation

Query order is routed to the corresponding Aurora instance according to the routing rules of the slave library .

Query Address. Since address is a broadcast table, an instance of address will be randomly selected and queried from the nodes used.

As shown in the figure below, perform the join query operations to order and order_item under the binding table, and the physical shard is precisely located based on the binding relationship.

3.5 Verifying sharding function

3.5.1 Setting up the database environment

As shown in the figure below, create two instances on Aurora: ds_0 and ds_1

When the sharding-spring-boot-jpa-example project is started, physical tables t_order_01, t_order_02, t_order_item_01,and t_order_item_02 and global table t_address will be created on two Aurora instances.

3.5.2 Configuring Sharding-JDBC

application.properties springboot master profile description

# Jpa automatically creates and drops data tables based on entities
spring.jpa.properties.hibernate.hbm2ddl.auto=create
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
spring.jpa.properties.hibernate.show_sql=true
# Activate sharding-databases-tables configuration items
#spring.profiles.active=sharding-databases
#spring.profiles.active=sharding-tables
spring.profiles.active=sharding-databases-tables
#spring.profiles.active=master-slave
#spring.profiles.active=sharding-master-slave

application-sharding-databases.properties sharding-jdbc profile description

spring.shardingsphere.datasource.names=ds_0,ds_1
# ds_0
spring.shardingsphere.datasource.ds_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_0.jdbc-url= 306/dev?useSSL=false&characterEncoding=utf-8
spring.shardingsphere.datasource.ds_0.username= 
spring.shardingsphere.datasource.ds_0.password=
spring.shardingsphere.datasource.ds_0.max-active=16
# ds_1
spring.shardingsphere.datasource.ds_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_1.jdbc-url= 
spring.shardingsphere.datasource.ds_1.username= 
spring.shardingsphere.datasource.ds_1.password=
spring.shardingsphere.datasource.ds_1.max-active=16
# default library splitting policy
spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=ds_$->{user_id % 2}
spring.shardingsphere.sharding.binding-tables=t_order,t_order_item
spring.shardingsphere.sharding.broadcast-tables=t_address
# Tables that do not meet the library splitting policy are placed on ds_0
spring.shardingsphere.sharding.default-data-source-name=ds_0
# t_order table splitting policy
spring.shardingsphere.sharding.tables.t_order.actual-data-nodes=ds_$->{0..1}.t_order_$->{0..1}
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.sharding-column=order_id
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.algorithm-expression=t_order_$->{order_id % 2}
spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id
spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=123
# t_order_item table splitting policy
spring.shardingsphere.sharding.tables.t_order_item.actual-data-nodes=ds_$->{0..1}.t_order_item_$->{0..1}
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.sharding-column=order_id
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.algorithm-expression=t_order_item_$->{order_id % 2}
spring.shardingsphere.sharding.tables.t_order_item.key-generator.column=order_item_id
spring.shardingsphere.sharding.tables.t_order_item.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order_item.key-generator.props.worker.id=123
# sharding-jdbc mdoe
spring.shardingsphere.mode.type=Memory
# start shardingsphere log
spring.shardingsphere.props.sql.show=true

 

3.5.3 Test and verification process description

1. DDL operation

JPA automatically creates tables for testing. When Sharding-JDBC’s sharding and routing rules are configured, the client executes DDL, and Sharding-JDBC will automatically create corresponding tables according to table splitting rules. If t_address is a broadcast table, t_address will be created on both ds_0 and ds_1. The three tables, t_address, t_order and t_order_item will be created on ds_0 and ds_1 respectively.

2. Write operation

For the broadcast table t_address, each record written will also be written to the t_address tables of ds_0 and ds_1.

The tables t_order and t_order_item of the sub-library are written to the table on the corresponding instance according to the slave library field and routing policy.

3. Read operation

The read operation is similar to the library split function verification described in section2.4.3.

3.6 Testing database sharding, table sharding and read/write splitting function

3.6.1 Setting up the database environment

The following figure shows the physical table of the created database instance.

3.6.2 Configuring Sharding-JDBC

application.properties spring boot master profile description

# Jpa automatically creates and drops data tables based on entities
spring.jpa.properties.hibernate.hbm2ddl.auto=create
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
spring.jpa.properties.hibernate.show_sql=true

# activate sharding-databases-tables configuration items
#spring.profiles.active=sharding-databases
#spring.profiles.active=sharding-tables
#spring.profiles.active=sharding-databases-tables
#spring.profiles.active=master-slave
spring.profiles.active=sharding-master-slave

application-sharding-master-slave.properties sharding-jdbc profile description

The url, name and password of the database need to be changed to your own database parameters.

spring.shardingsphere.datasource.names=ds_master_0,ds_master_1,ds_master_0_slave_0,ds_master_0_slave_1,ds_master_1_slave_0,ds_master_1_slave_1
spring.shardingsphere.datasource.ds_master_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_0.jdbc-url= spring.shardingsphere.datasource.ds_master_0.username= 
spring.shardingsphere.datasource.ds_master_0.password=
spring.shardingsphere.datasource.ds_master_0.max-active=16
spring.shardingsphere.datasource.ds_master_0_slave_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_0_slave_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_0_slave_0.jdbc-url= spring.shardingsphere.datasource.ds_master_0_slave_0.username= 
spring.shardingsphere.datasource.ds_master_0_slave_0.password=
spring.shardingsphere.datasource.ds_master_0_slave_0.max-active=16
spring.shardingsphere.datasource.ds_master_0_slave_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_0_slave_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_0_slave_1.jdbc-url= spring.shardingsphere.datasource.ds_master_0_slave_1.username= 
spring.shardingsphere.datasource.ds_master_0_slave_1.password=
spring.shardingsphere.datasource.ds_master_0_slave_1.max-active=16
spring.shardingsphere.datasource.ds_master_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_1.jdbc-url= 
spring.shardingsphere.datasource.ds_master_1.username= 
spring.shardingsphere.datasource.ds_master_1.password=
spring.shardingsphere.datasource.ds_master_1.max-active=16
spring.shardingsphere.datasource.ds_master_1_slave_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_1_slave_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_1_slave_0.jdbc-url=
spring.shardingsphere.datasource.ds_master_1_slave_0.username=
spring.shardingsphere.datasource.ds_master_1_slave_0.password=
spring.shardingsphere.datasource.ds_master_1_slave_0.max-active=16
spring.shardingsphere.datasource.ds_master_1_slave_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_1_slave_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_1_slave_1.jdbc-url= spring.shardingsphere.datasource.ds_master_1_slave_1.username=admin
spring.shardingsphere.datasource.ds_master_1_slave_1.password=
spring.shardingsphere.datasource.ds_master_1_slave_1.max-active=16
spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=ds_$->{user_id % 2}
spring.shardingsphere.sharding.binding-tables=t_order,t_order_item
spring.shardingsphere.sharding.broadcast-tables=t_address
spring.shardingsphere.sharding.default-data-source-name=ds_master_0
spring.shardingsphere.sharding.tables.t_order.actual-data-nodes=ds_$->{0..1}.t_order_$->{0..1}
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.sharding-column=order_id
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.algorithm-expression=t_order_$->{order_id % 2}
spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id
spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=123
spring.shardingsphere.sharding.tables.t_order_item.actual-data-nodes=ds_$->{0..1}.t_order_item_$->{0..1}
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.sharding-column=order_id
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.algorithm-expression=t_order_item_$->{order_id % 2}
spring.shardingsphere.sharding.tables.t_order_item.key-generator.column=order_item_id
spring.shardingsphere.sharding.tables.t_order_item.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order_item.key-generator.props.worker.id=123
# master/slave data source and slave data source configuration
spring.shardingsphere.sharding.master-slave-rules.ds_0.master-data-source-name=ds_master_0
spring.shardingsphere.sharding.master-slave-rules.ds_0.slave-data-source-names=ds_master_0_slave_0, ds_master_0_slave_1
spring.shardingsphere.sharding.master-slave-rules.ds_1.master-data-source-name=ds_master_1
spring.shardingsphere.sharding.master-slave-rules.ds_1.slave-data-source-names=ds_master_1_slave_0, ds_master_1_slave_1
# sharding-jdbc mode
spring.shardingsphere.mode.type=Memory
# start shardingsphere log
spring.shardingsphere.props.sql.show=true

 

3.6.3 Test and verification process description

1. DDL operation

JPA automatically creates tables for testing. When Sharding-JDBC’s library splitting and routing rules are configured, the client executes DDL, and Sharding-JDBC will automatically create corresponding tables according to table splitting rules. If t_address is a broadcast table, t_address will be created on both ds_0 and ds_1. The three tables, t_address, t_order and t_order_item will be created on ds_0 and ds_1 respectively.

2. Write operation

For the broadcast table t_address, each record written will also be written to the t_address tables of ds_0 and ds_1.

The tables t_order and t_order_item of the slave library are written to the table on the corresponding instance according to the slave library field and routing policy.

3. Read operation

The join query operations on order and order_item under the binding table are shown below.

3. Conclusion

As an open source product focusing on database enhancement, ShardingSphere is pretty good in terms of its community activitiy, product maturity and documentation richness.

Among its products, ShardingSphere-JDBC is a sharding solution based on the client-side, which supports all sharding scenarios. And there’s no need to introduce an intermediate layer like Proxy, so the complexity of operation and maintenance is reduced. Its latency is theoretically lower than Proxy due to the lack of intermediate layer. In addition, ShardingSphere-JDBC can support a variety of relational databases based on SQL standards such as MySQL/PostgreSQL/Oracle/SQL Server, etc.

However, due to the integration of Sharding-JDBC with the application program, it only supports Java language for now, and is strongly dependent on the application programs. Nevertheless, Sharding-JDBC separates all sharding configuration from the application program, which brings relatively small changes when switching to other middleware.

In conclusion, Sharding-JDBC is a good choice if you use a Java-based system and have to to interconnect with different relational databases — and don’t want to bother with introducing an intermediate layer.

Author

Sun Jinhua

A senior solution architect at AWS, Sun is responsible for the design and consult on cloud architecture. for providing customers with cloud-related design and consulting services. Before joining AWS, he ran his own business, specializing in building e-commerce platforms and designing the overall architecture for e-commerce platforms of automotive companies. He worked in a global leading communication equipment company as a senior engineer, responsible for the development and architecture design of multiple subsystems of LTE equipment system. He has rich experience in architecture design with high concurrency and high availability system, microservice architecture design, database, middleware, IOT etc.

Hibernate Naming Under Spring Boot and Embedded JPA Entities

Hibernate Naming Under Spring Boot and Embedded JPA Entities

I encountered a little gotcha when setting up multiple Embedded JPA entities while developing a module for my Spring Core online course. In this video, I show you a simple solution to support multiple Embedded JPA Entities without additional JPA annotations to customize the default column names.

You can learn more about my courses on the Spring Framework here: http://bit.ly/1RPhI2A

#spring boot #spring #jpa #hibernate

Rust  Language

Rust Language

1656924529

Macros in Rust - Everything You Need To Know

Macros in Rust - Everything You Need To Know

Ever wondered what the bang ("!") after "println" means? Not anymore! I will show you exactly how macros work, how to use them, and how to write your own macros.
This is the perfect talk for you if you are using macros, but you always wanted to know how they work and how to implement them yourself.


macro_rules!

Rust provides a powerful macro system that allows metaprogramming. As you've seen in previous chapters, macros look like functions, except that their name ends with a bang !, but instead of generating a function call, macros are expanded into source code that gets compiled with the rest of the program. However, unlike macros in C and other languages, Rust macros are expanded into abstract syntax trees, rather than string preprocessing, so you don't get unexpected precedence bugs.

Macros are created using the macro_rules! macro.

// This is a simple macro named `say_hello`.
macro_rules! say_hello {
    // `()` indicates that the macro takes no argument.
    () => {
        // The macro will expand into the contents of this block.
        println!("Hello!");
    };
}

fn main() {
    // This call will expand into `println!("Hello");`
    say_hello!()
}

So why are macros useful?

Don't repeat yourself. There are many cases where you may need similar functionality in multiple places but with different types. Often, writing a macro is a useful way to avoid repeating code. (More on this later)

Domain-specific languages. Macros allow you to define special syntax for a specific purpose. (More on this later)

Variadic interfaces. Sometimes you want to define an interface that takes a variable number of arguments. An example is println! which could take any number of arguments, depending on the format string!. (More on this later)


Macros

We’ve used macros like println! throughout this book, but we haven’t fully explored what a macro is and how it works. The term macro refers to a family of features in Rust: declarative macros with macro_rules! and three kinds of procedural macros:

  • Custom #[derive] macros that specify code added with the derive attribute used on structs and enums
  • Attribute-like macros that define custom attributes usable on any item
  • Function-like macros that look like function calls but operate on the tokens specified as their argument

We’ll talk about each of these in turn, but first, let’s look at why we even need macros when we already have functions.

The Difference Between Macros and Functions

Fundamentally, macros are a way of writing code that writes other code, which is known as metaprogramming. In Appendix C, we discuss the derive attribute, which generates an implementation of various traits for you. We’ve also used the println! and vec! macros throughout the book. All of these macros expand to produce more code than the code you’ve written manually.

Metaprogramming is useful for reducing the amount of code you have to write and maintain, which is also one of the roles of functions. However, macros have some additional powers that functions don’t.

A function signature must declare the number and type of parameters the function has. Macros, on the other hand, can take a variable number of parameters: we can call println!("hello") with one argument or println!("hello {}", name) with two arguments. Also, macros are expanded before the compiler interprets the meaning of the code, so a macro can, for example, implement a trait on a given type. A function can’t, because it gets called at runtime and a trait needs to be implemented at compile time.

The downside to implementing a macro instead of a function is that macro definitions are more complex than function definitions because you’re writing Rust code that writes Rust code. Due to this indirection, macro definitions are generally more difficult to read, understand, and maintain than function definitions.

Another important difference between macros and functions is that you must define macros or bring them into scope before you call them in a file, as opposed to functions you can define anywhere and call anywhere.

Declarative Macros with macro_rules! for General Metaprogramming

The most widely used form of macros in Rust is declarative macros. These are also sometimes referred to as “macros by example,” “macro_rules! macros,” or just plain “macros.” At their core, declarative macros allow you to write something similar to a Rust match expression. As discussed in Chapter 6, match expressions are control structures that take an expression, compare the resulting value of the expression to patterns, and then run the code associated with the matching pattern. Macros also compare a value to patterns that are associated with particular code: in this situation, the value is the literal Rust source code passed to the macro; the patterns are compared with the structure of that source code; and the code associated with each pattern, when matched, replaces the code passed to the macro. This all happens during compilation.

To define a macro, you use the macro_rules! construct. Let’s explore how to use macro_rules! by looking at how the vec! macro is defined. Chapter 8 covered how we can use the vec! macro to create a new vector with particular values. For example, the following macro creates a new vector containing three integers:

let v: Vec<u32> = vec![1, 2, 3];

We could also use the vec! macro to make a vector of two integers or a vector of five string slices. We wouldn’t be able to use a function to do the same because we wouldn’t know the number or type of values up front.

Listing 19-28 shows a slightly simplified definition of the vec! macro.

Filename: src/lib.rs

#[macro_export]
macro_rules! vec {
    ( $( $x:expr ),* ) => {
        {
            let mut temp_vec = Vec::new();
            $(
                temp_vec.push($x);
            )*
            temp_vec
        }
    };
}

Listing 19-28: A simplified version of the vec! macro definition

Note: The actual definition of the vec! macro in the standard library includes code to preallocate the correct amount of memory up front. That code is an optimization that we don’t include here to make the example simpler.

The #[macro_export] annotation indicates that this macro should be made available whenever the crate in which the macro is defined is brought into scope. Without this annotation, the macro can’t be brought into scope.

We then start the macro definition with macro_rules! and the name of the macro we’re defining without the exclamation mark. The name, in this case vec, is followed by curly brackets denoting the body of the macro definition.

The structure in the vec! body is similar to the structure of a match expression. Here we have one arm with the pattern ( $( $x:expr ),* ), followed by => and the block of code associated with this pattern. If the pattern matches, the associated block of code will be emitted. Given that this is the only pattern in this macro, there is only one valid way to match; any other pattern will result in an error. More complex macros will have more than one arm.

Valid pattern syntax in macro definitions is different than the pattern syntax covered in Chapter 18 because macro patterns are matched against Rust code structure rather than values. Let’s walk through what the pattern pieces in Listing 19-28 mean; for the full macro pattern syntax, see the reference.

First, a set of parentheses encompasses the whole pattern. A dollar sign ($) is next, followed by a set of parentheses that captures values that match the pattern within the parentheses for use in the replacement code. Within $() is $x:expr, which matches any Rust expression and gives the expression the name $x.

The comma following $() indicates that a literal comma separator character could optionally appear after the code that matches the code in $(). The * specifies that the pattern matches zero or more of whatever precedes the *.

When we call this macro with vec![1, 2, 3];, the $x pattern matches three times with the three expressions 1, 2, and 3.

Now let’s look at the pattern in the body of the code associated with this arm: temp_vec.push() within $()* is generated for each part that matches $() in the pattern zero or more times depending on how many times the pattern matches. The $x is replaced with each expression matched. When we call this macro with vec![1, 2, 3];, the code generated that replaces this macro call will be the following:

{
    let mut temp_vec = Vec::new();
    temp_vec.push(1);
    temp_vec.push(2);
    temp_vec.push(3);
    temp_vec
}

We’ve defined a macro that can take any number of arguments of any type and can generate code to create a vector containing the specified elements.

There are some strange edge cases with macro_rules!. In the future, Rust will have a second kind of declarative macro that will work in a similar fashion but fix some of these edge cases. After that update, macro_rules! will be effectively deprecated. With this in mind, as well as the fact that most Rust programmers will use macros more than write macros, we won’t discuss macro_rules! any further. To learn more about how to write macros, consult the online documentation or other resources, such as “The Little Book of Rust Macros” started by Daniel Keep and continued by Lukas Wirth.

Procedural Macros for Generating Code from Attributes

The second form of macros is procedural macros, which act more like functions (and are a type of procedure). Procedural macros accept some code as an input, operate on that code, and produce some code as an output rather than matching against patterns and replacing the code with other code as declarative macros do.

The three kinds of procedural macros (custom derive, attribute-like, and function-like) all work in a similar fashion.

When creating procedural macros, the definitions must reside in their own crate with a special crate type. This is for complex technical reasons that we hope to eliminate in the future. Defining procedural macros looks like the code in Listing 19-29, where some_attribute is a placeholder for using a specific macro variety.

Filename: src/lib.rs

use proc_macro;

#[some_attribute]
pub fn some_name(input: TokenStream) -> TokenStream {
}

Listing 19-29: An example of defining a procedural macro

The function that defines a procedural macro takes a TokenStream as an input and produces a TokenStream as an output. The TokenStream type is defined by the proc_macro crate that is included with Rust and represents a sequence of tokens. This is the core of the macro: the source code that the macro is operating on makes up the input TokenStream, and the code the macro produces is the output TokenStream. The function also has an attribute attached to it that specifies which kind of procedural macro we’re creating. We can have multiple kinds of procedural macros in the same crate.

Let’s look at the different kinds of procedural macros. We’ll start with a custom derive macro and then explain the small dissimilarities that make the other forms different.

How to Write a Custom derive Macro

Let’s create a crate named hello_macro that defines a trait named HelloMacro with one associated function named hello_macro. Rather than making our crate users implement the HelloMacro trait for each of their types, we’ll provide a procedural macro so users can annotate their type with #[derive(HelloMacro)] to get a default implementation of the hello_macro function. The default implementation will print Hello, Macro! My name is TypeName! where TypeName is the name of the type on which this trait has been defined. In other words, we’ll write a crate that enables another programmer to write code like Listing 19-30 using our crate.

Filename: src/main.rs

use hello_macro::HelloMacro;
use hello_macro_derive::HelloMacro;

#[derive(HelloMacro)]
struct Pancakes;

fn main() {
    Pancakes::hello_macro();
}

Listing 19-30: The code a user of our crate will be able to write when using our procedural macro

This code will print Hello, Macro! My name is Pancakes! when we’re done. The first step is to make a new library crate, like this:

$ cargo new hello_macro --lib

Next, we’ll define the HelloMacro trait and its associated function:

Filename: src/lib.rs

pub trait HelloMacro {
    fn hello_macro();
}

We have a trait and its function. At this point, our crate user could implement the trait to achieve the desired functionality, like so:

use hello_macro::HelloMacro;

struct Pancakes;

impl HelloMacro for Pancakes {
    fn hello_macro() {
        println!("Hello, Macro! My name is Pancakes!");
    }
}

fn main() {
    Pancakes::hello_macro();
}

However, they would need to write the implementation block for each type they wanted to use with hello_macro; we want to spare them from having to do this work.

Additionally, we can’t yet provide the hello_macro function with default implementation that will print the name of the type the trait is implemented on: Rust doesn’t have reflection capabilities, so it can’t look up the type’s name at runtime. We need a macro to generate code at compile time.

The next step is to define the procedural macro. At the time of this writing, procedural macros need to be in their own crate. Eventually, this restriction might be lifted. The convention for structuring crates and macro crates is as follows: for a crate named foo, a custom derive procedural macro crate is called foo_derive. Let’s start a new crate called hello_macro_derive inside our hello_macro project:

$ cargo new hello_macro_derive --lib

Our two crates are tightly related, so we create the procedural macro crate within the directory of our hello_macro crate. If we change the trait definition in hello_macro, we’ll have to change the implementation of the procedural macro in hello_macro_derive as well. The two crates will need to be published separately, and programmers using these crates will need to add both as dependencies and bring them both into scope. We could instead have the hello_macro crate use hello_macro_derive as a dependency and re-export the procedural macro code. However, the way we’ve structured the project makes it possible for programmers to use hello_macro even if they don’t want the derive functionality.

We need to declare the hello_macro_derive crate as a procedural macro crate. We’ll also need functionality from the syn and quote crates, as you’ll see in a moment, so we need to add them as dependencies. Add the following to the Cargo.toml file for hello_macro_derive:

Filename: hello_macro_derive/Cargo.toml

[lib]
proc-macro = true

[dependencies]
syn = "1.0"
quote = "1.0"

To start defining the procedural macro, place the code in Listing 19-31 into your src/lib.rs file for the hello_macro_derive crate. Note that this code won’t compile until we add a definition for the impl_hello_macro function.

Filename: hello_macro_derive/src/lib.rs

use proc_macro::TokenStream;
use quote::quote;
use syn;

#[proc_macro_derive(HelloMacro)]
pub fn hello_macro_derive(input: TokenStream) -> TokenStream {
    // Construct a representation of Rust code as a syntax tree
    // that we can manipulate
    let ast = syn::parse(input).unwrap();

    // Build the trait implementation
    impl_hello_macro(&ast)
}

Listing 19-31: Code that most procedural macro crates will require in order to process Rust code

Notice that we’ve split the code into the hello_macro_derive function, which is responsible for parsing the TokenStream, and the impl_hello_macro function, which is responsible for transforming the syntax tree: this makes writing a procedural macro more convenient. The code in the outer function (hello_macro_derive in this case) will be the same for almost every procedural macro crate you see or create. The code you specify in the body of the inner function (impl_hello_macro in this case) will be different depending on your procedural macro’s purpose.

We’ve introduced three new crates: proc_macro, syn, and quote. The proc_macro crate comes with Rust, so we didn’t need to add that to the dependencies in Cargo.toml. The proc_macro crate is the compiler’s API that allows us to read and manipulate Rust code from our code.

The syn crate parses Rust code from a string into a data structure that we can perform operations on. The quote crate turns syn data structures back into Rust code. These crates make it much simpler to parse any sort of Rust code we might want to handle: writing a full parser for Rust code is no simple task.

The hello_macro_derive function will be called when a user of our library specifies #[derive(HelloMacro)] on a type. This is possible because we’ve annotated the hello_macro_derive function here with proc_macro_derive and specified the name, HelloMacro, which matches our trait name; this is the convention most procedural macros follow.

The hello_macro_derive function first converts the input from a TokenStream to a data structure that we can then interpret and perform operations on. This is where syn comes into play. The parse function in syn takes a TokenStream and returns a DeriveInput struct representing the parsed Rust code. Listing 19-32 shows the relevant parts of the DeriveInput struct we get from parsing the struct Pancakes; string:

DeriveInput {
    // --snip--

    ident: Ident {
        ident: "Pancakes",
        span: #0 bytes(95..103)
    },
    data: Struct(
        DataStruct {
            struct_token: Struct,
            fields: Unit,
            semi_token: Some(
                Semi
            )
        }
    )
}

Listing 19-32: The DeriveInput instance we get when parsing the code that has the macro’s attribute in Listing 19-30

The fields of this struct show that the Rust code we’ve parsed is a unit struct with the ident (identifier, meaning the name) of Pancakes. There are more fields on this struct for describing all sorts of Rust code; check the syn documentation for DeriveInput for more information.

Soon we’ll define the impl_hello_macro function, which is where we’ll build the new Rust code we want to include. But before we do, note that the output for our derive macro is also a TokenStream. The returned TokenStream is added to the code that our crate users write, so when they compile their crate, they’ll get the extra functionality that we provide in the modified TokenStream.

You might have noticed that we’re calling unwrap to cause the hello_macro_derive function to panic if the call to the syn::parse function fails here. It’s necessary for our procedural macro to panic on errors because proc_macro_derive functions must return TokenStream rather than Result to conform to the procedural macro API. We’ve simplified this example by using unwrap; in production code, you should provide more specific error messages about what went wrong by using panic! or expect.

Now that we have the code to turn the annotated Rust code from a TokenStream into a DeriveInput instance, let’s generate the code that implements the HelloMacro trait on the annotated type, as shown in Listing 19-33.

Filename: hello_macro_derive/src/lib.rs

fn impl_hello_macro(ast: &syn::DeriveInput) -> TokenStream {
    let name = &ast.ident;
    let gen = quote! {
        impl HelloMacro for #name {
            fn hello_macro() {
                println!("Hello, Macro! My name is {}!", stringify!(#name));
            }
        }
    };
    gen.into()
}

Listing 19-33: Implementing the HelloMacro trait using the parsed Rust code

We get an Ident struct instance containing the name (identifier) of the annotated type using ast.ident. The struct in Listing 19-32 shows that when we run the impl_hello_macro function on the code in Listing 19-30, the ident we get will have the ident field with a value of "Pancakes". Thus, the name variable in Listing 19-33 will contain an Ident struct instance that, when printed, will be the string "Pancakes", the name of the struct in Listing 19-30.

The quote! macro lets us define the Rust code that we want to return. The compiler expects something different to the direct result of the quote! macro’s execution, so we need to convert it to a TokenStream. We do this by calling the into method, which consumes this intermediate representation and returns a value of the required TokenStream type.

The quote! macro also provides some very cool templating mechanics: we can enter #name, and quote! will replace it with the value in the variable name. You can even do some repetition similar to the way regular macros work. Check out the quote crate’s docs for a thorough introduction.

We want our procedural macro to generate an implementation of our HelloMacro trait for the type the user annotated, which we can get by using #name. The trait implementation has one function, hello_macro, whose body contains the functionality we want to provide: printing Hello, Macro! My name is and then the name of the annotated type.

The stringify! macro used here is built into Rust. It takes a Rust expression, such as 1 + 2, and at compile time turns the expression into a string literal, such as "1 + 2". This is different than format! or println!, macros which evaluate the expression and then turn the result into a String. There is a possibility that the #name input might be an expression to print literally, so we use stringify!. Using stringify! also saves an allocation by converting #name to a string literal at compile time.

At this point, cargo build should complete successfully in both hello_macro and hello_macro_derive. Let’s hook up these crates to the code in Listing 19-30 to see the procedural macro in action! Create a new binary project in your projects directory using cargo new pancakes. We need to add hello_macro and hello_macro_derive as dependencies in the pancakes crate’s Cargo.toml. If you’re publishing your versions of hello_macro and hello_macro_derive to crates.io, they would be regular dependencies; if not, you can specify them as path dependencies as follows:

hello_macro = { path = "../hello_macro" }
hello_macro_derive = { path = "../hello_macro/hello_macro_derive" }

Put the code in Listing 19-30 into src/main.rs, and run cargo run: it should print Hello, Macro! My name is Pancakes! The implementation of the HelloMacro trait from the procedural macro was included without the pancakes crate needing to implement it; the #[derive(HelloMacro)] added the trait implementation.

Next, let’s explore how the other kinds of procedural macros differ from custom derive macros.

Attribute-like macros

Attribute-like macros are similar to custom derive macros, but instead of generating code for the derive attribute, they allow you to create new attributes. They’re also more flexible: derive only works for structs and enums; attributes can be applied to other items as well, such as functions. Here’s an example of using an attribute-like macro: say you have an attribute named route that annotates functions when using a web application framework:

#[route(GET, "/")]
fn index() {

This #[route] attribute would be defined by the framework as a procedural macro. The signature of the macro definition function would look like this:

#[proc_macro_attribute]
pub fn route(attr: TokenStream, item: TokenStream) -> TokenStream {

Here, we have two parameters of type TokenStream. The first is for the contents of the attribute: the GET, "/" part. The second is the body of the item the attribute is attached to: in this case, fn index() {} and the rest of the function’s body.

Other than that, attribute-like macros work the same way as custom derive macros: you create a crate with the proc-macro crate type and implement a function that generates the code you want!

Function-like macros

Function-like macros define macros that look like function calls. Similarly to macro_rules! macros, they’re more flexible than functions; for example, they can take an unknown number of arguments. However, macro_rules! macros can be defined only using the match-like syntax we discussed in the section “Declarative Macros with macro_rules! for General Metaprogramming” earlier. Function-like macros take a TokenStream parameter and their definition manipulates that TokenStream using Rust code as the other two types of procedural macros do. An example of a function-like macro is an sql! macro that might be called like so:

let sql = sql!(SELECT * FROM posts WHERE id=1);

This macro would parse the SQL statement inside it and check that it’s syntactically correct, which is much more complex processing than a macro_rules! macro can do. The sql! macro would be defined like this:

#[proc_macro]
pub fn sql(input: TokenStream) -> TokenStream {

This definition is similar to the custom derive macro’s signature: we receive the tokens that are inside the parentheses and return the code we wanted to generate.

Summary

Whew! Now you have some Rust features in your toolbox that you won’t use often, but you’ll know they’re available in very particular circumstances. We’ve introduced several complex topics so that when you encounter them in error message suggestions or in other peoples’ code, you’ll be able to recognize these concepts and syntax. Use this chapter as a reference to guide you to solutions.


Macros in Rust: A tutorial with examples

In this tutorial, we’ll cover everything you need to know about Rust macros, including an introduction to macros in Rust and a demonstration of how to use Rust macros with examples.

What are Rust macros?

Rust has excellent support for macros. Macros enable you to write code that writes other code, which is known as metaprogramming.

Macros provide functionality similar to functions but without the runtime cost. There is some compile-time cost, however, since macros are expanded during compile time.

Rust macros are very different from macros in C. Rust macros are applied to the token tree whereas C macros are text substitution.

Types of macros in Rust

Rust has two types of macros:

  1. Declarative macros enable you to write something similar to a match expression that operates on the Rust code you provide as arguments. It uses the code you provide to generate code that replaces the macro invocation
  2. Procedural macros allow you to operate on the abstract syntax tree (AST) of the Rust code it is given. A proc macro is a function from a TokenStream (or two) to another TokenStream, where the output replaces the macro invocation

Let’s zoom in on both declarative and procedural macros and explore some examples of how to use macros in Rust.

Declarative macros in Rust

These macros are declared using macro_rules!. Declarative macros are a bit less powerful but provide an easy to use interface for creating macros to remove duplicate code. One of the common declarative macro is println!. Declarative macros provide a match like an interface where on match the macro is replaced with code inside the matched arm.

Creating declarative macros

// use macro_rules! <name of macro>{<Body>}
macro_rules! add{
 // macth like arm for macro
    ($a:expr,$b:expr)=>{
 // macro expand to this code
        {
// $a and $b will be templated using the value/variable provided to macro
            $a+$b
        }
    }
}

fn main(){
 // call to macro, $a=1 and $b=2
    add!(1,2);
}

This code creates a macro to add two numbers. [macro_rules!] are used with the name of the macro, add, and the body of the macro.

The macro doesn’t add two numbers, it just replaces itself with the code to add two numbers. Each arm of the macro takes an argument for functions and multiple types can be assigned to arguments. If the add function can also take a single argument, we add another arm.

macro_rules! add{
 // first arm match add!(1,2), add!(2,3) etc
    ($a:expr,$b:expr)=>{
        {
            $a+$b
        }
    };
// Second arm macth add!(1), add!(2) etc
    ($a:expr)=>{
        {
            $a
        }
    }
}

fn main(){
// call the macro
    let x=0;
    add!(1,2);
    add!(x);
}

There can be multiple branches in a single macro expanding to different code based on different arguments. Each branch can take multiple arguments, starting with the $ sign and followed by a token type:

  • item — an item, like a function, struct, module, etc.
  • block — a block (i.e. a block of statements and/or an expression, surrounded by braces)
  • stmt — a statement
  • pat — a pattern
  • expr — an expression
  • ty — a type
  • ident — an identifier
  • path — a path (e.g., foo, ::std::mem::replace, transmute::<_, int>, …)
  • meta — a meta item; the things that go inside #[...] and #![...] attributes
  • tt — a single token tree
  • vis — a possibly empty Visibility qualifier

In the example, we use the $typ argument with token type ty as a datatype like u8, u16, etc. This macro converts to a particular type before adding the numbers.

macro_rules! add_as{
// using a ty token type for macthing datatypes passed to maccro
    ($a:expr,$b:expr,$typ:ty)=>{
        $a as $typ + $b as $typ
    }
}

fn main(){
    println!("{}",add_as!(0,2,u8));
}

Rust macros also support taking a nonfixed number of arguments. The operators are very similar to the regular expression. * is used for zero or more token types and + for zero or one argument.

macro_rules! add_as{
    (
  // repeated block
  $($a:expr)
 // seperator
   ,
// zero or more
   *
   )=>{
       { 
   // to handle the case without any arguments
   0
   // block to be repeated
   $(+$a)*
     }
    }
}

fn main(){
    println!("{}",add_as!(1,2,3,4)); // => println!("{}",{0+1+2+3+4})
}

The token type that repeats is enclosed in $(), followed by a separator and a * or a +, indicating the number of times the token will repeat. The separator is used to distinguish the tokens from each other. The $() block followed by * or + is used to indicate the repeating block of code. In the above example, +$a is a repeating code.

If you look closely, you’ll notice an additional zero is added to the code to make the syntax valid. To remove this zero and make the add expression the same as the argument, we need to create a new macro known as TT muncher.

macro_rules! add{
 // first arm in case of single argument and last remaining variable/number
    ($a:expr)=>{
        $a
    };
// second arm in case of two arument are passed and stop recursion in case of odd number ofarguments
    ($a:expr,$b:expr)=>{
        {
            $a+$b
        }
    };
// add the number and the result of remaining arguments 
    ($a:expr,$($b:tt)*)=>{
       {
           $a+add!($($b)*)
       }
    }
}

fn main(){
    println!("{}",add!(1,2,3,4));
}

The TT muncher processes each token separately in a recursive fashion. It’s easier to process a single token at a time. The macro has three arms:

  1. The first arms handle the case if a single argument is passed
  2. The second one handles the case if two arguments are passed
  3. The third arm calls the add macro again with the rest of the arguments

The macro arguments don’t need to be comma-separated. Multiple tokens can be used with different token types. For example, brackets can be used with the ident token type. The Rust compiler takes the matched arm and extracts the variable from the argument string.

macro_rules! ok_or_return{
// match something(q,r,t,6,7,8) etc
// compiler extracts function name and arguments. It injects the values in respective varibles.
    ($a:ident($($b:tt)*))=>{
       {
        match $a($($b)*) {
            Ok(value)=>value,
            Err(err)=>{
                return Err(err);
            }
        }
        }
    };
}

fn some_work(i:i64,j:i64)->Result<(i64,i64),String>{
    if i+j>2 {
        Ok((i,j))
    } else {
        Err("error".to_owned())
    }
}

fn main()->Result<(),String>{
    ok_or_return!(some_work(1,4));
    ok_or_return!(some_work(1,0));
    Ok(())
}

The ok_or_return macro returns the function if an operation returns Err or the value of an operation returns Ok. It takes a function as an argument and executes it inside a match statement. For arguments passed to function, it uses repetition.

Often, few macros need to be grouped into a single macro. In these cases, internal macro rules are used. It helps to manipulate the macro inputs and write clean TT munchers.

To create an internal rule, add the rule name starting with @ as the argument. Now the macro will never match for an internal rule until explicitly specified as an argument.

macro_rules! ok_or_return{
 // internal rule.
    (@error $a:ident,$($b:tt)* )=>{
        {
        match $a($($b)*) {
            Ok(value)=>value,
            Err(err)=>{
                return Err(err);
            }
        }
        }
    };

// public rule can be called by the user.
    ($a:ident($($b:tt)*))=>{
        ok_or_return!(@error $a,$($b)*)
    };
}

fn some_work(i:i64,j:i64)->Result<(i64,i64),String>{
    if i+j>2 {
        Ok((i,j))
    } else {
        Err("error".to_owned())
    }
}

fn main()->Result<(),String>{
   // instead of round bracket curly brackets can also be used
    ok_or_return!{some_work(1,4)};
    ok_or_return!(some_work(1,0));
    Ok(())
}

Advanced parsing in Rust with declarative macros

Macros sometimes perform tasks that require parsing of the Rust language itself.

Do put together all the concepts we’ve covered to this point, let’s create a macro that makes a struct public by suffixing the pub keyword.

First, we need to parse the Rust struct to get the name of the struct, fields of the struct, and field type.

Parsing the name and field of a struct

A struct declaration has a visibility keyword at the start (such as pub), followed by the struct keyword and then the name of the struct and the body of the struct.

Parsing Struct Name Field Diagram

 

macro_rules! make_public{
    (
  // use vis type for visibility keyword and ident for struct name
     $vis:vis struct $struct_name:ident { }
    ) => {
        {
            pub struct $struct_name{ }
        }
    }
}

The $vis will have visibility and $struct_name will have a struct name. To make a struct public, we just need to add the pub keyword and ignore the $vis variable.

Make Struct Public with Keyword

A struct may contain multiple fields with the same or different data types and visibility. The ty token type is used for the data type, vis for visibility, and ident for the field name. We’ll use * repetition for zero or more fields.

 macro_rules! make_public{
    (
     $vis:vis struct $struct_name:ident {
        $(
 // vis for field visibility, ident for field name and ty for field data type
        $field_vis:vis $field_name:ident : $field_type:ty
        ),*
    }
    ) => {
        {
            pub struct $struct_name{
                $(
                pub $field_name : $field_type,
                )*
            }
        }
    }
}

Parsing metadata from the struct

Often the struct has some metadata attached or procedural macros, such as #[derive(Debug)]. This metadata needs to stay intact. Parsing this metadata is done using the meta type.

macro_rules! make_public{
    (
     // meta data about struct
     $(#[$meta:meta])* 
     $vis:vis struct $struct_name:ident {
        $(
        // meta data about field
        $(#[$field_meta:meta])*
        $field_vis:vis $field_name:ident : $field_type:ty
        ),*$(,)+
    }
    ) => {
        { 
            $(#[$meta])*
            pub struct $struct_name{
                $(
                $(#[$field_meta:meta])*
                pub $field_name : $field_type,
                )*
            }
        }
    }
}

Our make_public macro is ready now. To see how make_public works, let’s use Rust Playground to expand the macro to the actual code that is compiled.

macro_rules! make_public{
    (
     $(#[$meta:meta])* 
     $vis:vis struct $struct_name:ident {
        $(
        $(#[$field_meta:meta])*
        $field_vis:vis $field_name:ident : $field_type:ty
        ),*$(,)+
    }
    ) => {

            $(#[$meta])*
            pub struct $struct_name{
                $(
                $(#[$field_meta:meta])*
                pub $field_name : $field_type,
                )*
            }
    }
}

fn main(){
    make_public!{
        #[derive(Debug)]
        struct Name{
            n:i64,
            t:i64,
            g:i64,
        }
    }
}

The expanded code looks like this:

// some imports


macro_rules! make_public {
    ($ (#[$ meta : meta]) * $ vis : vis struct $ struct_name : ident
     {
         $
         ($ (#[$ field_meta : meta]) * $ field_vis : vis $ field_name : ident
          : $ field_type : ty), * $ (,) +
     }) =>
    {

            $ (#[$ meta]) * pub struct $ struct_name
            {
                $
                ($ (#[$ field_meta : meta]) * pub $ field_name : $
                 field_type,) *
            }
    }
}

fn main() {
        pub struct name {
            pub n: i64,
            pub t: i64,
            pub g: i64,
    }
}

Limitations of declarative macros

Declarative macros have a few limitations. Some are related to Rust macros themselves while others are more specific to declarative macros.

  • Lack of support for macros autocompletion and expansion
  • Debugging declarative macros is difficult
  • Limited modification capabilities
  • Larger binaries
  • Longer compile time (this applies to both declarative and procedural macros)

Procedural macros in Rust

Procedural macros are a more advanced version of macros. Procedural macros allow you to expand the existing syntax of Rust. It takes arbitrary input and returns valid Rust code.

Procedural macros are functions that take a TokenStream as input and return another Token Stream. Procedural macros manipulate the input TokenStream to produce an output stream.

There are three types of procedural macros:

  1. Attribute-like macros
  2. Derive macros
  3. Function-like macros

We’ll go into each procedural macro type in detail below.

Attribute-like macros

Attribute-like macros enable you to create a custom attribute that attaches itself to an item and allows manipulation of that item. It can also take arguments.

#[some_attribute_macro(some_argument)]
fn perform_task(){
// some code
}

In the above code, some_attribute_macros is an attribute macro. It manipulates the function perform_task.

To write an attribute-like macro, start by creating a project using cargo new macro-demo --lib. Once the project is ready, update the Cargo.toml to notify cargo the project will create procedural macros.

# Cargo.toml
[lib]
proc-macro = true

Now we are all set to venture into procedural macros.

Procedural macros are public functions that take TokenStream as input and return another TokenStream. To write a procedural macro, we need to write our parser to parse TokenStream. The Rust community has a very good crate, syn, for parsing TokenStream.

synprovides a ready-made parser for Rust syntax that can be used to parse TokenStream. You can also parse your syntax by combining low-level parsers providing syn.

Add syn and quote to Cargo.toml:

# Cargo.toml
[dependencies]
syn = {version="1.0.57",features=["full","fold"]}
quote = "1.0.8"

Now we can write an attribute-like a macro in lib.rs using the proc_macro crate provided by the compiler for writing procedural macros. A procedural macro crate cannot export anything else other than procedural macros and procedural macros defined in the crate can’t be used in the crate itself.

// lib.rs
extern crate proc_macro;
use proc_macro::{TokenStream};
use quote::{quote};

// using proc_macro_attribute to declare an attribute like procedural macro
#[proc_macro_attribute]
// _metadata is argument provided to macro call and _input is code to which attribute like macro attaches
pub fn my_custom_attribute(_metadata: TokenStream, _input: TokenStream) -> TokenStream {
    // returing a simple TokenStream for Struct
    TokenStream::from(quote!{struct H{}})
}

To test the macro we added, create an ingratiation test by creating a folder named tests and adding the file attribute_macro.rs in the folder. In this file, we can use our attribute-like macro for testing.

// tests/attribute_macro.rs

use macro_demo::*;

// macro converts struct S to struct H
#[my_custom_attribute]
struct S{}

#[test]
fn test_macro(){
// due to macro we have struct H in scope
    let demo=H{};
}

Run the above test using the cargo test command.

Now that we understand the basics of procedural macros, lets use syn for some advanced TokenStream manipulation and parsing.

To learn how syn is used for parsing and manipulation, let’s take an example from the syn GitHub repo. This example creates a Rust macro that trace variables when value changes.

First, we need to identify how our macro will manipulate the code it attaches.

#[trace_vars(a)]
fn do_something(){
  let a=9;
  a=6;
  a=0;
}

The trace_vars macro takes the name of the variable it needs to trace and injects a print statement each time the value of the input variable i.e a changes. It tracks the value of input variables.

First, parse the code to which the attribute-like macro attaches. syn provides an inbuilt parser for Rust function syntax. ItemFn will parse the function and throw an error if the syntax is invalid.

#[proc_macro_attribute]
pub fn trace_vars(_metadata: TokenStream, input: TokenStream) -> TokenStream {
// parsing rust function to easy to use struct
    let input_fn = parse_macro_input!(input as ItemFn);
    TokenStream::from(quote!{fn dummy(){}})
}

Now that we have the parsed input, let’s move to metadata. For metadata, no inbuilt parser will work, so we’ll have to write one ourselves using syn‘s parse module.

#[trace_vars(a,c,b)] // we need to parse a "," seperated list of tokens
// code

For syn to work, we need to implement the Parse trait provided by syn. Punctuated is used to create a vector of Indent separated by ,.

struct Args{
    vars:HashSet<Ident>
}

impl Parse for Args{
    fn parse(input: ParseStream) -> Result<Self> {
        // parses a,b,c, or a,b,c where a,b and c are Indent
        let vars = Punctuated::<Ident, Token![,]>::parse_terminated(input)?;
        Ok(Args {
            vars: vars.into_iter().collect(),
        })
    }
}

Once we implement the Parse trait, we can use parse_macro_input macro for parsing metadata.

#[proc_macro_attribute]
pub fn trace_vars(metadata: TokenStream, input: TokenStream) -> TokenStream {
    let input_fn = parse_macro_input!(input as ItemFn);
// using newly created struct Args
    let args= parse_macro_input!(metadata as Args);
    TokenStream::from(quote!{fn dummy(){}})
}

We will now modify the input_fn to add println! when the variable changes the value. To add this, we need to filter outlines that have an assignment and insert a print statement after that line.

impl Args {
    fn should_print_expr(&self, e: &Expr) -> bool {
        match *e {
            Expr::Path(ref e) => {
 // variable shouldn't start wiht ::
                if e.path.leading_colon.is_some() {
                    false
// should be a single variable like `x=8` not n::x=0 
                } else if e.path.segments.len() != 1 {
                    false
                } else {
// get the first part
                    let first = e.path.segments.first().unwrap();
// check if the variable name is in the Args.vars hashset
                    self.vars.contains(&first.ident) && first.arguments.is_empty()
                }
            }
            _ => false,
        }
    }

// used for checking if to print let i=0 etc or not
    fn should_print_pat(&self, p: &Pat) -> bool {
        match p {
// check if variable name is present in set
            Pat::Ident(ref p) => self.vars.contains(&p.ident),
            _ => false,
        }
    }

// manipulate tree to insert print statement
    fn assign_and_print(&mut self, left: Expr, op: &dyn ToTokens, right: Expr) -> Expr {
 // recurive call on right of the assigment statement
        let right = fold::fold_expr(self, right);
// returning manipulated sub-tree
        parse_quote!({
            #left #op #right;
            println!(concat!(stringify!(#left), " = {:?}"), #left);
        })
    }

// manipulating let statement
    fn let_and_print(&mut self, local: Local) -> Stmt {
        let Local { pat, init, .. } = local;
        let init = self.fold_expr(*init.unwrap().1);
// get the variable name of assigned variable
        let ident = match pat {
            Pat::Ident(ref p) => &p.ident,
            _ => unreachable!(),
        };
// new sub tree
        parse_quote! {
            let #pat = {
                #[allow(unused_mut)]
                let #pat = #init;
                println!(concat!(stringify!(#ident), " = {:?}"), #ident);
                #ident
            };
        }
    }
}

In the above example, the quote macro is used for templating and writing Rust. # is used for injecting the value of the variable.

Now we’ll do a DFS over input_fn and insert the print statement. syn provides a Fold trait that can be implemented for DFS over any Item. We just need to modify the trait methods that correspond with the token type we want to manipulate.

impl Fold for Args {
    fn fold_expr(&mut self, e: Expr) -> Expr {
        match e {
// for changing assignment like a=5
            Expr::Assign(e) => {
// check should print
                if self.should_print_expr(&e.left) {
                    self.assign_and_print(*e.left, &e.eq_token, *e.right)
                } else {
// continue with default travesal using default methods
                    Expr::Assign(fold::fold_expr_assign(self, e))
                }
            }
// for changing assigment and operation like a+=1
            Expr::AssignOp(e) => {
// check should print
                if self.should_print_expr(&e.left) {
                    self.assign_and_print(*e.left, &e.op, *e.right)
                } else {
// continue with default behaviour
                    Expr::AssignOp(fold::fold_expr_assign_op(self, e))
                }
            }
// continue with default behaviour for rest of expressions
            _ => fold::fold_expr(self, e),
        }
    }

// for let statements like let d=9
    fn fold_stmt(&mut self, s: Stmt) -> Stmt {
        match s {
            Stmt::Local(s) => {
                if s.init.is_some() && self.should_print_pat(&s.pat) {
                    self.let_and_print(s)
                } else {
                    Stmt::Local(fold::fold_local(self, s))
                }
            }
            _ => fold::fold_stmt(self, s),
        }
    }
}

The Fold trait is used to do a DFS of Item. It enables you to use different behavior for various token types.

Now we can use fold_item_fn to inject print statements in our parsed code.

#[proc_macro_attribute]
pub fn trace_var(args: TokenStream, input: TokenStream) -> TokenStream {
// parse the input
    let input = parse_macro_input!(input as ItemFn);
// parse the arguments
    let mut args = parse_macro_input!(args as Args);
// create the ouput
    let output = args.fold_item_fn(input);
// return the TokenStream
    TokenStream::from(quote!(#output))
}

This code example is from the syn examples repo, which is an excellent resource to learn about procedural macros.

Custom derive macros

Custom derive macros in Rust allow auto implement traits. These macros enable you to implement traits using #[derive(Trait)].

syn has excellent support for derive macros.

#[derive(Trait)]
struct MyStruct{}

To write a custom derive macro in Rust, we can use DeriveInput for parsing input to derive macro. We’ll also use the proc_macro_derive macro to define a custom derive macro.

#[proc_macro_derive(Trait)]
pub fn derive_trait(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
    let input = parse_macro_input!(input as DeriveInput);

    let name = input.ident;

    let expanded = quote! {
        impl Trait for #name {
            fn print(&self) -> usize {
                println!("{}","hello from #name")
           }
        }
    };

    proc_macro::TokenStream::from(expanded)
}

More advanced procedural macros can be written using syn. Check out this example from syn‘s repo.

Function-like macros

Function-like macros are similar to declarative macros in that they’re invoked with the macro invocation operator ! and look like function calls. They operate on the code that is inside the parentheses.

Here’s how to write a function-like macro in Rust:

#[proc_macro]
pub fn a_proc_macro(_input: TokenStream) -> TokenStream {
    TokenStream::from(quote!(
            fn anwser()->i32{
                5
            }
))
}

Function-like macros are executed not at runtime but at compile time. They can be used anywhere in Rust code. Function-like macros also take a TokenStream and return a TokenStream.

Advantages of using procedural macros include:

  • Better error handling using span
  • Better control over output
  • Community-built crates syn and quote
  • More powerful than declarative macros

Conclusion

In this Rust macros tutorial, we covered the basics of macros in Rust, defined declarative and procedural macros, and walked through how to write both types of macros using various syntax and community-built crates. We also outlined the advantages of using each type of Rust macro.

#rust #programming 

Rust  Language

Rust Language

1652510548

The Rust Programming Language - Macros

Rust Macros

We’ve used macros like println! throughout this book, but we haven’t fully explored what a macro is and how it works. The term macro refers to a family of features in Rust: declarative macros with macro_rules! and three kinds of procedural macros:

  • Custom #[derive] macros that specify code added with the derive attribute used on structs and enums
  • Attribute-like macros that define custom attributes usable on any item
  • Function-like macros that look like function calls but operate on the tokens specified as their argument

We’ll talk about each of these in turn, but first, let’s look at why we even need macros when we already have functions.

The Difference Between Macros and Functions

Fundamentally, macros are a way of writing code that writes other code, which is known as metaprogramming. In Appendix C, we discuss the derive attribute, which generates an implementation of various traits for you. We’ve also used the println! and vec! macros throughout the book. All of these macros expand to produce more code than the code you’ve written manually.

Metaprogramming is useful for reducing the amount of code you have to write and maintain, which is also one of the roles of functions. However, macros have some additional powers that functions don’t.

A function signature must declare the number and type of parameters the function has. Macros, on the other hand, can take a variable number of parameters: we can call println!("hello") with one argument or println!("hello {}", name) with two arguments. Also, macros are expanded before the compiler interprets the meaning of the code, so a macro can, for example, implement a trait on a given type. A function can’t, because it gets called at runtime and a trait needs to be implemented at compile time.

The downside to implementing a macro instead of a function is that macro definitions are more complex than function definitions because you’re writing Rust code that writes Rust code. Due to this indirection, macro definitions are generally more difficult to read, understand, and maintain than function definitions.

Another important difference between macros and functions is that you must define macros or bring them into scope before you call them in a file, as opposed to functions you can define anywhere and call anywhere.

Declarative Macros with macro_rules! for General Metaprogramming

The most widely used form of macros in Rust is declarative macros. These are also sometimes referred to as “macros by example,” “macro_rules! macros,” or just plain “macros.” At their core, declarative macros allow you to write something similar to a Rust match expression. As discussed in Chapter 6, match expressions are control structures that take an expression, compare the resulting value of the expression to patterns, and then run the code associated with the matching pattern. Macros also compare a value to patterns that are associated with particular code: in this situation, the value is the literal Rust source code passed to the macro; the patterns are compared with the structure of that source code; and the code associated with each pattern, when matched, replaces the code passed to the macro. This all happens during compilation.

To define a macro, you use the macro_rules! construct. Let’s explore how to use macro_rules! by looking at how the vec! macro is defined. Chapter 8 covered how we can use the vec! macro to create a new vector with particular values. For example, the following macro creates a new vector containing three integers:


#![allow(unused)]
fn main() {
let v: Vec<u32> = vec![1, 2, 3];
}

We could also use the vec! macro to make a vector of two integers or a vector of five string slices. We wouldn’t be able to use a function to do the same because we wouldn’t know the number or type of values up front.

Listing 19-28 shows a slightly simplified definition of the vec! macro.

Filename: src/lib.rs

#[macro_export]
macro_rules! vec {
    ( $( $x:expr ),* ) => {
        {
            let mut temp_vec = Vec::new();
            $(
                temp_vec.push($x);
            )*
            temp_vec
        }
    };
}

Listing 19-28: A simplified version of the vec! macro definition

Note: The actual definition of the vec! macro in the standard library includes code to preallocate the correct amount of memory up front. That code is an optimization that we don’t include here to make the example simpler.

The #[macro_export] annotation indicates that this macro should be made available whenever the crate in which the macro is defined is brought into scope. Without this annotation, the macro can’t be brought into scope.

We then start the macro definition with macro_rules! and the name of the macro we’re defining without the exclamation mark. The name, in this case vec, is followed by curly brackets denoting the body of the macro definition.

The structure in the vec! body is similar to the structure of a match expression. Here we have one arm with the pattern ( $( $x:expr ),* ), followed by => and the block of code associated with this pattern. If the pattern matches, the associated block of code will be emitted. Given that this is the only pattern in this macro, there is only one valid way to match; any other pattern will result in an error. More complex macros will have more than one arm.

Valid pattern syntax in macro definitions is different than the pattern syntax covered in Chapter 18 because macro patterns are matched against Rust code structure rather than values. Let’s walk through what the pattern pieces in Listing 19-28 mean; for the full macro pattern syntax, see the reference.

First, a set of parentheses encompasses the whole pattern. A dollar sign ($) is next, followed by a set of parentheses that captures values that match the pattern within the parentheses for use in the replacement code. Within $() is $x:expr, which matches any Rust expression and gives the expression the name $x.

The comma following $() indicates that a literal comma separator character could optionally appear after the code that matches the code in $(). The * specifies that the pattern matches zero or more of whatever precedes the *.

When we call this macro with vec![1, 2, 3];, the $x pattern matches three times with the three expressions 1, 2, and 3.

Now let’s look at the pattern in the body of the code associated with this arm: temp_vec.push() within $()* is generated for each part that matches $() in the pattern zero or more times depending on how many times the pattern matches. The $x is replaced with each expression matched. When we call this macro with vec![1, 2, 3];, the code generated that replaces this macro call will be the following:

{
    let mut temp_vec = Vec::new();
    temp_vec.push(1);
    temp_vec.push(2);
    temp_vec.push(3);
    temp_vec
}

We’ve defined a macro that can take any number of arguments of any type and can generate code to create a vector containing the specified elements.

There are some strange edge cases with macro_rules!. In the future, Rust will have a second kind of declarative macro that will work in a similar fashion but fix some of these edge cases. After that update, macro_rules! will be effectively deprecated. With this in mind, as well as the fact that most Rust programmers will use macros more than write macros, we won’t discuss macro_rules! any further. To learn more about how to write macros, consult the online documentation or other resources, such as “The Little Book of Rust Macros” started by Daniel Keep and continued by Lukas Wirth.

Procedural Macros for Generating Code from Attributes

The second form of macros is procedural macros, which act more like functions (and are a type of procedure). Procedural macros accept some code as an input, operate on that code, and produce some code as an output rather than matching against patterns and replacing the code with other code as declarative macros do.

The three kinds of procedural macros (custom derive, attribute-like, and function-like) all work in a similar fashion.

When creating procedural macros, the definitions must reside in their own crate with a special crate type. This is for complex technical reasons that we hope to eliminate in the future. Using procedural macros looks like the code in Listing 19-29, where some_attribute is a placeholder for using a specific macro.

Filename: src/lib.rs

use proc_macro;

#[some_attribute]
pub fn some_name(input: TokenStream) -> TokenStream {
}

 

Listing 19-29: An example of using a procedural macro

The function that defines a procedural macro takes a TokenStream as an input and produces a TokenStream as an output. The TokenStream type is defined by the proc_macro crate that is included with Rust and represents a sequence of tokens. This is the core of the macro: the source code that the macro is operating on makes up the input TokenStream, and the code the macro produces is the output TokenStream. The function also has an attribute attached to it that specifies which kind of procedural macro we’re creating. We can have multiple kinds of procedural macros in the same crate.

Let’s look at the different kinds of procedural macros. We’ll start with a custom derive macro and then explain the small dissimilarities that make the other forms different.

How to Write a Custom derive Macro

Let’s create a crate named hello_macro that defines a trait named HelloMacro with one associated function named hello_macro. Rather than making our crate users implement the HelloMacro trait for each of their types, we’ll provide a procedural macro so users can annotate their type with #[derive(HelloMacro)] to get a default implementation of the hello_macro function. The default implementation will print Hello, Macro! My name is TypeName! where TypeName is the name of the type on which this trait has been defined. In other words, we’ll write a crate that enables another programmer to write code like Listing 19-30 using our crate.

Filename: src/main.rs

use hello_macro::HelloMacro;
use hello_macro_derive::HelloMacro;

#[derive(HelloMacro)]
struct Pancakes;

fn main() {
    Pancakes::hello_macro();
}

Listing 19-30: The code a user of our crate will be able to write when using our procedural macro

This code will print Hello, Macro! My name is Pancakes! when we’re done. The first step is to make a new library crate, like this:

$ cargo new hello_macro --lib

Next, we’ll define the HelloMacro trait and its associated function:

Filename: src/lib.rs

pub trait HelloMacro {
    fn hello_macro();
}

We have a trait and its function. At this point, our crate user could implement the trait to achieve the desired functionality, like so:

use hello_macro::HelloMacro;

struct Pancakes;

impl HelloMacro for Pancakes {
    fn hello_macro() {
        println!("Hello, Macro! My name is Pancakes!");
    }
}

fn main() {
    Pancakes::hello_macro();
}

However, they would need to write the implementation block for each type they wanted to use with hello_macro; we want to spare them from having to do this work.

Additionally, we can’t yet provide the hello_macro function with default implementation that will print the name of the type the trait is implemented on: Rust doesn’t have reflection capabilities, so it can’t look up the type’s name at runtime. We need a macro to generate code at compile time.

The next step is to define the procedural macro. At the time of this writing, procedural macros need to be in their own crate. Eventually, this restriction might be lifted. The convention for structuring crates and macro crates is as follows: for a crate named foo, a custom derive procedural macro crate is called foo_derive. Let’s start a new crate called hello_macro_derive inside our hello_macro project:

$ cargo new hello_macro_derive --lib

Our two crates are tightly related, so we create the procedural macro crate within the directory of our hello_macro crate. If we change the trait definition in hello_macro, we’ll have to change the implementation of the procedural macro in hello_macro_derive as well. The two crates will need to be published separately, and programmers using these crates will need to add both as dependencies and bring them both into scope. We could instead have the hello_macro crate use hello_macro_derive as a dependency and re-export the procedural macro code. However, the way we’ve structured the project makes it possible for programmers to use hello_macro even if they don’t want the derive functionality.

We need to declare the hello_macro_derive crate as a procedural macro crate. We’ll also need functionality from the syn and quote crates, as you’ll see in a moment, so we need to add them as dependencies. Add the following to the Cargo.toml file for hello_macro_derive:

Filename: hello_macro_derive/Cargo.toml

[lib]
proc-macro = true

[dependencies]
syn = "1.0"
quote = "1.0"

To start defining the procedural macro, place the code in Listing 19-31 into your src/lib.rs file for the hello_macro_derive crate. Note that this code won’t compile until we add a definition for the impl_hello_macro function.

Filename: hello_macro_derive/src/lib.rs

extern crate proc_macro;

use proc_macro::TokenStream;
use quote::quote;
use syn;

#[proc_macro_derive(HelloMacro)]
pub fn hello_macro_derive(input: TokenStream) -> TokenStream {
    // Construct a representation of Rust code as a syntax tree
    // that we can manipulate
    let ast = syn::parse(input).unwrap();

    // Build the trait implementation
    impl_hello_macro(&ast)
}

Listing 19-31: Code that most procedural macro crates will require in order to process Rust code

Notice that we’ve split the code into the hello_macro_derive function, which is responsible for parsing the TokenStream, and the impl_hello_macro function, which is responsible for transforming the syntax tree: this makes writing a procedural macro more convenient. The code in the outer function (hello_macro_derive in this case) will be the same for almost every procedural macro crate you see or create. The code you specify in the body of the inner function (impl_hello_macro in this case) will be different depending on your procedural macro’s purpose.

We’ve introduced three new crates: proc_macro, syn, and quote. The proc_macro crate comes with Rust, so we didn’t need to add that to the dependencies in Cargo.toml. The proc_macro crate is the compiler’s API that allows us to read and manipulate Rust code from our code.

The syn crate parses Rust code from a string into a data structure that we can perform operations on. The quote crate turns syn data structures back into Rust code. These crates make it much simpler to parse any sort of Rust code we might want to handle: writing a full parser for Rust code is no simple task.

The hello_macro_derive function will be called when a user of our library specifies #[derive(HelloMacro)] on a type. This is possible because we’ve annotated the hello_macro_derive function here with proc_macro_derive and specified the name, HelloMacro, which matches our trait name; this is the convention most procedural macros follow.

The hello_macro_derive function first converts the input from a TokenStream to a data structure that we can then interpret and perform operations on. This is where syn comes into play. The parse function in syn takes a TokenStream and returns a DeriveInput struct representing the parsed Rust code. Listing 19-32 shows the relevant parts of the DeriveInput struct we get from parsing the struct Pancakes; string:

DeriveInput {
    // --snip--

    ident: Ident {
        ident: "Pancakes",
        span: #0 bytes(95..103)
    },
    data: Struct(
        DataStruct {
            struct_token: Struct,
            fields: Unit,
            semi_token: Some(
                Semi
            )
        }
    )
}

Listing 19-32: The DeriveInput instance we get when parsing the code that has the macro’s attribute in Listing 19-30

The fields of this struct show that the Rust code we’ve parsed is a unit struct with the ident (identifier, meaning the name) of Pancakes. There are more fields on this struct for describing all sorts of Rust code; check the syn documentation for DeriveInput for more information.

Soon we’ll define the impl_hello_macro function, which is where we’ll build the new Rust code we want to include. But before we do, note that the output for our derive macro is also a TokenStream. The returned TokenStream is added to the code that our crate users write, so when they compile their crate, they’ll get the extra functionality that we provide in the modified TokenStream.

You might have noticed that we’re calling unwrap to cause the hello_macro_derive function to panic if the call to the syn::parse function fails here. It’s necessary for our procedural macro to panic on errors because proc_macro_derive functions must return TokenStream rather than Result to conform to the procedural macro API. We’ve simplified this example by using unwrap; in production code, you should provide more specific error messages about what went wrong by using panic! or expect.

Now that we have the code to turn the annotated Rust code from a TokenStream into a DeriveInput instance, let’s generate the code that implements the HelloMacro trait on the annotated type, as shown in Listing 19-33.

Filename: hello_macro_derive/src/lib.rs

extern crate proc_macro;

use proc_macro::TokenStream;
use quote::quote;
use syn;

#[proc_macro_derive(HelloMacro)]
pub fn hello_macro_derive(input: TokenStream) -> TokenStream {
    // Construct a representation of Rust code as a syntax tree
    // that we can manipulate
    let ast = syn::parse(input).unwrap();

    // Build the trait implementation
    impl_hello_macro(&ast)
}

fn impl_hello_macro(ast: &syn::DeriveInput) -> TokenStream {
    let name = &ast.ident;
    let gen = quote! {
        impl HelloMacro for #name {
            fn hello_macro() {
                println!("Hello, Macro! My name is {}!", stringify!(#name));
            }
        }
    };
    gen.into()
}

Listing 19-33: Implementing the HelloMacro trait using the parsed Rust code

We get an Ident struct instance containing the name (identifier) of the annotated type using ast.ident. The struct in Listing 19-32 shows that when we run the impl_hello_macro function on the code in Listing 19-30, the ident we get will have the ident field with a value of "Pancakes". Thus, the name variable in Listing 19-33 will contain an Ident struct instance that, when printed, will be the string "Pancakes", the name of the struct in Listing 19-30.

The quote! macro lets us define the Rust code that we want to return. The compiler expects something different to the direct result of the quote! macro’s execution, so we need to convert it to a TokenStream. We do this by calling the into method, which consumes this intermediate representation and returns a value of the required TokenStream type.

The quote! macro also provides some very cool templating mechanics: we can enter #name, and quote! will replace it with the value in the variable name. You can even do some repetition similar to the way regular macros work. Check out the quote crate’s docs for a thorough introduction.

We want our procedural macro to generate an implementation of our HelloMacro trait for the type the user annotated, which we can get by using #name. The trait implementation has one function, hello_macro, whose body contains the functionality we want to provide: printing Hello, Macro! My name is and then the name of the annotated type.

The stringify! macro used here is built into Rust. It takes a Rust expression, such as 1 + 2, and at compile time turns the expression into a string literal, such as "1 + 2". This is different than format! or println!, macros which evaluate the expression and then turn the result into a String. There is a possibility that the #name input might be an expression to print literally, so we use stringify!. Using stringify! also saves an allocation by converting #name to a string literal at compile time.

At this point, cargo build should complete successfully in both hello_macro and hello_macro_derive. Let’s hook up these crates to the code in Listing 19-30 to see the procedural macro in action! Create a new binary project in your projects directory using cargo new pancakes. We need to add hello_macro and hello_macro_derive as dependencies in the pancakes crate’s Cargo.toml. If you’re publishing your versions of hello_macro and hello_macro_derive to crates.io, they would be regular dependencies; if not, you can specify them as path dependencies as follows:

hello_macro = { path = "../hello_macro" }
hello_macro_derive = { path = "../hello_macro/hello_macro_derive" }

Put the code in Listing 19-30 into src/main.rs, and run cargo run: it should print Hello, Macro! My name is Pancakes! The implementation of the HelloMacro trait from the procedural macro was included without the pancakes crate needing to implement it; the #[derive(HelloMacro)] added the trait implementation.

Next, let’s explore how the other kinds of procedural macros differ from custom derive macros.

Attribute-like macros

Attribute-like macros are similar to custom derive macros, but instead of generating code for the derive attribute, they allow you to create new attributes. They’re also more flexible: derive only works for structs and enums; attributes can be applied to other items as well, such as functions. Here’s an example of using an attribute-like macro: say you have an attribute named route that annotates functions when using a web application framework:

#[route(GET, "/")]
fn index() {

This #[route] attribute would be defined by the framework as a procedural macro. The signature of the macro definition function would look like this:

#[proc_macro_attribute]
pub fn route(attr: TokenStream, item: TokenStream) -> TokenStream {

Here, we have two parameters of type TokenStream. The first is for the contents of the attribute: the GET, "/" part. The second is the body of the item the attribute is attached to: in this case, fn index() {} and the rest of the function’s body.

Other than that, attribute-like macros work the same way as custom derive macros: you create a crate with the proc-macro crate type and implement a function that generates the code you want!

Function-like macros

Function-like macros define macros that look like function calls. Similarly to macro_rules! macros, they’re more flexible than functions; for example, they can take an unknown number of arguments. However, macro_rules! macros can be defined only using the match-like syntax we discussed in the section “Declarative Macros with macro_rules! for General Metaprogramming” earlier. Function-like macros take a TokenStream parameter and their definition manipulates that TokenStream using Rust code as the other two types of procedural macros do. An example of a function-like macro is an sql! macro that might be called like so:

let sql = sql!(SELECT * FROM posts WHERE id=1);

This macro would parse the SQL statement inside it and check that it’s syntactically correct, which is much more complex processing than a macro_rules! macro can do. The sql! macro would be defined like this:

#[proc_macro]
pub fn sql(input: TokenStream) -> TokenStream {

Summary

Whew! Now you have some Rust features in your toolbox that you won’t use often, but you’ll know they’re available in very particular circumstances. We’ve introduced several complex topics so that when you encounter them in error message suggestions or in other peoples’ code, you’ll be able to recognize these concepts and syntax. Use this chapter as a reference to guide you to solutions.


Macros By Example

Syntax
MacroRulesDefinition :
   macro_rules ! IDENTIFIER MacroRulesDef

MacroRulesDef :
      ( MacroRules ) ;
   | [ MacroRules ] ;
   | { MacroRules }

MacroRules :
   MacroRule ( ; MacroRule )* ;?

MacroRule :
   MacroMatcher => MacroTranscriber

MacroMatcher :
      ( MacroMatch* )
   | [ MacroMatch* ]
   | { MacroMatch* }

MacroMatch :
      Tokenexcept $ and delimiters
   | MacroMatcher
   | $ ( IDENTIFIER_OR_KEYWORD except crate | RAW_IDENTIFIER | _ ) : MacroFragSpec
   | $ ( MacroMatch+ ) MacroRepSep? MacroRepOp

MacroFragSpec :
      block | expr | ident | item | lifetime | literal
   | meta | pat | pat_param | path | stmt | tt | ty | vis

MacroRepSep :
   Tokenexcept delimiters and MacroRepOp

MacroRepOp :
   * | + | ?

MacroTranscriber :
   DelimTokenTree

macro_rules allows users to define syntax extension in a declarative way. We call such extensions "macros by example" or simply "macros".

Each macro by example has a name, and one or more rules. Each rule has two parts: a matcher, describing the syntax that it matches, and a transcriber, describing the syntax that will replace a successfully matched invocation. Both the matcher and the transcriber must be surrounded by delimiters. Macros can expand to expressions, statements, items (including traits, impls, and foreign items), types, or patterns.

Transcribing

When a macro is invoked, the macro expander looks up macro invocations by name, and tries each macro rule in turn. It transcribes the first successful match; if this results in an error, then future matches are not tried. When matching, no lookahead is performed; if the compiler cannot unambiguously determine how to parse the macro invocation one token at a time, then it is an error. In the following example, the compiler does not look ahead past the identifier to see if the following token is a ), even though that would allow it to parse the invocation unambiguously:


#![allow(unused)]
fn main() {
macro_rules! ambiguity {
    ($($i:ident)* $j:ident) => { };
}

ambiguity!(error); // Error: local ambiguity
}

In both the matcher and the transcriber, the $ token is used to invoke special behaviours from the macro engine (described below in Metavariables and Repetitions). Tokens that aren't part of such an invocation are matched and transcribed literally, with one exception. The exception is that the outer delimiters for the matcher will match any pair of delimiters. Thus, for instance, the matcher (()) will match {()} but not {{}}. The character $ cannot be matched or transcribed literally.

When forwarding a matched fragment to another macro-by-example, matchers in the second macro will see an opaque AST of the fragment type. The second macro can't use literal tokens to match the fragments in the matcher, only a fragment specifier of the same type. The ident, lifetime, and tt fragment types are an exception, and can be matched by literal tokens. The following illustrates this restriction:


#![allow(unused)]
fn main() {
macro_rules! foo {
    ($l:expr) => { bar!($l); }
// ERROR:               ^^ no rules expected this token in macro call
}

macro_rules! bar {
    (3) => {}
}

foo!(3);
}

The following illustrates how tokens can be directly matched after matching a tt fragment:


#![allow(unused)]
fn main() {
// compiles OK
macro_rules! foo {
    ($l:tt) => { bar!($l); }
}

macro_rules! bar {
    (3) => {}
}

foo!(3);
}

Metavariables

In the matcher, $ name : fragment-specifier matches a Rust syntax fragment of the kind specified and binds it to the metavariable $name. Valid fragment specifiers are:

In the transcriber, metavariables are referred to simply by $name, since the fragment kind is specified in the matcher. Metavariables are replaced with the syntax element that matched them. The keyword metavariable $crate can be used to refer to the current crate; see Hygiene below. Metavariables can be transcribed more than once or not at all.

For reasons of backwards compatibility, though _ is also an expression, a standalone underscore is not matched by the expr fragment specifier. However, _ is matched by the expr fragment specifier when it appears as a subexpression.

Repetitions

In both the matcher and transcriber, repetitions are indicated by placing the tokens to be repeated inside $(), followed by a repetition operator, optionally with a separator token between. The separator token can be any token other than a delimiter or one of the repetition operators, but ; and , are the most common. For instance, $( $i:ident ),* represents any number of identifiers separated by commas. Nested repetitions are permitted.

The repetition operators are:

  • * — indicates any number of repetitions.
  • + — indicates any number but at least one.
  • ? — indicates an optional fragment with zero or one occurrences.

Since ? represents at most one occurrence, it cannot be used with a separator.

The repeated fragment both matches and transcribes to the specified number of the fragment, separated by the separator token. Metavariables are matched to every repetition of their corresponding fragment. For instance, the $( $i:ident ),* example above matches $i to all of the identifiers in the list.

During transcription, additional restrictions apply to repetitions so that the compiler knows how to expand them properly:

  1. A metavariable must appear in exactly the same number, kind, and nesting order of repetitions in the transcriber as it did in the matcher. So for the matcher $( $i:ident ),*, the transcribers => { $i }, => { $( $( $i)* )* }, and => { $( $i )+ } are all illegal, but => { $( $i );* } is correct and replaces a comma-separated list of identifiers with a semicolon-separated list.
  2. Each repetition in the transcriber must contain at least one metavariable to decide how many times to expand it. If multiple metavariables appear in the same repetition, they must be bound to the same number of fragments. For instance, ( $( $i:ident ),* ; $( $j:ident ),* ) => (( $( ($i,$j) ),* )) must bind the same number of $i fragments as $j fragments. This means that invoking the macro with (a, b, c; d, e, f) is legal and expands to ((a,d), (b,e), (c,f)), but (a, b, c; d, e) is illegal because it does not have the same number. This requirement applies to every layer of nested repetitions.

Scoping, Exporting, and Importing

For historical reasons, the scoping of macros by example does not work entirely like items. Macros have two forms of scope: textual scope, and path-based scope. Textual scope is based on the order that things appear in source files, or even across multiple files, and is the default scoping. It is explained further below. Path-based scope works exactly the same way that item scoping does. The scoping, exporting, and importing of macros is controlled largely by attributes.

When a macro is invoked by an unqualified identifier (not part of a multi-part path), it is first looked up in textual scoping. If this does not yield any results, then it is looked up in path-based scoping. If the macro's name is qualified with a path, then it is only looked up in path-based scoping.

use lazy_static::lazy_static; // Path-based import.

macro_rules! lazy_static { // Textual definition.
    (lazy) => {};
}

lazy_static!{lazy} // Textual lookup finds our macro first.
self::lazy_static!{} // Path-based lookup ignores our macro, finds imported one.

Textual Scope

Textual scope is based largely on the order that things appear in source files, and works similarly to the scope of local variables declared with let except it also applies at the module level. When macro_rules! is used to define a macro, the macro enters the scope after the definition (note that it can still be used recursively, since names are looked up from the invocation site), up until its surrounding scope, typically a module, is closed. This can enter child modules and even span across multiple files:

//// src/lib.rs
mod has_macro {
    // m!{} // Error: m is not in scope.

    macro_rules! m {
        () => {};
    }
    m!{} // OK: appears after declaration of m.

    mod uses_macro;
}

// m!{} // Error: m is not in scope.

//// src/has_macro/uses_macro.rs

m!{} // OK: appears after declaration of m in src/lib.rs

It is not an error to define a macro multiple times; the most recent declaration will shadow the previous one unless it has gone out of scope.


#![allow(unused)]
fn main() {
macro_rules! m {
    (1) => {};
}

m!(1);

mod inner {
    m!(1);

    macro_rules! m {
        (2) => {};
    }
    // m!(1); // Error: no rule matches '1'
    m!(2);

    macro_rules! m {
        (3) => {};
    }
    m!(3);
}

m!(1);
}

Macros can be declared and used locally inside functions as well, and work similarly:


#![allow(unused)]
fn main() {
fn foo() {
    // m!(); // Error: m is not in scope.
    macro_rules! m {
        () => {};
    }
    m!();
}


// m!(); // Error: m is not in scope.
}

The macro_use attribute

The macro_use attribute has two purposes. First, it can be used to make a module's macro scope not end when the module is closed, by applying it to a module:


#![allow(unused)]
fn main() {
#[macro_use]
mod inner {
    macro_rules! m {
        () => {};
    }
}

m!();
}

Second, it can be used to import macros from another crate, by attaching it to an extern crate declaration appearing in the crate's root module. Macros imported this way are imported into the macro_use prelude, not textually, which means that they can be shadowed by any other name. While macros imported by #[macro_use] can be used before the import statement, in case of a conflict, the last macro imported wins. Optionally, a list of macros to import can be specified using the MetaListIdents syntax; this is not supported when #[macro_use] is applied to a module.

#[macro_use(lazy_static)] // Or #[macro_use] to import all macros.
extern crate lazy_static;

lazy_static!{}
// self::lazy_static!{} // Error: lazy_static is not defined in `self`

Macros to be imported with #[macro_use] must be exported with #[macro_export], which is described below.

Path-Based Scope

By default, a macro has no path-based scope. However, if it has the #[macro_export] attribute, then it is declared in the crate root scope and can be referred to normally as such:


#![allow(unused)]
fn main() {
self::m!();
m!(); // OK: Path-based lookup finds m in the current module.

mod inner {
    super::m!();
    crate::m!();
}

mod mac {
    #[macro_export]
    macro_rules! m {
        () => {};
    }
}
}

Macros labeled with #[macro_export] are always pub and can be referred to by other crates, either by path or by #[macro_use] as described above.

Hygiene

By default, all identifiers referred to in a macro are expanded as-is, and are looked up at the macro's invocation site. This can lead to issues if a macro refers to an item or macro which isn't in scope at the invocation site. To alleviate this, the $crate metavariable can be used at the start of a path to force lookup to occur inside the crate defining the macro.

//// Definitions in the `helper_macro` crate.
#[macro_export]
macro_rules! helped {
    // () => { helper!() } // This might lead to an error due to 'helper' not being in scope.
    () => { $crate::helper!() }
}

#[macro_export]
macro_rules! helper {
    () => { () }
}

//// Usage in another crate.
// Note that `helper_macro::helper` is not imported!
use helper_macro::helped;

fn unit() {
    helped!();
}

Note that, because $crate refers to the current crate, it must be used with a fully qualified module path when referring to non-macro items:


#![allow(unused)]
fn main() {
pub mod inner {
    #[macro_export]
    macro_rules! call_foo {
        () => { $crate::inner::foo() };
    }

    pub fn foo() {}
}
}

Additionally, even though $crate allows a macro to refer to items within its own crate when expanding, its use has no effect on visibility. An item or macro referred to must still be visible from the invocation site. In the following example, any attempt to invoke call_foo!() from outside its crate will fail because foo() is not public.


#![allow(unused)]
fn main() {
#[macro_export]
macro_rules! call_foo {
    () => { $crate::foo() };
}

fn foo() {}
}

Version & Edition Differences: Prior to Rust 1.30, $crate and local_inner_macros (below) were unsupported. They were added alongside path-based imports of macros (described above), to ensure that helper macros did not need to be manually imported by users of a macro-exporting crate. Crates written for earlier versions of Rust that use helper macros need to be modified to use $crate or local_inner_macros to work well with path-based imports.

When a macro is exported, the #[macro_export] attribute can have the local_inner_macros keyword added to automatically prefix all contained macro invocations with $crate::. This is intended primarily as a tool to migrate code written before $crate was added to the language to work with Rust 2018's path-based imports of macros. Its use is discouraged in new code.


#![allow(unused)]
fn main() {
#[macro_export(local_inner_macros)]
macro_rules! helped {
    () => { helper!() } // Automatically converted to $crate::helper!().
}

#[macro_export]
macro_rules! helper {
    () => { () }
}
}

Follow-set Ambiguity Restrictions

The parser used by the macro system is reasonably powerful, but it is limited in order to prevent ambiguity in current or future versions of the language. In particular, in addition to the rule about ambiguous expansions, a nonterminal matched by a metavariable must be followed by a token which has been decided can be safely used after that kind of match.

As an example, a macro matcher like $i:expr [ , ] could in theory be accepted in Rust today, since [,] cannot be part of a legal expression and therefore the parse would always be unambiguous. However, because [ can start trailing expressions, [ is not a character which can safely be ruled out as coming after an expression. If [,] were accepted in a later version of Rust, this matcher would become ambiguous or would misparse, breaking working code. Matchers like $i:expr, or $i:expr; would be legal, however, because , and ; are legal expression separators. The specific rules are:

  • expr and stmt may only be followed by one of: =>, ,, or ;.
  • pat_param may only be followed by one of: =>, ,, =, |, if, or in.
  • pat may only be followed by one of: =>, ,, =, if, or in.
  • path and ty may only be followed by one of: =>, ,, =, |, ;, :, >, >>, [, {, as, where, or a macro variable of block fragment specifier.
  • vis may only be followed by one of: ,, an identifier other than a non-raw priv, any token that can begin a type, or a metavariable with a ident, ty, or path fragment specifier.
  • All other fragment specifiers have no restrictions.

Edition Differences: Before the 2021 edition, pat may also be followed by |.

When repetitions are involved, then the rules apply to every possible number of expansions, taking separators into account. This means:

  • If the repetition includes a separator, that separator must be able to follow the contents of the repetition.
  • If the repetition can repeat multiple times (* or +), then the contents must be able to follow themselves.
  • The contents of the repetition must be able to follow whatever comes before, and whatever comes after must be able to follow the contents of the repetition.
  • If the repetition can match zero times (* or ?), then whatever comes after must be able to follow whatever comes before.


Macros in Rust: A tutorial with examples

In this tutorial, we’ll cover everything you need to know about Rust macros, including an introduction to macros in Rust and a demonstration of how to use Rust macros with examples.

What are Rust macros?

Rust has excellent support for macros. Macros enable you to write code that writes other code, which is known as metaprogramming.

Macros provide functionality similar to functions but without the runtime cost. There is some compile-time cost, however, since macros are expanded during compile time.

Rust macros are very different from macros in C. Rust macros are applied to the token tree whereas C macros are text substitution.

Types of macros in Rust

Rust has two types of macros:

  1. Declarative macros enable you to write something similar to a match expression that operates on the Rust code you provide as arguments. It uses the code you provide to generate code that replaces the macro invocation
  2. Procedural macros allow you to operate on the abstract syntax tree (AST) of the Rust code it is given. A proc macro is a function from a TokenStream (or two) to another TokenStream, where the output replaces the macro invocation

Let’s zoom in on both declarative and procedural macros and explore some examples of how to use macros in Rust.

Declarative macros in Rust

These macros are declared using macro_rules!. Declarative macros are a bit less powerful but provide an easy to use interface for creating macros to remove duplicate code. One of the common declarative macro is println!. Declarative macros provide a match like an interface where on match the macro is replaced with code inside the matched arm.

Creating declarative macros

// use macro_rules! <name of macro>{<Body>}
macro_rules! add{
 // macth like arm for macro
    ($a:expr,$b:expr)=>{
 // macro expand to this code
        {
// $a and $b will be templated using the value/variable provided to macro
            $a+$b
        }
    }
}

fn main(){
 // call to macro, $a=1 and $b=2
    add!(1,2);
}

This code creates a macro to add two numbers. [macro_rules!] are used with the name of the macro, add, and the body of the macro.

 

The macro doesn’t add two numbers, it just replaces itself with the code to add two numbers. Each arm of the macro takes an argument for functions and multiple types can be assigned to arguments. If the add function can also take a single argument, we add another arm.

macro_rules! add{
 // first arm match add!(1,2), add!(2,3) etc
    ($a:expr,$b:expr)=>{
        {
            $a+$b
        }
    };
// Second arm macth add!(1), add!(2) etc
    ($a:expr)=>{
        {
            $a
        }
    }
}

fn main(){
// call the macro
    let x=0;
    add!(1,2);
    add!(x);
}

There can be multiple branches in a single macro expanding to different code based on different arguments. Each branch can take multiple arguments, starting with the $ sign and followed by a token type:

  • item — an item, like a function, struct, module, etc.
  • block — a block (i.e. a block of statements and/or an expression, surrounded by braces)
  • stmt — a statement
  • pat — a pattern
  • expr — an expression
  • ty — a type
  • ident — an identifier
  • path — a path (e.g., foo, ::std::mem::replace, transmute::<_, int>, …)
  • meta — a meta item; the things that go inside #[...] and #![...] attributes
  • tt — a single token tree
  • vis — a possibly empty Visibility qualifier

In the example, we use the $typ argument with token type ty as a datatype like u8, u16, etc. This macro converts to a particular type before adding the numbers.

macro_rules! add_as{
// using a ty token type for macthing datatypes passed to maccro
    ($a:expr,$b:expr,$typ:ty)=>{
        $a as $typ + $b as $typ
    }
}

fn main(){
    println!("{}",add_as!(0,2,u8));
}

Rust macros also support taking a nonfixed number of arguments. The operators are very similar to the regular expression. * is used for zero or more token types and + for zero or one argument.

macro_rules! add_as{
    (
  // repeated block
  $($a:expr)
 // seperator
   ,
// zero or more
   *
   )=>{
       { 
   // to handle the case without any arguments
   0
   // block to be repeated
   $(+$a)*
     }
    }
}

fn main(){
    println!("{}",add_as!(1,2,3,4)); // => println!("{}",{0+1+2+3+4})
}

The token type that repeats is enclosed in $(), followed by a separator and a * or a +, indicating the number of times the token will repeat. The separator is used to distinguish the tokens from each other. The $() block followed by * or + is used to indicate the repeating block of code. In the above example, +$a is a repeating code.

If you look closely, you’ll notice an additional zero is added to the code to make the syntax valid. To remove this zero and make the add expression the same as the argument, we need to create a new macro known as TT muncher.

macro_rules! add{
 // first arm in case of single argument and last remaining variable/number
    ($a:expr)=>{
        $a
    };
// second arm in case of two arument are passed and stop recursion in case of odd number ofarguments
    ($a:expr,$b:expr)=>{
        {
            $a+$b
        }
    };
// add the number and the result of remaining arguments 
    ($a:expr,$($b:tt)*)=>{
       {
           $a+add!($($b)*)
       }
    }
}

fn main(){
    println!("{}",add!(1,2,3,4));
}

The TT muncher processes each token separately in a recursive fashion. It’s easier to process a single token at a time. The macro has three arms:

  1. The first arms handle the case if a single argument is passed
  2. The second one handles the case if two arguments are passed
  3. The third arm calls the add macro again with the rest of the arguments

The macro arguments don’t need to be comma-separated. Multiple tokens can be used with different token types. For example, brackets can be used with the ident token type. The Rust compiler takes the matched arm and extracts the variable from the argument string.

macro_rules! ok_or_return{
// match something(q,r,t,6,7,8) etc
// compiler extracts function name and arguments. It injects the values in respective varibles.
    ($a:ident($($b:tt)*))=>{
       {
        match $a($($b)*) {
            Ok(value)=>value,
            Err(err)=>{
                return Err(err);
            }
        }
        }
    };
}

fn some_work(i:i64,j:i64)->Result<(i64,i64),String>{
    if i+j>2 {
        Ok((i,j))
    } else {
        Err("error".to_owned())
    }
}

fn main()->Result<(),String>{
    ok_or_return!(some_work(1,4));
    ok_or_return!(some_work(1,0));
    Ok(())
}

The ok_or_return macro returns the function if an operation returns Err or the value of an operation returns Ok. It takes a function as an argument and executes it inside a match statement. For arguments passed to function, it uses repetition.

Often, few macros need to be grouped into a single macro. In these cases, internal macro rules are used. It helps to manipulate the macro inputs and write clean TT munchers.

To create an internal rule, add the rule name starting with @ as the argument. Now the macro will never match for an internal rule until explicitly specified as an argument.

macro_rules! ok_or_return{
 // internal rule.
    (@error $a:ident,$($b:tt)* )=>{
        {
        match $a($($b)*) {
            Ok(value)=>value,
            Err(err)=>{
                return Err(err);
            }
        }
        }
    };

// public rule can be called by the user.
    ($a:ident($($b:tt)*))=>{
        ok_or_return!(@error $a,$($b)*)
    };
}

fn some_work(i:i64,j:i64)->Result<(i64,i64),String>{
    if i+j>2 {
        Ok((i,j))
    } else {
        Err("error".to_owned())
    }
}

fn main()->Result<(),String>{
   // instead of round bracket curly brackets can also be used
    ok_or_return!{some_work(1,4)};
    ok_or_return!(some_work(1,0));
    Ok(())
}

Advanced parsing in Rust with declarative macros

Macros sometimes perform tasks that require parsing of the Rust language itself.

Do put together all the concepts we’ve covered to this point, let’s create a macro that makes a struct public by suffixing the pub keyword.

First, we need to parse the Rust struct to get the name of the struct, fields of the struct, and field type.

Parsing the name and field of a struct

A struct declaration has a visibility keyword at the start (such as pub), followed by the struct keyword and then the name of the struct and the body of the struct.

macro_rules! make_public{
    (
  // use vis type for visibility keyword and ident for struct name
     $vis:vis struct $struct_name:ident { }
    ) => {
        {
            pub struct $struct_name{ }
        }
    }
}

The $vis will have visibility and $struct_name will have a struct name. To make a struct public, we just need to add the pub keyword and ignore the $vis variable.


 

A struct may contain multiple fields with the same or different data types and visibility. The ty token type is used for the data type, vis for visibility, and ident for the field name. We’ll use * repetition for zero or more fields.

 macro_rules! make_public{
    (
     $vis:vis struct $struct_name:ident {
        $(
 // vis for field visibility, ident for field name and ty for field data type
        $field_vis:vis $field_name:ident : $field_type:ty
        ),*
    }
    ) => {
        {
            pub struct $struct_name{
                $(
                pub $field_name : $field_type,
                )*
            }
        }
    }
}

Parsing metadata from the struct

Often the struct has some metadata attached or procedural macros, such as #[derive(Debug)]. This metadata needs to stay intact. Parsing this metadata is done using the meta type.

macro_rules! make_public{
    (
     // meta data about struct
     $(#[$meta:meta])* 
     $vis:vis struct $struct_name:ident {
        $(
        // meta data about field
        $(#[$field_meta:meta])*
        $field_vis:vis $field_name:ident : $field_type:ty
        ),*$(,)+
    }
    ) => {
        { 
            $(#[$meta])*
            pub struct $struct_name{
                $(
                $(#[$field_meta:meta])*
                pub $field_name : $field_type,
                )*
            }
        }
    }
}

Our make_public macro is ready now. To see how make_public works, let’s use Rust Playground to expand the macro to the actual code that is compiled.

macro_rules! make_public{
    (
     $(#[$meta:meta])* 
     $vis:vis struct $struct_name:ident {
        $(
        $(#[$field_meta:meta])*
        $field_vis:vis $field_name:ident : $field_type:ty
        ),*$(,)+
    }
    ) => {

            $(#[$meta])*
            pub struct $struct_name{
                $(
                $(#[$field_meta:meta])*
                pub $field_name : $field_type,
                )*
            }
    }
}

fn main(){
    make_public!{
        #[derive(Debug)]
        struct Name{
            n:i64,
            t:i64,
            g:i64,
        }
    }
}

The expanded code looks like this:

// some imports


macro_rules! make_public {
    ($ (#[$ meta : meta]) * $ vis : vis struct $ struct_name : ident
     {
         $
         ($ (#[$ field_meta : meta]) * $ field_vis : vis $ field_name : ident
          : $ field_type : ty), * $ (,) +
     }) =>
    {

            $ (#[$ meta]) * pub struct $ struct_name
            {
                $
                ($ (#[$ field_meta : meta]) * pub $ field_name : $
                 field_type,) *
            }
    }
}

fn main() {
        pub struct name {
            pub n: i64,
            pub t: i64,
            pub g: i64,
    }
}

Limitations of declarative macros

Declarative macros have a few limitations. Some are related to Rust macros themselves while others are more specific to declarative macros.

  • Lack of support for macros autocompletion and expansion
  • Debugging declarative macros is difficult
  • Limited modification capabilities
  • Larger binaries
  • Longer compile time (this applies to both declarative and procedural macros)

Procedural macros in Rust

Procedural macros are a more advanced version of macros. Procedural macros allow you to expand the existing syntax of Rust. It takes arbitrary input and returns valid Rust code.

Procedural macros are functions that take a TokenStream as input and return another Token Stream. Procedural macros manipulate the input TokenStream to produce an output stream.

There are three types of procedural macros:

  1. Attribute-like macros
  2. Derive macros
  3. Function-like macros

We’ll go into each procedural macro type in detail below.

Attribute-like macros

Attribute-like macros enable you to create a custom attribute that attaches itself to an item and allows manipulation of that item. It can also take arguments.

#[some_attribute_macro(some_argument)]
fn perform_task(){
// some code
}

In the above code, some_attribute_macros is an attribute macro. It manipulates the function perform_task.

To write an attribute-like macro, start by creating a project using cargo new macro-demo --lib. Once the project is ready, update the Cargo.toml to notify cargo the project will create procedural macros.

# Cargo.toml
[lib]
proc-macro = true

Now we are all set to venture into procedural macros.

Procedural macros are public functions that take TokenStream as input and return another TokenStream. To write a procedural macro, we need to write our parser to parse TokenStream. The Rust community has a very good crate, syn, for parsing TokenStream.

synprovides a ready-made parser for Rust syntax that can be used to parse TokenStream. You can also parse your syntax by combining low-level parsers providing syn.

Add syn and quote to Cargo.toml:

# Cargo.toml
[dependencies]
syn = {version="1.0.57",features=["full","fold"]}
quote = "1.0.8"

Now we can write an attribute-like a macro in lib.rs using the proc_macro crate provided by the compiler for writing procedural macros. A procedural macro crate cannot export anything else other than procedural macros and procedural macros defined in the crate can’t be used in the crate itself.

// lib.rs
extern crate proc_macro;
use proc_macro::{TokenStream};
use quote::{quote};

// using proc_macro_attribute to declare an attribute like procedural macro
#[proc_macro_attribute]
// _metadata is argument provided to macro call and _input is code to which attribute like macro attaches
pub fn my_custom_attribute(_metadata: TokenStream, _input: TokenStream) -> TokenStream {
    // returing a simple TokenStream for Struct
    TokenStream::from(quote!{struct H{}})
}

To test the macro we added, create an ingratiation test by creating a folder named tests and adding the file attribute_macro.rs in the folder. In this file, we can use our attribute-like macro for testing.

// tests/attribute_macro.rs

use macro_demo::*;

// macro converts struct S to struct H
#[my_custom_attribute]
struct S{}

#[test]
fn test_macro(){
// due to macro we have struct H in scope
    let demo=H{};
}

Run the above test using the cargo test command.

Now that we understand the basics of procedural macros, lets use syn for some advanced TokenStream manipulation and parsing.

To learn how syn is used for parsing and manipulation, let’s take an example from the syn GitHub repo. This example creates a Rust macro that trace variables when value changes.

First, we need to identify how our macro will manipulate the code it attaches.

#[trace_vars(a)]
fn do_something(){
  let a=9;
  a=6;
  a=0;
}

The trace_vars macro takes the name of the variable it needs to trace and injects a print statement each time the value of the input variable i.e a changes. It tracks the value of input variables.

First, parse the code to which the attribute-like macro attaches. syn provides an inbuilt parser for Rust function syntax. ItemFn will parse the function and throw an error if the syntax is invalid.

#[proc_macro_attribute]
pub fn trace_vars(_metadata: TokenStream, input: TokenStream) -> TokenStream {
// parsing rust function to easy to use struct
    let input_fn = parse_macro_input!(input as ItemFn);
    TokenStream::from(quote!{fn dummy(){}})
}

Now that we have the parsed input, let’s move to metadata. For metadata, no inbuilt parser will work, so we’ll have to write one ourselves using syn‘s parse module.

#[trace_vars(a,c,b)] // we need to parse a "," seperated list of tokens
// code

For syn to work, we need to implement the Parse trait provided by syn. Punctuated is used to create a vector of Indent separated by ,.

struct Args{
    vars:HashSet<Ident>
}

impl Parse for Args{
    fn parse(input: ParseStream) -> Result<Self> {
        // parses a,b,c, or a,b,c where a,b and c are Indent
        let vars = Punctuated::<Ident, Token![,]>::parse_terminated(input)?;
        Ok(Args {
            vars: vars.into_iter().collect(),
        })
    }
}

Once we implement the Parse trait, we can use parse_macro_input macro for parsing metadata.

#[proc_macro_attribute]
pub fn trace_vars(metadata: TokenStream, input: TokenStream) -> TokenStream {
    let input_fn = parse_macro_input!(input as ItemFn);
// using newly created struct Args
    let args= parse_macro_input!(metadata as Args);
    TokenStream::from(quote!{fn dummy(){}})
}

We will now modify the input_fn to add println! when the variable changes the value. To add this, we need to filter outlines that have an assignment and insert a print statement after that line.

impl Args {
    fn should_print_expr(&self, e: &Expr) -> bool {
        match *e {
            Expr::Path(ref e) => {
 // variable shouldn't start wiht ::
                if e.path.leading_colon.is_some() {
                    false
// should be a single variable like `x=8` not n::x=0 
                } else if e.path.segments.len() != 1 {
                    false
                } else {
// get the first part
                    let first = e.path.segments.first().unwrap();
// check if the variable name is in the Args.vars hashset
                    self.vars.contains(&first.ident) && first.arguments.is_empty()
                }
            }
            _ => false,
        }
    }

// used for checking if to print let i=0 etc or not
    fn should_print_pat(&self, p: &Pat) -> bool {
        match p {
// check if variable name is present in set
            Pat::Ident(ref p) => self.vars.contains(&p.ident),
            _ => false,
        }
    }

// manipulate tree to insert print statement
    fn assign_and_print(&mut self, left: Expr, op: &dyn ToTokens, right: Expr) -> Expr {
 // recurive call on right of the assigment statement
        let right = fold::fold_expr(self, right);
// returning manipulated sub-tree
        parse_quote!({
            #left #op #right;
            println!(concat!(stringify!(#left), " = {:?}"), #left);
        })
    }

// manipulating let statement
    fn let_and_print(&mut self, local: Local) -> Stmt {
        let Local { pat, init, .. } = local;
        let init = self.fold_expr(*init.unwrap().1);
// get the variable name of assigned variable
        let ident = match pat {
            Pat::Ident(ref p) => &p.ident,
            _ => unreachable!(),
        };
// new sub tree
        parse_quote! {
            let #pat = {
                #[allow(unused_mut)]
                let #pat = #init;
                println!(concat!(stringify!(#ident), " = {:?}"), #ident);
                #ident
            };
        }
    }
}

In the above example, the quote macro is used for templating and writing Rust. # is used for injecting the value of the variable.

Now we’ll do a DFS over input_fn and insert the print statement. syn provides a Fold trait that can be implemented for DFS over any Item. We just need to modify the trait methods that correspond with the token type we want to manipulate.

impl Fold for Args {
    fn fold_expr(&mut self, e: Expr) -> Expr {
        match e {
// for changing assignment like a=5
            Expr::Assign(e) => {
// check should print
                if self.should_print_expr(&e.left) {
                    self.assign_and_print(*e.left, &e.eq_token, *e.right)
                } else {
// continue with default travesal using default methods
                    Expr::Assign(fold::fold_expr_assign(self, e))
                }
            }
// for changing assigment and operation like a+=1
            Expr::AssignOp(e) => {
// check should print
                if self.should_print_expr(&e.left) {
                    self.assign_and_print(*e.left, &e.op, *e.right)
                } else {
// continue with default behaviour
                    Expr::AssignOp(fold::fold_expr_assign_op(self, e))
                }
            }
// continue with default behaviour for rest of expressions
            _ => fold::fold_expr(self, e),
        }
    }

// for let statements like let d=9
    fn fold_stmt(&mut self, s: Stmt) -> Stmt {
        match s {
            Stmt::Local(s) => {
                if s.init.is_some() && self.should_print_pat(&s.pat) {
                    self.let_and_print(s)
                } else {
                    Stmt::Local(fold::fold_local(self, s))
                }
            }
            _ => fold::fold_stmt(self, s),
        }
    }
}

The Fold trait is used to do a DFS of Item. It enables you to use different behavior for various token types.

Now we can use fold_item_fn to inject print statements in our parsed code.

#[proc_macro_attribute]
pub fn trace_var(args: TokenStream, input: TokenStream) -> TokenStream {
// parse the input
    let input = parse_macro_input!(input as ItemFn);
// parse the arguments
    let mut args = parse_macro_input!(args as Args);
// create the ouput
    let output = args.fold_item_fn(input);
// return the TokenStream
    TokenStream::from(quote!(#output))
}

This code example is from the syn examples repo, which is an excellent resource to learn about procedural macros.

Custom derive macros

Custom derive macros in Rust allow auto implement traits. These macros enable you to implement traits using #[derive(Trait)].

syn has excellent support for derive macros.

#[derive(Trait)]
struct MyStruct{}

To write a custom derive macro in Rust, we can use DeriveInput for parsing input to derive macro. We’ll also use the proc_macro_derive macro to define a custom derive macro.

#[proc_macro_derive(Trait)]
pub fn derive_trait(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
    let input = parse_macro_input!(input as DeriveInput);

    let name = input.ident;

    let expanded = quote! {
        impl Trait for #name {
            fn print(&self) -> usize {
                println!("{}","hello from #name")
           }
        }
    };

    proc_macro::TokenStream::from(expanded)
}

More advanced procedural macros can be written using syn. Check out this example from syn‘s repo.

Function-like macros

Function-like macros are similar to declarative macros in that they’re invoked with the macro invocation operator ! and look like function calls. They operate on the code that is inside the parentheses.

Here’s how to write a function-like macro in Rust:

#[proc_macro]
pub fn a_proc_macro(_input: TokenStream) -> TokenStream {
    TokenStream::from(quote!(
            fn anwser()->i32{
                5
            }
))
}

Function-like macros are executed not at runtime but at compile time. They can be used anywhere in Rust code. Function-like macros also take a TokenStream and return a TokenStream.

Advantages of using procedural macros include:

  • Better error handling using span
  • Better control over output
  • Community-built crates syn and quote
  • More powerful than declarative macros

Conclusion

In this Rust macros tutorial, we covered the basics of macros in Rust, defined declarative and procedural macros, and walked through how to write both types of macros using various syntax and community-built crates. We also outlined the advantages of using each type of Rust macro.


References:
- Please support those who need it the most right now: https://help.gov.ua/en
- Source Code: https://github.com/tsoding/noq
- Macros by Example: https://doc.rust-lang.org/reference/macros-by-example.html
- TT munchers: https://danielkeep.github.io/tlborm/book/pat-incremental-tt-munchers.html

#rust #programming