1659979326
https://youtu.be/TW5PD_TJMXo
1667488080
A full Python implementation of the ROUGE metric, producing same results as in the official perl implementation.
Important remarks
<3e-5
for ROUGE-L as well as ROUGE-W and <4e-5
for ROUGE-N.-b 665
.In case of doubts, please see all the implemented tests to compare outputs between the official ROUGE-1.5.5 and this script.
Package is uploaded on PyPI <https://pypi.org/project/py-rouge>
_.
You can install it with pip:
pip install py-rouge
or do it manually:
git clone https://github.com/Diego999/py-rouge
cd py-rouge
python setup.py install
Issues/Pull Requests/Feedbacks
Don't hesitate to contact for any feedback or create issues/pull requests (especially if you want to rewrite the stemmer implemented in ROUGE-1.5.5 in python ;)).
Example
import rouge
def prepare_results(m, p, r, f):
return '\t{}:\t{}: {:5.2f}\t{}: {:5.2f}\t{}: {:5.2f}'.format(m, 'P', 100.0 * p, 'R', 100.0 * r, 'F1', 100.0 * f)
for aggregator in ['Avg', 'Best', 'Individual']:
print('Evaluation with {}'.format(aggregator))
apply_avg = aggregator == 'Avg'
apply_best = aggregator == 'Best'
evaluator = rouge.Rouge(metrics=['rouge-n', 'rouge-l', 'rouge-w'],
max_n=4,
limit_length=True,
length_limit=100,
length_limit_type='words',
apply_avg=apply_avg,
apply_best=apply_best,
alpha=0.5, # Default F1_score
weight_factor=1.2,
stemming=True)
hypothesis_1 = "King Norodom Sihanouk has declined requests to chair a summit of Cambodia 's top political leaders , saying the meeting would not bring any progress in deadlocked negotiations to form a government .\nGovernment and opposition parties have asked King Norodom Sihanouk to host a summit meeting after a series of post-election negotiations between the two opposition groups and Hun Sen 's party to form a new government failed .\nHun Sen 's ruling party narrowly won a majority in elections in July , but the opposition _ claiming widespread intimidation and fraud _ has denied Hun Sen the two-thirds vote in parliament required to approve the next government .\n"
references_1 = ["Prospects were dim for resolution of the political crisis in Cambodia in October 1998.\nPrime Minister Hun Sen insisted that talks take place in Cambodia while opposition leaders Ranariddh and Sam Rainsy, fearing arrest at home, wanted them abroad.\nKing Sihanouk declined to chair talks in either place.\nA U.S. House resolution criticized Hun Sen's regime while the opposition tried to cut off his access to loans.\nBut in November the King announced a coalition government with Hun Sen heading the executive and Ranariddh leading the parliament.\nLeft out, Sam Rainsy sought the King's assurance of Hun Sen's promise of safety and freedom for all politicians.",
"Cambodian prime minister Hun Sen rejects demands of 2 opposition parties for talks in Beijing after failing to win a 2/3 majority in recent elections.\nSihanouk refuses to host talks in Beijing.\nOpposition parties ask the Asian Development Bank to stop loans to Hun Sen's government.\nCCP defends Hun Sen to the US Senate.\nFUNCINPEC refuses to share the presidency.\nHun Sen and Ranariddh eventually form a coalition at summit convened by Sihanouk.\nHun Sen remains prime minister, Ranariddh is president of the national assembly, and a new senate will be formed.\nOpposition leader Rainsy left out.\nHe seeks strong assurance of safety should he return to Cambodia.\n",
]
hypothesis_2 = "China 's government said Thursday that two prominent dissidents arrested this week are suspected of endangering national security _ the clearest sign yet Chinese leaders plan to quash a would-be opposition party .\nOne leader of a suppressed new political party will be tried on Dec. 17 on a charge of colluding with foreign enemies of China '' to incite the subversion of state power , '' according to court documents given to his wife on Monday .\nWith attorneys locked up , harassed or plain scared , two prominent dissidents will defend themselves against charges of subversion Thursday in China 's highest-profile dissident trials in two years .\n"
references_2 = "Hurricane Mitch, category 5 hurricane, brought widespread death and destruction to Central American.\nEspecially hard hit was Honduras where an estimated 6,076 people lost their lives.\nThe hurricane, which lingered off the coast of Honduras for 3 days before moving off, flooded large areas, destroying crops and property.\nThe U.S. and European Union were joined by Pope John Paul II in a call for money and workers to help the stricken area.\nPresident Clinton sent Tipper Gore, wife of Vice President Gore to the area to deliver much needed supplies to the area, demonstrating U.S. commitment to the recovery of the region.\n"
all_hypothesis = [hypothesis_1, hypothesis_2]
all_references = [references_1, references_2]
scores = evaluator.get_scores(all_hypothesis, all_references)
for metric, results in sorted(scores.items(), key=lambda x: x[0]):
if not apply_avg and not apply_best: # value is a type of list as we evaluate each summary vs each reference
for hypothesis_id, results_per_ref in enumerate(results):
nb_references = len(results_per_ref['p'])
for reference_id in range(nb_references):
print('\tHypothesis #{} & Reference #{}: '.format(hypothesis_id, reference_id))
print('\t' + prepare_results(metric,results_per_ref['p'][reference_id], results_per_ref['r'][reference_id], results_per_ref['f'][reference_id]))
print()
else:
print(prepare_results(metric, results['p'], results['r'], results['f']))
print()
It produces the following output:
Evaluation with Avg
rouge-1: P: 28.62 R: 26.46 F1: 27.49
rouge-2: P: 4.21 R: 3.92 F1: 4.06
rouge-3: P: 0.80 R: 0.74 F1: 0.77
rouge-4: P: 0.00 R: 0.00 F1: 0.00
rouge-l: P: 30.52 R: 28.57 F1: 29.51
rouge-w: P: 15.85 R: 8.28 F1: 10.87
Evaluation with Best
rouge-1: P: 30.44 R: 28.36 F1: 29.37
rouge-2: P: 4.74 R: 4.46 F1: 4.59
rouge-3: P: 1.06 R: 0.98 F1: 1.02
rouge-4: P: 0.00 R: 0.00 F1: 0.00
rouge-l: P: 31.54 R: 29.71 F1: 30.60
rouge-w: P: 16.42 R: 8.82 F1: 11.47
Evaluation with Individual
Hypothesis #0 & Reference #0:
rouge-1: P: 38.54 R: 35.58 F1: 37.00
Hypothesis #0 & Reference #1:
rouge-1: P: 45.83 R: 43.14 F1: 44.44
Hypothesis #1 & Reference #0:
rouge-1: P: 15.05 R: 13.59 F1: 14.29
Hypothesis #0 & Reference #0:
rouge-2: P: 7.37 R: 6.80 F1: 7.07
Hypothesis #0 & Reference #1:
rouge-2: P: 9.47 R: 8.91 F1: 9.18
Hypothesis #1 & Reference #0:
rouge-2: P: 0.00 R: 0.00 F1: 0.00
Hypothesis #0 & Reference #0:
rouge-3: P: 2.13 R: 1.96 F1: 2.04
Hypothesis #0 & Reference #1:
rouge-3: P: 1.06 R: 1.00 F1: 1.03
Hypothesis #1 & Reference #0:
rouge-3: P: 0.00 R: 0.00 F1: 0.00
Hypothesis #0 & Reference #0:
rouge-4: P: 0.00 R: 0.00 F1: 0.00
Hypothesis #0 & Reference #1:
rouge-4: P: 0.00 R: 0.00 F1: 0.00
Hypothesis #1 & Reference #0:
rouge-4: P: 0.00 R: 0.00 F1: 0.00
Hypothesis #0 & Reference #0:
rouge-l: P: 42.11 R: 39.39 F1: 40.70
Hypothesis #0 & Reference #1:
rouge-l: P: 46.19 R: 43.92 F1: 45.03
Hypothesis #1 & Reference #0:
rouge-l: P: 16.88 R: 15.50 F1: 16.16
Hypothesis #0 & Reference #0:
rouge-w: P: 22.27 R: 11.49 F1: 15.16
Hypothesis #0 & Reference #1:
rouge-w: P: 24.56 R: 13.60 F1: 17.51
Hypothesis #1 & Reference #0:
rouge-w: P: 8.29 R: 4.04 F1: 5.43
Author: Diego999
Source Code: https://github.com/Diego999/py-rouge
License: Apache-2.0 license
1654075127
Amazon Aurora is a relational database management system (RDBMS) developed by AWS(Amazon Web Services). Aurora gives you the performance and availability of commercial-grade databases with full MySQL and PostgreSQL compatibility. In terms of high performance, Aurora MySQL and Aurora PostgreSQL have shown an increase in throughput of up to 5X over stock MySQL and 3X over stock PostgreSQL respectively on similar hardware. In terms of scalability, Aurora achieves enhancements and innovations in storage and computing, horizontal and vertical functions.
Aurora supports up to 128TB of storage capacity and supports dynamic scaling of storage layer in units of 10GB. In terms of computing, Aurora supports scalable configurations for multiple read replicas. Each region can have an additional 15 Aurora replicas. In addition, Aurora provides multi-primary architecture to support four read/write nodes. Its Serverless architecture allows vertical scaling and reduces typical latency to under a second, while the Global Database enables a single database cluster to span multiple AWS Regions in low latency.
Aurora already provides great scalability with the growth of user data volume. Can it handle more data and support more concurrent access? You may consider using sharding to support the configuration of multiple underlying Aurora clusters. To this end, a series of blogs, including this one, provides you with a reference in choosing between Proxy and JDBC for sharding.
AWS Aurora offers a single relational database. Primary-secondary, multi-primary, and global database, and other forms of hosting architecture can satisfy various architectural scenarios above. However, Aurora doesn’t provide direct support for sharding scenarios, and sharding has a variety of forms, such as vertical and horizontal forms. If we want to further increase data capacity, some problems have to be solved, such as cross-node database Join
, associated query, distributed transactions, SQL sorting, page turning, function calculation, database global primary key, capacity planning, and secondary capacity expansion after sharding.
It is generally accepted that when the capacity of a MySQL table is less than 10 million, the time spent on queries is optimal because at this time the height of its BTREE
index is between 3 and 5. Data sharding can reduce the amount of data in a single table and distribute the read and write loads to different data nodes at the same time. Data sharding can be divided into vertical sharding and horizontal sharding.
1. Advantages of vertical sharding
2. Disadvantages of vertical sharding
Join
can only be implemented by interface aggregation, which will increase the complexity of development.3. Advantages of horizontal sharding
4. Disadvantages of horizontal sharding
Join
is poor.Based on the analysis above, and the available studis on popular sharding middleware, we selected ShardingSphere, an open source product, combined with Amazon Aurora to introduce how the combination of these two products meets various forms of sharding and how to solve the problems brought by sharding.
ShardingSphere is an open source ecosystem including a set of distributed database middleware solutions, including 3 independent products, Sharding-JDBC, Sharding-Proxy & Sharding-Sidecar.
The characteristics of Sharding-JDBC are:
Hybrid Structure Integrating Sharding-JDBC and Applications
Sharding-JDBC’s core concepts
Data node: The smallest unit of a data slice, consisting of a data source name and a data table, such as ds_0.product_order_0.
Actual table: The physical table that really exists in the horizontal sharding database, such as product order tables: product_order_0, product_order_1, and product_order_2.
Logic table: The logical name of the horizontal sharding databases (tables) with the same schema. For instance, the logic table of the order product_order_0, product_order_1, and product_order_2 is product_order.
Binding table: It refers to the primary table and the joiner table with the same sharding rules. For example, product_order table and product_order_item are sharded by order_id, so they are binding tables with each other. Cartesian product correlation will not appear in the multi-tables correlating query, so the query efficiency will increase greatly.
Broadcast table: It refers to tables that exist in all sharding database sources. The schema and data must consist in each database. It can be applied to the small data volume that needs to correlate with big data tables to query, dictionary table and configuration table for example.
Download the example project code locally. In order to ensure the stability of the test code, we choose shardingsphere-example-4.0.0
version.
git clone
https://github.com/apache/shardingsphere-example.git
Project description:
shardingsphere-example
├── example-core
│ ├── config-utility
│ ├── example-api
│ ├── example-raw-jdbc
│ ├── example-spring-jpa #spring+jpa integration-based entity,repository
│ └── example-spring-mybatis
├── sharding-jdbc-example
│ ├── sharding-example
│ │ ├── sharding-raw-jdbc-example
│ │ ├── sharding-spring-boot-jpa-example #integration-based sharding-jdbc functions
│ │ ├── sharding-spring-boot-mybatis-example
│ │ ├── sharding-spring-namespace-jpa-example
│ │ └── sharding-spring-namespace-mybatis-example
│ ├── orchestration-example
│ │ ├── orchestration-raw-jdbc-example
│ │ ├── orchestration-spring-boot-example #integration-based sharding-jdbc governance function
│ │ └── orchestration-spring-namespace-example
│ ├── transaction-example
│ │ ├── transaction-2pc-xa-example #sharding-jdbc sample of two-phase commit for a distributed transaction
│ │ └──transaction-base-seata-example #sharding-jdbc distributed transaction seata sample
│ ├── other-feature-example
│ │ ├── hint-example
│ │ └── encrypt-example
├── sharding-proxy-example
│ └── sharding-proxy-boot-mybatis-example
└── src/resources
└── manual_schema.sql
Configuration file description:
application-master-slave.properties #read/write splitting profile
application-sharding-databases-tables.properties #sharding profile
application-sharding-databases.properties #library split profile only
application-sharding-master-slave.properties #sharding and read/write splitting profile
application-sharding-tables.properties #table split profile
application.properties #spring boot profile
Code logic description:
The following is the entry class of the Spring Boot application below. Execute it to run the project.
The execution logic of demo is as follows:
As business grows, the write and read requests can be split to different database nodes to effectively promote the processing capability of the entire database cluster. Aurora uses a reader/writer endpoint
to meet users' requirements to write and read with strong consistency, and a read-only endpoint
to meet the requirements to read without strong consistency. Aurora's read and write latency is within single-digit milliseconds, much lower than MySQL's binlog
-based logical replication, so there's a lot of loads that can be directed to a read-only endpoint
.
Through the one primary and multiple secondary configuration, query requests can be evenly distributed to multiple data replicas, which further improves the processing capability of the system. Read/write splitting can improve the throughput and availability of system, but it can also lead to data inconsistency. Aurora provides a primary/secondary architecture in a fully managed form, but applications on the upper-layer still need to manage multiple data sources when interacting with Aurora, routing SQL requests to different nodes based on the read/write type of SQL statements and certain routing policies.
ShardingSphere-JDBC provides read/write splitting features and it is integrated with application programs so that the complex configuration between application programs and database clusters can be separated from application programs. Developers can manage the Shard
through configuration files and combine it with ORM frameworks such as Spring JPA and Mybatis to completely separate the duplicated logic from the code, which greatly improves the ability to maintain code and reduces the coupling between code and database.
Create a set of Aurora MySQL read/write splitting clusters. The model is db.r5.2xlarge. Each set of clusters has one write node and two read nodes.
application.properties spring boot
Master profile description:
You need to replace the green ones with your own environment configuration.
# Jpa automatically creates and drops data tables based on entities
spring.jpa.properties.hibernate.hbm2ddl.auto=create-drop
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
spring.jpa.properties.hibernate.show_sql=true
#spring.profiles.active=sharding-databases
#spring.profiles.active=sharding-tables
#spring.profiles.active=sharding-databases-tables
#Activate master-slave configuration item so that sharding-jdbc can use master-slave profile
spring.profiles.active=master-slave
#spring.profiles.active=sharding-master-slave
application-master-slave.properties sharding-jdbc
profile description:
spring.shardingsphere.datasource.names=ds_master,ds_slave_0,ds_slave_1
# data souce-master
spring.shardingsphere.datasource.ds_master.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master.password=Your master DB password
spring.shardingsphere.datasource.ds_master.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master.jdbc-url=Your primary DB data sourceurl spring.shardingsphere.datasource.ds_master.username=Your primary DB username
# data source-slave
spring.shardingsphere.datasource.ds_slave_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_slave_0.password= Your slave DB password
spring.shardingsphere.datasource.ds_slave_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_slave_0.jdbc-url=Your slave DB data source url
spring.shardingsphere.datasource.ds_slave_0.username= Your slave DB username
# data source-slave
spring.shardingsphere.datasource.ds_slave_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_slave_1.password= Your slave DB password
spring.shardingsphere.datasource.ds_slave_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_slave_1.jdbc-url= Your slave DB data source url
spring.shardingsphere.datasource.ds_slave_1.username= Your slave DB username
# Routing Policy Configuration
spring.shardingsphere.masterslave.load-balance-algorithm-type=round_robin
spring.shardingsphere.masterslave.name=ds_ms
spring.shardingsphere.masterslave.master-data-source-name=ds_master
spring.shardingsphere.masterslave.slave-data-source-names=ds_slave_0,ds_slave_1
# sharding-jdbc configures the information storage mode
spring.shardingsphere.mode.type=Memory
# start shardingsphere log,and you can see the conversion from logical SQL to actual SQL from the print
spring.shardingsphere.props.sql.show=true
As shown in the ShardingSphere-SQL log
figure below, the write SQL is executed on the ds_master
data source.
As shown in the ShardingSphere-SQL log
figure below, the read SQL is executed on the ds_slave
data source in the form of polling.
[INFO ] 2022-04-02 19:43:39,376 --main-- [ShardingSphere-SQL] Rule Type: master-slave
[INFO ] 2022-04-02 19:43:39,376 --main-- [ShardingSphere-SQL] SQL: select orderentit0_.order_id as order_id1_1_, orderentit0_.address_id as address_2_1_,
orderentit0_.status as status3_1_, orderentit0_.user_id as user_id4_1_ from t_order orderentit0_ ::: DataSources: ds_slave_0
---------------------------- Print OrderItem Data -------------------
Hibernate: select orderiteme1_.order_item_id as order_it1_2_, orderiteme1_.order_id as order_id2_2_, orderiteme1_.status as status3_2_, orderiteme1_.user_id
as user_id4_2_ from t_order orderentit0_ cross join t_order_item orderiteme1_ where orderentit0_.order_id=orderiteme1_.order_id
[INFO ] 2022-04-02 19:43:40,898 --main-- [ShardingSphere-SQL] Rule Type: master-slave
[INFO ] 2022-04-02 19:43:40,898 --main-- [ShardingSphere-SQL] SQL: select orderiteme1_.order_item_id as order_it1_2_, orderiteme1_.order_id as order_id2_2_, orderiteme1_.status as status3_2_,
orderiteme1_.user_id as user_id4_2_ from t_order orderentit0_ cross join t_order_item orderiteme1_ where orderentit0_.order_id=orderiteme1_.order_id ::: DataSources: ds_slave_1
Note: As shown in the figure below, if there are both reads and writes in a transaction, Sharding-JDBC routes both read and write operations to the master library. If the read/write requests are not in the same transaction, the corresponding read requests are distributed to different read nodes according to the routing policy.
@Override
@Transactional // When a transaction is started, both read and write in the transaction go through the master library. When closed, read goes through the slave library and write goes through the master library
public void processSuccess() throws SQLException {
System.out.println("-------------- Process Success Begin ---------------");
List<Long> orderIds = insertData();
printData();
deleteData(orderIds);
printData();
System.out.println("-------------- Process Success Finish --------------");
}
The Aurora database environment adopts the configuration described in Section 2.2.1.
3.2.4.1 Verification process description
Spring-Boot
project2. Perform a failover on Aurora’s console
3. Execute the Rest API
request
4. Repeatedly execute POST
(http://localhost:8088/save-user) until the call to the API failed to write to Aurora and eventually recovered successfully.
5. The following figure shows the process of executing code failover. It takes about 37 seconds from the time when the latest SQL write is successfully performed to the time when the next SQL write is successfully performed. That is, the application can be automatically recovered from Aurora failover, and the recovery time is about 37 seconds.
application.properties spring boot
master profile description
# Jpa automatically creates and drops data tables based on entities
spring.jpa.properties.hibernate.hbm2ddl.auto=create-drop
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
spring.jpa.properties.hibernate.show_sql=true
#spring.profiles.active=sharding-databases
#Activate sharding-tables configuration items
#spring.profiles.active=sharding-tables
#spring.profiles.active=sharding-databases-tables
# spring.profiles.active=master-slave
#spring.profiles.active=sharding-master-slave
application-sharding-tables.properties sharding-jdbc
profile description
## configure primary-key policy
spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id
spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=123
spring.shardingsphere.sharding.tables.t_order_item.actual-data-nodes=ds.t_order_item_$->{0..1}
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.sharding-column=order_id
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.algorithm-expression=t_order_item_$->{order_id % 2}
spring.shardingsphere.sharding.tables.t_order_item.key-generator.column=order_item_id
spring.shardingsphere.sharding.tables.t_order_item.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order_item.key-generator.props.worker.id=123
# configure the binding relation of t_order and t_order_item
spring.shardingsphere.sharding.binding-tables[0]=t_order,t_order_item
# configure broadcast tables
spring.shardingsphere.sharding.broadcast-tables=t_address
# sharding-jdbc mode
spring.shardingsphere.mode.type=Memory
# start shardingsphere log
spring.shardingsphere.props.sql.show=true
1. DDL operation
JPA automatically creates tables for testing. When Sharding-JDBC routing rules are configured, the client
executes DDL, and Sharding-JDBC automatically creates corresponding tables according to the table splitting rules. If t_address
is a broadcast table, create a t_address
because there is only one master instance. Two physical tables t_order_0
and t_order_1
will be created when creating t_order
.
2. Write operation
As shown in the figure below, Logic SQL
inserts a record into t_order
. When Sharding-JDBC is executed, data will be distributed to t_order_0
and t_order_1
according to the table splitting rules.
When t_order
and t_order_item
are bound, the records associated with order_item
and order
are placed on the same physical table.
3. Read operation
As shown in the figure below, perform the join
query operations to order
and order_item
under the binding table, and the physical shard is precisely located based on the binding relationship.
The join
query operations on order
and order_item
under the unbound table will traverse all shards.
Create two instances on Aurora: ds_0
and ds_1
When the sharding-spring-boot-jpa-example
project is started, tables t_order
, t_order_item
,t_address
will be created on two Aurora instances.
application.properties springboot
master profile description
# Jpa automatically creates and drops data tables based on entities
spring.jpa.properties.hibernate.hbm2ddl.auto=create
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
spring.jpa.properties.hibernate.show_sql=true
# Activate sharding-databases configuration items
spring.profiles.active=sharding-databases
#spring.profiles.active=sharding-tables
#spring.profiles.active=sharding-databases-tables
#spring.profiles.active=master-slave
#spring.profiles.active=sharding-master-slave
application-sharding-databases.properties sharding-jdbc
profile description
spring.shardingsphere.datasource.names=ds_0,ds_1
# ds_0
spring.shardingsphere.datasource.ds_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_0.jdbc-url= spring.shardingsphere.datasource.ds_0.username=
spring.shardingsphere.datasource.ds_0.password=
# ds_1
spring.shardingsphere.datasource.ds_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_1.jdbc-url=
spring.shardingsphere.datasource.ds_1.username=
spring.shardingsphere.datasource.ds_1.password=
spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=ds_$->{user_id % 2}
spring.shardingsphere.sharding.binding-tables=t_order,t_order_item
spring.shardingsphere.sharding.broadcast-tables=t_address
spring.shardingsphere.sharding.default-data-source-name=ds_0
spring.shardingsphere.sharding.tables.t_order.actual-data-nodes=ds_$->{0..1}.t_order
spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id
spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=123
spring.shardingsphere.sharding.tables.t_order_item.actual-data-nodes=ds_$->{0..1}.t_order_item
spring.shardingsphere.sharding.tables.t_order_item.key-generator.column=order_item_id
spring.shardingsphere.sharding.tables.t_order_item.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order_item.key-generator.props.worker.id=123
# sharding-jdbc mode
spring.shardingsphere.mode.type=Memory
# start shardingsphere log
spring.shardingsphere.props.sql.show=true
1. DDL operation
JPA automatically creates tables for testing. When Sharding-JDBC’s library splitting and routing rules are configured, the client
executes DDL, and Sharding-JDBC will automatically create corresponding tables according to table splitting rules. If t_address
is a broadcast table, physical tables will be created on ds_0
and ds_1
. The three tables, t_address
, t_order
and t_order_item
will be created on ds_0
and ds_1
respectively.
2. Write operation
For the broadcast table t_address
, each record written will also be written to the t_address
tables of ds_0
and ds_1
.
The tables t_order
and t_order_item
of the slave library are written on the table in the corresponding instance according to the slave library field and routing policy.
3. Read operation
Query order
is routed to the corresponding Aurora instance according to the routing rules of the slave library .
Query Address
. Since address
is a broadcast table, an instance of address
will be randomly selected and queried from the nodes used.
As shown in the figure below, perform the join
query operations to order
and order_item
under the binding table, and the physical shard is precisely located based on the binding relationship.
As shown in the figure below, create two instances on Aurora: ds_0
and ds_1
When the sharding-spring-boot-jpa-example
project is started, physical tables t_order_01
, t_order_02
, t_order_item_01
,and t_order_item_02
and global table t_address
will be created on two Aurora instances.
application.properties springboot
master profile description
# Jpa automatically creates and drops data tables based on entities
spring.jpa.properties.hibernate.hbm2ddl.auto=create
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
spring.jpa.properties.hibernate.show_sql=true
# Activate sharding-databases-tables configuration items
#spring.profiles.active=sharding-databases
#spring.profiles.active=sharding-tables
spring.profiles.active=sharding-databases-tables
#spring.profiles.active=master-slave
#spring.profiles.active=sharding-master-slave
application-sharding-databases.properties sharding-jdbc
profile description
spring.shardingsphere.datasource.names=ds_0,ds_1
# ds_0
spring.shardingsphere.datasource.ds_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_0.jdbc-url= 306/dev?useSSL=false&characterEncoding=utf-8
spring.shardingsphere.datasource.ds_0.username=
spring.shardingsphere.datasource.ds_0.password=
spring.shardingsphere.datasource.ds_0.max-active=16
# ds_1
spring.shardingsphere.datasource.ds_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_1.jdbc-url=
spring.shardingsphere.datasource.ds_1.username=
spring.shardingsphere.datasource.ds_1.password=
spring.shardingsphere.datasource.ds_1.max-active=16
# default library splitting policy
spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=ds_$->{user_id % 2}
spring.shardingsphere.sharding.binding-tables=t_order,t_order_item
spring.shardingsphere.sharding.broadcast-tables=t_address
# Tables that do not meet the library splitting policy are placed on ds_0
spring.shardingsphere.sharding.default-data-source-name=ds_0
# t_order table splitting policy
spring.shardingsphere.sharding.tables.t_order.actual-data-nodes=ds_$->{0..1}.t_order_$->{0..1}
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.sharding-column=order_id
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.algorithm-expression=t_order_$->{order_id % 2}
spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id
spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=123
# t_order_item table splitting policy
spring.shardingsphere.sharding.tables.t_order_item.actual-data-nodes=ds_$->{0..1}.t_order_item_$->{0..1}
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.sharding-column=order_id
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.algorithm-expression=t_order_item_$->{order_id % 2}
spring.shardingsphere.sharding.tables.t_order_item.key-generator.column=order_item_id
spring.shardingsphere.sharding.tables.t_order_item.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order_item.key-generator.props.worker.id=123
# sharding-jdbc mdoe
spring.shardingsphere.mode.type=Memory
# start shardingsphere log
spring.shardingsphere.props.sql.show=true
1. DDL operation
JPA automatically creates tables for testing. When Sharding-JDBC’s sharding and routing rules are configured, the client
executes DDL, and Sharding-JDBC will automatically create corresponding tables according to table splitting rules. If t_address
is a broadcast table, t_address
will be created on both ds_0
and ds_1
. The three tables, t_address
, t_order
and t_order_item
will be created on ds_0
and ds_1
respectively.
2. Write operation
For the broadcast table t_address
, each record written will also be written to the t_address
tables of ds_0
and ds_1
.
The tables t_order
and t_order_item
of the sub-library are written to the table on the corresponding instance according to the slave library field and routing policy.
3. Read operation
The read operation is similar to the library split function verification described in section2.4.3.
The following figure shows the physical table of the created database instance.
application.properties spring boot
master profile description
# Jpa automatically creates and drops data tables based on entities
spring.jpa.properties.hibernate.hbm2ddl.auto=create
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
spring.jpa.properties.hibernate.show_sql=true
# activate sharding-databases-tables configuration items
#spring.profiles.active=sharding-databases
#spring.profiles.active=sharding-tables
#spring.profiles.active=sharding-databases-tables
#spring.profiles.active=master-slave
spring.profiles.active=sharding-master-slave
application-sharding-master-slave.properties sharding-jdbc
profile description
The url, name and password of the database need to be changed to your own database parameters.
spring.shardingsphere.datasource.names=ds_master_0,ds_master_1,ds_master_0_slave_0,ds_master_0_slave_1,ds_master_1_slave_0,ds_master_1_slave_1
spring.shardingsphere.datasource.ds_master_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_0.jdbc-url= spring.shardingsphere.datasource.ds_master_0.username=
spring.shardingsphere.datasource.ds_master_0.password=
spring.shardingsphere.datasource.ds_master_0.max-active=16
spring.shardingsphere.datasource.ds_master_0_slave_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_0_slave_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_0_slave_0.jdbc-url= spring.shardingsphere.datasource.ds_master_0_slave_0.username=
spring.shardingsphere.datasource.ds_master_0_slave_0.password=
spring.shardingsphere.datasource.ds_master_0_slave_0.max-active=16
spring.shardingsphere.datasource.ds_master_0_slave_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_0_slave_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_0_slave_1.jdbc-url= spring.shardingsphere.datasource.ds_master_0_slave_1.username=
spring.shardingsphere.datasource.ds_master_0_slave_1.password=
spring.shardingsphere.datasource.ds_master_0_slave_1.max-active=16
spring.shardingsphere.datasource.ds_master_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_1.jdbc-url=
spring.shardingsphere.datasource.ds_master_1.username=
spring.shardingsphere.datasource.ds_master_1.password=
spring.shardingsphere.datasource.ds_master_1.max-active=16
spring.shardingsphere.datasource.ds_master_1_slave_0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_1_slave_0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_1_slave_0.jdbc-url=
spring.shardingsphere.datasource.ds_master_1_slave_0.username=
spring.shardingsphere.datasource.ds_master_1_slave_0.password=
spring.shardingsphere.datasource.ds_master_1_slave_0.max-active=16
spring.shardingsphere.datasource.ds_master_1_slave_1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds_master_1_slave_1.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds_master_1_slave_1.jdbc-url= spring.shardingsphere.datasource.ds_master_1_slave_1.username=admin
spring.shardingsphere.datasource.ds_master_1_slave_1.password=
spring.shardingsphere.datasource.ds_master_1_slave_1.max-active=16
spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=ds_$->{user_id % 2}
spring.shardingsphere.sharding.binding-tables=t_order,t_order_item
spring.shardingsphere.sharding.broadcast-tables=t_address
spring.shardingsphere.sharding.default-data-source-name=ds_master_0
spring.shardingsphere.sharding.tables.t_order.actual-data-nodes=ds_$->{0..1}.t_order_$->{0..1}
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.sharding-column=order_id
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.algorithm-expression=t_order_$->{order_id % 2}
spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id
spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=123
spring.shardingsphere.sharding.tables.t_order_item.actual-data-nodes=ds_$->{0..1}.t_order_item_$->{0..1}
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.sharding-column=order_id
spring.shardingsphere.sharding.tables.t_order_item.table-strategy.inline.algorithm-expression=t_order_item_$->{order_id % 2}
spring.shardingsphere.sharding.tables.t_order_item.key-generator.column=order_item_id
spring.shardingsphere.sharding.tables.t_order_item.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order_item.key-generator.props.worker.id=123
# master/slave data source and slave data source configuration
spring.shardingsphere.sharding.master-slave-rules.ds_0.master-data-source-name=ds_master_0
spring.shardingsphere.sharding.master-slave-rules.ds_0.slave-data-source-names=ds_master_0_slave_0, ds_master_0_slave_1
spring.shardingsphere.sharding.master-slave-rules.ds_1.master-data-source-name=ds_master_1
spring.shardingsphere.sharding.master-slave-rules.ds_1.slave-data-source-names=ds_master_1_slave_0, ds_master_1_slave_1
# sharding-jdbc mode
spring.shardingsphere.mode.type=Memory
# start shardingsphere log
spring.shardingsphere.props.sql.show=true
1. DDL operation
JPA automatically creates tables for testing. When Sharding-JDBC’s library splitting and routing rules are configured, the client
executes DDL, and Sharding-JDBC will automatically create corresponding tables according to table splitting rules. If t_address
is a broadcast table, t_address
will be created on both ds_0
and ds_1
. The three tables, t_address
, t_order
and t_order_item
will be created on ds_0
and ds_1
respectively.
2. Write operation
For the broadcast table t_address
, each record written will also be written to the t_address
tables of ds_0
and ds_1
.
The tables t_order
and t_order_item
of the slave library are written to the table on the corresponding instance according to the slave library field and routing policy.
3. Read operation
The join
query operations on order
and order_item
under the binding table are shown below.
3. Conclusion
As an open source product focusing on database enhancement, ShardingSphere is pretty good in terms of its community activitiy, product maturity and documentation richness.
Among its products, ShardingSphere-JDBC is a sharding solution based on the client-side, which supports all sharding scenarios. And there’s no need to introduce an intermediate layer like Proxy, so the complexity of operation and maintenance is reduced. Its latency is theoretically lower than Proxy due to the lack of intermediate layer. In addition, ShardingSphere-JDBC can support a variety of relational databases based on SQL standards such as MySQL/PostgreSQL/Oracle/SQL Server, etc.
However, due to the integration of Sharding-JDBC with the application program, it only supports Java language for now, and is strongly dependent on the application programs. Nevertheless, Sharding-JDBC separates all sharding configuration from the application program, which brings relatively small changes when switching to other middleware.
In conclusion, Sharding-JDBC is a good choice if you use a Java-based system and have to to interconnect with different relational databases — and don’t want to bother with introducing an intermediate layer.
Author
Sun Jinhua
A senior solution architect at AWS, Sun is responsible for the design and consult on cloud architecture. for providing customers with cloud-related design and consulting services. Before joining AWS, he ran his own business, specializing in building e-commerce platforms and designing the overall architecture for e-commerce platforms of automotive companies. He worked in a global leading communication equipment company as a senior engineer, responsible for the development and architecture design of multiple subsystems of LTE equipment system. He has rich experience in architecture design with high concurrency and high availability system, microservice architecture design, database, middleware, IOT etc.
1646753760
A new Cumulus-based Substrate node, ready for hacking :cloud:
This project is a fork of the Substrate Node Template modified to include dependencies required for registering this node as a parathread or parachain to an established relay chain.
👉 Learn more about parachains here, and parathreads here.
Follow these steps to prepare a local Substrate development environment :hammer_and_wrench:
If necessary, refer to the setup instructions at the Substrate Developer Hub.
Once the development environment is set up, build the Cumulus Parachain Template. This command will build the Wasm Runtime and native code:
cargo build --release
NOTE: In the following two sections, we document how to manually start a few relay chain nodes, start a parachain node (collator), and register the parachain with the relay chain.
We also have the
polkadot-launch
CLI tool that automate the following steps and help you easily launch relay chains and parachains. However it is still good to go through the following procedures once to understand the mechanism for running and registering a parachain.
To operate a parathread or parachain, you must connect to a relay chain. Typically you would test on a local Rococo development network, then move to the testnet, and finally launch on the mainnet. Keep in mind you need to configure the specific relay chain you will connect to in your collator chain_spec.rs
. In the following examples, we will use rococo-local
as the relay network.
Clone and build Polkadot (beware of the version tag we used):
# Get a fresh clone, or `cd` to where you have polkadot already:
git clone -b v0.9.7 --depth 1 https://github.com/paritytech/polkadot.git
cd polkadot
cargo build --release
First, we create the chain specification file (chainspec). Note the chainspec file must be generated on a single node and then shared among all nodes!
👉 Learn more about chain specification here.
./target/release/polkadot build-spec \
--chain rococo-local \
--raw \
--disable-default-bootnode \
> rococo_local.json
We need n + 1 full validator nodes running on a relay chain to accept n parachain / parathread connections. Here we will start two relay chain nodes so we can have one parachain node connecting in later.
From the Polkadot working directory:
# Start Relay `Alice` node
./target/release/polkadot \
--chain ./rococo_local.json \
-d /tmp/relay/alice \
--validator \
--alice \
--port 50555
Open a new terminal, same directory:
# Start Relay `Bob` node
./target/release/polkadot \
--chain ./rococo_local.json \
-d /tmp/relay/bob \
--validator \
--bob \
--port 50556
Add more nodes as needed, with non-conflicting ports, DB directories, and validator keys (--charlie
, --dave
, etc.).
To connect to a relay chain, you must first _reserve a ParaId
for your parathread that will become a parachain. To do this, you will need sufficient amount of currency on the network account to reserve the ID.
In this example, we will use Charlie
development account where we have funds available. Once you submit this extrinsic successfully, you can start your collators.
The easiest way to reserve your ParaId
is via Polkadot Apps UI under the Parachains
-> Parathreads
tab and use the + ParaID
button.
To operate your parachain, you need to specify the correct relay chain you will connect to in your collator chain_spec.rs
. Specifically you pass the command for the network you need in the Extensions
of your ChainSpec::from_genesis()
in the code.
Extensions {
relay_chain: "rococo-local".into(), // You MUST set this to the correct network!
para_id: id.into(),
},
You can choose from any pre-set runtime chainspec in the Polkadot repo, by referring to the
cli/src/command.rs
andnode/service/src/chain_spec.rs
files or generate your own and use that. See the Cumulus Workshop for how.
In the following examples, we will use the rococo-local
relay network we setup in the last section.
We first generate the genesis state and genesis wasm needed for the parachain registration.
# Build the parachain node (from it's top level dir)
cd substrate-parachain-template
cargo build --release
# Folder to store resource files needed for parachain registration
mkdir -p resources
# Build the chainspec
./target/release/parachain-collator build-spec \
--disable-default-bootnode > ./resources/template-local-plain.json
# Build the raw chainspec file
./target/release/parachain-collator build-spec \
--chain=./resources/template-local-plain.json \
--raw --disable-default-bootnode > ./resources/template-local-raw.json
# Export genesis state to `./resources`, using 2000 as the ParaId
./target/release/parachain-collator export-genesis-state --parachain-id 2000 > ./resources/para-2000-genesis
# Export the genesis wasm
./target/release/parachain-collator export-genesis-wasm > ./resources/para-2000-wasm
NOTE: we have set the
para_ID
to be 2000 here. This must be unique for all parathreads/chains on the relay chain you register with. You must reserve this first on the relay chain for the testnet or mainnet.
From the parachain template working directory:
# NOTE: this command assumes the chain spec is in a directory named `polkadot`
# that is at the same level of the template working directory. Change as needed.
#
# It also assumes a ParaId of 2000. Change as needed.
./target/release/parachain-collator \
-d /tmp/parachain/alice \
--collator \
--alice \
--force-authoring \
--ws-port 9945 \
--parachain-id 2000 \
-- \
--execution wasm \
--chain ../polkadot/rococo_local.json
Output:
2021-05-30 16:57:39 Parachain Collator Template
2021-05-30 16:57:39 ✌️ version 3.0.0-acce183-x86_64-linux-gnu
2021-05-30 16:57:39 ❤️ by Anonymous, 2017-2021
2021-05-30 16:57:39 📋 Chain specification: Local Testnet
2021-05-30 16:57:39 🏷 Node name: Alice
2021-05-30 16:57:39 👤 Role: AUTHORITY
2021-05-30 16:57:39 💾 Database: RocksDb at /tmp/parachain/alice/chains/local_testnet/db
2021-05-30 16:57:39 ⛓ Native runtime: template-parachain-1 (template-parachain-0.tx1.au1)
2021-05-30 16:57:41 Parachain id: Id(2000)
2021-05-30 16:57:41 Parachain Account: 5Ec4AhPUwPeyTFyuhGuBbD224mY85LKLMSqSSo33JYWCazU4
2021-05-30 16:57:41 Parachain genesis state: 0x0000000000000000000000000000000000000000000000000000000000000000000a96f42b5cb798190e5f679bb16970905087a9a9fc612fb5ca6b982b85783c0d03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400
2021-05-30 16:57:41 Is collating: yes
2021-05-30 16:57:41 [Parachain] 🔨 Initializing Genesis block/state (state: 0x0a96…3c0d, header-hash: 0xd42b…f271)
2021-05-30 16:57:41 [Parachain] ⏱ Loaded block-time = 12s from block 0xd42bb78354bc21770e3f0930ed45c7377558d2d8e81ca4d457e573128aabf271
2021-05-30 16:57:43 [Relaychain] 🔨 Initializing Genesis block/state (state: 0xace1…1b62, header-hash: 0xfa68…cf58)
2021-05-30 16:57:43 [Relaychain] 👴 Loading GRANDPA authority set from genesis on what appears to be first startup.
2021-05-30 16:57:44 [Relaychain] ⏱ Loaded block-time = 6s from block 0xfa68f5abd2a80394b87c9bd07e0f4eee781b8c696d0a22c8e5ba38ae10e1cf58
2021-05-30 16:57:44 [Relaychain] 👶 Creating empty BABE epoch changes on what appears to be first startup.
2021-05-30 16:57:44 [Relaychain] 🏷 Local node identity is: 12D3KooWBjYK2W4dsBfsrFA9tZCStb5ogPb6STQqi2AK9awXfXyG
2021-05-30 16:57:44 [Relaychain] 📦 Highest known block at #0
2021-05-30 16:57:44 [Relaychain] 〽️ Prometheus server started at 127.0.0.1:9616
2021-05-30 16:57:44 [Relaychain] Listening for new connections on 127.0.0.1:9945.
2021-05-30 16:57:44 [Parachain] Using default protocol ID "sup" because none is configured in the chain specs
2021-05-30 16:57:44 [Parachain] 🏷 Local node identity is: 12D3KooWADBSC58of6ng2M29YTDkmWCGehHoUZhsy9LGkHgYscBw
2021-05-30 16:57:44 [Parachain] 📦 Highest known block at #0
2021-05-30 16:57:44 [Parachain] Unable to listen on 127.0.0.1:9945
2021-05-30 16:57:44 [Parachain] Unable to bind RPC server to 127.0.0.1:9945. Trying random port.
2021-05-30 16:57:44 [Parachain] Listening for new connections on 127.0.0.1:45141.
2021-05-30 16:57:45 [Relaychain] 🔍 Discovered new external address for our node: /ip4/192.168.42.204/tcp/30334/ws/p2p/12D3KooWBjYK2W4dsBfsrFA9tZCStb5ogPb6STQqi2AK9awXfXyG
2021-05-30 16:57:45 [Parachain] 🔍 Discovered new external address for our node: /ip4/192.168.42.204/tcp/30333/p2p/12D3KooWADBSC58of6ng2M29YTDkmWCGehHoUZhsy9LGkHgYscBw
2021-05-30 16:57:48 [Relaychain] ✨ Imported #8 (0xe60b…9b0a)
2021-05-30 16:57:49 [Relaychain] 💤 Idle (2 peers), best: #8 (0xe60b…9b0a), finalized #5 (0x1e6f…567c), ⬇ 4.5kiB/s ⬆ 2.2kiB/s
2021-05-30 16:57:49 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 2.0kiB/s ⬆ 1.7kiB/s
2021-05-30 16:57:54 [Relaychain] ✨ Imported #9 (0x1af9…c9be)
2021-05-30 16:57:54 [Relaychain] ✨ Imported #9 (0x6ed8…fdf6)
2021-05-30 16:57:54 [Relaychain] 💤 Idle (2 peers), best: #9 (0x1af9…c9be), finalized #6 (0x3319…69a2), ⬇ 1.8kiB/s ⬆ 0.5kiB/s
2021-05-30 16:57:54 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 0.2kiB/s ⬆ 0.2kiB/s
2021-05-30 16:57:59 [Relaychain] 💤 Idle (2 peers), best: #9 (0x1af9…c9be), finalized #7 (0x5b50…1e5b), ⬇ 0.6kiB/s ⬆ 0.4kiB/s
2021-05-30 16:57:59 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0
2021-05-30 16:58:00 [Relaychain] ✨ Imported #10 (0xc9c9…1ca3)
You see messages are from both a relaychain node and a parachain node. This is because a relay chain light client is also run next to the parachain collator.
Now that you have two relay chain nodes, and a parachain node accompanied with a relay chain light client running, the next step is to register the parachain in the relay chain with the following steps (for detail, refer to the Substrate Cumulus Worship):
Developer
-> sudo
page.paraSudoWrapper
-> sudoScheduleParaInitialize(id, genesis)
as the extrinsic type, shown below.id: ParaId
to 2,000 (or whatever ParaId you used above), and set the parachain: Bool
option to Yes.genesisHead
, drag the genesis state file exported above, para-2000-genesis
, in.validationCode
, drag the genesis wasm file exported above, para-2000-wasm
, in.Note: When registering to the public Rococo testnet, ensure you set a unique
paraId
larger than 1,000. Values below 1,000 are reserved exclusively for system parachains.
The collator node may need to be restarted to get it functioning as expected. After a new epoch starts on the relay chain, your parachain will come online. Once this happens, you should see the collator start reporting parachain blocks:
# Notice the relay epoch change! Only then do we start parachain collating!
#
2021-05-30 17:00:04 [Relaychain] 💤 Idle (2 peers), best: #30 (0xfc02…2a2a), finalized #28 (0x10ff…6539), ⬇ 1.0kiB/s ⬆ 0.3kiB/s
2021-05-30 17:00:04 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0
2021-05-30 17:00:06 [Relaychain] 👶 New epoch 3 launching at block 0x68bc…0605 (block slot 270402601 >= start slot 270402601).
2021-05-30 17:00:06 [Relaychain] 👶 Next epoch starts at slot 270402611
2021-05-30 17:00:06 [Relaychain] ✨ Imported #31 (0x68bc…0605)
2021-05-30 17:00:06 [Parachain] Starting collation. relay_parent=0x68bcc93d24a31a2c89800a56c7a2b275fe9ca7bd63f829b64588ae0d99280605 at=0xd42bb78354bc21770e3f0930ed45c7377558d2d8e81ca4d457e573128aabf271
2021-05-30 17:00:06 [Parachain] 🙌 Starting consensus session on top of parent 0xd42bb78354bc21770e3f0930ed45c7377558d2d8e81ca4d457e573128aabf271
2021-05-30 17:00:06 [Parachain] 🎁 Prepared block for proposing at 1 [hash: 0xf6507812bf60bf53af1311f775aac03869be870df6b0406b2969784d0935cb92; parent_hash: 0xd42b…f271; extrinsics (2): [0x1bf5…1d76, 0x7c9b…4e23]]
2021-05-30 17:00:06 [Parachain] 🔖 Pre-sealed block for proposal at 1. Hash now 0x80fc151d7ccf228b802525022b6de257e42388ec7dc3c1dd7de491313650ccae, previously 0xf6507812bf60bf53af1311f775aac03869be870df6b0406b2969784d0935cb92.
2021-05-30 17:00:06 [Parachain] ✨ Imported #1 (0x80fc…ccae)
2021-05-30 17:00:06 [Parachain] Produced proof-of-validity candidate. block_hash=0x80fc151d7ccf228b802525022b6de257e42388ec7dc3c1dd7de491313650ccae
2021-05-30 17:00:09 [Relaychain] 💤 Idle (2 peers), best: #31 (0x68bc…0605), finalized #29 (0xa6fa…9e16), ⬇ 1.2kiB/s ⬆ 129.9kiB/s
2021-05-30 17:00:09 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0
2021-05-30 17:00:12 [Relaychain] ✨ Imported #32 (0x5e92…ba30)
2021-05-30 17:00:12 [Relaychain] Moving approval window from session 0..=2 to 0..=3
2021-05-30 17:00:12 [Relaychain] ✨ Imported #32 (0x8144…74eb)
2021-05-30 17:00:14 [Relaychain] 💤 Idle (2 peers), best: #32 (0x5e92…ba30), finalized #29 (0xa6fa…9e16), ⬇ 1.4kiB/s ⬆ 0.2kiB/s
2021-05-30 17:00:14 [Parachain] 💤 Idle (0 peers), best: #0 (0xd42b…f271), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0
2021-05-30 17:00:18 [Relaychain] ✨ Imported #33 (0x8c30…9ccd)
2021-05-30 17:00:18 [Parachain] Starting collation. relay_parent=0x8c30ce9e6e9867824eb2aff40148ac1ed64cf464f51c5f2574013b44b20f9ccd at=0x80fc151d7ccf228b802525022b6de257e42388ec7dc3c1dd7de491313650ccae
2021-05-30 17:00:19 [Relaychain] 💤 Idle (2 peers), best: #33 (0x8c30…9ccd), finalized #30 (0xfc02…2a2a), ⬇ 0.7kiB/s ⬆ 0.4kiB/s
2021-05-30 17:00:19 [Parachain] 💤 Idle (0 peers), best: #1 (0x80fc…ccae), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0
2021-05-30 17:00:22 [Relaychain] 👴 Applying authority set change scheduled at block #31
2021-05-30 17:00:22 [Relaychain] 👴 Applying GRANDPA set change to new set [(Public(88dc3417d5058ec4b4503e0c12ea1a0a89be200fe98922423d4334014fa6b0ee (5FA9nQDV...)), 1), (Public(d17c2d7823ebf260fd138f2d7e27d114c0145d968b5ff5006125f2414fadae69 (5GoNkf6W...)), 1)]
2021-05-30 17:00:22 [Relaychain] 👴 Imported justification for block #31 that triggers command Changing authorities, signaling voter.
2021-05-30 17:00:24 [Relaychain] ✨ Imported #34 (0x211b…febf)
2021-05-30 17:00:24 [Parachain] Starting collation. relay_parent=0x211b3c53bebeff8af05e8f283d59fe171b7f91a5bf9c4669d88943f5a42bfebf at=0x80fc151d7ccf228b802525022b6de257e42388ec7dc3c1dd7de491313650ccae
2021-05-30 17:00:24 [Parachain] 🙌 Starting consensus session on top of parent 0x80fc151d7ccf228b802525022b6de257e42388ec7dc3c1dd7de491313650ccae
2021-05-30 17:00:24 [Parachain] 🎁 Prepared block for proposing at 2 [hash: 0x10fcb3180e966729c842d1b0c4d8d2c4028cfa8bef02b909af5ef787e6a6a694; parent_hash: 0x80fc…ccae; extrinsics (2): [0x4a6c…1fc6, 0x6b84…7cea]]
2021-05-30 17:00:24 [Parachain] 🔖 Pre-sealed block for proposal at 2. Hash now 0x5087fd06b1b73d90cfc3ad175df8495b378fffbb02fea212cc9e49a00fd8b5a0, previously 0x10fcb3180e966729c842d1b0c4d8d2c4028cfa8bef02b909af5ef787e6a6a694.
2021-05-30 17:00:24 [Parachain] ✨ Imported #2 (0x5087…b5a0)
2021-05-30 17:00:24 [Parachain] Produced proof-of-validity candidate. block_hash=0x5087fd06b1b73d90cfc3ad175df8495b378fffbb02fea212cc9e49a00fd8b5a0
2021-05-30 17:00:24 [Relaychain] 💤 Idle (2 peers), best: #34 (0x211b…febf), finalized #31 (0x68bc…0605), ⬇ 1.0kiB/s ⬆ 130.1kiB/s
2021-05-30 17:00:24 [Parachain] 💤 Idle (0 peers), best: #1 (0x80fc…ccae), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0
2021-05-30 17:00:29 [Relaychain] 💤 Idle (2 peers), best: #34 (0x211b…febf), finalized #32 (0x5e92…ba30), ⬇ 0.2kiB/s ⬆ 0.1kiB/s
2021-05-30 17:00:29 [Parachain] 💤 Idle (0 peers), best: #1 (0x80fc…ccae), finalized #0 (0xd42b…f271), ⬇ 0 ⬆ 0
2021-05-30 17:00:30 [Relaychain] ✨ Imported #35 (0xee07…38a0)
2021-05-30 17:00:34 [Relaychain] 💤 Idle (2 peers), best: #35 (0xee07…38a0), finalized #33 (0x8c30…9ccd), ⬇ 0.9kiB/s ⬆ 0.3kiB/s
2021-05-30 17:00:34 [Parachain] 💤 Idle (0 peers), best: #1 (0x80fc…ccae), finalized #1 (0x80fc…ccae), ⬇ 0 ⬆ 0
2021-05-30 17:00:36 [Relaychain] ✨ Imported #36 (0xe8ce…4af6)
2021-05-30 17:00:36 [Parachain] Starting collation. relay_parent=0xe8cec8015c0c7bf508bf3f2f82b1696e9cca078e814b0f6671f0b0d5dfe84af6 at=0x5087fd06b1b73d90cfc3ad175df8495b378fffbb02fea212cc9e49a00fd8b5a0
2021-05-30 17:00:39 [Relaychain] 💤 Idle (2 peers), best: #36 (0xe8ce…4af6), finalized #33 (0x8c30…9ccd), ⬇ 0.6kiB/s ⬆ 0.1kiB/s
2021-05-30 17:00:39 [Parachain] 💤 Idle (0 peers), best: #2 (0x5087…b5a0), finalized #1 (0x80fc…ccae), ⬇ 0 ⬆ 0
Note the delay here! It may take some time for your relay chain to enter a new epoch.
Is this Cumulus Parachain Template Rococo & Westend testnets compatible? Yes!
See the Cumulus Workshop for the latest instructions to register a parathread/parachain on a relay chain.
NOTE: When running the relay chain and parachain, you must use the same tagged version of Polkadot and Cumulus so the collator would register successfully to the relay chain. You should test locally registering your parachain successfully before attempting to connect to any running relay chain network!
Find chainspec
files to connect to live networks here. You want to be sure to use the correct git release tag in these files, as they change from time to time and must match the live network!
These networks are under constant development - so please follow the progress and update of your parachains in lock step with the testnet changes if you wish to connect to the network. Do join the Parachain Technical matrix chat room to ask questions and connect with the parachain building teams.
Download Details:
Author: aresprotocols
Source Code: https://github.com/aresprotocols/substrate-parachain-template
License: Unlicense License
1594289280
The REST acronym is defined as a “REpresentational State Transfer” and is designed to take advantage of existing HTTP protocols when used for Web APIs. It is very flexible in that it is not tied to resources or methods and has the ability to handle different calls and data formats. Because REST API is not constrained to an XML format like SOAP, it can return multiple other formats depending on what is needed. If a service adheres to this style, it is considered a “RESTful” application. REST allows components to access and manage functions within another application.
REST was initially defined in a dissertation by Roy Fielding’s twenty years ago. He proposed these standards as an alternative to SOAP (The Simple Object Access Protocol is a simple standard for accessing objects and exchanging structured messages within a distributed computing environment). REST (or RESTful) defines the general rules used to regulate the interactions between web apps utilizing the HTTP protocol for CRUD (create, retrieve, update, delete) operations.
An API (or Application Programming Interface) provides a method of interaction between two systems.
A RESTful API (or application program interface) uses HTTP requests to GET, PUT, POST, and DELETE data following the REST standards. This allows two pieces of software to communicate with each other. In essence, REST API is a set of remote calls using standard methods to return data in a specific format.
The systems that interact in this manner can be very different. Each app may use a unique programming language, operating system, database, etc. So, how do we create a system that can easily communicate and understand other apps?? This is where the Rest API is used as an interaction system.
When using a RESTful API, we should determine in advance what resources we want to expose to the outside world. Typically, the RESTful API service is implemented, keeping the following ideas in mind:
The features of the REST API design style state:
For REST to fit this model, we must adhere to the following rules:
#tutorials #api #application #application programming interface #crud #http #json #programming #protocols #representational state transfer #rest #rest api #rest api graphql #rest api json #rest api xml #restful #soap #xml #yaml
1648641360
A symbolic natural language parsing library for Rust, inspired by HDPSG.
This is a library for parsing natural or constructed languages into syntax trees and feature structures. There's no machine learning or probabilistic models, everything is hand-crafted and deterministic.
You can find out more about the motivations of this project in this blog post.
I'm using this to parse a constructed language for my upcoming xenolinguistics game, Themengi.
Using a simple 80-line grammar, introduced in the tutorial below, we can parse a simple subset of English, checking reflexive pronoun binding, case, and number agreement.
$ cargo run --bin cli examples/reflexives.fgr
> she likes himself
Parsed 0 trees
> her likes herself
Parsed 0 trees
> she like herself
Parsed 0 trees
> she likes herself
Parsed 1 tree
(0..3: S
(0..1: N (0..1: she))
(1..2: TV (1..2: likes))
(2..3: N (2..3: herself)))
[
child-2: [
case: acc
pron: ref
needs_pron: #0 she
num: sg
child-0: [ word: herself ]
]
child-1: [
tense: nonpast
child-0: [ word: likes ]
num: #1 sg
]
child-0: [
child-0: [ word: she ]
case: nom
pron: #0
num: #1
]
]
Low resource language? Low problem! No need to train on gigabytes of text, just write a grammar using your brain. Let's hypothesize that in American Sign Language, topicalized nouns (expressed with raised eyebrows) must appear first in the sentence. We can write a small grammar (18 lines), and plug in some sentences:
$ cargo run --bin cli examples/asl-wordorder.fgr -n
> boy sit
Parsed 1 tree
(0..2: S
(0..1: NP ((0..1: N (0..1: boy))))
(1..2: IV (1..2: sit)))
> boy throw ball
Parsed 1 tree
(0..3: S
(0..1: NP ((0..1: N (0..1: boy))))
(1..2: TV (1..2: throw))
(2..3: NP ((2..3: N (2..3: ball)))))
> ball nm-raised-eyebrows boy throw
Parsed 1 tree
(0..4: S
(0..2: NP
(0..1: N (0..1: ball))
(1..2: Topic (1..2: nm-raised-eyebrows)))
(2..3: NP ((2..3: N (2..3: boy))))
(3..4: TV (3..4: throw)))
> boy throw ball nm-raised-eyebrows
Parsed 0 trees
As an example, let's say we want to build a parser for English reflexive pronouns (himself, herself, themselves, themself, itself). We'll also support number ("He likes X" v.s. "They like X") and simple embedded clauses ("He said that they like X").
Grammar files are written in a custom language, similar to BNF, called Feature GRammar (.fgr). There's a VSCode syntax highlighting extension for these files available as fgr-syntax
.
We'll start by defining our lexicon. The lexicon is the set of terminal symbols (symbols in the actual input) that the grammar will match. Terminal symbols must start with a lowercase letter, and non-terminal symbols must start with an uppercase letter.
// pronouns
N -> he
N -> him
N -> himself
N -> she
N -> her
N -> herself
N -> they
N -> them
N -> themselves
N -> themself
// names, lowercase as they are terminals
N -> mary
N -> sue
N -> takeshi
N -> robert
// complementizer
Comp -> that
// verbs -- intransitive, transitive, and clausal
IV -> falls
IV -> fall
IV -> fell
TV -> likes
TV -> like
TV -> liked
CV -> says
CV -> say
CV -> said
Next, we can add our sentence rules (they must be added at the top, as the first rule in the file is assumed to be the top-level rule):
// sentence rules
S -> N IV
S -> N TV N
S -> N CV Comp S
// ... previous lexicon ...
Assuming this file is saved as examples/no-features.fgr
(which it is :wink:), we can test this file with the built-in CLI:
$ cargo run --bin cli examples/no-features.fgr
> he falls
Parsed 1 tree
(0..2: S
(0..1: N (0..1: he))
(1..2: IV (1..2: falls)))
[
child-1: [ child-0: [ word: falls ] ]
child-0: [ child-0: [ word: he ] ]
]
> he falls her
Parsed 0 trees
> he likes her
Parsed 1 tree
(0..3: S
(0..1: N (0..1: he))
(1..2: TV (1..2: likes))
(2..3: N (2..3: her)))
[
child-2: [ child-0: [ word: her ] ]
child-1: [ child-0: [ word: likes ] ]
child-0: [ child-0: [ word: he ] ]
]
> he likes
Parsed 0 trees
> he said that he likes her
Parsed 1 tree
(0..6: S
(0..1: N (0..1: he))
(1..2: CV (1..2: said))
(2..3: Comp (2..3: that))
(3..6: S
(3..4: N (3..4: he))
(4..5: TV (4..5: likes))
(5..6: N (5..6: her))))
[
child-0: [ child-0: [ word: he ] ]
child-2: [ child-0: [ word: that ] ]
child-1: [ child-0: [ word: said ] ]
child-3: [
child-2: [ child-0: [ word: her ] ]
child-1: [ child-0: [ word: likes ] ]
child-0: [ child-0: [ word: he ] ]
]
]
> he said that he
Parsed 0 trees
This grammar already parses some correct sentences, and blocks some trivially incorrect ones. However, it doesn't care about number, case, or reflexives right now:
> she likes himself // unbound reflexive pronoun
Parsed 1 tree
(0..3: S
(0..1: N (0..1: she))
(1..2: TV (1..2: likes))
(2..3: N (2..3: himself)))
[
child-0: [ child-0: [ word: she ] ]
child-2: [ child-0: [ word: himself ] ]
child-1: [ child-0: [ word: likes ] ]
]
> him like her // incorrect case on the subject pronoun, should be nominative
// (he) instead of accusative (him)
Parsed 1 tree
(0..3: S
(0..1: N (0..1: him))
(1..2: TV (1..2: like))
(2..3: N (2..3: her)))
[
child-0: [ child-0: [ word: him ] ]
child-1: [ child-0: [ word: like ] ]
child-2: [ child-0: [ word: her ] ]
]
> he like her // incorrect verb number agreement
Parsed 1 tree
(0..3: S
(0..1: N (0..1: he))
(1..2: TV (1..2: like))
(2..3: N (2..3: her)))
[
child-2: [ child-0: [ word: her ] ]
child-1: [ child-0: [ word: like ] ]
child-0: [ child-0: [ word: he ] ]
]
To fix this, we need to add features to our lexicon, and restrict the sentence rules based on features.
Features are added with square brackets, and are key: value pairs separated by commas. **top**
is a special feature value, which basically means "unspecified" -- we'll come back to it later. Features that are unspecified are also assumed to have a **top**
value, but sometimes explicitly stating top is more clear.
/// Pronouns
// The added features are:
// * num: sg or pl, whether this noun wants a singular verb (likes) or
// a plural verb (like). note this is grammatical number, so for example
// singular they takes plural agreement ("they like X", not *"they likes X")
// * case: nom or acc, whether this noun is nominative or accusative case.
// nominative case goes in the subject, and accusative in the object.
// e.g., "he fell" and "she likes him", not *"him fell" and *"her likes he"
// * pron: he, she, they, or ref -- what type of pronoun this is
// * needs_pron: whether this is a reflexive that needs to bind to another
// pronoun.
N[ num: sg, case: nom, pron: he ] -> he
N[ num: sg, case: acc, pron: he ] -> him
N[ num: sg, case: acc, pron: ref, needs_pron: he ] -> himself
N[ num: sg, case: nom, pron: she ] -> she
N[ num: sg, case: acc, pron: she ] -> her
N[ num: sg, case: acc, pron: ref, needs_pron: she] -> herself
N[ num: pl, case: nom, pron: they ] -> they
N[ num: pl, case: acc, pron: they ] -> them
N[ num: pl, case: acc, pron: ref, needs_pron: they ] -> themselves
N[ num: sg, case: acc, pron: ref, needs_pron: they ] -> themself
// Names
// The added features are:
// * num: sg, as people are singular ("mary likes her" / *"mary like her")
// * case: **top**, as names can be both subjects and objects
// ("mary likes her" / "she likes mary")
// * pron: whichever pronoun the person uses for reflexive agreement
// mary pron: she => mary likes herself
// sue pron: they => sue likes themself
// takeshi pron: he => takeshi likes himself
N[ num: sg, case: **top**, pron: she ] -> mary
N[ num: sg, case: **top**, pron: they ] -> sue
N[ num: sg, case: **top**, pron: he ] -> takeshi
N[ num: sg, case: **top**, pron: he ] -> robert
// Complementizer doesn't need features
Comp -> that
// Verbs -- intransitive, transitive, and clausal
// The added features are:
// * num: sg, pl, or **top** -- to match the noun numbers.
// **top** will match either sg or pl, as past-tense verbs in English
// don't agree in number: "he fell" and "they fell" are both fine
// * tense: past or nonpast -- this won't be used for agreement, but will be
// copied into the final feature structure, and the client code could do
// something with it
IV[ num: sg, tense: nonpast ] -> falls
IV[ num: pl, tense: nonpast ] -> fall
IV[ num: **top**, tense: past ] -> fell
TV[ num: sg, tense: nonpast ] -> likes
TV[ num: pl, tense: nonpast ] -> like
TV[ num: **top**, tense: past ] -> liked
CV[ num: sg, tense: nonpast ] -> says
CV[ num: pl, tense: nonpast ] -> say
CV[ num: **top**, tense: past ] -> said
Now that our lexicon is updated with features, we can update our sentence rules to constrain parsing based on those features. This uses two new features, tags and unification. Tags allow features to be associated between nodes in a rule, and unification controls how those features are compatible. The rules for unification are:
If unification fails anywhere, the parse is aborted and the tree is discarded. This allows the programmer to discard trees if features don't match.
// Sentence rules
// Intransitive verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #1)
S -> N[ case: nom, num: #1 ] IV[ num: #1 ]
// Transitive verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #2)
// * If there's a reflexive in the object position, make sure its `needs_pron`
// feature matches the subject's `pron` feature. If the object isn't a
// reflexive, then its `needs_pron` feature will implicitly be `**top**`, so
// will unify with anything.
S -> N[ case: nom, pron: #1, num: #2 ] TV[ num: #2 ] N[ case: acc, needs_pron: #1 ]
// Clausal verb:
// * Subject must be nominative case
// * Subject and verb must agree in number (copied through #1)
// * Reflexives can't cross clause boundaries (*"He said that she likes himself"),
// so we can ignore reflexives and delegate to inner clause rule
S -> N[ case: nom, num: #1 ] CV[ num: #1 ] Comp S
Now that we have this augmented grammar (available as examples/reflexives.fgr
), we can try it out and see that it rejects illicit sentences that were previously accepted, while still accepting valid ones:
> he fell
Parsed 1 tree
(0..2: S
(0..1: N (0..1: he))
(1..2: IV (1..2: fell)))
[
child-1: [
child-0: [ word: fell ]
num: #0 sg
tense: past
]
child-0: [
pron: he
case: nom
num: #0
child-0: [ word: he ]
]
]
> he like him
Parsed 0 trees
> he likes himself
Parsed 1 tree
(0..3: S
(0..1: N (0..1: he))
(1..2: TV (1..2: likes))
(2..3: N (2..3: himself)))
[
child-1: [
num: #0 sg
child-0: [ word: likes ]
tense: nonpast
]
child-2: [
needs_pron: #1 he
num: sg
child-0: [ word: himself ]
pron: ref
case: acc
]
child-0: [
child-0: [ word: he ]
pron: #1
num: #0
case: nom
]
]
> he likes herself
Parsed 0 trees
> mary likes herself
Parsed 1 tree
(0..3: S
(0..1: N (0..1: mary))
(1..2: TV (1..2: likes))
(2..3: N (2..3: herself)))
[
child-0: [
pron: #0 she
num: #1 sg
case: nom
child-0: [ word: mary ]
]
child-1: [
tense: nonpast
child-0: [ word: likes ]
num: #1
]
child-2: [
child-0: [ word: herself ]
num: sg
pron: ref
case: acc
needs_pron: #0
]
]
> mary likes themself
Parsed 0 trees
> sue likes themself
Parsed 1 tree
(0..3: S
(0..1: N (0..1: sue))
(1..2: TV (1..2: likes))
(2..3: N (2..3: themself)))
[
child-0: [
pron: #0 they
child-0: [ word: sue ]
case: nom
num: #1 sg
]
child-1: [
tense: nonpast
num: #1
child-0: [ word: likes ]
]
child-2: [
needs_pron: #0
case: acc
pron: ref
child-0: [ word: themself ]
num: sg
]
]
> sue likes himself
Parsed 0 trees
If this is interesting to you and you want to learn more, you can check out my blog series, the excellent textbook Syntactic Theory: A Formal Introduction (2nd ed.), and the DELPH-IN project, whose work on the LKB inspired this simplified version.
I need to write this section in more detail, but if you're comfortable with Rust, I suggest looking through the codebase. It's not perfect, it started as one of my first Rust projects (after migrating through F# -> TypeScript -> C in search of the right performance/ergonomics tradeoff), and it could use more tests, but overall it's not too bad.
Basically, the processing pipeline is:
Grammar
structGrammar
is defined in rules.rs
.Grammar
is Grammar::parse_from_file
, which is mostly a hand-written recusive descent parser in parse_grammar.rs
. Yes, I recognize the irony here.Grammar::parse
, which does everything for you, or Grammar::parse_chart
, which just does the chart)earley.rs
forest.rs
, using an algorithm I found in a very useful blog series I forget the URL for, because the algorithms in the academic literature for this are... weird.The most interesting thing you can do via code and not via the CLI is probably getting at the raw feature DAG, as that would let you do things like pronoun coreference. The DAG code is in featurestructure.rs
, and should be fairly approachable -- there's a lot of Rust ceremony around Rc<RefCell<...>>
because using an arena allocation crate seemed too harlike overkill, but that is somewhat mitigated by the NodeRef
type alias. Hit me up at https://vgel.me/contact if you need help with anything here!
Download Details:
Author: vgel
Source Code: https://github.com/vgel/treebender
License: MIT License