Motivation

I read a lot of online news sites to help me catch up with the latest stories. From time to time, I found that some news stories from different sources are simply rephrasing each other. This happens sometimes with technical articles too, but I digress.

This love of reading news stories lead me into the idea of text summarization. Wouldn’t it be great if I can summarize across multiple news sources and let me gather all the information?

I have another goal in mind. I love computer languages. Rust has been on my radar for a long time but I have never come up with a good excuse to learn to use it. I figured that a data process program like text summarization could be a good fit to learn about the gory details of memory ownership.

Text Summarization in Short

There are basically two techniques when it comes to text summarization: Abstractive and Extractive.

Abstraction-based summarization takes words from the original article based on semantic meaning. It can sometimes pick words from another source if the words fit the meaning. The idea is not unlike how a human would have summarize a piece of text. As you can imagine, this is not an easy problem and would almost require some form of machine understanding.

Extraction-based summarization takes a different approach. Instead of trying to understand the underlying text. It uses some mathematical formulas to rank each sentence from the article and output only sentences that are above certain score. This way, the meaning of the original text is mostly preserved without coding the machine to understand.

In this article, we will use an extraction based summarization technique. It’s perfect for individual developer, like me (or you), to experiment with and to appreciate the problem.

#rust

A Simple Text Summarizer written in Rust
2.05 GEEK