How to Convert PDF to Text in Java

Utilize Optical Character Recognition (OCR) technology to convert a PDF to text using an API in Java.

Without the ability to copy, paste, or edit within a PDF document, it can be a frustrating task to manually transcribe a PDF to text. Fortunately for us, we have Optical Character Recognition (OCR) technology to help us out. We have discussed this a bit in previous articles, but to clarify, optical character recognition or optical character reader is the electronic or mechanical conversion of images of typed, handwritten, or printed text into machine-encoded text.

OCR is most popular as a form of data entry for printed paper data records, but it is also frequently used to digitize printed texts so that they can be edited, stored compactly, or displayed online. This technology has been refined and trained to recognize patterns, and now with the additional assistance of AI, can provide a high degree of accuracy with little effort.

In the following tutorial, we will provide instructions on how to utilize an OCR API to scan a PDF document and convert it to text, automating what would normally be a long and drawn-out process. The operation supports various quality levels and a wide array of languages, so you can customize it to fit your project’s needs.

#java #api #pdf

What is GEEK

Buddha Community

How to Convert PDF to Text in Java
Tyrique  Littel

Tyrique Littel

1600135200

How to Install OpenJDK 11 on CentOS 8

What is OpenJDK?

OpenJDk or Open Java Development Kit is a free, open-source framework of the Java Platform, Standard Edition (or Java SE). It contains the virtual machine, the Java Class Library, and the Java compiler. The difference between the Oracle OpenJDK and Oracle JDK is that OpenJDK is a source code reference point for the open-source model. Simultaneously, the Oracle JDK is a continuation or advanced model of the OpenJDK, which is not open source and requires a license to use.

In this article, we will be installing OpenJDK on Centos 8.

#tutorials #alternatives #centos #centos 8 #configuration #dnf #frameworks #java #java development kit #java ee #java environment variables #java framework #java jdk #java jre #java platform #java sdk #java se #jdk #jre #open java development kit #open source #openjdk #openjdk 11 #openjdk 8 #openjdk runtime environment

Samanta  Moore

Samanta Moore

1621118940

How to Convert ODT Files to PDF in Java

Convert Office Open Document Text File (ODT) to standard PDF, DOCX, PNG, and JPG using Java.

Microsoft has maintained its position in the spotlight for formatted document creation and editing for good reason. Its extreme ease of use and lack of a learning curve has transformed the Microsoft Office Suite into a household name for most computer users in the United States as well as globally. This is propagated further through its almost ubiquitous use in education, as students are raised and taught using these applications.

The issue that arises with these programs, however, is their operation costs. For Apple and other non-Windows-based Operating Systems, the purchasing fees for Office can be steep. This, then, creates a paywall separating potential users from programs to which they are already accustomed. As an answer this problem, Microsoft created the OpenOffice application, which is a free, opensource version of the classic Office Suite. Within this application, you can perform almost all of the same functions as Office Suite, including creating text documents like one would with Microsoft Word. These text documents can be made using OpenOffice Writer, and are formatted using the .ODT file type. While this file type can be opened and saved using OpenOffice Writer and Word, in order to convert the file to a different format such as PDF you will need to run it through a conversion process.

The following APIs will allow you to convert your ODT documents to PDF, DOCX, PNG, and JPG for use in whatever way you need. The goal of this tutorial is to provide a simple and efficient means for instantly converting your ODT files without needing to find or download any extraneous programming.

#java #tutorial #api #pdf #java api #pdf converter #api access keys #api tutorial #java api tutorials #java apis

Mitchel  Carter

Mitchel Carter

1603519200

How to Convert a PDF to PNG or JPG in Java

For sharing documents both in hardcopy and digitally, the PDF file format is the preferred choice.Because of its high-versatility and compatibility between different operating systems, the PDF format allows users to create, edit, encrypt, and lock important documents for viewing on any browser or with any PDF viewing application such as Adobe Acrobat. Furthermore, its flexibility means that almost any other file type can be converted into PDF format without loss of quality or corruption of formatting. This means that complex file types such as DOCX and XLSX documents can be converted and shared easily in a protected format that will limit the chance of accidental edits or formatting errors.

However, if you are planning to display examples or insert an image of a PDF document in a separate file or web page, converting your PDF files to an image array will be more useful. For example, if you are creating a PowerPoint showing the on-boarding process for your organization and need to include images of different documents or contracts associated with the process, converting your PDF files into JPG will allow you to quickly and effortlessly insert the image and scale or crop it according to the needs of your presentation. When performing a similar process for a web page, having your document available as a PNG image will optimize it for viewing online within your website. It also prevents users from downloading and editing the document as would be possible with a PDF.

In this article, we will review three Conversion APIs that will allow you to convert any PDF document into an image. This includes conversion to a PNG or JPG array with one image created per page in your document. We will also discuss how you can merge and stack your PDF pages for conversion into a single PNG, or “tall” image.

Our goal for this tutorial is to simplify and improve your versatility for document display and sharing. Furthermore, as most documents can be converted to PDF, you can apply these APIs to any file, post-PDF-conversion. This will greatly increase your deliverable scope and production value.

The first API that we will discuss here provides an automated process for converting a PDF document to a PNG array. As an array, one PNG image will be created of each page within the PDF document, meaning you can choose specific pages for use without need for extra input commands.

#java #tutorial #api #images #pdf #png #api access keys #jpg #convert pdf into images #convert pdf to jpeg

Navigating Between DOM Nodes in JavaScript

In the previous chapters you've learnt how to select individual elements on a web page. But there are many occasions where you need to access a child, parent or ancestor element. See the JavaScript DOM nodes chapter to understand the logical relationships between the nodes in a DOM tree.

DOM node provides several properties and methods that allow you to navigate or traverse through the tree structure of the DOM and make changes very easily. In the following section we will learn how to navigate up, down, and sideways in the DOM tree using JavaScript.

Accessing the Child Nodes

You can use the firstChild and lastChild properties of the DOM node to access the first and last direct child node of a node, respectively. If the node doesn't have any child element, it returns null.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");
console.log(main.firstChild.nodeName); // Prints: #text

var hint = document.getElementById("hint");
console.log(hint.firstChild.nodeName); // Prints: SPAN
</script>

Note: The nodeName is a read-only property that returns the name of the current node as a string. For example, it returns the tag name for element node, #text for text node, #comment for comment node, #document for document node, and so on.

If you notice the above example, the nodeName of the first-child node of the main DIV element returns #text instead of H1. Because, whitespace such as spaces, tabs, newlines, etc. are valid characters and they form #text nodes and become a part of the DOM tree. Therefore, since the <div> tag contains a newline before the <h1> tag, so it will create a #text node.

To avoid the issue with firstChild and lastChild returning #text or #comment nodes, you could alternatively use the firstElementChild and lastElementChild properties to return only the first and last element node, respectively. But, it will not work in IE 9 and earlier.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");
alert(main.firstElementChild.nodeName); // Outputs: H1
main.firstElementChild.style.color = "red";

var hint = document.getElementById("hint");
alert(hint.firstElementChild.nodeName); // Outputs: SPAN
hint.firstElementChild.style.color = "blue";
</script>

Similarly, you can use the childNodes property to access all child nodes of a given element, where the first child node is assigned index 0. Here's an example:

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");

// First check that the element has child nodes 
if(main.hasChildNodes()) {
    var nodes = main.childNodes;
    
    // Loop through node list and display node name
    for(var i = 0; i < nodes.length; i++) {
        alert(nodes[i].nodeName);
    }
}
</script>

The childNodes returns all child nodes, including non-element nodes like text and comment nodes. To get a collection of only elements, use children property instead.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");

// First check that the element has child nodes 
if(main.hasChildNodes()) {
    var nodes = main.children;
    
    // Loop through node list and display node name
    for(var i = 0; i < nodes.length; i++) {
        alert(nodes[i].nodeName);
    }
}
</script>

#javascript 

Samanta  Moore

Samanta Moore

1620458875

Going Beyond Java 8: Local Variable Type Inference (var) - DZone Java

According to some surveys, such as JetBrains’s great survey, Java 8 is currently the most used version of Java, despite being a 2014 release.

What you are reading is one in a series of articles titled ‘Going beyond Java 8,’ inspired by the contents of my book, Java for Aliens. These articles will guide you step-by-step through the most important features introduced to the language, starting from version 9. The aim is to make you aware of how important it is to move forward from Java 8, explaining the enormous advantages that the latest versions of the language offer.

In this article, we will talk about the most important new feature introduced with Java 10. Officially called local variable type inference, this feature is better known as the **introduction of the word **var. Despite the complicated name, it is actually quite a simple feature to use. However, some observations need to be made before we can see the impact that the introduction of the word var has on other pre-existing characteristics.

#java #java 11 #java 10 #java 12 #var #java 14 #java 13 #java 15 #verbosity