Subscribe
Home Craft How to Extract Text From PDFs and Images on Linux Using gImageReader

How to Extract Text From PDFs and Images on Linux Using gImageReader

by Staff
0 comment

If you’re a student or your work involves working with lots of images and PDFs, you’d have, at some point, felt the need to extract text from an image or a document.

Luckily, text extraction makes this possible. And there are several tools that you can use to do this. gImageReader is one of the many tools. It’s free to use and works with both image files and PDF documents.

MAKEUSEOF VIDEO OF THE DAY

Let’s dive in to check out gImageReader in detail and see how you can use it to extract text from images and PDFs.

What Is gImageReader?

gImageReader is an app that lets you extract text from images and PDFs on Linux. It’s essentially a GUI or front-end to the Tesseract OCR engine, an open-source engine developed by Hewlett-Packard that’s considered to be one of the best OCR engines available.

With gImageReader, you can easily and quite accurately extract text from images or PDF documents with a few simple clicks. You can then export the extracted text to a text or PDF file for further use.


Features of gImageReader

gImageReader packs the following features:

  • Import PDF documents and images from different sources (disk, scanning devices, clipboard, and screenshot)
  • Batch process images or documents, i.e., extract text from multiple images or documents at once
  • Recognize text snippets as plain text or hOCR documents
  • Built-in spell checker
  • Automatic text area detection
  • Basic image/document editing
  • Save output as a text file

How to Install gImageReader on Linux

gImageReader is available on most major Linux distros. But before you proceed with its installation, you need to install the Tesseract OCR engine on your system.

To do this, open the Software Manager on your system and search for tesseract. When it returns a list of results, install the tesseract-ocr and tesseract-ocr-eng packages. You can also use command-line package managers to install the package if you’re more comfortable with the terminal.

After this, check out the installation instructions in the following sections to install gImageReader on your computer.

If you’re on Debian or Ubuntu, open the terminal and run the below commands to install gImageReader:

sudo add-apt-repository ppa:sandromani/gimagereader
sudo apt-get update
sudo apt install gimagereader

On Fedora, CentOS, or Red Hat Enterprise Linux (RHEL):

sudo dnf install gimagereader-qt 

On Arch Linux or Manjaro:

sudo pacman -S gimagereader

openSUSE users can install gImageReader using:

sudo zypper install gimagereader

In case you’re using any other Linux distro, you can build gImageReader from the source by following the instructions over at gImageReader’s GitHub.

How to Use gImageReader on Linux

gImageReader is pretty easy to use and works with all kinds of image files as well as PDF documents. Follow the instructions below to extract text from images or PDFs on Linux.


Open the applications menu, search for gImageReader, and launch the app. Hit the Maximize button in the gImageReader window to open it in full-screen view.

Now, click the Add images button on the left pane under the toolbar and use the file browser to select the image(s) or PDF(s) from which you want to extract text.

Click Ok to import the image(s) or PDF(s) to gImageReader. Or, if you want to extract text from what’s displayed on the screen, click on the dropdown beside the Add images button and select Take Screenshot. gImageReader will take a screenshot of the screen’s content.

Once you’ve added the image to gImageReader, click the Toggle output pane button (one with the notepad icon) to bring up the output pane. This is where the text you extract from images or PDFs appears.

Depending on how you want to proceed, you now have the option to identify the text in the image or PDF automatically or manually. To do this automatically, click on the Autodetect layout button, and it will highlight all the text blocks in the selected image or PDF document.


After this, tap on Recognize selection > Current Page to begin the text extraction process.

Alternatively, to select the text manually, hover over the text you want to extract, and using the cross-hair draw a box around the area from where you want to extract the text. Then, hit the Recognize selection button to proceed.

If it’s a PDF document, and you want to extract text from different pages, tap on the Plus (+) button to flip pages over.

To go back, hit the Minus () button. And then, select the text you want to extract and hit the Recognize selection button to extract it.

Although rare, there may be times when gImageReader would return the extracted text in a language other than English. When this happens, simply tap on the dropdown button beside Recognize selection button and select one of the English options.

Finally, to save the extracted text, click on the Save output button. This will bring up the Save window. Here, give a name to the file and hit Ok.

What Else Can You Do With gImageReader?

As mentioned earlier, gImageReader also gives you the option to modify certain aspects of the imported images or documents, like their brightness, contrast, and resolution. Additionally, you can also invert colors or rotate the images or documents, if required.

Most of these options can prove to be useful when the text in an image or document isn’t legible to gImageReader, and is, therefore, preventing the tool from recognizing the text.

To access any of these editing options, click the Image Controls button, and it will reveal a mini toolbar below the main toolbar. From here, select the appropriate buttons to perform your desired editing operation on the image or document.

Text extraction often requires the right tool: one that employs a reliable and accurate OCR engine that enables it to identify text in an image or document effectively, so you can extract it efficiently without any hassle.

gImageReader accomplishes this nicely, thanks to the Tesseract OCR engine it uses in the background. Considering its ease of use, gImageReader is undoubtedly one of the best text extraction tools available for Linux.

Alternatively, if you’re looking for a simpler solution, you can check out TextSnatcher, which is fast and pretty easy to use.

Read the full article here

SaleBestseller No. 1
Apple AirPods Max Wireless Over-Ear Headphones. Active Noise Cancelling, Transparency Mode, Spatial Audio, Digital Crown for Volume Control. Bluetooth Headphones for iPhone - Green
Apple AirPods Max Wireless Over-Ear Headphones. Active Noise Cancelling, Transparency Mode, Spatial Audio, Digital Crown for Volume Control. Bluetooth Headphones for iPhone - Green
 Apple-designed dynamic driver provides high-fidelity audio; Active Noise Cancellation blocks outside noise, so you can immerse yourself in music
$449.99
SaleBestseller No. 3
Apple iPad Air 2, 64 GB, Space Gray (Renewed)
Apple iPad Air 2, 64 GB, Space Gray (Renewed)
Apple iOS 8; 9.7-Inch Retina Display; 2048x1536 Resolution; A8X Chip with 64-bit Architecture; M8 Motion Coprocessor
$129.99
SaleBestseller No. 4
2021 Apple 10.2-inch iPad (Wi-Fi, 64GB) - Silver
2021 Apple 10.2-inch iPad (Wi-Fi, 64GB) - Silver
Gorgeous 10.2-inch Retina display with True Tone; A13 Bionic chip with Neural Engine; 8MP Wide back camera, 12MP Ultra Wide front camera with Center Stage
$269.99
Bestseller No. 5
2022 Apple TV 4K Wi‑Fi with 64GB Storage (3rd Generation)
2022 Apple TV 4K Wi‑Fi with 64GB Storage (3rd Generation)
4K Dolby Vision and HDR10+ for vivid picture quality; Dolby Atmos for three-dimensional, theater-like sound
$123.49
Bestseller No. 7
Apple AirTag 4 Pack
Apple AirTag 4 Pack
Keep track of and find your items alongside friends and devices in the Find My app; Simple one-tap setup instantly connects AirTag with your iPhone or iPad
Bestseller No. 8
Apple MacBook Air with Intel Core i5, 1.6GHz, (13-inch, 4GB,128GB SSD) - Silver (Renewed)
Apple MacBook Air with Intel Core i5, 1.6GHz, (13-inch, 4GB,128GB SSD) - Silver (Renewed)
1.6 GHz dual-core Intel Core i5 (Turbo Boost up to 2.7 GHz) with 3 MB shared L3 cache; 13.3-Inch (diagonal) LED-backlit Glossy Widescreen Display, 1440 x 900 resolution
$305.00
Bestseller No. 9
Apple Of My Eye
Apple Of My Eye
Amazon Prime Video (Video on Demand); Amy Smart, Burt Reynolds, Liam McIntyre (Actors); Castille Landon (Director) - Castille Landon (Writer) - Dori A. Rath (Producer)
$3.99
SaleBestseller No. 10
Apple 35W Dual USB-C Port Compact Power Adapter ​​​​​​​
Apple 35W Dual USB-C Port Compact Power Adapter ​​​​​​​
The compact size and folding prongs make it easy to pack and store.; Charging cable sold separately.
$52.00

You may also like

Leave a Comment

Iman Hearts is one of the biggest lifestyle news and articles portals, we provide the latest news and articles about family, lifestyle, entertainment, and many more, follow us to get the latest news about what matters to you.

 

© 2022 Iman Hearts. All rights reserved. Sitemap