April 28, 2024

DocEngines

Nerds Without Borders – Technology for Everyone

Home » OCR » OCR App Development Node.js: A Journey Through OCR App Development with Node.js, pdf.js, & Tesseract.js

OCR App Development Node.js: A Journey Through OCR App Development with Node.js, pdf.js, & Tesseract.js

Home » OCR » OCR App Development Node.js: A Journey Through OCR App Development with Node.js, pdf.js, & Tesseract.js

Full-Stack OCR web application using Node.js and React

Learn how to develop a full-stack OCR web application using Node.js and React; this comprehensive guide will walk you through building a full-stack Optical Character Recognition (OCR) web application using Node.js and React. The application allows users to upload PDFs or images, processes these documents to extract text using Tesseract.js, and displays a list of processed documents. The tutorial is structured into several parts, covering the setup of both the client and server sides, and integration of OCR functionality.

Text Recognition Software Development

Furthermore, the tutorial delves into best practices for efficiently handling file uploads and managing the asynchronous nature of OCR processing in a web environment. Emphasis is placed on creating a user-friendly interface with React, ensuring a smooth user experience from document upload to text extraction. The guide also explores error handling and performance optimization, crucial for maintaining a responsive application. By the end of this comprehensive tutorial, readers will have gained valuable insights into not only the technical integration of OCR with Node.js and React but also the practical considerations involved in developing a high-performing, full-stack web application.

Prerequisites

Setting Up the Project

Server-Side Setup

Initialize a Node.js Project

Create a project directory and initialize a Node.js project:

mkdir ocr-web-app
cd ocr-web-app
npm init -y

Install Dependencies

Install Express for the server framework and other necessary packages:

npm install express multer pdfjs-dist tesseract.js

Server Implementation

In the root of your project, create a file named server.js. This will be our simple Express server setup.

const express = require('express');
const app = express();
const PORT = process.env.PORT || 3001;

app.get('/', (req, res) => {
  res.send('OCR Web App Backend Running');
});

app.listen(PORT, () => {
  console.log(`Server listening on port ${PORT}`);
});

Run the Server

Start your server with Node:

node server.js

Client-Side Setup

Create a React Application

In the same project directory, create a React application named client

Enter the following a command in a new and separate terminal (i.e., cmd) session, as the server is running off the other session. Be careful, avoid terminating sessions that are required to run.

npx create-react-app client

Install Client Dependencies

Navigate to the client folder and install necessary packages:

cd client
npm install react-router-dom

Configure React Router

In the src directory of your React app, modify App.js to include React Router for navigation:

import React from 'react';
import { BrowserRouter as Router, Route, Routes } from 'react-router-dom';
import DocumentUploadPage from './components/DocumentUploadPage';
import DocumentListPage from './components/DocumentListPage';
import { OCRProvider } from './context/OCRContext';

function App() {
  return (
    <OCRProvider>
      <Router>
        <Routes>
          <Route path="/upload" element={<DocumentUploadPage />} />
          <Route path="/documents" element={<DocumentListPage />} />
          <Route path="/" element={<DocumentUploadPage />} />
        </Routes>
      </Router>
    </OCRProvider>
  );
}

export default App;

Implement OCRContext

Create a context directory under src and add OCRContext.js:

import React, { createContext, useContext, useState } from 'react';

const OCRContext = createContext();

export function useOCR() {
  return useContext(OCRContext);
}

export const OCRProvider = ({ children }) => {
  const [ocrResult, setOcrResult] = useState('');

  const value = {
    ocrResult,
    setOcrResult,
  };

  return <OCRContext.Provider value={value}>{children}</OCRContext.Provider>;
};

Adding Components

OCRComponent

In src/components, add OCRComponent.js:

import React, { useState } from 'react';
import Tesseract from 'tesseract.js';
import { getDocument, GlobalWorkerOptions } from 'pdfjs-dist/legacy/build/pdf';
import './OCRComponent.css'; // Import a CSS file for styling

GlobalWorkerOptions.workerSrc = `${process.env.PUBLIC_URL}/pdf.worker.min.js`;

const OCRComponent = () => {
  const [ocrText, setOcrText] = useState('');
  const [isLoading, setIsLoading] = useState(false);
  const [progress, setProgress] = useState(0); // New state for tracking progress

  const handleFileChange = async (event) => {
    setIsLoading(true);
    setProgress(0); // Reset progress on new file upload
    const file = event.target.files[0];
    setOcrText('');

    if (!file) {
      setIsLoading(false);
      return;
    }

    if (file.type === 'application/pdf') {
      const pdf = await getDocument(URL.createObjectURL(file)).promise;
      let allText = '';

      for (let pageNum = 1; pageNum <= pdf.numPages; pageNum++) {
        const page = await pdf.getPage(pageNum);
        const viewport = page.getViewport({ scale: 2 });
        const canvas = document.createElement('canvas');
        const context = canvas.getContext('2d');
        canvas.height = viewport.height;
        canvas.width = viewport.width;

        await page.render({ canvasContext: context, viewport }).promise;
        const text = await Tesseract.recognize(
          canvas,
          'eng',
          {
            logger: m => {
              if (m.status === 'recognizing text') {
                setProgress((pageNum - 1) / pdf.numPages + m.progress / pdf.numPages);
              }
            }
          }
        ).then(({ data: { text } }) => text);

        allText += text + '\n\n';
      }

      setOcrText(allText);
    } else {
      Tesseract.recognize(
        file,
        'eng',
        {
          logger: m => {
            if (m.status === 'recognizing text') {
              setProgress(m.progress);
            }
          }
        }
      ).then(({ data: { text } }) => {
        setOcrText(text);
      });
    }

    setIsLoading(false);
  };

  return (
    <div className="ocr-container">
      <input type="file" onChange={handleFileChange} accept="image/*,application/pdf" className="file-input"/>
      {isLoading && <div className="progress-bar" style={{ width: `${progress * 100}%` }}></div>}
      <textarea value={ocrText} readOnly className="ocr-result"></textarea>
      {isLoading && <p>Processing, please wait...</p>}
    </div>
  );
};

export default OCRComponent;
  • File Input: The <input type="file" /> element allows users to select files from their device. The accept attribute limits the file selection to image files and PDFs.
  • handleFileChange Function: This function is triggered when a user selects a file. It checks the file type and processes it accordingly.
  • For PDFs, it uses pdf.js to read and convert each page into a canvas, which is then processed by Tesseract.js for OCR.
  • For images, it directly uses Tesseract.js to extract text.
  • State Management: The ocrText state stores the extracted text, and isLoading manages the loading state to provide user feedback during processing.

This setup provides a basic yet functional approach to handling file uploads for OCR processing in a React application.

Create a CSS file named OCRComponent.css in the same directory as your OCRComponent.js to add styling to your component.

OCRComponent.js with Styling and Progress Indicator

Navigate to /src folder and create the following file entitled “OCRComponent.css”:

.ocr-container {
  display: flex;
  flex-direction: column;
  align-items: center;
  margin-top: 20px;
}

.file-input {
  margin-bottom: 20px;
  border: 1px solid #ccc;
  display: inline-block;
  padding: 6px 12px;
  cursor: pointer;
}

.progress-bar {
  height: 5px;
  background-color: #4CAF50;
  margin-bottom: 20px;
  width: 0%; /* Initial width */
  transition: width 0.5s ease-in-out;
}

.ocr-result {
  width: 90%;
  height: 300px;
  margin-bottom: 20px;
  border: 1px solid #ddd;
  font-family: monospace;
}
  • Styling: The CSS provides a more appealing visual appearance. The .ocr-container styles the container for better alignment, .file-input styles the file input button, .progress-bar creates a visual progress indicator, and .ocr-result styles the textarea for displaying OCR results.
  • Progress Indicator: The progress indicator dynamically updates as OCR processing progresses, offering users real-time feedback. The width of the .progress-bar element is updated based on the OCR processing progress, providing a visual cue to the user about the current status of the process.
  • Dynamic Progress Update: The progress state is updated during the OCR process to reflect the current progress. This is particularly useful for PDF files with multiple pages, as it gives users an idea of the processing progress across all pages.
  • User Feedback: By updating the progress indicator and displaying a processing message, the application provides immediate feedback to the user, enhancing the user experience by making the application feel more responsive and interactive.

These enhancements make the OCR component more user-friendly and visually appealing, providing a better overall user experience.

For the next couple of file, navigate to /src/comonents and create the files as set forth below:

DocumentUploadPage

In the same directory, add DocumentUploadPage.js:

import React from 'react';
import OCRComponent from './OCRComponent';

const DocumentUploadPage = () => {
  return (
    <div>
      <h1>Upload Image or PDF - OCR Tool</h1>
      <OCRComponent />
    </div>
  );
};

export default DocumentUploadPage;

DocumentListPage

Similarly, add DocumentListPage.js:

import React from 'react';

const DocumentListPage = () => {
  return (
    <div>
      <h1>Document List Page</h1>
    </div>
  );
};

export default DocumentListPage;

Before running the application, folow the next step.

Verify the Worker Script Location:

Ensure that pdf.worker.min.mjs is placed in the public folder of your React application. The public folder is the correct place for static assets that need to be accessible by the browser directly. Most likely, you will have to move this file.

Run the React Application

Navigate back to the client directory and start the React development server:

npm start

Folder Structure

Check your folder structure, your files should like this…see folder structure diagram:

ocr-web-app/
│
├── client/                   # React front-end
│   ├── public/               # Public files
│   │   ├── index.html        # Entry HTML file
│   │   ├── pdf.worker.min.js # PDF.js worker script
│   │   └── ...               # Other static files
│   │
│   ├── src/                  # Source files
│   │   ├── components/       # React components
│   │   │   ├── DocumentListPage.js  # Component for listing documents
│   │   │   ├── DocumentUploadPage.js # Component for uploading documents
│   │   │   └── OCRComponent.js       # Component for handling OCR
│   │   │
│   │   ├── context/          # Context for global state management
│   │   │   └── OCRContext.js # OCR context definition
│   │   │
│   │   ├── App.js            # Main React application component
│   │   ├── index.js          # Entry point for React application
│   │   ├── OCRComponent.css  # Styles for the OCRComponent
│   │   └── ...               # Other source files
│   │
│   └── package.json          # Client-side dependencies and scripts
│
├── server.js                 # Node.js server setup
└── package.json              # Server-side dependencies and scripts

This structure organizes your application into a clear separation of concerns, with the client directory containing all front-end React code, including components, context, and styles. The public folder inside the client directory holds static assets, such as the PDF.js worker script required for processing PDF files in the browser.

The server.js file in the root directory sets up the Express server for your application, handling backend logic, API routes, or serving the built React application in a production environment.

Each JavaScript file within the components and context directories serves a specific purpose, from managing global state with OCRContext.js to handling user interactions and displaying content in DocumentListPage.jsDocumentUploadPage.js, and OCRComponent.js.

About The Author

RSS
fb-share-icon
LinkedIn
Share
WhatsApp
Reddit
Copy link