Full-Stack OCR web application using Node.js and React
Learn how to develop a full-stack OCR web application using Node.js and React; this comprehensive guide will walk you through building a full-stack Optical Character Recognition (OCR) web application using Node.js and React. The application allows users to upload PDFs or images, processes these documents to extract text using Tesseract.js, and displays a list of processed documents. The tutorial is structured into several parts, covering the setup of both the client and server sides, and integration of OCR functionality.
Text Recognition Software Development
Furthermore, the tutorial delves into best practices for efficiently handling file uploads and managing the asynchronous nature of OCR processing in a web environment. Emphasis is placed on creating a user-friendly interface with React, ensuring a smooth user experience from document upload to text extraction. The guide also explores error handling and performance optimization, crucial for maintaining a responsive application. By the end of this comprehensive tutorial, readers will have gained valuable insights into not only the technical integration of OCR with Node.js and React but also the practical considerations involved in developing a high-performing, full-stack web application.
Prerequisites
- Basic knowledge of JavaScript, Node.js, and React.
- Node.js and npm installed on your system. See https://nodejs.org/en/download and https://nodejs.org/docs/latest/api/
- An understanding of client-server architecture.
Setting Up the Project
Server-Side Setup
Initialize a Node.js Project
Create a project directory and initialize a Node.js project:
mkdir ocr-web-app
cd ocr-web-app
npm init -y
Install Dependencies
Install Express for the server framework and other necessary packages:
npm install express multer pdfjs-dist tesseract.js
Server Implementation
In the root of your project, create a file named server.js
. This will be our simple Express server setup.
const express = require('express');
const app = express();
const PORT = process.env.PORT || 3001;
app.get('/', (req, res) => {
res.send('OCR Web App Backend Running');
});
app.listen(PORT, () => {
console.log(`Server listening on port ${PORT}`);
});
Run the Server
Start your server with Node:
node server.js
Client-Side Setup
Create a React Application
In the same project directory, create a React application named client
Enter the following a command in a new and separate terminal (i.e., cmd) session, as the server is running off the other session. Be careful, avoid terminating sessions that are required to run.
npx create-react-app client
Install Client Dependencies
Navigate to the client
folder and install necessary packages:
cd client
npm install react-router-dom
Configure React Router
In the src
directory of your React app, modify App.js
to include React Router for navigation:
import React from 'react';
import { BrowserRouter as Router, Route, Routes } from 'react-router-dom';
import DocumentUploadPage from './components/DocumentUploadPage';
import DocumentListPage from './components/DocumentListPage';
import { OCRProvider } from './context/OCRContext';
function App() {
return (
<OCRProvider>
<Router>
<Routes>
<Route path="/upload" element={<DocumentUploadPage />} />
<Route path="/documents" element={<DocumentListPage />} />
<Route path="/" element={<DocumentUploadPage />} />
</Routes>
</Router>
</OCRProvider>
);
}
export default App;
Implement OCRContext
Create a context
directory under src
and add OCRContext.js
:
import React, { createContext, useContext, useState } from 'react';
const OCRContext = createContext();
export function useOCR() {
return useContext(OCRContext);
}
export const OCRProvider = ({ children }) => {
const [ocrResult, setOcrResult] = useState('');
const value = {
ocrResult,
setOcrResult,
};
return <OCRContext.Provider value={value}>{children}</OCRContext.Provider>;
};
Adding Components
OCRComponent
In src/components
, add OCRComponent.js
:
import React, { useState } from 'react';
import Tesseract from 'tesseract.js';
import { getDocument, GlobalWorkerOptions } from 'pdfjs-dist/legacy/build/pdf';
import './OCRComponent.css'; // Import a CSS file for styling
GlobalWorkerOptions.workerSrc = `${process.env.PUBLIC_URL}/pdf.worker.min.js`;
const OCRComponent = () => {
const [ocrText, setOcrText] = useState('');
const [isLoading, setIsLoading] = useState(false);
const [progress, setProgress] = useState(0); // New state for tracking progress
const handleFileChange = async (event) => {
setIsLoading(true);
setProgress(0); // Reset progress on new file upload
const file = event.target.files[0];
setOcrText('');
if (!file) {
setIsLoading(false);
return;
}
if (file.type === 'application/pdf') {
const pdf = await getDocument(URL.createObjectURL(file)).promise;
let allText = '';
for (let pageNum = 1; pageNum <= pdf.numPages; pageNum++) {
const page = await pdf.getPage(pageNum);
const viewport = page.getViewport({ scale: 2 });
const canvas = document.createElement('canvas');
const context = canvas.getContext('2d');
canvas.height = viewport.height;
canvas.width = viewport.width;
await page.render({ canvasContext: context, viewport }).promise;
const text = await Tesseract.recognize(
canvas,
'eng',
{
logger: m => {
if (m.status === 'recognizing text') {
setProgress((pageNum - 1) / pdf.numPages + m.progress / pdf.numPages);
}
}
}
).then(({ data: { text } }) => text);
allText += text + '\n\n';
}
setOcrText(allText);
} else {
Tesseract.recognize(
file,
'eng',
{
logger: m => {
if (m.status === 'recognizing text') {
setProgress(m.progress);
}
}
}
).then(({ data: { text } }) => {
setOcrText(text);
});
}
setIsLoading(false);
};
return (
<div className="ocr-container">
<input type="file" onChange={handleFileChange} accept="image/*,application/pdf" className="file-input"/>
{isLoading && <div className="progress-bar" style={{ width: `${progress * 100}%` }}></div>}
<textarea value={ocrText} readOnly className="ocr-result"></textarea>
{isLoading && <p>Processing, please wait...</p>}
</div>
);
};
export default OCRComponent;
- File Input: The
<input type="file" />
element allows users to select files from their device. Theaccept
attribute limits the file selection to image files and PDFs. - handleFileChange Function: This function is triggered when a user selects a file. It checks the file type and processes it accordingly.
- For PDFs, it uses
pdf.js
to read and convert each page into a canvas, which is then processed by Tesseract.js for OCR. - For images, it directly uses Tesseract.js to extract text.
- State Management: The
ocrText
state stores the extracted text, andisLoading
manages the loading state to provide user feedback during processing.
This setup provides a basic yet functional approach to handling file uploads for OCR processing in a React application.
Create a CSS file named OCRComponent.css
in the same directory as your OCRComponent.js
to add styling to your component.
OCRComponent.js with Styling and Progress Indicator
Navigate to /src folder and create the following file entitled “OCRComponent.css”:
.ocr-container {
display: flex;
flex-direction: column;
align-items: center;
margin-top: 20px;
}
.file-input {
margin-bottom: 20px;
border: 1px solid #ccc;
display: inline-block;
padding: 6px 12px;
cursor: pointer;
}
.progress-bar {
height: 5px;
background-color: #4CAF50;
margin-bottom: 20px;
width: 0%; /* Initial width */
transition: width 0.5s ease-in-out;
}
.ocr-result {
width: 90%;
height: 300px;
margin-bottom: 20px;
border: 1px solid #ddd;
font-family: monospace;
}
- Styling: The CSS provides a more appealing visual appearance. The
.ocr-container
styles the container for better alignment,.file-input
styles the file input button,.progress-bar
creates a visual progress indicator, and.ocr-result
styles the textarea for displaying OCR results. - Progress Indicator: The progress indicator dynamically updates as OCR processing progresses, offering users real-time feedback. The width of the
.progress-bar
element is updated based on the OCR processing progress, providing a visual cue to the user about the current status of the process. - Dynamic Progress Update: The
progress
state is updated during the OCR process to reflect the current progress. This is particularly useful for PDF files with multiple pages, as it gives users an idea of the processing progress across all pages. - User Feedback: By updating the progress indicator and displaying a processing message, the application provides immediate feedback to the user, enhancing the user experience by making the application feel more responsive and interactive.
These enhancements make the OCR component more user-friendly and visually appealing, providing a better overall user experience.
For the next couple of file, navigate to /src/comonents and create the files as set forth below:
DocumentUploadPage
In the same directory, add DocumentUploadPage.js
:
import React from 'react';
import OCRComponent from './OCRComponent';
const DocumentUploadPage = () => {
return (
<div>
<h1>Upload Image or PDF - OCR Tool</h1>
<OCRComponent />
</div>
);
};
export default DocumentUploadPage;
DocumentListPage
Similarly, add DocumentListPage.js
:
import React from 'react';
const DocumentListPage = () => {
return (
<div>
<h1>Document List Page</h1>
</div>
);
};
export default DocumentListPage;
Before running the application, folow the next step.
Verify the Worker Script Location:
Ensure that pdf.worker.min.mjs
is placed in the public
folder of your React application. The public
folder is the correct place for static assets that need to be accessible by the browser directly. Most likely, you will have to move this file.
Run the React Application
Navigate back to the client
directory and start the React development server:
npm start
Folder Structure
Check your folder structure, your files should like this…see folder structure diagram:
ocr-web-app/
│
├── client/ # React front-end
│ ├── public/ # Public files
│ │ ├── index.html # Entry HTML file
│ │ ├── pdf.worker.min.js # PDF.js worker script
│ │ └── ... # Other static files
│ │
│ ├── src/ # Source files
│ │ ├── components/ # React components
│ │ │ ├── DocumentListPage.js # Component for listing documents
│ │ │ ├── DocumentUploadPage.js # Component for uploading documents
│ │ │ └── OCRComponent.js # Component for handling OCR
│ │ │
│ │ ├── context/ # Context for global state management
│ │ │ └── OCRContext.js # OCR context definition
│ │ │
│ │ ├── App.js # Main React application component
│ │ ├── index.js # Entry point for React application
│ │ ├── OCRComponent.css # Styles for the OCRComponent
│ │ └── ... # Other source files
│ │
│ └── package.json # Client-side dependencies and scripts
│
├── server.js # Node.js server setup
└── package.json # Server-side dependencies and scripts
This structure organizes your application into a clear separation of concerns, with the client
directory containing all front-end React code, including components, context, and styles. The public
folder inside the client
directory holds static assets, such as the PDF.js worker script required for processing PDF files in the browser.
The server.js
file in the root directory sets up the Express server for your application, handling backend logic, API routes, or serving the built React application in a production environment.
Each JavaScript file within the components
and context
directories serves a specific purpose, from managing global state with OCRContext.js
to handling user interactions and displaying content in DocumentListPage.js
, DocumentUploadPage.js
, and OCRComponent.js
.
1 thought on “OCR App Development Node.js: A Journey Through OCR App Development with Node.js, pdf.js, & Tesseract.js”