Logo
Home Products Support Contact About Us
arrow1 File Converters
arrow1 TIFF and PDF apps
arrow1 Forensic
arrow1 Freeware


Convert Word to Text via Command Line — Server Batch Converter

You have folders of DOC and DOCX files and you need the readable text body, not the formatting — for full-text indexing, search-engine ingestion, NLP and machine-learning pipelines, eDiscovery review, or long-term archival. Opening each file in Word and saving as plain text does not scale past a handful of files, and it leaves Word formatting artefacts behind. Total Doc Converter X strips Word formatting and writes clean UTF-8 or ANSI text from the command line, in batch, with no GUI and no Microsoft Word installation required. Install it on a Windows server, call it from a script or via ActiveX, and let it run unattended.

What Total Doc Converter X Does

  • Batch conversion — pass a wildcard (*.docx) and the converter processes every matching file in one run
  • Clean text output — strips fonts, colors, paragraph styles, headers, and footers, leaving only the readable text body
  • Encoding control — write UTF-8, UTF-16, ANSI, or any Windows code page to match your downstream pipeline
  • BOM handling — emit or suppress the UTF-8 byte-order mark to match the requirements of search indexers and parsers
  • Multilingual content — preserves Cyrillic, CJK, Arabic, Hebrew, and any Unicode script the source DOC/DOCX contains
  • No Word required — the converter parses DOC and DOCX directly without Microsoft Office on the server
  • No GUI — runs silently from the command line with no pop-up windows or confirmation dialogs
  • ActiveX / COM — call the converter from .NET, VBScript, PHP, Python, or any COM-compatible environment to embed conversion into your own application
  • .bat scripting — save commands in batch files and schedule them with Windows Task Scheduler for fully automated conversion

Word to Text command line conversion

Download Free Trial

(30 days, no email)

Buy License

(server license, perpetual)

Windows 7/8/10/11 • Server 2008/2012/2016/2019/2022

Word vs Text: Why Convert?

DOC and DOCX are Microsoft Word formats built for human reading and editing. A DOCX file is a ZIP container with XML parts, embedded media, styles, revision history, comments, and tracked changes. A search engine, an indexer, an LLM tokenizer, or an eDiscovery pipeline does not care about any of that — it cares about the readable text. Pointing those systems at raw Word files forces every consumer to ship its own DOC/DOCX parser, and the parsers disagree on edge cases.

TXT is the lowest common denominator. Every search indexer, every NLP toolkit, every grep-style tool, every diff utility reads plain text the same way. Converting Word to TXT once, on the server, gives every downstream consumer the same clean input. Tables flatten to tab-separated rows or line breaks. Images drop out. Headers and footers can be retained or stripped, depending on your flag. What remains is the body content, ready for indexing or feature extraction.

DOC / DOCXTXT
ContentText + formatting + mediaText only
File sizeTens to hundreds of KBTypically 5–20% of the original
IndexingRequires DOC/DOCX parserWorks with any indexer or tokenizer
TablesStructured cellsFlattened to tab-separated rows
ImagesEmbeddedRemoved
AudienceReviewers, editorsSearch, NLP, archival, eDiscovery

How to Convert Word to Text from the Command Line

Step 1. Install Total Doc Converter X

Download the installer from the link above and run it on your Windows server or workstation. The setup takes under a minute. No Microsoft Word, LibreOffice, or browser installation is required — the converter parses DOC and DOCX directly using its own engine, and writes plain text in the encoding you specify.

Step 2. Open the Command Prompt

Open cmd.exe or PowerShell. The converter executable is DOCConverter.exe, located in the installation folder (typically C:\Program Files\CoolUtils\TotalDocConverterX\). Add it to your system PATH or use the full path in your commands.

Step 3. Run the Basic Conversion

The simplest command converts all DOCX files in a folder to TXT:

DOCConverter.exe C:\Docs\*.docx C:\Output\ -c TXT -Encoding UTF-8

This processes every .docx file in C:\Docs\ and saves the resulting TXT files in C:\Output\. Each Word file produces one TXT with the same base name. Use *.doc for legacy Word 97–2003 documents, or *.do* to catch both at once.

Step 4. Add Encoding and Logging Options

Control the TXT output with additional flags:

DOCConverter.exe C:\Docs\*.docx C:\Output\ -c TXT -Encoding UTF-8 -BOM 0 -log C:\Logs\word2txt.log
  • -Encoding UTF-8 — output encoding (UTF-8, UTF-16, ANSI, 1251, 1252, etc.)
  • -BOM 0 — suppress the UTF-8 byte-order mark; use -BOM 1 to write it
  • -LineBreaks CRLF — Windows-style \r\n or LF for Unix-style \n
  • -log C:\Logs\word2txt.log — write a conversion log for verification

Step 5. Automate with a .bat File

Save your command in a .bat file and schedule it with Windows Task Scheduler:

@echo off
"C:\Program Files\CoolUtils\TotalDocConverterX\DOCConverter.exe" C:\Incoming\*.docx C:\Archive\TXT\ -c TXT -Encoding UTF-8 -BOM 0 -log C:\Logs\word2txt.log

This runs the conversion every night (or at whatever interval you set) and writes a log file so you can verify the results. Pair the output folder with your search indexer or your NLP ingestion job and the pipeline runs end to end with no manual step.

ActiveX / COM Integration

Total Doc Converter X includes a full ActiveX interface. You can call the converter from any COM-compatible environment — .NET, VBScript, PHP, Python, Ruby, or ASP. This lets you embed Word-to-Text conversion into your own web application, eDiscovery platform, or document workflow without shelling out to a command-line process.

Example (C#/.NET):

DOCConverterX Cnv = new DOCConverterX();
Cnv.Convert("C:\\Docs\\contract.docx", "C:\\Output\\contract.txt", "-c TXT -Encoding UTF-8 -BOM 0 -log c:\\Logs\\doc.log");

Example (PHP):

$c = new COM("DOCConverter.DOCConverterX");
$c->convert("C:\\Docs\\contract.docx", "C:\\Output\\contract.txt", "-c TXT -Encoding UTF-8 -BOM 0 -log c:\\Logs\\doc.log");

The same call works from ASP.NET, VBScript, Python, Ruby, Perl, and JavaScript (Windows Script Host). Your web application can accept uploaded Word files and return clean UTF-8 text to the indexer, the LLM endpoint, or the storage layer in real time.

Online Converters vs Total Doc Converter X

FeatureOnline ConvertersTotal Doc Converter X
Batch processingOne file at a timeUnlimited files per batch
File privacyFiles uploaded to third-party serverFiles never leave your machine
Encoding controlUTF-8 only, BOM forcedUTF-8, UTF-16, ANSI, code pages, BOM on/off
Multilingual contentInconsistent on CJK, RTL scriptsFull Unicode preserved
AutomationManual onlyCommand line, .bat, Task Scheduler, ActiveX
Server deploymentNot possibleDesigned for servers, no GUI needed
Requires Word installedN/ANo
Requires internetYesNo

When You Need Word to Text Command-Line Conversion

  • Full-text search indexing. An enterprise search engine indexes a corporate document share. Pointing it at raw DOCX files forces it to ship its own parser; pointing it at TXT files lets any indexer (Elasticsearch, Solr, Sphinx, Manticore) read the content directly. The converter prepares the corpus once, the indexer ingests forever.
  • Feeding contracts to an LLM or ML pipeline. Tokenizers and embedding models work on plain text. A nightly job converts new contracts to TXT and pushes them to the embedding store, where a retrieval-augmented model can answer questions about clause language without choking on Word XML.
  • Legal text mining and eDiscovery. A litigation-support team needs the textual body of thousands of DOC and DOCX exhibits for keyword search, concept clustering, and predictive coding. Plain text is the input format every eDiscovery tool understands the same way.
  • Email and log preservation for compliance. Outlook exports email bodies as DOCX. Compliance archives need plain-text copies that any auditor can read in twenty years without a Word installation. The converter strips the Word wrapper and stores clean TXT alongside the original.
  • NLP feature extraction. Sentiment scoring, named-entity recognition, and topic modelling run on token streams. Converting the source DOCX to UTF-8 TXT once means the NLP pipeline does not re-parse the same document on every run.
  • Long-term archival. Word formats evolve. A DOC from 2001 already needs a compatibility shim. Plain UTF-8 text will still open in any editor in 2050. Archiving the readable body alongside the original is cheap insurance.

Why Total Doc Converter X

No Word Required

The converter parses DOC and DOCX directly. You do not need Microsoft Office, LibreOffice, or any word processor installed on the server. This avoids licensing costs and the well-known instability of automating Word in unattended scenarios.

True Server Application

Total Doc Converter X is designed for unattended use. No GUI windows, no dialog boxes, no confirmation prompts. It runs silently from the command line or as part of a service — exactly what a production server needs.

Encoding That Matches Your Pipeline

Search indexers, NLP frameworks, and legacy archives each have their own encoding rules. Total Doc Converter X writes UTF-8 with or without BOM, UTF-16 LE or BE, Windows ANSI code pages 1251 and 1252, and any other code page registered on the system. Cyrillic contracts, Japanese product manuals, Arabic correspondence, and German technical documentation all survive the conversion intact — the converter reads the source DOC/DOCX as Unicode and writes the chosen output encoding without lossy transliteration. Set -Encoding once in your .bat file and the output matches downstream consumers byte for byte.

Not Just TXT

The same command-line tool converts Word to PDF, HTML, RTF, XLS, TIFF, JPEG, and more. One installation covers all your Word conversion needs. Change -c TXT to -c PDF and you get PDF output with the same batch and automation features.

Download Free Trial

(30 days, no email or credit card)

Buy License

(server license, perpetual)

Windows 7/8/10/11 • Server 2008/2012/2016/2019/2022


quote

Total Doc Converter X Customer Reviews 2026

Rate It
Rated 4.7/5 based on customer reviews
5 Star

"We process several thousand DOC and DOCX exhibits per matter. Total Doc Converter X runs as a nightly batch on the eDiscovery server and produces UTF-8 text copies for keyword search and concept clustering. The text body is clean — no Word artefacts, no header/footer noise, tables flattened to tabs. Setting -BOM 0 was the small detail that made our indexer happy on the first try."

5 Star Caroline Whitford Litigation Support Specialist, Mid-Atlantic Law Group

"We feed contract corpora into an embedding pipeline for retrieval-augmented search. Parsing DOCX inside the pipeline was slow and brittle, and python-docx disagreed with Word on table cells. Pre-converting to plain TXT with DOCConverter.exe removed both problems. The .bat file lives in Task Scheduler, the embedding job reads TXT, and we stopped fighting Word XML."

5 Star Devansh Iyer NLP Engineer

"Compliance asked us to keep plain-text copies of every clinical document alongside the originals for long-term archival. We picked Total Doc Converter X because it does not need Word on the file server, and the encoding flag let us standardise on UTF-8 without BOM across the archive. Documentation could be more detailed on the table-flattening rules, but support answered our questions the same day."

4 Star Margaret Holloway Records Manager, Regional Health Network

FAQ ▼

The basic command is: DOCConverter.exe C:\Docs\*.docx C:\Output\ -c TXT -Encoding UTF-8. This converts every Word file in the source folder to TXT. Use *.doc for legacy Word 97–2003 files, or *.do* to catch both DOC and DOCX in one run.
Pass -Encoding followed by the target encoding. Supported values include UTF-8, UTF-16, UTF-16BE, ANSI, and any Windows code page registered on the system (for example 1251 for Cyrillic Windows or 1252 for Western European). The output bytes match exactly what the downstream indexer or parser expects.
Yes. Add -BOM 0 to write a clean UTF-8 stream with no byte-order mark. Use -BOM 1 to emit the BOM. Some search indexers and JSON parsers reject files that start with a BOM, while some Windows-native tools require it — the flag lets you match either side without post-processing.
Yes. The source DOC or DOCX is read as Unicode, so Cyrillic, Greek, Arabic, Hebrew, Chinese, Japanese, Korean, and Indic scripts all reach the output intact. Choose -Encoding UTF-8 for full Unicode coverage in a single byte stream, or pick a code page if your archive standard requires one.
Tables are flattened to text. Each row becomes a line; cells within a row are separated by tabs by default. This produces a TSV-like layout that any spreadsheet, indexer, or pandas reader can parse. The column structure of the original table is preserved as long as your downstream tool understands tab-separated values.
By default headers and footers are included once per document, not repeated on every page, so the text body stays clean. Use -IncludeHeaders 0 to drop them entirely, or -IncludeHeaders 1 to keep them. Page numbers are stripped because TXT has no concept of pages.
Yes. Total Doc Converter X registers as a COM/ActiveX object (DOCConverter.DOCConverterX). You can call it from .NET, PHP, Python, VBScript, ASP, Ruby, Perl, and any other COM-compatible environment. Your web application can accept uploaded DOC/DOCX files and return UTF-8 text to the indexer or LLM endpoint in real time.

 

Start working now!

Download free trial and convert your files in minutes.
No credit card or email required.

⬇ Download Free Trial Windows 7/8/10/11 • 135 MB

Examples of Total Doc Converter X

Convert Doc files with Total Doc Converter X and .NET


string src  = @"C:\test\Source.docx";
string dest = @"C:\test\Dest.pdf";

var cnv = new DocConverterX();
cnv.Convert(src, dest, "-cPDF -log c:\\test\\Doc.log");

if (!string.IsNullOrEmpty(cnv.ErrorMessage))
    throw new Exception(cnv.ErrorMessage);

Convert Doc files on web servers with Total Doc Converter X

public static class Function1
    {
        [FunctionName("Function1")]
        public static async Task Run(
            [HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
            ILogger log)
        {
            StringBuilder sbLogs = new StringBuilder();
            sbLogs.AppendLine("started...");
            try
            {
                ProcessStartInfo startInfo = new ProcessStartInfo();
                startInfo.CreateNoWindow = true;
                startInfo.UseShellExecute = false;
                var assemblyDirectoryPath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location);
                assemblyDirectoryPath = assemblyDirectoryPath.Substring(0, assemblyDirectoryPath.Length - 4);

                var executablePath = $@"{assemblyDirectoryPath}\Converter\DocConverterX.exe";
                sbLogs.AppendLine(executablePath + "...");
                var srcPath = $@"{assemblyDirectoryPath}\src\sample.docx";
                var outPath = Path.GetTempFileName() + ".pdf";
                startInfo.FileName = executablePath;

                if (File.Exists(outPath))
                {
                    File.Delete(outPath);
                }

                if (File.Exists(executablePath) && File.Exists(srcPath))
                {
                    sbLogs.AppendLine("files exists...");
                }
                else
                    sbLogs.AppendLine("EXE & source files NOT exists...");
                startInfo.WindowStyle = ProcessWindowStyle.Hidden;
                startInfo.Arguments = $"\"{srcPath}\" \"{outPath}\" -cPDF";
                using (Process exeProcess = Process.Start(startInfo))
                {
                    sbLogs.AppendLine($"wait...{DateTime.Now.ToString()}");
                    exeProcess.WaitForExit();
                    sbLogs.AppendLine($"complete...{DateTime.Now.ToString()}");
                }
                sbLogs.AppendLine("Conversion complete.");
            }
            catch (Exception ex)
            {
                sbLogs.AppendLine(ex.ToString());
            }

            return new OkObjectResult(sbLogs);
        }
    }
More information about Azure Functions.

Convert Doc files on web servers with Total Doc Converter X

dim C
Set C=CreateObject("DocConverter.DocConverterX")
C.Convert "c:\source.docx", "c:\dest.pdf", "-cPDF -log c:\doc.log"
Response.Write C.ErrorMessage
set C = nothing

Stream the resulting PDF directly from ASP

dim C
Set C=CreateObject("DocConverter.DocConverterX")
Response.Clear
Response.AddHeader "Content-Type", "binary/octet-stream"
Response.AddHeader "Content-Disposition", "attachment; filename=test.pdf"
Response.BinaryWrite C.ConvertToStream("C:\www\ASP\Source.docx", "C:\www\ASP", "-cpdf -log c:\doc.log")
set C = nothing

Convert Doc files with PHP and Total Doc Converter X

$src="C:\\test\\test.docx";
$dest="C:\\test\\test.pdf";
if (file_exists($dest)) unlink($dest);
$c= new COM("DocConverter.DocConverterX");
$c->convert($src,$dest, "-cPDF -log c:\\test\\Doc.log");
if (file_exists($dest)) echo "OK"; else echo "fail:".$c->ErrorMessage;

Convert Doc files with Total Doc Converter X and Ruby

require 'win32ole'
c = WIN32OLE.new('DocConverter.DocConverterX')

src = "C:\\test\\test.docx"
dest = "C:\\test\\test.pdf"

c.convert(src, dest, "-cPDF -log c:\\test\\Doc.log")

if not File.exist?(dest)
  puts c.ErrorMessage
end

Convert Doc files with Total Doc Converter X and Python

import win32com.client
import os.path

c = win32com.client.Dispatch("DocConverter.DocConverterX")

src  = "C:\\test\\test.docx"
dest = "C:\\test\\test.pdf"

c.convert(src, dest, "-cPDF -log c:\\test\\Doc.log")

if not os.path.exists(dest):
    print(c.ErrorMessage)

Convert Doc files with Pascal and Total Doc Converter X

uses Dialogs, Vcl.OleAuto;

var
  c: OleVariant;
begin
  c := CreateOleObject('DocConverter.DocConverterX');
  c.Convert('c:\test\source.docx', 'c:\test\dest.pdf', '-cPDF -log c:\test\Doc.log');
  if c.ErrorMessage <> '' then
    ShowMessage(c.ErrorMessage);
end;

Convert Doc files on web servers with Total Doc Converter X

var c = new ActiveXObject("DocConverter.DocConverterX");
c.Convert("C:\\test\\source.docx", "C:\\test\\dest.pdf", "-cPDF");
if (c.ErrorMessage != "")
  alert(c.ErrorMessage)

Convert Doc files with Total Doc Converter X and Perl

use Win32::OLE;

my $src  = "C:\\test\\test.docx";
my $dest = "C:\\test\\test.pdf";

my $c = CreateObject Win32::OLE 'DocConverter.DocConverterX';
$c->convert($src, $dest, "-cPDF -log c:\\test\\Doc.log");
print $c->ErrorMessage if -e $dest;

Support
Total Doc Converter X Preview1

Latest News

Newsletter Subscribe

No worries, we don't spam.


© 2026. All rights reserved. CoolUtils File Converters

Cards