You have folders of DOC and DOCX files and you need the readable text body, not the formatting — for full-text indexing, search-engine ingestion, NLP and machine-learning pipelines, eDiscovery review, or long-term archival. Opening each file in Word and saving as plain text does not scale past a handful of files, and it leaves Word formatting artefacts behind. Total Doc Converter X strips Word formatting and writes clean UTF-8 or ANSI text from the command line, in batch, with no GUI and no Microsoft Word installation required. Install it on a Windows server, call it from a script or via ActiveX, and let it run unattended.
*.docx) and the converter processes every matching file in one run
(30 days, no email)
(server license, perpetual)
Windows 7/8/10/11 • Server 2008/2012/2016/2019/2022
DOC and DOCX are Microsoft Word formats built for human reading and editing. A DOCX file is a ZIP container with XML parts, embedded media, styles, revision history, comments, and tracked changes. A search engine, an indexer, an LLM tokenizer, or an eDiscovery pipeline does not care about any of that — it cares about the readable text. Pointing those systems at raw Word files forces every consumer to ship its own DOC/DOCX parser, and the parsers disagree on edge cases.
TXT is the lowest common denominator. Every search indexer, every NLP toolkit, every grep-style tool, every diff utility reads plain text the same way. Converting Word to TXT once, on the server, gives every downstream consumer the same clean input. Tables flatten to tab-separated rows or line breaks. Images drop out. Headers and footers can be retained or stripped, depending on your flag. What remains is the body content, ready for indexing or feature extraction.
| DOC / DOCX | TXT | |
|---|---|---|
| Content | Text + formatting + media | Text only |
| File size | Tens to hundreds of KB | Typically 5–20% of the original |
| Indexing | Requires DOC/DOCX parser | Works with any indexer or tokenizer |
| Tables | Structured cells | Flattened to tab-separated rows |
| Images | Embedded | Removed |
| Audience | Reviewers, editors | Search, NLP, archival, eDiscovery |
Download the installer from the link above and run it on your Windows server or workstation. The setup takes under a minute. No Microsoft Word, LibreOffice, or browser installation is required — the converter parses DOC and DOCX directly using its own engine, and writes plain text in the encoding you specify.
Open cmd.exe or PowerShell. The converter executable is DOCConverter.exe, located in the installation folder (typically C:\Program Files\CoolUtils\TotalDocConverterX\). Add it to your system PATH or use the full path in your commands.
The simplest command converts all DOCX files in a folder to TXT:
DOCConverter.exe C:\Docs\*.docx C:\Output\ -c TXT -Encoding UTF-8
This processes every .docx file in C:\Docs\ and saves the resulting TXT files in C:\Output\. Each Word file produces one TXT with the same base name. Use *.doc for legacy Word 97–2003 documents, or *.do* to catch both at once.
Control the TXT output with additional flags:
DOCConverter.exe C:\Docs\*.docx C:\Output\ -c TXT -Encoding UTF-8 -BOM 0 -log C:\Logs\word2txt.log
-Encoding UTF-8 — output encoding (UTF-8, UTF-16, ANSI, 1251, 1252, etc.)-BOM 0 — suppress the UTF-8 byte-order mark; use -BOM 1 to write it-LineBreaks CRLF — Windows-style \r\n or LF for Unix-style \n-log C:\Logs\word2txt.log — write a conversion log for verificationSave your command in a .bat file and schedule it with Windows Task Scheduler:
@echo off "C:\Program Files\CoolUtils\TotalDocConverterX\DOCConverter.exe" C:\Incoming\*.docx C:\Archive\TXT\ -c TXT -Encoding UTF-8 -BOM 0 -log C:\Logs\word2txt.log
This runs the conversion every night (or at whatever interval you set) and writes a log file so you can verify the results. Pair the output folder with your search indexer or your NLP ingestion job and the pipeline runs end to end with no manual step.
Total Doc Converter X includes a full ActiveX interface. You can call the converter from any COM-compatible environment — .NET, VBScript, PHP, Python, Ruby, or ASP. This lets you embed Word-to-Text conversion into your own web application, eDiscovery platform, or document workflow without shelling out to a command-line process.
Example (C#/.NET):
DOCConverterX Cnv = new DOCConverterX();
Cnv.Convert("C:\\Docs\\contract.docx", "C:\\Output\\contract.txt", "-c TXT -Encoding UTF-8 -BOM 0 -log c:\\Logs\\doc.log");
Example (PHP):
$c = new COM("DOCConverter.DOCConverterX");
$c->convert("C:\\Docs\\contract.docx", "C:\\Output\\contract.txt", "-c TXT -Encoding UTF-8 -BOM 0 -log c:\\Logs\\doc.log");
The same call works from ASP.NET, VBScript, Python, Ruby, Perl, and JavaScript (Windows Script Host). Your web application can accept uploaded Word files and return clean UTF-8 text to the indexer, the LLM endpoint, or the storage layer in real time.
| Feature | Online Converters | Total Doc Converter X |
|---|---|---|
| Batch processing | One file at a time | Unlimited files per batch |
| File privacy | Files uploaded to third-party server | Files never leave your machine |
| Encoding control | UTF-8 only, BOM forced | UTF-8, UTF-16, ANSI, code pages, BOM on/off |
| Multilingual content | Inconsistent on CJK, RTL scripts | Full Unicode preserved |
| Automation | Manual only | Command line, .bat, Task Scheduler, ActiveX |
| Server deployment | Not possible | Designed for servers, no GUI needed |
| Requires Word installed | N/A | No |
| Requires internet | Yes | No |
The converter parses DOC and DOCX directly. You do not need Microsoft Office, LibreOffice, or any word processor installed on the server. This avoids licensing costs and the well-known instability of automating Word in unattended scenarios.
Total Doc Converter X is designed for unattended use. No GUI windows, no dialog boxes, no confirmation prompts. It runs silently from the command line or as part of a service — exactly what a production server needs.
Search indexers, NLP frameworks, and legacy archives each have their own encoding rules. Total Doc Converter X writes UTF-8 with or without BOM, UTF-16 LE or BE, Windows ANSI code pages 1251 and 1252, and any other code page registered on the system. Cyrillic contracts, Japanese product manuals, Arabic correspondence, and German technical documentation all survive the conversion intact — the converter reads the source DOC/DOCX as Unicode and writes the chosen output encoding without lossy transliteration. Set -Encoding once in your .bat file and the output matches downstream consumers byte for byte.
The same command-line tool converts Word to PDF, HTML, RTF, XLS, TIFF, JPEG, and more. One installation covers all your Word conversion needs. Change -c TXT to -c PDF and you get PDF output with the same batch and automation features.
(30 days, no email or credit card)
(server license, perpetual)
Windows 7/8/10/11 • Server 2008/2012/2016/2019/2022
"We process several thousand DOC and DOCX exhibits per matter. Total Doc Converter X runs as a nightly batch on the eDiscovery server and produces UTF-8 text copies for keyword search and concept clustering. The text body is clean — no Word artefacts, no header/footer noise, tables flattened to tabs. Setting -BOM 0 was the small detail that made our indexer happy on the first try."
Caroline Whitford Litigation Support Specialist, Mid-Atlantic Law Group
"We feed contract corpora into an embedding pipeline for retrieval-augmented search. Parsing DOCX inside the pipeline was slow and brittle, and python-docx disagreed with Word on table cells. Pre-converting to plain TXT with DOCConverter.exe removed both problems. The .bat file lives in Task Scheduler, the embedding job reads TXT, and we stopped fighting Word XML."
Devansh Iyer NLP Engineer
"Compliance asked us to keep plain-text copies of every clinical document alongside the originals for long-term archival. We picked Total Doc Converter X because it does not need Word on the file server, and the encoding flag let us standardise on UTF-8 without BOM across the archive. Documentation could be more detailed on the table-flattening rules, but support answered our questions the same day."
Margaret Holloway Records Manager, Regional Health Network
DOCConverter.exe C:\Docs\*.docx C:\Output\ -c TXT -Encoding UTF-8. This converts every Word file in the source folder to TXT. Use *.doc for legacy Word 97–2003 files, or *.do* to catch both DOC and DOCX in one run.-Encoding followed by the target encoding. Supported values include UTF-8, UTF-16, UTF-16BE, ANSI, and any Windows code page registered on the system (for example 1251 for Cyrillic Windows or 1252 for Western European). The output bytes match exactly what the downstream indexer or parser expects.-BOM 0 to write a clean UTF-8 stream with no byte-order mark. Use -BOM 1 to emit the BOM. Some search indexers and JSON parsers reject files that start with a BOM, while some Windows-native tools require it — the flag lets you match either side without post-processing.-Encoding UTF-8 for full Unicode coverage in a single byte stream, or pick a code page if your archive standard requires one.-IncludeHeaders 0 to drop them entirely, or -IncludeHeaders 1 to keep them. Page numbers are stripped because TXT has no concept of pages.DOCConverter.DOCConverterX). You can call it from .NET, PHP, Python, VBScript, ASP, Ruby, Perl, and any other COM-compatible environment. Your web application can accept uploaded DOC/DOCX files and return UTF-8 text to the indexer or LLM endpoint in real time.
Download free trial and convert your files in minutes.
No credit card or email required.
string src = @"C:\test\Source.docx";
string dest = @"C:\test\Dest.pdf";
var cnv = new DocConverterX();
cnv.Convert(src, dest, "-cPDF -log c:\\test\\Doc.log");
if (!string.IsNullOrEmpty(cnv.ErrorMessage))
throw new Exception(cnv.ErrorMessage);
public static class Function1
{
[FunctionName("Function1")]
public static async Task Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
ILogger log)
{
StringBuilder sbLogs = new StringBuilder();
sbLogs.AppendLine("started...");
try
{
ProcessStartInfo startInfo = new ProcessStartInfo();
startInfo.CreateNoWindow = true;
startInfo.UseShellExecute = false;
var assemblyDirectoryPath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location);
assemblyDirectoryPath = assemblyDirectoryPath.Substring(0, assemblyDirectoryPath.Length - 4);
var executablePath = $@"{assemblyDirectoryPath}\Converter\DocConverterX.exe";
sbLogs.AppendLine(executablePath + "...");
var srcPath = $@"{assemblyDirectoryPath}\src\sample.docx";
var outPath = Path.GetTempFileName() + ".pdf";
startInfo.FileName = executablePath;
if (File.Exists(outPath))
{
File.Delete(outPath);
}
if (File.Exists(executablePath) && File.Exists(srcPath))
{
sbLogs.AppendLine("files exists...");
}
else
sbLogs.AppendLine("EXE & source files NOT exists...");
startInfo.WindowStyle = ProcessWindowStyle.Hidden;
startInfo.Arguments = $"\"{srcPath}\" \"{outPath}\" -cPDF";
using (Process exeProcess = Process.Start(startInfo))
{
sbLogs.AppendLine($"wait...{DateTime.Now.ToString()}");
exeProcess.WaitForExit();
sbLogs.AppendLine($"complete...{DateTime.Now.ToString()}");
}
sbLogs.AppendLine("Conversion complete.");
}
catch (Exception ex)
{
sbLogs.AppendLine(ex.ToString());
}
return new OkObjectResult(sbLogs);
}
}
dim C
Set C=CreateObject("DocConverter.DocConverterX")
C.Convert "c:\source.docx", "c:\dest.pdf", "-cPDF -log c:\doc.log"
Response.Write C.ErrorMessage
set C = nothing
dim C
Set C=CreateObject("DocConverter.DocConverterX")
Response.Clear
Response.AddHeader "Content-Type", "binary/octet-stream"
Response.AddHeader "Content-Disposition", "attachment; filename=test.pdf"
Response.BinaryWrite C.ConvertToStream("C:\www\ASP\Source.docx", "C:\www\ASP", "-cpdf -log c:\doc.log")
set C = nothing
$src="C:\\test\\test.docx";
$dest="C:\\test\\test.pdf";
if (file_exists($dest)) unlink($dest);
$c= new COM("DocConverter.DocConverterX");
$c->convert($src,$dest, "-cPDF -log c:\\test\\Doc.log");
if (file_exists($dest)) echo "OK"; else echo "fail:".$c->ErrorMessage;
require 'win32ole'
c = WIN32OLE.new('DocConverter.DocConverterX')
src = "C:\\test\\test.docx"
dest = "C:\\test\\test.pdf"
c.convert(src, dest, "-cPDF -log c:\\test\\Doc.log")
if not File.exist?(dest)
puts c.ErrorMessage
end
import win32com.client
import os.path
c = win32com.client.Dispatch("DocConverter.DocConverterX")
src = "C:\\test\\test.docx"
dest = "C:\\test\\test.pdf"
c.convert(src, dest, "-cPDF -log c:\\test\\Doc.log")
if not os.path.exists(dest):
print(c.ErrorMessage)
uses Dialogs, Vcl.OleAuto;
var
c: OleVariant;
begin
c := CreateOleObject('DocConverter.DocConverterX');
c.Convert('c:\test\source.docx', 'c:\test\dest.pdf', '-cPDF -log c:\test\Doc.log');
if c.ErrorMessage <> '' then
ShowMessage(c.ErrorMessage);
end;
var c = new ActiveXObject("DocConverter.DocConverterX");
c.Convert("C:\\test\\source.docx", "C:\\test\\dest.pdf", "-cPDF");
if (c.ErrorMessage != "")
alert(c.ErrorMessage)
use Win32::OLE; my $src = "C:\\test\\test.docx"; my $dest = "C:\\test\\test.pdf"; my $c = CreateObject Win32::OLE 'DocConverter.DocConverterX'; $c->convert($src, $dest, "-cPDF -log c:\\test\\Doc.log"); print $c->ErrorMessage if -e $dest;