We just have a task to extract all email addresses from a word document text file containing more than 800 pages. The size of the doc file is 250M. But the size does not matter. To extract the email address, we need to
1. convert the .docx file into a plain text file.
2. transfer the text file to a Linux machine
3. use grep to extract the email address and save result.
grep -E -o --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" email_list.txt > output.txt
This code is obtained from http://enure.net
4. Use Excel to open the text file and save it as a workbook.
5. Do the mailshot using the Excel file