You will be looking for all the capitalized words in the document. The regex expression for this is something like
(\b[A-Z][A-Za-z]{2,20}\b)
This will find all names like Robert, Paul, McKeskey, VanRecklinghouse. We will assume that names are between 2 and 20 characters in length.
But, as you allude to, the pattern is not specific to actual people’s names: NATO, January, April, and words at the start of sentences will also match. [This; We; But from the previous couple sentences.]
You have not really told us how the documents that you work with are actually structured. But assuming the basic worst case scenario that these documents have the general character of a book or newspaper, false positives will actually dominate. Most of these regex matches will not actually be names. Assume also (as it would be in a book or newspaper) that context is important in determining if a word is actually a name. For example, “The Trojan War started on April 2 in Washington.” Without context, how are you to know whether Trojan, War, April, Washington are people’s names or not?
So I would approach this as a problem of extracting the text with the page numbers from the PDF. I happen to use PDFpenPro, but I image that PDF Expert would have a similar feature. You have to make sure that you have the page number.
In PDFpenPro, you can create headers for your PDF that contain the page number. In PDFpenPro, those headers can take the format of left, center, right. I would put page__ on left, the page number in center.
Then I would select the entire document and paste it into a text editor such as BBEdit or TextEdit. This process will create a text file in which every line of the PDF is defined as a line in the text editor (terminated by a \n (new line). Crucially, the page numbers will be included. This will be an easy thing to process with a program.
It will look something like
page__1
CHAPTER 1. Loomings.
Call me Ishmael. Some years ago—never mind how long precisely—having little or
no money in my purse, and nothing particular to interest me on shore, I thought I would
sail about a little and see the watery part of the world. It is a way I have of driving off the
spleen and regulating the circulation. Whenever I find myself growing grim about the
mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself
involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral
page__2
I meet; and especially whenever my hypos get such an upper hand of me, that it requires
a strong moral principle to prevent me from deliberately stepping into the street, and
methodically knocking people’s hats off—then, I account it high time to get to sea as
soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato
throws himself upon his sword; I quietly take to the ship. There is nothing surprising in
this. If they but knew it, almost all men in their degree, some time or other, cherish very
nearly the same feelings towards the ocean with me.
page__3
etc.
The regex pattern provided at the top of this reply will find:
CHAPTER; Loomings; Call; Ishmael; Some; It; Whenever; November; This; With; Cato; There; If and so on.
Then frankly, I would write code in a language that has regex capabilities built in, to just march through the document highlighting each of these capitalized words in context and providing the user with two buttons to press: Is A Name and Is Not a Name. A very simple interface. The program would know what the page was and every time the user clicked on Is A Name, that capitalized word would be appended to another text file in the fashion that you want:
Ishmael, 1
Cato, 2
The task could be made considerably less onerous by giving the program some elementary intelligence so as to skip over things that are NOT names and commonly occur at the beginning of sentences: Some, It, This, There etc. Your could give the program an additional button, This Is Never a Name, and it could quickly learn and remember these common false positives.
This is not a complex program to write. But it does require knowledge of some programming language. I do not think that trying to do this in Keyboard Maestro/Automator etc. would get you even close to an efficient workflow.
If you are spending hours doing this almost mindless task, getting such a program written has to be worth it.