This may be a good example of the current limits of artificial intelligence...
I don't think there is a reliable way to identify the title of a research paper without actually opening the file and using human reasoning.
There are variables such as a subtitle, and the title of the journal, which make this task very difficult for an algorithm.
There are other contexts, for example... geez, I dunno, maybe electronic filing systems used by the courts, or maybe a system like EDGAR, where they may have strong rules that govern file names, and that would potentially make the task a lot easier.
Sometimes, the downloaded files are fucked where the first pages are some random shits and the articles start from p2
