Idea in extracting the title of an article?

blueraincap · Mar 14, 2024

BMK said:
This may be a good example of the current limits of artificial intelligence...

I don't think there is a reliable way to identify the title of a research paper without actually opening the file and using human reasoning.

There are variables such as a subtitle, and the title of the journal, which make this task very difficult for an algorithm.

There are other contexts, for example... geez, I dunno, maybe electronic filing systems used by the courts, or maybe a system like EDGAR, where they may have strong rules that govern file names, and that would potentially make the task a lot easier.

Sometimes, the downloaded files are fucked where the first pages are some random shits and the articles start from p2

blueraincap · Mar 14, 2024

BMK said:
That was my first thought, that maybe the metadata, or file properties, would contain the title. And PDF properties does indeed contain a field called title. But it many, many cases, the data in that field is completely unrelated to the title of the paper.

There is no standard across academia for how you name a file.

Many people do not include metadata, and I think the filename in the property is the file name you give it

Baron · Mar 14, 2024

blueraincap said:
Sometimes, the downloaded files are fucked where the first pages are some random shits and the articles start from p2

How many papers are you trying to organize?

murray t turtle · Mar 14, 2024

blueraincap said:
I have a bunch of academic papers on the computer that I need organising.
I need to extract the titles of them, but have not found a valid method yet.
Any idea?
Usually, the title has the largest font in the first page, so I used python (and pdfminer module) to do so, but it is only working 50-60%.
======================================================

S2007S said:

Isn't AI that's now part of our daily routine and completely the talk of Wallstreet able to handle this with a few voice prompts???

More...

%%
Sure, in theory,LOL.
I organize my trade notebook by time/ MARCH 3-9-2024 MARCH 10-16-2024. Blue , black red, green, purple ink
I also put US + UK easily seen.
I use a time stop+ plan. So least ,low value junk never gets done-read + fine+ good

blueraincap · Mar 14, 2024

Baron said:
How many papers are you trying to organize?

Way more than the number of cocks you have sucked

BMK · Mar 14, 2024

blueraincap said:
Way more than the number of cocks you have sucked

WTF, dude. How did this thread go off the rails so quickly?

lindq · Mar 14, 2024

blueraincap said:
Way more than the number of cocks you have sucked

Welcome to my ignore list, where you'll join a large group of fools I've collected over the past 20 years.

Baron · Mar 14, 2024

blueraincap said:
Way more than the number of cocks you have sucked

After 27 years of running this site, I thought I had seen it all, but then you come along and get the award for being the most disrespectful, lowest-vibration dumbass of them all.

I actually feel sorry for you more than anything else because the universe is never going to reward you by putting out negative energy like that. As the last post you're ever going to make on this site, I wish you good luck with your search and your life moving forward because you're damn sure going to need it.

ondafringe · Mar 14, 2024

blueraincap said:
Way more than the number of cocks you have sucked

You are one dumb mofo. lol

Quanto · Mar 14, 2024

Man, like @Baron himself, I too was shocked when I read what this idiot named @blueraincap wrote to Baron...

First I thought, he is maybe a buddy of Baron, and they just joke with such hard words... but nope, it was real!!!

Idea in extracting the title of an article?

blueraincap

blueraincap

Baron

ET Founder

murray t turtle

blueraincap

BMK

lindq

Baron

ET Founder

ondafringe

Quanto