Page 1 of 1 [ 8 posts ] 

zacb
Veteran
Veteran

User avatar

Joined: 7 May 2012
Age: 29
Gender: Male
Posts: 1,158

17 Feb 2018, 8:16 pm

I have been making Python scripts lately to find decent web content, but it seems harder to sort out the chaff lately because "branded" domains have been pushed to the top, and many niche topics only have rudimentary talk about subjects. All the sites I used to go to have to s**t and even torrent sites have gone down hill (except for one I go to). Library genesis is about the only place left with somewhat complex knowledge. Redit can be ok, but it is hit or miss there too. Facebook can be hit or miss too. My solution is to build tools to mine data and collate it, but beyond that I am kinda at a loss. What have you done to alleviate this?



Mudboy
Veteran
Veteran

User avatar

Joined: 19 May 2007
Age: 62
Gender: Male
Posts: 1,441
Location: Hiding in plain sight

17 Feb 2018, 9:02 pm

In the beginning (before search engines) we had printed books and websites that were nothing but lists of internet sites. (The books also had the phone numbers of public BBSs we could dial into.) We called our searches surfing the web because we would wander for hours from site to site using hyperlinks. There was even a book called Zen and the art of the internet that helped people use subliminal clues in their travels across the web. It took relaxation and patience to find what we were looking for.
Now I use alternative words, or search modifiers, or ignore the first 100 sites the search engine suggests.

Here are some sites that may help:

http://airodump.net/complete-google-hacks-list/

https://www.untangle.com/inside-untangl ... ebfilters/

https://www.ghacks.net/2016/02/26/read- ... googlebot/

https://blog.kissmetrics.com/alternativ ... h-engines/


_________________
When I lose an obsession, I feel lost until I find another.
Aspie score: 155 of 200
NT score: 49 of 200


zacb
Veteran
Veteran

User avatar

Joined: 7 May 2012
Age: 29
Gender: Male
Posts: 1,158

17 Feb 2018, 11:06 pm

I am working on a tool to use modifiers without having to specify them. Also collate lists so you don't have to do 5 different searches to find the same thing. Also there is a search engine called millionshort, but it is slightly broken, so I am trying to work around that and integrate it into my piece of software. Also is there any way to find indie sites? Outside removing sites from post 2010,, I have not found anything.



Mudboy
Veteran
Veteran

User avatar

Joined: 19 May 2007
Age: 62
Gender: Male
Posts: 1,441
Location: Hiding in plain sight

18 Feb 2018, 12:29 am

To find independent sites I look for bookmarks from similar sites. I find the bookmarks by using search modifiers to do sub-directory mining on poorly secured servers. This is still old school web surfing since the modified search engine is only a doorway.

I hope you are successful with your software modifications


_________________
When I lose an obsession, I feel lost until I find another.
Aspie score: 155 of 200
NT score: 49 of 200


zacb
Veteran
Veteran

User avatar

Joined: 7 May 2012
Age: 29
Gender: Male
Posts: 1,158

18 Feb 2018, 12:48 am

Care to point to a tutorial to explain this? I had also thought about finding sites that excluded javascript or CSS, but seems like too much of a pain to do on google. I am intrigued by your method if you don't mind. I have found directories, pdfs, and the like, but kinda curious about the bookmark stuff. Thanks.



Mudboy
Veteran
Veteran

User avatar

Joined: 19 May 2007
Age: 62
Gender: Male
Posts: 1,441
Location: Hiding in plain sight

18 Feb 2018, 2:49 am

It's still searching for file types, instead of pdf or mp3, it's cookies, history, and stored webpages. You already know the target files and hierarchies that you need to substitute. Unfortunately I am old school. Due to the grey nature of ethical hacking, I am not willing to share more. You must "pay your dues for entry" through deep thought and study. The easy path leads to the dark side...


_________________
When I lose an obsession, I feel lost until I find another.
Aspie score: 155 of 200
NT score: 49 of 200


Ichinin
Veteran
Veteran

User avatar

Joined: 3 Apr 2009
Gender: Male
Posts: 3,653
Location: A cold place with lots of blondes.

18 Feb 2018, 2:54 am

You should start looking at classifying content upon its reliability and reputation then scrape those. Identify good authors and search for them in articles, same can be done with words you are interested in (as well as synonyms) or even subject lines.

Generic OSINT/Information Gathering require you to do quality control to get any value out of it, or else it's just copying random crap.


_________________
"It is far better to grasp the Universe as it really is than to persist in delusion, however satisfying and reassuring" (Carl Sagan)


Mudboy
Veteran
Veteran

User avatar

Joined: 19 May 2007
Age: 62
Gender: Male
Posts: 1,441
Location: Hiding in plain sight

18 Feb 2018, 3:06 am

Ichinin, those are good methods

http://saiethicalhacking.blogspot.com/2 ... cking.html


_________________
When I lose an obsession, I feel lost until I find another.
Aspie score: 155 of 200
NT score: 49 of 200