|
|
|
|
i bet i could scrape the images using scrapebox with free proxies to save on costs. the only reason i used paid proxies for the data is because i want to be sure that it's US data to get US results for each product id. and theyre more reliable |
|
|
|
|
cauz |
May 21, 2017, 10:03 p.m. |
|
|
|
Alfred A. Montapert |
In life, the first thing you must do is decide what you really want. Weigh the costs and the results. Are the results worthy of the costs? Then make up your mind completely and go after your goal with all your might. |
Charles Babbage |
Errors using inadequate data are much less than those using no data at all. |
Leos Carax |
My films start with images, a few images and a few feelings, and I try to edit them together to see the correspondence between these images and these feelings. |
Thomas Sowell |
Would you bet your paycheck on a weather forecast for tomorrow? If not, then why should this country bet billions on global warming predictions that have even less foundation? |
Alison Jackson |
Because Bin Laden's culture doesn't permit the worship of images, they understand how powerful images are. We wouldn't have thought of creating a visual bomb. In a way, he's chopped down two iconic buildings, and used our very truth imagery, to express himself. It's fascinating... I mean, dreadful. |
Robert Kiyosaki |
Most businesses think that product is the most important thing, but without great leadership, mission and a team that deliver results at a high level, even the best product won't make a company successful. |
J. G. Ballard |
The American Dream has run out of gas. The car has stopped. It no longer supplies the world with its images, its dreams, its fantasies. No more. It's over. It supplies the world with its nightmares now: the Kennedy assassination, Watergate, Vietnam. |
Jonathan Ive |
When you're trying to solve a problem on a new product type, you become completely focused on problems that seem a number of steps removed from the main product. That problem solving can appear a little abstract, and it is easy to lose sight of the product. |
Jim Valvano |
We need your help. I need your help. We need money for research. It may not save my life. It may save my children's life. It may save someone you love. And it's very important. |
Eugene Fama |
After costs, only the top 3% of managers produce a return that indicates they have sufficient skill to just cover their costs, which means that going forward, and despite extraordinary past returns, even the top performers are expected to be only as good as a low-cost passive index fund. The other 97% can be expected to do worse. |
|
|
This list of 400k product ids include lots of copies of the same product with a different tracking number. im only getting maybe 30k off that list total. was gonna scrape more after so my next run of my id gathering, ill find better ways to remove redundancy and save some money. ive used almost 30g of bandwidth through those proxies the past few days. but i also download huge high rez images too
|
|
|
|
So I scraped 450k amazon product urls, and now i finally finished writing my scraper and finally kicked it off with some fresh proxies. downloading massive amounts of data and images from the big A hole
|
|
|
|
have more than a half million product urls (which is really the hard part with amazon, they make it extremely difficult for scrapers to crawl their entire site). after cleaning up this list and potentially trying to get even more products, i will continue to modify my php scraper, this time with use for amazon. it rotates through proxies and user agents so it has worked well in google maps, yelp,. and your university's student directories, so it should bypass amazons no problem. my scraper nowadays saves all the data into xml so i can import through certain plugins, but also have a super easy way to convert to any form i need. originally my scraper rotated through tor proxies and saved all data directly into mysql, over time i created sql files for importing and now that wordpress is used so extensively and doesnt recieve penalties in the search engine like it used to, i can just throw all the data in there and make as many copies and variations of the sites as i want. and make it loo...
This post is a comment.
|
|
|
|
now that my list of product ids is in the millions and ive used about 40gb of proxy bandwidth scraping maybe 50k pages from that data, i have to carefully weigh out how much i want to spend on proxies (spent about $30) on this experiment that could result in just a simple takedown notice to stop the method. granted i can always reuse and modify this data. but i guarantee if you had a million page site based directly around real ecommerce products you would make good money if it stays up
|
|
|
|
I had this idea one night for creating a decentralized search engine. It would pull data from other search engines (through proxies or from a single server, so no personal user data is involved) and then re-display it to the user.
The next additional thought I had was to make it into a 'roll your own' search engine, so users could then deploy their search engine on their own server to have further control of the traffic as you obviously cant trust shit like duckduckgo (fishy)
Then you could m...
|
|
|
|
Netflix Has Saved Every Choice You've Ever Made In 'Black Mirror: Bandersnatch'
According to a technology policy researcher, Netflix records all the choices you make in Black Mirror's Bandersnatch episode. "Michael Veale, a technology policy researcher at University College London, wanted to know what data Netflix was collecting from Bandersnatch," reports Motherboard. "People had been speculating a lot on Twitter about Netflix's motivations," Veale told Motherboard in an email. "I thought it would be a fun test to show people how you can use data protection law to ask real questions you have." From the report: The law Veale used is Europe's General Data Protection Regulation (GDPR). The ...
|
|
|
|
oh yeah then you wrap it all in tor and proxies of course
This post is a comment.
|
|
|
|
Amazon Opens Up Its Internal Machine Learning Training To Everyone
Amazon announced Monday that it's making the machine learning courses it uses to train its engineers available to everybody for free. The course is tailored to four major groups -- developers, data scientists, data platform engineers and business professionals -- and it offers both foundational level lessons as well as more advanced instruction.
https://aws.amazon.com/blogs/machine-learning/amazons-own-machine-learning-unive...
|
|
|
|
Human CAPTCHA services cost money don't they? Not a lot but like-- pennies. Reed is saying you could have your program generate a new model by taking the word of the thing they ask you to identify and scraping Google images to get training data.
This post is a comment.
|
|
|
|
Maybe I should make some youtube videos. I would make one about the real concerns of AI, one about basic data science and data analysis, and one that is an introduction to neural networks.
|
|