|
|
|
|
have more than a half million product urls (which is really the hard part with amazon, they make it extremely difficult for scrapers to crawl their entire site). after cleaning up this list and potentially trying to get even more products, i will continue to modify my php scraper, this time with use for amazon. it rotates through proxies and user agents so it has worked well in google maps, yelp,. and your university's student directories, so it should bypass amazons no problem. my scraper nowadays saves all the data into xml so i can import through certain plugins, but also have a super easy way to convert to any form i need. originally my scraper rotated through tor proxies and saved all data directly into mysql, over time i created sql files for importing and now that wordpress is used so extensively and doesnt recieve penalties in the search engine like it used to, i can just throw all the data in there and make as many copies and variations of the sites as i want. and make it look responsive and beautiful. fuck wit me nerrrds |
|
|
|
|
cauz |
May 13, 2017, 11:25 a.m. |
|
|
|
Tony Fadell |
At the end of the day, customer choice is essential. And we don't make products that compete with Apple, nor make products that compete with Google. Our customers come in both iOS and Android flavors, and I hope our customers can still buy the products they want to purchase wherever they want to purchase them. |
Jonathan Ive |
When you're trying to solve a problem on a new product type, you become completely focused on problems that seem a number of steps removed from the main product. That problem solving can appear a little abstract, and it is easy to lose sight of the product. |
Stephen Hawking |
If the rate of expansion one second after the Big Bang had been smaller by even one part in a hundred thousand million million, it would have recollapsed before it reached its present size. On the other hand, if it had been greater by a part in a million, the universe would have expanded too rapidly for stars and planets to form. |
James Fallows |
The demise of Google Reader, if logical, is a reminder of how far we've come from the cuddly old 'I'm Feeling Lucky' Google days, in which there was a foreseeably-astonishing delight in the way Google's evolving design tricks anticipated what users would like. |
Ryan Adams |
You could eat sushi off my bookshelf. My cleaning regime is like a battleground. I'm Genghis Khan and my cleaning products are my Mongolian army and I take no prisoners. The rest of my life is an experiment in chaos so I like to keep my flat neat. |
J. R. R. Tolkien |
I don't know half of you half as well as I should like; and I like less than half of you half as well as you deserve. |
Jennifer Egan |
I write totally spontaneously. I actually write fiction by hand - that always seems to startle people. I think the reason I do that is to bypass the thinking part of me and get to the more unconscious part, which is where all the good ideas seem to be. |
Dave Eggers |
When I was on the bestseller list with the first book, everyone who knows me knows that every week it continued to be on the list was a very dark week for me. Everyone knows that all I wanted was to be off that list. |
Joel Edgerton |
I'm on the list that I thought I'd never be on. I'm not sitting here thinking, 'God, I might get this part' or 'is it too late for me to play Hamlet?' It's really about: who do I get to work with? There's so many people on that list. |
Sergio Garcia |
Obviously, the good thing about golf, it's difficult to really, really blow it after five holes unless it goes really, really, really... really, really, really wrong. But you still have 13 to go, and if you have a good run, where you make five or six birdies, you can get it back somehow. |
|
|
So I scraped 450k amazon product urls, and now i finally finished writing my scraper and finally kicked it off with some fresh proxies. downloading massive amounts of data and images from the big A hole
|
|
|
|
now that my list of product ids is in the millions and ive used about 40gb of proxy bandwidth scraping maybe 50k pages from that data, i have to carefully weigh out how much i want to spend on proxies (spent about $30) on this experiment that could result in just a simple takedown notice to stop the method. granted i can always reuse and modify this data. but i guarantee if you had a million page site based directly around real ecommerce products you would make good money if it stays up
|
|
|
|
Referring a user to amazon through your affiliate link gets you 24 hours to 90 days tracking cookie where you can earn commission on anything the user purchases in the time period from Amazon, the biggest online store in the world. 1 million product pages will bring long tail search traffic and careful analytics will reveal the most promising niches/products which new hyper focused niche sites can be created around.
This post is a comment.
|
|
|
|
Scraping Every Product on Amazon to Make a Million Page Affiliate Site
|
|
|
|
Amazon Is Finally Profitable, Earns $2.5 Billion Over the Last Three Months
Amazon topped $2 billion in quarterly profit for the first time in its history, an impressive run fueled by continued growth in Prime subscriptions, cloud computing and its nascent advertising business. Amazon said Thursday that it earned $2.5 billion in profit for the three months ending in June, a staggering jump from the $197 million it posted in the same period last year. It marked the third consecutive quarter that Amazon has topped $1 billion in profit, a remarkable feat for a company once known for investing so much in its business that it often lost money. "The profitability trajectory appears to be accelerating quicker than expected," Daniel Ives, an analyst with GBH Insights, wrote in an investor note ...
|
|
|
|
oih yes. also, now a days i run my scripts from a server or even my localhost through WGET and remove the output i use for testing. also another reason i use xml and import into wordpress is because they can manage a database of that sizes efficiency way better than i can. i tried to make a million page site a long time ago and it would take for ever to load my data i put in mysql directly off the scraper
This post is a comment.
|
|
|
|
Amazon Will Pay $0 in Federal Taxes on $11.2 Billion Profits (fortune.com)
Those wondering how many zeros Amazon, which is valued at nearly $800 billion, has to pay in federal taxes might be surprised to learn that its check to the IRS will read exactly $0.00. From a report:
According to a report published by the Institute on Taxation and Economic (ITEP) policy Wednesday, the e-tail/retail/tech/entertainment/everything giant won't have to pay a cent in federal taxes for the second year in a row. This tax-free break comes even though Amazon almost doubled its U.S. profits from $5.6 billion t...
|
|
|
|
This list of 400k product ids include lots of copies of the same product with a different tracking number. im only getting maybe 30k off that list total. was gonna scrape more after so my next run of my id gathering, ill find better ways to remove redundancy and save some money. ive used almost 30g of bandwidth through those proxies the past few days. but i also download huge high rez images too
|
|
|
|
I had a dream that my roommate started cleaning the apartment and then he stopped partway through and went to bed. I was trying to make a sandwich and microwave it or something but I was having trouble doing everything in the kitchen because stuff was lying everywhere from his half cleaning.
|
|
|
|
its official. amazon manually reviewed my site and determined i had no original content lol. but. it did make a few bucks (that unfortunately i wont be seeing)
|
|