I have been using proxies for a while (with scrapy-spash). That was a static one and the problem is now regards rotating proxies. I have the following code:
proxies = ['82.209.49.196:8080', '217.9.91.88:8080', '85.142.158.45:8080', '134.209.115.223:3128']
for i in range(0,3):
yield SplashRequest(callback = self.parse, endpoint ='execute', meta={'dont_retry' : False,}, args={'lua_source':
self.luaScripts['checkIP'], 'proxy' : 'http://' + proxies[i],
'timeout': 90}, dont_filter=True)
This is my what is in my Lua script:
function main(splash, args)
assert(splash:go('https://httpbin.org/ip'))
local _linksToBeFixed = 0
return {mypng = splash:png(),}
end
The proxy used is always the first proxy specified (in this case, 82.209.49.196:8080). This seems quite strange to me. Note that self.luaScripts['checkIP']
is a lua script that goes to https://httpbin.org/ip.
Why is it only the first proxy specified that is used in ALL your SplashRequests? How can you specify different proxies per request as with Scrapy requests (i.e. meta['proxy'])?
Even request.body
has different proxies
(as set per request) - so this only makes it even stranger that only the one set in the first request is being used for all future SplashRequests...
It also happens if I use splash:set_proxy
in the script.
You could use multiple Splash instances, and use one proxy with each, for concurrency.
Nonetheless, since concurrency was the problem, could you close this issue?