Can we skip the files that don’t exist in the http remote but are present in the --files-from files.txt?
I’ve extracted all of files into a text file but some of the files in the remote longer exists.
So rclone doens’t do anything and puts up this error, Failed to lsf: Stat failed: failed to stat: HTTP Error 404: 404 Not Found
Can I suppress this error and still continue with the files that are present in the remote as well as in the files.txt.
I’ve done just that, a scrapy spider crawls the website to list me all the links and stores it into a file for the remote i am trying to fetch using rclone.
Both the crawler and rclone runs in a scheduled manner, once every 6 hours,
but Rclone is taking very long to start the process of copy (about 2-4 hours) depending on the number of entries in the file mentioned in --files-from.
Without the use of --files-from rclone would perform much better,
is this supposed to take a long time to start?
(The file has around 4k-5k entries)
Rclone version I’ve tried on:
rclone: Version “v1.45-031-ge7684b7e-beta”
Hmm, yes rclone is checking each file in the file-from exists. However it does this in a very inefficient way, one at a time! Rclone should really be parallelising this using --checkers threads at once like it does everything else.
I dont think I would be much help right now as I dont know GoLang at all,
Currently I am learning the basics of it,
Will try to help when I am able to understand whats going on in the project.
I’ve encountered a new problem that i really need your help with.
There is an http remote which has many files that i need to copy into a google drive remote.
I’ve extracted all the links and stored it in a file names files.txt
Now the issue is that the server requires those jtokens to authenticate and if i were to pass the entire link including the jtoken, to a downloader(idm) then the file is downloaded.
Is there a way for rclone to able to takle this problem?
using a downloader or browser to download the file would give ~9-10 MBps, if i can acheive that then transferring the files one by one might become feasible.
My rclone version is:
rclone v1.45-056-g95e52e1a-beta
os/arch: windows/amd64
go version: go1.11
.
.
.
PS: I really appreciate you taking the time to reply to my queries.Thanks for your help.
This will stop rclone having to make and remake the drive remote.
You can also do them in parallel if you supply “_async=true” to the command. (You might want to pace them a little otherwise it will do all of them at once!)
I have written down a script that does just that.
Few issues i am facing:
Extremely slow download rate for each file (10~15 KBytes/s)
copyurl copies content even if it exists in destination (the rcd server gets interrupted at times, so the script happens to copy files already copied)
.
It turns out that this problem that i was facing before still persits and rclone’s total bandwidth usage is around ~2 Mbits/s upload and ~2 Mbits/s download,
this is when i have 10-20 jobs running asncronously at a time.
Is there a way to increase the download rate for each rclone copyurl?
Also is it possible to check if file exists in the destination for rclone copyurl?
Basically if –ignore-existing flag can be used in this method.
EDIT:
I checked the logs, out of 200 files that the script tried copying using copyurl
180 transfers terminated with this error, after rclone sent a chunk of data:
Can you try using copyurl to the local disk - how fast does it run then?
How big are the files you are copying?
I think the expectation here should be rclone does some sort of checking on the file, so if the remote file is the same length then it doesn’t copy it.
Implementing --ignore-existing is probably a good idea too.
That appears to be an HTTP2 error.
Can you run one copyurl (ideally with a small text file) which demostratest the problem with -vv --dump bodies?
It would be helpful to have a log to look at (with -vv) of the transfers.
--ignore-existing doesn’t seem to work with rclone copyurl. . . . If it can work then, I would be grateful if you could tell me how to send --ignore-existing flags to the rcd.
.
.
.
About 200~300 MB each, some are Larger ~600 MB.
It has the same effect when saving in the disk
(The website seems to allow only ~150KB per connection, while idm manages to get downloads from 16 connections thereby downloading many parts and later appending)
(the ~150KB gets distributed among all the files downloading at a time )
I tried with a few text files, the problem didnt occour.
I think it only happens when I have schedules too many async jobs to the rcd.
PS: still appreciate you taking the time to reply.