Internet Archive: md5 tag in [id]_files.xml interpreted incorrectly

What is the problem you are having with rclone?

With the Internet Archive remote, the md5 tag in the [id]_files.xml entry in [id]_files.xml is interpreted incorrectly - presumably as an md5 for the [id]_files.xml file itself, which causes rclone to say that the id_files.xml download got corrupted on transfer. Whereas the actual value of this tag, as suggested by the presence of <summation>md5</summation>, is "generated by hashing a concatenated string of all the filenames and their md5 strings" (according to Jonah from the Internet Archive).

Example:

https://archive.org/download/dictionaryofprin00drozrich/dictionaryofprin00drozrich_files.xml

<file name="dictionaryofprin00drozrich_files.xml" source="original">
<format>Metadata</format>
<md5>95836025c7a0fe6a7fd4db679481c5ba</md5>
<summation>md5</summation>
</file>

Run the command 'rclone version' and share the full output of the command.

stable: rclone v1.59.0
beta: rclone v1.60.0-beta.6361.140af43c2

Which cloud storage system are you using? (eg Google Drive)

Internet Archive (internetarchive)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy ia:dictionaryofprin00drozrich/dictionaryofprin00drozrich_files.xml .

The rclone config contents with secrets removed.

# other remotes censored

[ia]
type = internetarchive
access_key_id = # censored
secret_access_key = # censored

# other remotes censored

A log from the command with the -vv flag

2022/07/15 15:51:39 DEBUG : rclone: Version "v1.59.0" starting with parameters ["rclone" "-vv" "copy" "ia:dictionaryofprin00drozrich/dictionaryofprin00drozrich_files.xml" "."]
2022/07/15 15:51:39 DEBUG : Creating backend with remote "ia:dictionaryofprin00drozrich/dictionaryofprin00drozrich_files.xml"
2022/07/15 15:51:39 DEBUG : Using config file from "/home/user/.config/rclone/rclone.conf"
2022/07/15 15:51:40 DEBUG : fs cache: adding new entry for parent of "ia:dictionaryofprin00drozrich/dictionaryofprin00drozrich_files.xml", "ia:dictionaryofprin00drozrich"
2022/07/15 15:51:40 DEBUG : Creating backend with remote "."
2022/07/15 15:51:40 DEBUG : fs cache: renaming cache item "." to be canonical "/home/user/downloads/temp"
2022/07/15 15:51:40 DEBUG : dictionaryofprin00drozrich_files.xml: Need to transfer - File not found at Destination
2022/07/15 15:51:41 DEBUG : Local file system at /home/user/downloads/temp: File to upload is small (9510 bytes), uploading instead of streaming
2022/07/15 15:51:41 DEBUG : dictionaryofprin00drozrich_files.xml: md5 = 1ac94347bfa2e8802ac666d31af6f659 OK
2022/07/15 15:51:41 INFO  : dictionaryofprin00drozrich_files.xml: Copied (new)
2022/07/15 15:51:41 DEBUG : dictionaryofprin00drozrich_files.xml: md5 = 95836025c7a0fe6a7fd4db679481c5ba (Internet Archive item dictionaryofprin00drozrich)
2022/07/15 15:51:41 DEBUG : dictionaryofprin00drozrich_files.xml: md5 = 1ac94347bfa2e8802ac666d31af6f659 (Local file system at /home/user/downloads/temp)
2022/07/15 15:51:41 ERROR : dictionaryofprin00drozrich_files.xml: corrupted on transfer: md5 hash differ "95836025c7a0fe6a7fd4db679481c5ba" vs "1ac94347bfa2e8802ac666d31af6f659"
2022/07/15 15:51:41 INFO  : dictionaryofprin00drozrich_files.xml: Removing failed copy
2022/07/15 15:51:41 ERROR : Attempt 1/3 failed with 1 errors and: corrupted on transfer: md5 hash differ "95836025c7a0fe6a7fd4db679481c5ba" vs "1ac94347bfa2e8802ac666d31af6f659"
2022/07/15 15:51:41 DEBUG : dictionaryofprin00drozrich_files.xml: Need to transfer - File not found at Destination
2022/07/15 15:51:42 DEBUG : Local file system at /home/user/downloads/temp: File to upload is small (9510 bytes), uploading instead of streaming
2022/07/15 15:51:42 DEBUG : dictionaryofprin00drozrich_files.xml: md5 = 1ac94347bfa2e8802ac666d31af6f659 OK
2022/07/15 15:51:42 INFO  : dictionaryofprin00drozrich_files.xml: Copied (new)
2022/07/15 15:51:42 DEBUG : dictionaryofprin00drozrich_files.xml: md5 = 95836025c7a0fe6a7fd4db679481c5ba (Internet Archive item dictionaryofprin00drozrich)
2022/07/15 15:51:42 DEBUG : dictionaryofprin00drozrich_files.xml: md5 = 1ac94347bfa2e8802ac666d31af6f659 (Local file system at /home/user/downloads/temp)
2022/07/15 15:51:42 ERROR : dictionaryofprin00drozrich_files.xml: corrupted on transfer: md5 hash differ "95836025c7a0fe6a7fd4db679481c5ba" vs "1ac94347bfa2e8802ac666d31af6f659"
2022/07/15 15:51:42 INFO  : dictionaryofprin00drozrich_files.xml: Removing failed copy
2022/07/15 15:51:42 ERROR : Attempt 2/3 failed with 1 errors and: corrupted on transfer: md5 hash differ "95836025c7a0fe6a7fd4db679481c5ba" vs "1ac94347bfa2e8802ac666d31af6f659"
2022/07/15 15:51:42 DEBUG : dictionaryofprin00drozrich_files.xml: Need to transfer - File not found at Destination
2022/07/15 15:51:42 DEBUG : Local file system at /home/user/downloads/temp: File to upload is small (9510 bytes), uploading instead of streaming
2022/07/15 15:51:42 DEBUG : dictionaryofprin00drozrich_files.xml: md5 = 1ac94347bfa2e8802ac666d31af6f659 OK
2022/07/15 15:51:42 INFO  : dictionaryofprin00drozrich_files.xml: Copied (new)
2022/07/15 15:51:42 DEBUG : dictionaryofprin00drozrich_files.xml: md5 = 95836025c7a0fe6a7fd4db679481c5ba (Internet Archive item dictionaryofprin00drozrich)
2022/07/15 15:51:42 DEBUG : dictionaryofprin00drozrich_files.xml: md5 = 1ac94347bfa2e8802ac666d31af6f659 (Local file system at /home/user/downloads/temp)
2022/07/15 15:51:42 ERROR : dictionaryofprin00drozrich_files.xml: corrupted on transfer: md5 hash differ "95836025c7a0fe6a7fd4db679481c5ba" vs "1ac94347bfa2e8802ac666d31af6f659"
2022/07/15 15:51:42 INFO  : dictionaryofprin00drozrich_files.xml: Removing failed copy
2022/07/15 15:51:42 ERROR : Attempt 3/3 failed with 1 errors and: corrupted on transfer: md5 hash differ "95836025c7a0fe6a7fd4db679481c5ba" vs "1ac94347bfa2e8802ac666d31af6f659"
2022/07/15 15:51:42 INFO  : 
Transferred:   	   55.723 KiB / 55.723 KiB, 100%, 18.564 KiB/s, ETA 0s
Errors:                 1 (retrying may help)
Transferred:            3 / 3, 100%
Elapsed time:         3.5s

2022/07/15 15:51:42 DEBUG : 6 go routines active
2022/07/15 15:51:42 Failed to copy: corrupted on transfer: md5 hash differ "95836025c7a0fe6a7fd4db679481c5ba" vs "1ac94347bfa2e8802ac666d31af6f659"

It looks like the same thing is also available from /metadata/, filtering it out is not that hard

image

Now there is a fun fact - [id]_files.xml has nothing to compare with as it reports unknown size for the file

$ rclone --fast-list ls :internetarchive:dictionaryofprin00drozrich/ | grep files
       -1 dictionaryofprin00drozrich_files.xml
1 Like

Thanks. I realise now I probably should have just opened this as a github issue. Maybe I should recreate it there?

I think making an issue would be a good idea.

1 Like

Done: Internet Archive: md5 tag in [id]_files.xml interpreted incorrectly · Issue #6321 · rclone/rclone · GitHub.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.