Uploading from Google Drive to Internet Archive with Rclone + Flags Template (Optimized & Troubleshooted)

This guide covers how to efficiently upload files and folders from Google Drive to Internet Archive using rclone, focusing on common scenarios like "Shared with me" content and optimizing performance, especially for users with limited RAM.

Prerequisites

Before you start, ensure you have:

  1. Rclone Installed: If not, follow the official rclone installation guide.
  2. Google Drive Remote Configured:
  • You should have an rclone remote configured for Google Drive (e.g., named Gdrive).
  • Highly Recommended: Configure your own Google API Client ID and Client Secret for your Google Drive remote. This significantly increases your API quota, preventing throttling and improving upload speeds.
  1. Internet Archive Remote Configured:
  • You should have an rclone remote configured for Internet Archive (e.g., named InternetA). This requires your IA access key and secret.

Understanding Your Source Path (Google Drive)

The way you specify your Google Drive source path depends on where the file/folder is located:

  1. "My Drive" Files/Folders (Standard):
  • If your file/folder is directly in "My Drive" or its subfolders: "Gdrive:My Folder/Subfolder/MyFile.ext" "Gdrive:My Regular Folder"
  1. "Shared with me" Files/Folders (Direct Access):
  • If you want to access items directly from your "Shared with me" tab without adding shortcuts to "My Drive," you use the --drive-shared-with-me flag.
  • Crucial: When using this flag, Gdrive: becomes the root of your "Shared with me" view. You do not include Shared with me/ in the path.
  • How to find the exact path:

rclone ls "Gdrive:" --drive-shared-with-me

    • This will list the top-level items in your "Shared with me." Use the exact names from this output.
    • Example: If you see MySharedProject, the path is Gdrive:"MySharedProject".
    • Command: rclone copy "Gdrive:MySharedProject" InternetA:MySharedItem --drive-shared-with-me ...
  1. "Shared with me" via "My Drive" Shortcut (Your Scenario):
  • This is the most common and often simplest approach for shared items.
  • In the Google Drive web interface, right-click on the shared item in "Shared with me" and select "Add shortcut to Drive" (or "Add to My Drive"). Place it in a folder like Shortcuts or directly in your My Drive root.
  • Once a shortcut is in "My Drive," rclone treats it like any other item in "My Drive." You do not need the --drive-shared-with-me flag.
  • Example: If you created a shortcut to my_shared_document.pdf inside Gdrive:Shortcuts/: "Gdrive:Shortcuts/my_shared_document.pdf"

Internet Archive Destination & Metadata

When uploading to Internet Archive, you define an "Item Identifier" and associate metadata with it.

  1. Item Identifier (The "Bucket Name"):
  • This is the unique name of the item on Internet Archive.
  • Critical Rule: It must be between 5 and 101 characters long (inclusive of the first char) and can only contain:
    • Alphanumeric characters (a-zA-Z0-9)
    • Underscores (_)
    • Hyphens (-)
    • Periods (.)
  • Example of Valid: "InternetA:My_Cool_Archive_Item_2023"
  • Example of Invalid: "InternetA:E" (too short), "InternetA:My Item!" (invalid characters).
  • All files copied to this path will be placed inside this Internet Archive item.
  1. Metadata (--header flags):
  • These flags set the display information for your item on Internet Archive. They do not affect the item identifier's naming rules.
  • --header "X-Archive-Meta-Mediatype: movies": Sets the content type (e.g., audio, video, texts, software, image).
  • --header "X-Archive-Meta-title: My Item Display Title": The human-readable title shown on IA.
  • --header "X-Archive-Meta-Collection: opensource_movies": The collection you're uploading to (e.g., opensource_audio, archiveofourlives).
  1. --internetarchive-wait-archive 15m0s:
  • This flag tells rclone to wait for the specified duration (e.g., 15 minutes) for Internet Archive's processing after the upload completes. This ensures the item is properly indexed.

Optimizing Upload Speed & Resource Usage

These flags are crucial for maximizing throughput and managing RAM, especially with limited resources (e.g., 1.34 GB available out of 8 GB total).

  • --drive-chunk-size <SIZE>: (e.g., 128M, 256M)
    • Purpose: Controls the size of data chunks sent to Google Drive. Larger chunks generally mean fewer API calls and better speed for large files.
    • RAM Impact: Each chunk needs to be buffered per transfer.
    • Recommendation for 1.34 GB RAM: Start with 128M or 64M. You can try 256M if you're only doing 1-2 concurrent transfers, but be cautious.
  • --transfers <N>: (e.g., 4)
    • Purpose: Number of files to upload concurrently.
    • RAM Impact: drive-chunk-size * transfers.
    • Recommendation for 1.34 GB RAM: Start with 2 or 3. 4 is often a good balance if drive-chunk-size is 128M or less. Monitor your RAM usage.
  • --checkers <N>: (e.g., 8)
    • Purpose: Number of concurrent checks (e.g., for file existence/differences). Less RAM-intensive than transfers.
    • Recommendation: Can be higher than --transfers, e.g., 8 or 16.
  • --buffer-size 0:
    • Purpose: Disables buffering for reading data from the source disk.
    • RAM Impact: Saves RAM by streaming directly. Might be slightly slower for very fast local storage.
    • Recommendation: Use 0 with limited RAM.
  • --drive-pacer-burst <N> & --drive-pacer-min-sleep <DURATION>: (e.g., 200 and 10ms)
    • Purpose: Control how aggressively rclone makes API calls to Google Drive to avoid rate limits.
    • Recommendation:
      • --drive-pacer-burst 200 is a good balance. Higher values (e.g., 2000) are more aggressive but can lead to more 403 Bad Request errors if you hit quotas without a personal API key.
      • --drive-pacer-min-sleep 10ms (default) is usually fine.
    • Crucial: These flags are most effective when using your own Google API Client ID and Secret.

Essential Safety for Internet Archive Uploads

  • --ignore-existing (Recommended over sync):
    • Purpose: Tells rclone copy to skip files if a file with the same name already exists at the destination. It will upload new files but will not overwrite existing ones.
    • Why NOT rclone sync for IA? Internet Archive generates its own metadata files (e.g., .xml) after uploads. rclone sync attempts to delete any files at the destination that are not in the source, which would include these IA-generated files. This results in constant errors and retries because IA prevents deletion of these specific files.
    • Always use rclone copy with --ignore-existing for adding files to an existing IA item.

Monitoring Progress & Troubleshooting

  • -P (Progress) & --stats <DURATION> (Statistics):
    • rclone -P displays a live progress bar.
    • --stats 1s (or 5s) shows updated transfer statistics every second (or 5 seconds).
    • If stats/progress don't show: Ensure your command is terminated correctly. A common mistake is a trailing backslash \ on the very last line of a multi-line command, which makes the shell wait for more input. Remove it!
 # Correct (no backslash on the last line)
rclone copy \
"Source" \
"Destination" \
-P
  • -vv (Very Verbose Logging) & --log-file <PATH>:
    • Add -vv to your command for detailed output in the terminal, useful for debugging.
    • Use --log-file rclone_upload.log to save all output to a file for later review.

Complete Example Command

This example assumes:

  • Your Google Drive remote is Gdrive.
  • Your Internet Archive remote is InternetA.
  • You are uploading a specific PDF file from a shortcut in Gdrive:Shortcuts/ to a new Internet Archive item.
  • You have limited RAM and want a balanced optimization.

My Example Below

rclone copy \
  "Gdrive:Space 1 New" \
  "InternetA:4000_Scripts" \
  --internetarchive-wait-archive 15m0s \
  --metadata \
  --header "X-Archive-Meta-Mediatype: movies" \
  --header "X-Archive-Meta-title: 4000 funScripts" \
  --header "X-Archive-Meta-Collection: opensource_movies" \
  --drive-chunk-size 256M \
  --transfers 4 \
  --checkers 8 \
  --buffer-size 0 \
  --drive-pacer-burst 200 --drive-pacer-min-sleep 10ms \
  --stats 1s \
  -P
  
# No backslash here ^^^^^
	 
  # Remove --order-by if you are copying a single folder/file, it's mostly for large syncs
  # --order-by size,mixed,75
  # Add logging for troubleshooting
  --log-file rclone_upload.log -vv

Template Here

rclone copy \
  "Gdrive:Shortcuts/my_shared_document.pdf" \
  "InternetA:my_pdf_collection_item" \  # <--- CHOOSE YOUR UNIQUE, VALID ITEM IDENTIFIER HERE (min 5 chars)
  --internetarchive-wait-archive 15m0s \
  --metadata \
  --header "X-Archive-Meta-Mediatype: texts" \ # Changed to 'texts' for PDF
  --header "X-Archive-Meta-title: My Important Shared PDF Document" \ # Adjust display title
  --header "X-Archive-Meta-Collection: opensource_texts" \ # Adjust collection if needed (e.g., opensource_software)
  --drive-chunk-size 128M \ # Good for limited RAM
  --transfers 3 \          # Balance with chunk size for RAM
  --checkers 8 \
  --buffer-size 0 \        # Saves RAM
  --drive-pacer-burst 200 --drive-pacer-min-sleep 10ms \ # Standard pacing
  --stats 1s \             # Frequent progress updates
  -P \                     # Show progress bar
  --ignore-existing \      # Safely add to existing IA item, avoid overwriting
  --log-file rclone_upload.log \ # Log all output
  -vv                      # Very verbose output for debugging

By following this guide, you should be able to effectively manage your rclone uploads from Google Drive to Internet Archive, optimize for your system, and troubleshoot common issues. Happy uploading!

2 Likes