Box getting 403 errors when using chunker (working fine without it)

I'm pretty sure this is a config problem with the client_id. I don't know exactly what though - maybe box support can help? If we do find out we should put in in the docs!

@ncw @kapitainsky

Box API documentation is crap so I asked ChatGPT for some help and think I finally made some progress. I also think the Rclone documentation for Box remotes is a bit lacking so maybe the info below will help others. I had to go through a completely convoluted way to get OAuth functional so I'm sure this can be made simpler but here goes.


Creating the Box App

  1. Go to the Box Developer Console and login, then click "My Apps" on the sidebar. Click "Create New App" and select "Custom App".
  2. In the first screen on the box that pops up, you can pretty much enter whatever you want. The "App Name" can be whatever. For "Purpose" choose automation to avoid having to fill out anything else. Click "Next".
  3. In the second screen of the creation screen, it'll ask to select authentication method. Box API does NOT provide refresh tokens when using "Server Authentication (with JWT)" or "Server Authentication (Client Credentials Grant)" so select "User Authentication (OAuth 2.0)". Then click "Create App".

Configuring the Box App

  1. You should now be on the "Configuration" tab of your new app. If not, click on it at the top of the webpage.
  2. Copy down "Client ID" and "Client Secret", you'll need those for rclone.
  3. Under "OAuth 2.0 Redirect URI", add http://localhost:8080.
  4. For "Application Scopes", select "Read all files and folders stored in Box" and "Write all files and folders stored in box" (assuming you want to do both). Leave others unchecked.
  5. Click "Save Changes" at the top right.

All the stuff below is unnecessary and I was (unknowingly) bypassing a different error that I can no longer pinpoint. Leaving it here for reference but going through the normal Rclone prompt for remote creation should work as expected.

Getting the Access/Refresh Tokens

This is the janky part that I was unable to complete through rclone, so I did it through a python script which you can get here.

  1. Replace YOUR_CLIENT_ID and YOUR_CLIENT_SECRET on lines 8-9 with the info collected in the previous step. Then run the script.
  2. Once you get a URL, either click on it if your shell allows or copy and paste it into your browser. It should take you to Box's authorization page where you can click "Grant Access".
  3. You will be redirected to a webpage that does not exist but the URL will look like this: http://localhost:8080/?state=random_string&code=[TEMPORARY_TOKEN]. Copy the token and paste it into your shell that's running my script.
  4. The script will spit out a JSON string with your access token and refresh token, which looks something like this: {"access_token": "[ACCESS_TOKEN]", "refresh_token": "[REFRESH_TOKEN]"}. Copy this down, you'll need it for rclone.

rclone Config

My entire configuration for the box remote is:

[box]
type = box
token = {"access_token": "[ACCESS_TOKEN]", "refresh_token": "[REFRESH_TOKEN]"}
root_folder_id = [FOLDER_ID]
1 Like

I think this is the important bit and explains why the tokens didn't last more than 1 hour.

Why not choose http://127.0.0.1:53682 which is where rclone will be listening? Then when you auth from box, rclone will receive the token and configure your backend straight away.

This URL is printed when you set your own client_id as part of the rclone config process.

Yep, that would have saved me about half a day worth of doing random things and hoping for the best :sweat_smile: ...

When I first tried to follow the prompts via rclone, it was throwing errors that it was unable to launch a webserver at http://127.0.0.1:53682 despite opening the right ports and troubleshooting network issues. I just tried again and of course it seems to be working now. Leaving my post up in case it's of any use but I was overly complicating things.

1 Like

I have used @vashp2029 client_id creation description to add relevant section to our Box documentation:

3 Likes

Just a heads up for anyone that is doing this as I did, make sure you redirect has a "/" at the end or it won't work.

http://127.0.0.1:53682/

3 Likes

Does chunker waste api calls on box.com?

Every time I upload a file smaller than 4.9Gi

rclone says:

2023/08/02 18:44:09 INFO  : FILENAME.rclone_chunk.001_blahblahblah: Moved (server-side) to: FILENAME

99.99% of my files are smaller than 4.9Gi, so while I appreciate the convenience of chunker if it's doubling my api calls maybe that's problematic? Or am I totally misunderstanding?

I suppose I could personally check if any local file is above 5Gi and if not upload to box without chunker, and then in other situations I could upload to box with chunker....

like
rclone -v copy --max-size 4.9Gi local: remote:
and
rclone -v copy --min-size 4.8Gi local: chunkerremote:
? Although making the copy twice would also potentially double my number of api calls. So this solution would... not at all solve anything? (As is usual for solution I'm the one who thinks of them.)

I imagine it does "waste" api calls, but as far as I know, there isn't any clear indication on the number of API calls allotted by Box or how strict they are on enforcement. I will tell you that I'm syncing about 250 TB and about 1 TB is images which are a few MB each, so I did a sync of the image folders first since it would likely be the most API-intensive thing. I checked my platform usage the following day and either Box did not count the API calls or it takes longer than 24hrs to update API usage. Note that I'm uploading at about 50-100 MB/s so if there's is a hard cap on calls, it's not something that us mortals are going to need to worry about.

What command flags are you using? (The ones I chose slowed down my speed it feels like.)

rclone sync gdrive: box_chunker: -vv --log-file ./gdrive2box.log -P --transfers 30 --checkers 64 --fast-list --cache-dir ./cache

I should note that I don't know if cache directory is actually useful or doing anything in this case.

I thought box didn't support --fast-list, hmm, 30 transfers, and no tpslimt, feels like a recipe for hitting an apilimit, but heck, it can't hurt me to try.

I'd also imagine --no-check-dest would be even faster than --checkers 64 but again, I could always try it your way and see what I get :slight_smile:

Are you using your own client_id with that flag? I would assume Yes? (based on what thread we're in.)

Edit: I did it your way. 98.169 MiB/s vast improvement, no idea which of the 5 or 6 changes I made caused this though :slight_smile:

Edit2: --fast-list I know specifically makes googledrive perform worse, because it isn't supported, IIRC, according to another thread.

1 Like

I'm actually not using my own client ID. I scrapped that whole idea and went through the user auth flow through rclone. I'm not exactly sure which flags helped me get to 100 MB/s but I'm happy with it and until I hit any API caps, sticking to it lol

In my testing it seems like setting my --tpslimit to 64 instead of 12 is what gives me the 100MB/s for the most part, as well of course as increased --transfers

I am using my own client_id, this thread literally taught me how to do it! I couldn't've figured it out otherwise.

Edit: Wait no, I've set up my own client_id but it's on the computer getting 20Mib/s the remote server getting the100Mib/s is without client_id (the slower computer was in the middle of a session/transfer so I didn't want to interrupt it.)

So how did you go about setting up your own client ID? What app type and config did you use on the box dev console?

I literally followed your instructions.

1,2,3,4,5 the second section was entirely irrelevant.

I just made the correction the dev said to make to your step 3 I used http://127.0.0.1:53682/

I've been testing it though, there appears to be no benefit whatsoever. I ran one server without a client_id and one with it.

Even if there's no benefit now, there would potentially be a benefit if box.com saw an uptick in usage and altered their policies in some manner.

I reckon this might belong here, since the instructions on how to set up your own client_id are contained in this thread.

Studying my api usage on a platform activity report, it appears if client_id and client_secret are empty you use the rclone app appid in the report, which is listed as "Chargeable - No" and if you use your own custom client_id and client_secret it's listed as "Chargeable - Yes"....

But I can find no mention of any charges or prices for your own custom apps.

Also interestingly this page Platform Activity Report – Box Support says that non-chargeable api calls are what is limited per month, so that would mean that in theory the 50,000 or 100,000 of the business or enterprise plan applied to api calls made via rclone with blank client_id and blank client_secret...

But I cannot tell if this is because api calls made using my own client_id and my own client_secret are MORE limited/expensive or LESS limited/expensive.

So, yeah, what the heck is a chargeable custom app on box.com and where does it mention anything at all about their limits or prices.

Edit: this is clear as mud to me

"Monthly Platform API Calls" is any API call made by a Platform Application to the Service within a monthly calendar period on behalf of: (a) a Platform Application User; (b) a User; or (c) a Platform Service Account, not to exceed your allotted amount. Except as otherwise set forth in an order, excluded from Monthly Platform API Calls are API calls made on behalf of: (i) third party software application integrations that are permitted with your use of the Service; (ii) Box provided applications (e.g., the Box Web App, Box Desktop, Box Notes, Box Capture); (iii) Box provided services (e.g. Box Shuttle), if applicable. For clarity, any API calls resulting from a Platform Application will be considered chargeable if 95% of its Monthly Platform API Calls are used by you or on your behalf.

TLDR: Should I be using my own custom client_id or the public rclone one. It's tough to say.

I currently have over 400k API hits. A part with oAuth and the larger part with the public API. So far nothing has been limited and everything runs normally. I'm curious how this will play out in the future.

I'd love to know what chargeable means and if there are over usage api fees. But I'm afraid to ask them, because if none of ask, maybe they'll notice :slight_smile:

I'm not sure but afaik, there was nothing in the pricing about charges over the subscription fee. If anything, they would likely temporarily discontinue API access for your account for some amount of time for going over so I guess we'll see.

400k is a lot more than I've done so far, so that's reassuring, but uh, are you on business, business plus, or enterprise? Just asking in case it ends up mattering.