Use-server-modtime flag has no effect on Azure Data Lake Gen2 Storage

What is the problem you are having with rclone?

The flag: --use-server-modtime has no effect on a remote of type azure blob
Note: I am connecting to an azure storage account of type "Datalake Gen2".

When I upload a file "report1.txt" from my Linux server to an Azure Data Lake Gen2 storage, the
rclone lsl command always shows the time when the file was created on Linux (Metadata key Mtime).
I was expecting to see the time when the file was uploaded to the storage container "server-modtime" (Last Modified time in the screenshot)

( see also attached screenshot)

Both commands provide the same result:
rclone lsl datalake2:sbdatalake/reports/ --use-server-modtime
rclone lsl datalake2:sbdatalake/reports/

1048576 2022-01-26 09:55:21.588528368 report1.txt

Note:
2022-01-26 09:55 : is the time the file was created on Linux
I uploaded the file today 28.Jan to the container.
I was expecting to see the date when I uploaded the file: 2022-01-28 .. in case I use the flag --use-server-modtime

In AWS S3 it works without any issues. When I provide the --use-server-modtime flag it shows the date of the upload to the bucket.

Run the command 'rclone version' and share the full output of the command.

rclone v1.57.0

  • os/version: Microsoft Windows 10 Pro 2009 (64 bit)
  • os/kernel: 10.0.19042.1466 (x86_64)
  • os/type: windows
  • os/arch: amd64
  • go/version: go1.17.2
  • go/linking: dynamic
  • go/tags: cmount

Which cloud storage system are you using? (eg Google Drive)

Azure Blob Storage, Data Lake Gen2 ( basically Azure Blob with support for directories)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

Both commands provide the same result:
rclone lsl datalake2:sbdatalake/reports/ --use-server-modtime
rclone lsl datalake2:sbdatalake/reports/ 

 1048576 2022-01-26 09:55:21.588528368 report1.txt
![2022-01-28 16_36_25-reports_report1.txt - Microsoft Azure|455x500](upload://n7tJZW6qozTmNwox9LAWtXzbDpo.png)

The rclone config contents with secrets removed.

[datalake2]
type = azureblob
account = stonebranchsd
service_principal_file = C:\coding\datalake\azure-principal.json

azure-principal.json:
{
  "appId": "xyz",
  "displayName": "abc",
  "password": "klm",
  "tenant": "abc"
}

connectio to azure data lake gen2 is via service principal

A log from the command with the -vv flag

C:\coding\datalake\rclone>rclone lsl -vv datalake2:sbdatalake/reports/ --use-server-modtime
2022/01/28 17:05:54 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "lsl" "-vv" "datalake2:sbdatalake/reports/" "--use-server-modtime"]
2022/01/28 17:05:54 DEBUG : Creating backend with remote "datalake2:sbdatalake/reports/"
2022/01/28 17:05:54 DEBUG : Using config file from "C:\\Users\\nils.buer\\AppData\\Roaming\\rclone\\rclone.conf"
2022/01/28 17:05:54 DEBUG : fs cache: renaming cache item "datalake2:sbdatalake/reports/" to be canonical "datalake2:sbdatalake/reports"
  1048576 2022-01-26 09:55:21.588528368 report1.txt
2022/01/28 17:05:54 DEBUG : 6 go routines active

C:\coding\datalake\rclone>rclone lsl -vv datalake2:sbdatalake/reports/
2022/01/28 17:06:11 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "lsl" "-vv" "datalake2:sbdatalake/reports/"]
2022/01/28 17:06:11 DEBUG : Creating backend with remote "datalake2:sbdatalake/reports/"
2022/01/28 17:06:11 DEBUG : Using config file from "C:\\Users\\nils.buer\\AppData\\Roaming\\rclone\\rclone.conf"
2022/01/28 17:06:11 DEBUG : fs cache: renaming cache item "datalake2:sbdatalake/reports/" to be canonical "datalake2:sbdatalake/reports"
  1048576 2022-01-26 09:55:21.588528368 report1.txt
2022/01/28 17:06:11 DEBUG : 6 go routines active

hi,
can you copy a single one byte file?

post a full debug log
or
for a deeper look
--dump=headers --retries=1 --low-level-retries=1 --log-level=DEBUG --log-file=rclone.log

Hi,
thanks for supporting me.
what I noticed is that when I upload a file from my local disc to an azure container using the azure console everything works fine (No Metadata.Mtime property is set -> see screenshot).

The command: rclone lsf -vv datalake2:sbdatalake/reports/report2.txt --format "tsp" --use-server-modtime
returns the modification time, when I uploaded the file to the container ( and not the time, when I last modified it on my local file system) -> this is what I want
=> 2022-01-29 01:55:48;104;report2.txt

When I upload the same file using the rclone copy command:

rclone copy -vv windows:C:\demo\report2.txt datalake2:sbdatalake/reports/

than a Metadata.Mtime property is set and the lsf command:
rclone lsf -vv datalake2:sbdatalake/reports/ --format "tsp" --use-server-modtime
always returns the time when I last modified the file on my local disc ( which was last year).
the flag "--use-server-modtime" does not change this behaviour. the command always returns the value in the field Metadata.Mtime property.

Note:
See below the requested log-file from the copy command:
rclone copy windows:C:\demo\report1.txt datalake2:sbdatalake/reports/ --dump=headers --retries=1 --low-level-retries=1 --log-level=DEBUG --log-file=rclone.log
rclone.log (9.4 KB)

sure, but this is getting very confusing,

each example seems uses a different source file with a different set of dates?
so going forward, let's just use the latest example, reports/report2.txt

and using both rclone lsl and rclone lsf?
since your latest example uses rclone lsf, let's use that.

please post the output of these commands

rclone lsf C:\demo\report2.txt --format "tsp"
rclone lsf datalake2:sbdatalake/reports/report2.txt --format "tsp"
rclone lsf datalake2:sbdatalake/reports/report2.txt --format "tsp" --use-server-modtime

and for each command, enclose the command and console output with three backticks so it looks like

rclone lsf file.txt --format="tsp" 
2000-01-01 00:00:00;1;file.txt

File uploaded via exploerer using azure console :

C:\coding\datalake\rclone>rclone lsf C:\demo\report2.txt --format "tsp"
2021-04-08 15:43:14;104;report2.txt

C:\coding\datalake\rclone>rclone lsf datalake2:sbdatalake/reports/report2.txt --format "tsp"
2022-01-29 01:55:48;104;report2.txt

C:\coding\datalake\rclone>rclone lsf datalake2:sbdatalake/reports/report2.txt --format "tsp" --use-server-modtime
2022-01-29 01:55:48;104;report2.txt

File uploaded using rcclone copy command: rclone copy -vv windows:C:\demo\report2.txt datalake2:sbdatalake/reports/

C:\coding\datalake\rclone>rclone lsf C:\demo\report2.txt --format "tsp"
2021-04-08 15:43:14;104;report2.txt

C:\coding\datalake\rclone>rclone lsf datalake2:sbdatalake/reports/report2.txt --format "tsp"
2021-04-08 15:43:14;104;report2.txt

C:\coding\datalake\rclone>rclone lsf datalake2:sbdatalake/reports/report2.txt --format "tsp" --use-server-modtime
2021-04-08 15:43:14;104;report2.txt

ok, that last example, can see the problem, this output is not correct

rclone lsf datalake2:sbdatalake/reports/report2.txt --format "tsp" --use-server-modtime
2021-04-08 15:43:14;104;report2.txt

azure data lake is built on top of azure blob storage.
i would test using azure blob storage, using that same local report2.txt

Hi,
I tested it on AWS S3, Azure Blob storage and Azure Blob storage with ADLS Gen2 ( Datalake) enabled. On AWS S3 is works fine. When using the flag use-server-modtime I see the data, when I uploaded the file to the container.
On Azure the flag has no effect independent if it is Azure Blob storage with ADLS Gen2 enabled or not.

Azure Blob storage with ADLSGen2 (Datalake) enabled

File uploaded using rcclone copy command: rclone copy -vv windows:C:\demo\report2.txt datalake2:sbdatalake/reports/
C:\coding\datalake\rclone>rclone lsf C:\demo\report2.txt --format "tsp"
2021-04-08 15:43:14;104;report2.txt

C:\coding\datalake\rclone>rclone lsf datalake2:sbdatalake/reports/report2.txt --format "tsp"
2021-04-08 15:43:14;104;report2.txt

C:\coding\datalake\rclone>rclone lsf datalake2:sbdatalake/reports/report2.txt --format "tsp" --use-server-modtime
2021-04-08 15:43:14;104;report2.txt

Azure Blob storage

File uploaded using rcclone copy command: rclone copy -vv windows:C:\demo\report2.txt azure:sbzure/reports/

C:\coding\datalake\rclone>rclone lsf C:\demo\report2.txt --format "tsp"
2021-04-08 15:43:14;104;report2.txt

C:\coding\datalake\rclone>rclone lsf azure:sbzure/reports/report2.txt --format "tsp"
2021-04-08 15:43:14;104;report2.txt

C:\coding\datalake\rclone>rclone lsf azure:sbzure/reports/report2.txt --format "tsp" --use-server-modtime
2021-04-08 15:43:14;104;report2.txt

AWS S3 Blob storage

File uploaded using rcclone copy command: rclone copy -vv windows:C:\demo\report2.txt awss3:sbdatalakes3/reports/

C:\coding\datalake\rclone>rclone lsf C:\demo\report2.txt --format "tsp"
2021-04-08 15:43:14;104;report2.txt

C:\coding\datalake\rclone>rclone lsf awss3:sbdatalakes3/reports/report2.txt --format "tsp"
2021-04-08 15:43:14;104;report2.txt

C:\coding\datalake\rclone>rclone lsf awss3:sbdatalakes3/reports/report2.txt --format "tsp" --use-server-modtime
2022-02-01 12:32:18;104;report2.txt

It looks like this isn't implemented for azure blob storage.

It would be relatively straight forward to implement if you wanted to have a go?

Hi Nick,
I had a quick look. It seems azure blob should work similarly, then AWS S3.
When the flag "--use-server-modtime" is set we need to ignore the value in the metadata key: mtime and use the LastModified time ( hopefully returned in the http headers - need to check that).
When the flag "--use-server-modtime" is not set we will first try to read the metadata key mtime and if that isn't present the LastModified returned is used.
I think I need to add to the azureblob.go file the function
func (o *Object) ModTime(ctx context.Context) time.Time {
if o.fs.ci.UseServerModTime {
return o.lastModified
}
...

I ask our Software Architect ( rclone user: asaglam0) to help me on the implementation for the "--use-server-modtime" for Azure and if possible also for Google GCS.
Is there a "How-to" document we could read upfront on the development process or should I just send you the changes we made after we tested everything?

Sounds like you are on the right track :slight_smile:

The contributing doc is here: rclone/CONTRIBUTING.md at master · rclone/rclone · GitHub

That should contain everything you need to know.

Feel free to ask more questions if you need more help :slight_smile:

Thanks a lot. This is what I was looking for.
We planning to work on this next week.

1 Like

Hi Nick,
Abdullah (rclone user: asaglam0) our Software Architect fixed the issue with the --use-server-modtime flag when using Azure Storage buckets as remote. When the flag --use-server-modtime is provided, the server modified time is used instead of the object metadata. The behavior is now the same as it is currently for the AWS S3 remote. Note: Once you accept our fix, we will also implement it for Google GCS. Abdullah plans to upload our changes within the next week.

That sounds great - thanks :slight_smile: Look forward to the pull request.

Hi,
I will take it up next week. fyi.
Regards

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.