How I deleted all of clients archive files in Azure storage by mistake
Hello Reader,
(If you want to cut to the chase go to this position of the blog: "So long story short and learning from hard-way:")
I am writing this to share my nightmare experience that I encountered recently which put me in a embarrassing situation in my work with clients and colleagues.
Databricks:
I really like this product and I really hate this product. Reason being you don't have any clue why a command or lines of script does not work even after spending hours and when it works you love it.
Mount:
It is a cool concept of mounting an Azure storage account using managed identity and then you can start read, write to it like any other data source. I was trying to do mounting but could not get to do with one of the managed identity I had it with storage readers access.
I got help from another colleague to mount a storage container which was already mounted (POC subscription but it is considered as production.. one such case where client starts with POC and starts treating the box like production). I was able to mount and it was Eureka! moment for me. I was able to read.
The disaster:
I was working on optimizing mapping data flow in data factory (another MS product where I have this mixed feeling of hate/love.. MS took years to give us excel connector. How ironic, MS excel cant be connected using MS ADF but non MS products like Talend were supporting that for years..).
I had 10 million rows in a data frame(equivalent to mapping data flow final output). I decided to write to the so-called POC subscription root container location to get the files in csv format so that I can do BULK INSERT(awesome feature not explored by many in Azure SQL). I was able to write in few minutes, I was able to download and view the csv file. It was a success! Hours later one of the colleague pinged me and asked what i do to the storage account as he was only seeing two files (as in image below) in all folders and nothing else.
That was the moment I freaked out and tried to check storage account data protection option:
Soft delete enabled? No
Versioning? No
Snapshot ? No
Redundancy option? Poor LRS
Worst part is there is a user drop zone in a shared folder were users drop their files. ADF picks up from this location and moves to this archive location in storage account and deletes it from shared folder.
Raised ticket to MS with critical severity, MS duly responded back after investigation after almost a day (wonder why it took so long..) just to say files cant be recovered as LRS can only retain files for about an hour. After that garbage values are cleared and you are only left to hue and cry.
So long story short and learning from hard-way:
- Do not mount storage container with important data from a POC Databricks notebook to write.
- Even if you plan to do so, do not use "Overwrite" option in spark construct while writing, use "Append".
- Enable better data protection features like soft delete, versioning take frequent snapshots and have ZRS or GRS as minimum. Enable change feed (preview feature available in few locations).
- Reach out to MS ASAP even if you end up deleting or overwriting some contents accidentally.
- I was sure earlier that LRS retains 3 copies and can get the files back. But fact it only keeps current version copy in 3 places in same region.