Automatically backing up your files on a web server and uploading them to Google Drive using a Python script
Creating an archive
To archive our directory, we will be using the bz2 algorithm since that gives us the best performance. Python has a module called tarfile
that helps us archive files and directories.
First, we need to import it into our script.
import tarfile
Let’s now create a function that would accept a directory path as an argument and return the name of the backup archive created.
def archive(dir): print("Archiving directory %s."%dir) now=datetime.datetime.now().isoformat().replace(':','_').split(".")[0] fileName="backup_"+now+".tar.bz2" with tarfile.open(fileName,"w:bz2") as tar: tar.add(dir) print("Directory successfully archived. Archive name: %s."%fileName) return fileName
Since the backup file will have the created date and time in its name, we should also import the datetime
module.
Here, we are storing the current date and time in a variable and then concatenating that to “backup_” to get the name of the archive file.
Then, using the with
statement in Python, we are opening a tar file with bz2 compression (with the tar.bz2 extension) and adding the directory to that file. Once the archiving is successful, the name of the file is returned by the function.
Now, we have our backup file.
Creating a Google Drive Service
We need to create a Google Drive Service to upload our file to Google Drive. So, let’s create a function that would accept the path to the token file we downloaded from Google Project Console as an argument and return the created service object.
First, you need to install the Google API Python Client using pip.
$ pip install --upgrade google-api-python-client
Then, import the following modules into the script.
from google.oauth2 import service_account
from googleapiclient.discovery import build
def createDriveService(token): SCOPES=['https://www.googleapis.com/auth/drive'] SERVICE_ACCOUNT_FILE=token credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE, scopes=SCOPES) return build('drive','v3',credentials=credentials)
Here, we specify the scope of our service as drive
so that we can access our Google Drive. Then we create a credential object using the token file, and pass that credential object into the build method to get our service object. We can now access our Google Drive programmatically.
Removing old backups
As you may have already guessed, we are going to create another function that will accept the service object as an argument.
We will search our drive for archives beginning with the name “backup”. We will then store the id of these files as keys, and the names as values in a dictionary. If there is more than one file, we will create a list of names and sort the list in the ascending order. The last member of the list will be the latest file since the names of our files will all have their created date and time. Then, we will delete all the files except the latest one.
def clearPastBackups(service): page=None filesObj={} while True: files=service.files().list(q="mimeType='application/gzip' and name contains 'backup'",pageToken=page, fields="nextPageToken, files(id,name)").execute() for file in files.get('files',[]): filesObj[file.get('id')]=file.get('name') page=files.get('nextPageToken',None) if page is None: break if not(not filesObj or len(filesObj)<2): print("Two or more previous backups found.") latest=sorted(list(filesObj.values()))[len(filesObj)-1] for l in sorted(list(filesObj.values())): print(l) print ("Backup to be kept: %s."%latest) print("Deleting all but the latest backup...") for file in filesObj: if filesObj[file]!=latest: service.files().delete(fileId=file).execute() print("Backup named %s deleted."%filesObj[file])
We use service.files().list()
to search for files in our drive. The mimeType application/gzip
specifies that we are searching for archive files. If there are more files, then Google might split the list of files into pages and return only the first page. The while loop uses the page token to iterate over all the pages so that at the end of the loop we will have all the matching files.
The rest of the code, I believe, is self-explanatory.
Leave a Reply