Wednesday, April 16, 2014

How to use pycurl to provide status bar and percentage using python

Today I was tasked to write a python script that uses pycurl to upload a 30GB binary file to a cloud storage portal. The problem is using curl doesn't provide useful information such as progress and percentage completed. This is very useful if you want to redo parts of the artifact. Since our cloud storage supports partitioned uploads it becomes all the more important to upload in parts and provide the percentage uploaded. I used pycurl documentation to figure out most of the args that I needed to set for the upload. However the part which I felt most cute about was the progress bar. The way I hooked it up to the pycurl was using call back mechanism which pycurl provides Disclaimer: This code has been tested out in Redhat Linux v6 machine.
import pycurl
import os, sys

# pretty print progress and percentage completed
def progress(total_to_download, total_downloaded, total_to_upload, total_uploaded):
  if total_to_upload:
    percent_completed = float(total_uploaded)/total_to_upload       # You are calculating amount uploaded
    rate = round(percent_completed * 100, ndigits=2)                # Convert the completed fraction to percentage
    completed = "#" * int(rate)                                     # Calculate completed percentage
    spaces = " " * ( 100 - completed)                               # Calculate remaining completed rate      
    sys.stdout.write('[%s%s] %s%%' %(completed, spaces, rate))      # the pretty progress [####     ] 34%  
    sys.stdout.flush()


def upload_to_cloud(url, filename, is_proxy=False):
  if not os.path.exists(filename):
    raise Exception('did not find file')

  # initialize py curl
  c = pycurl.Curl()
  c.setopt(pycurl.UPLOAD, 1)
  if is_proxy:
    c.setopt(pycurl.PROXY, 'XXX')
    c.setopt(pycurl.PROXYPORT, 80)
  
  #For authenticated cloud store
  c.setopt(pycurl.USERPWD, 'XXXX' + ':' + 'XXXX')
  c.setopt(pycurl.READFUNCTION, open(filename, 'rb').read)
  c.setopt(pycurl.VERBOSE, 0)
  c.setopt(pycurl.URL, url)
  c.setopt(pycurl.NOPROGRESS, 0)
  c.setopt(pycurl.PROGRESSFUNCTION, progress)

  #Set size of the file to be uploaded
  filesize = os.path.getsize(filename)      # you can simply open the file and do a byte counter for this. Initially that's what I did then moved to os API
  c.setopt(pycurl.INFILESIZE, filezie)
  
  # Start transfer
  print 'Uploading file %s to url %s' %(filename, url)
  c.perform()         # this kicks off the pycurl module with all options set.
  c.close()


if __name__=='__main__':
  if len(sys.argv) < 2:
    print 'Usage python upload_to_cloud URL FILE_PATH'

  upload_to_cloud(sys.argv[1], sys.argv[2])