| Google Cloud Certified Professional Data Engineer with 7+ years of experience in Industry/Academia. | UIUC, JNU, VIT alum | 🇺🇸 🇮🇳 |
Building scalable Data pipelines with AWS+GCP hybrid Cloud infrastructure using Apache airflow + DBT (Data build tool) processing up to 10 TB of data daily.
Automating operational workloads to assess data completeness in data lake.
Developing de-identification process/pipelines to remove PHI using AWS Batch and GCP Kubernetes engine at scale.
Machine Learning/IoT implementation for one of the biggest Electric companies in the midwest
Building end to end Data Infrastructure for for 3 live Applications
CI/CD Deployment of Data definition Packages through Jenkins
Development of Deep Learning (CNN) Models (82% acc.) using AWS Sagemaker for auto-moderation of images, saving on replacement with manual Moderation up to $92k-$100k in a year.
Developed a script using NLTK, Flask-API running on AWS to get insights from thousands of reviews posted on client websites.
Refactored R-script for review analysis to Python to already integrated solutions with in the network.
Python automated scripting using Boto3 library for data quenching from AWS S3 buckets.
Used Pandas and Matplotlib for Statistical Analysis of Salesforce data from Customer feedback.
Used Docker images to restrict the workflow breaking from dependency updates.
Used JIRA to assign, track the stories and tasks.
Developed Restful Microservices using Flask and deployed single point of entry on AWS servers using EC2 instances.
Automated most of the daily task using python scripting.
Developing dashboards from Data sources generated by 10,000 data points at the plant.
Estimating and analyzing the production cost as well as improving efficiency with predictive modeling. Also, calculating emissions with EPA norms.
Analyzed various logs that are been generated and used various Python libraries to predict/forecast next occurrence of event with notification.
Built a Monte Carlo simulation for predicting behavior of Data points and evaluating errors using pandas and Matplotlib.
Used OpenRefine & Python to clean the data, created a relational model of the dataset using SQL on SQLite database. Generated prospective provenance information of the process involved using YesWorkflow and datalog.
Built various graphs for business decision-making using Python matplotlib library.
Designed and developed various analytical reports from data sources by blending data onto a single worksheet.
Worked on web-scraping the Webpages and used modules like urllib2, Beautiful Soup and pandas.
Automated the existing scripts for performance calculations using Numpy.
Debugging and testing applications and fine tuning the performance
CS598 (Theory & Practice of Data Cleaning)
Improved operational efficiency of logistics by 7% of a major Retail company through Inventory Management and analysis of GIS Data.
A/B testing to optimize churn rate and Improved Web traffic of client’s website by up to 20% through Web Analytics, SEO and Web Portal optimizations.
Composed moderate to complex SQL queries to analyze large and complex data sets.
Wrote MapReduce code to make un-structured data into semi- structured data and loaded into Hive tables.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
Played a key role in Supporting and deploying Cloud computing services, including IaaS, PaaS, SaaS deployments.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Involved in analysis, specification, design, and implementation and testing phases of Software Development Life Cycle (SDLC) and used Agile methodology for developing application.
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Making recommendations to the team in terms of appropriate testing techniques, shared testing tasks.
Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
Statistical analysis of Microarray, Nextgen sequencing data using computational approach and open source tools.
Developed scripts for automating tasks using Python and UNIX shell scripting.
Worked on Python scripts to parse JSON documents and load the data in database.
Maintained technical documentation for resolved issues for future reference
Worked on Python OpenStack APIs and used NumPy for Numerical analysis.
Setup automated cron jobs to upload data into database, generate graphs, bar charts, upload these charts to wiki, and backup the database.
Ensured high quality data collection and maintaining the integrity of the Healthcare data.
Resolving Complexity in the scripts of the website due to the complex logic and correlations.
Designed and developed data management system using MySQL
Wrote python modules to extract/load asset data from the MySQL source database.
Used PyUnit, a python unit test framework for all Python applications.
Ability to successfully implement the application in LINUX environment
Carried out various mathematical operations for calculation purpose using Python library Numpy.
X-Ray diffraction data preprocessed/analyzed using HKL2000 and XDS, collected at source (JNUCAR) and synchrotron (Indus-1,2 and RRCAT) for Drug Design, Protein structure prediction.
Structure determination by experimental Phasing using CCP4 and Phenix suite, visualization and refinement using Coot (Linux).
Responsible for handling the integration of database system.
Developed rich user interface guidelines and standards throughout the development and maintenance of the website using CSS, HTML and JavaScript
Upgraded existing UI with HTML, CSS, jQuery and Bootstrap.
Involved in Design, Development, Deployment, Testing, and Implementation of the application.
Designed Interface using Bootstrap framework.
Coding and execution of scripts in Python/Unix/VB.
Design, develop, test, deploy and maintain the website.
Used UML Tools to develop Use Case diagrams, Class diagrams, Collaboration and Sequence Diagrams, State Diagrams and Data Modeling.
Operating System | Unix, Linux-Ubuntu, Kali, CentOS, Windows and MacOS |
Programming Languages | Python, JavaScript, Octave, Scala (Inter), R(Inter), Shell Scripting |
Database | MySQL and PostgreSQL, Hadoop- HDFS |
Analytical Tools | Numpy, Pandas, SciPy, Matplotlib, Tableau |
Cloud Technologies | Amazon Web Services (AWS), Google Cloud Platform (GCP), Databricks |
Machine Learning Tools | TensorFlow, Scikit Learning, MLlib, Keras and Weka |
Deployment Tools | Heroku, Jenkins |
Data Cleaning | Open Refine (Google) |
Tools | Spyder, Visual Studio, Tableau Analytics |
Defect Tracking | JIRA and VersionOne |
Frameworks | Flask and Django |
Version Control Systems | GIT |
IDE’s/Development Tools | PyCharm, IntelliJ, Atom, Eclipse, Sublime Text, Jupyter Notebooks |