Niteesh Kanungo

Chicago, IL, USA | - (217)778-3469 | niteeshkanungo@gmail.com

| Google Cloud Certified Professional Data Engineer with 7+ years of experience in Industry/Academia. | UIUC, JNU, VIT alum | 🇺🇸 🇮🇳 |


Experience

Senior Data Engineer

Building scalable Data pipelines with AWS+GCP hybrid Cloud infrastructure using Apache airflow + DBT (Data build tool) processing up to 10 TB of data daily.

Automating operational workloads to assess data completeness in data lake.

Developing de-identification process/pipelines to remove PHI using AWS Batch and GCP Kubernetes engine at scale.

December 2019 - working

Data Engineer

Machine Learning/IoT implementation for one of the biggest Electric companies in the midwest

Building end to end Data Infrastructure for for 3 live Applications

CI/CD Deployment of Data definition Packages through Jenkins

Feb 2019 - December 2019

Data Scientist

Development of Deep Learning (CNN) Models (82% acc.) using AWS Sagemaker for auto-moderation of images, saving on replacement with manual Moderation up to $92k-$100k in a year.

Developed a script using NLTK, Flask-API running on AWS to get insights from thousands of reviews posted on client websites.

Refactored R-script for review analysis to Python to already integrated solutions with in the network.

Python automated scripting using Boto3 library for data quenching from AWS S3 buckets.

Used Pandas and Matplotlib for Statistical Analysis of Salesforce data from Customer feedback.

Used Docker images to restrict the workflow breaking from dependency updates.

Used JIRA to assign, track the stories and tasks.

Developed Restful Microservices using Flask and deployed single point of entry on AWS servers using EC2 instances.

Automated most of the daily task using python scripting.

May 2018 - August 2018

Data Analyst

Developing dashboards from Data sources generated by 10,000 data points at the plant.

Estimating and analyzing the production cost as well as improving efficiency with predictive modeling. Also, calculating emissions with EPA norms.

Analyzed various logs that are been generated and used various Python libraries to predict/forecast next occurrence of event with notification.

Built a Monte Carlo simulation for predicting behavior of Data points and evaluating errors using pandas and Matplotlib.

Used OpenRefine & Python to clean the data, created a relational model of the dataset using SQL on SQLite database. Generated prospective provenance information of the process involved using YesWorkflow and datalog.

Built various graphs for business decision-making using Python matplotlib library.

Designed and developed various analytical reports from data sources by blending data onto a single worksheet.

Worked on web-scraping the Webpages and used modules like urllib2, Beautiful Soup and pandas.

Automated the existing scripts for performance calculations using Numpy.

Debugging and testing applications and fine tuning the performance

June 2017 - May 2018

Teaching Assistant (T.A.)

CS598 (Theory & Practice of Data Cleaning)

June 2017 - September 2017

Software Engineer

Improved operational efficiency of logistics by 7% of a major Retail company through Inventory Management and analysis of GIS Data.

A/B testing to optimize churn rate and Improved Web traffic of client’s website by up to 20% through Web Analytics, SEO and Web Portal optimizations.

Composed moderate to complex SQL queries to analyze large and complex data sets.

Wrote MapReduce code to make un-structured data into semi- structured data and loaded into Hive tables.

Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.

Played a key role in Supporting and deploying Cloud computing services, including IaaS, PaaS, SaaS deployments.

Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Involved in analysis, specification, design, and implementation and testing phases of Software Development Life Cycle (SDLC) and used Agile methodology for developing application.

Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.

Making recommendations to the team in terms of appropriate testing techniques, shared testing tasks.

Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.

June 2015 - December 2016

Computational Biologist

Statistical analysis of Microarray, Nextgen sequencing data using computational approach and open source tools.

Developed scripts for automating tasks using Python and UNIX shell scripting.

Worked on Python scripts to parse JSON documents and load the data in database.

Maintained technical documentation for resolved issues for future reference

Worked on Python OpenStack APIs and used NumPy for Numerical analysis.

Setup automated cron jobs to upload data into database, generate graphs, bar charts, upload these charts to wiki, and backup the database.

Ensured high quality data collection and maintaining the integrity of the Healthcare data.

Resolving Complexity in the scripts of the website due to the complex logic and correlations.

Designed and developed data management system using MySQL

Wrote python modules to extract/load asset data from the MySQL source database.

Used PyUnit, a python unit test framework for all Python applications.

Ability to successfully implement the application in LINUX environment

Carried out various mathematical operations for calculation purpose using Python library Numpy.

September 2014 - May 2015

Graduate Research Fellow

X-Ray diffraction data preprocessed/analyzed using HKL2000 and XDS, collected at source (JNUCAR) and synchrotron (Indus-1,2 and RRCAT) for Drug Design, Protein structure prediction.

Structure determination by experimental Phasing using CCP4 and Phenix suite, visualization and refinement using Coot (Linux).

Responsible for handling the integration of database system.

Developed rich user interface guidelines and standards throughout the development and maintenance of the website using CSS, HTML and JavaScript

Upgraded existing UI with HTML, CSS, jQuery and Bootstrap.

Involved in Design, Development, Deployment, Testing, and Implementation of the application.

Designed Interface using Bootstrap framework.

Coding and execution of scripts in Python/Unix/VB.

Design, develop, test, deploy and maintain the website.

Used UML Tools to develop Use Case diagrams, Class diagrams, Collaboration and Sequence Diagrams, State Diagrams and Data Modeling.

December 2013 - August 2014

Education

University of Illinois at Urbana Champaign, IL, USA

Masters (M.S.) – Information Management/Data Science
(Data Science Track) – Machine Learning, Data Mining, Data Stats and Information, Data Visualization Information Storage and Retrieval, Programming Analytics, Information Modelling and Business Analytics.
December 2018

VIT University, Vellore, TN, India

Masters (M.S.) – Bioinformatics and Biotechnology
August 2014

Christian Eminent College, Indore, MP, India

Post Graduate Diploma in Computer Application (PGDCA)
July 2012

Institute of Professional Studies, Indore, MP, India

Bachelors (B.S.) - Computer Science and Biotechnology
August 2011

Skills

Programming Languages & Tools

Operating System Unix, Linux-Ubuntu, Kali, CentOS, Windows and MacOS
Programming Languages Python, JavaScript, Octave, Scala (Inter), R(Inter), Shell Scripting
Database MySQL and PostgreSQL, Hadoop- HDFS
Analytical Tools Numpy, Pandas, SciPy, Matplotlib, Tableau
Cloud Technologies Amazon Web Services (AWS), Google Cloud Platform (GCP), Databricks
Machine Learning Tools TensorFlow, Scikit Learning, MLlib, Keras and Weka
Deployment Tools Heroku, Jenkins
Data Cleaning Open Refine (Google)
Tools Spyder, Visual Studio, Tableau Analytics
Defect Tracking JIRA and VersionOne
Frameworks Flask and Django
Version Control Systems GIT
IDE’s/Development Tools PyCharm, IntelliJ, Atom, Eclipse, Sublime Text, Jupyter Notebooks

Projects

  • Deep Learning: Automated rating system for images classification running on Flask/AWS trained on moderated image data.

  • Natural Language processing: Reviews analytics platform for products ratings and reviews in addition to CGC (Consumer generated content).

  • Programming Analytics/Predictive Modelling: Build a Monte Carlo simulation for predicting the prices of Cryptocurrencies with evaluation for the next 2 years using pandas and Matplotlib.

  • Google Analytics: Developed dashboard for “Knowledge base” website for a major B2B company helping the team in taking necessary actions for improvement.

  • Data Visualization: Developed data visualization for CU-MTD transportation system using numpy, pandas, matplotlib and bqplot library in python. Created interactive modules by writing callback functions for the components interaction.

  • IoT Project/Supply Chain Management: Developed a GPS tracker device using raspberry pi-3 and U-Blox Neo-6M GPS from scratch. Used AWS for storing the data and python for implementation of geo-fencing module.

  • Data Cleaning: Used OpenRefine & Python to clean the NYPL dataset, created a relational model of the dataset using SQL on SQLite database. Generated prospective provenance information of the process involved using YesWorkflow and datalog, while working with a team of 4 members.

  • Hackathon – 1.) Ranked in top 5 @Data Synchrony Financials UIUC. 2.) Illinois, in top 8 @St. Mary’s Health care Datathon, South-Bend, Indiana.

  • ********* Under Construction **********

Publications


  • Suresh Gudala, Uzma Khan, Niteesh Kanungo, Srinivas Bandaru, Anuraj Nayarisseri, MS Parihar, Hema Prasad Mundluru., 2015.“Identification and Pharmacological Analysis of High Efficacy Small Molecule Inhibitors of EGF-EGFR Interactions in Clinical Treatment of Non-Small Cell Lung Carcinoma: A Computational Approach.” Asian Pacific Journal of Cancer Prevention (2015) Volume 16, Issue 4, 8191-8196 PubMed ID: 26745059.

  • Kanungo Niteesh, Sharma Ruby, Kashyap Vipin, Saxena K. Ajay., 2014. "Expression, Purification and Structural Analysis with Computational Aspects of Plasmodium Cysteine Protease Inhibitor - Falstatin." School of Life Science, Jawaharlal Nehru University, New Delhi- 110067.

  • Roy B. Upasana, Kanungo Niteesh, Sudhakaran R., 2013. “Identification of Hemocyanin gene in freshwater crab, Paratelaphusa hydrodomous.” 7th International Conference on “Science, Engineering and Technology”, VIT University (2013)

  • Kanungo Niteesh, Sivakumar A., 2013.“In-silico Analysis of Novel compounds against Mycobacterium Tuberculosis.” 6th International Conference on “Science, Engineering and Technology”, VIT University (2013)

  • Pallavali Amrutha, Kanungo Niteesh, Nikita Fiji, Jabez Osborne., 2012. “Bioaccumulation and Bio-sorption of Lead by indigenous Bacterial Population of Tannery Sludge.” 5th International Conference on “Science, Engineering and Technology”, VIT University (2012)

Certifications & Awards


  • Computer Vision and OpenCV

  • Django full Stack Web Developer

  • Google Data Engineer and Cloud Architect

  • Coursera - Machine Learning (Andrew Ng)

  • Apache Drill Essentials

  • Apache Spark and Scala Advance

  • Pyspark and Python for Big Data

  • Python for Data Science(DataCamp)

  • Python for Financial Analysis and Algorithmic Trading

  • Tensorflow for Deep Learning

  • SAS Programming and Statistics Certificate

  • Python Data Science and Machine Learning

  • PostgreSQL - SQL Bootcamp

  • Scala and Spark for Big Data and Machine Learning

  • Python Dashboard and Plotly


*For Certification License, Research Publication and validation please see Linkedin account.



Scan for contact details


Contact Me


Download CV