Progress So Far

I just realized that I had not retrospected my data science work in 2018. With the pressure of graduation, I kind of got lost in heavy workload. However, I think I still need to conclude my year 2018 and look forward to the coming year. I decide to be a Data Scientist at the end of 2017 after I found I had no interest in software engineering. I love analytics and find mystry behind numbers.

My background is in statistics, but to be honest, before going to graduate school, I have little knowledge about data science. Even data science is not a popular word in 2015 when I stepped into Carnegie Mellon. However, world changes so fast. Before 2015, I was a undergraduate student with linear regression and statistical inference in mind without knowledge about machine learning and data focused Python skills. Going to Carnegie Mellon is a turning point in my life and help me become a data science practitioner in all aspects, still on the way though.

Acheivements in 2018

I started preparing my data science career since 2018 (officially). At the beginning of the year, I learned deep learning from coursera and finished four courses in the specialization. Also, I started learning relational database management in SQL from class. Later in the Mid-March, I learned Data Science bootcamp in Python from Udemy. Meanwhile, I took class in Machine Learning Foundation. Data mining taken at CMU helped me practice my skills in Pandas and scikit learn packages as well as Tableau. Summer internship from May to August gave me a chance to manage the entire data science pipeline from data cleaning, visualization, modeling and communication, especially in cluster analysis.

From the new semester, I came to Pittsburgh, main campus of Carnegie Mellon, providing more advanced courses with high quality. I took Introduction to Machine Learning, Data Science for Product Managers, A/B Testing and Data Structure for Application Programmers. Introduction to Machine Learning required building classifiers from the scratch. It was really tough for some homework but worth taking since I felt like I understood algorithms better. Meanwhile, I took courses in Datacamp to review Python and R.

Overall, I complished:

  • Data-focused Python
  • SQL
  • A/B Testing
  • Machine Learning
  • Visualization using Tableau, Matplotlib, Seaborn, NetworkX
  • Intro-level Deep learning
  • Cluster analysis
  • Review basic statistics
  • Data Science Pipeline

So far in 2019

I started applying for data scientist roles since the beginning of September, but did not get offers. I finished some online assessments but never heard back. I started believing that it is really hard for a new grad to land a job offer. From the beginning of 2019, I started focusing on preparing data science interviews. During winter break, I reivewed some knowledge in machine learning which I was not familiar with before, like gradient boosting and some advanced techniques in scikit learn. Also, I started learning business content to help improve my business acumen. I read lean analytics which is a great book telling important metrics in business world. To make sure I am ready for data challenges, I purchased data challenge collection to practice when I was free.

This new semester, I took NoSQL database, Intro to Deep Learning, Big Data and Large Scale computing, R for Data Science. I also worked as a teaching assistant for a machine learning course for Heinz College. I want to prepare myself for the real large dataset in the reality. Understanding distributed systems and knowing cloud computing is essential. Intro to Deep Learning is really a tough course and worth taking time. Moreover, I used AWS to train model for the first time. NoSQL is also a good course let me know about key-value based, column-family, document based, and graph model. The reason I took R for Data Science is because I want to refresh my skills in R. I used R a lot back in Mizzou, but not really for Data Science. I want to know how to use R as well.

Overall, I complished:

  • NoSQL Database
  • AWS deployment
  • Product content (a little)

To be honest, seeking jobs is time consuming. For different positions of differnet requirements force me to stop my current plan to conform the interview. The learning process was stopped unfortunately.

The goal from March to May is:

  • Know R and complish projects in R
  • Use Spark and Hadoop
  • Use AWS
  • Understand RNN Models and other topics in deep learning

With the approach of graduation, I always think about what kind of data scientist I want to be. That is, what industry I want to step in. In my heart, there are two fields which attract me, they are health care and entertainment. This semester, I feel honored to join a capstone group with other three fellow students to work for Highmark Inc. Highmark Inc is one of the largest healthcare providers across U.S. I can really see the potential and promising future of healthcare Ai industry.

The second indutry i am interested in is entertainment. I like movie industry and celebrities. Entertainment is the most way of humans to relax. Understanding audience and please them by analytics is really attractive.

My long term career goal is to be a full stack data scientist. Data Scientist seems a great title but requires a lot of work. I know I am not perfect now and will never be perfect. However, I am willing to improve myself all the time. Data science is life time pursuation. In the future, I wish to gain these skills:

  • ETL(more data engineer work) with SQL practical experience and other tools
  • Big data platform (airflow, redshift)
  • Deep understanding in Machine learning and deep learning
  • Production-level programming
  • Product sense with analytical skills to help business make better decisions
  • Some software and hardware knowledge

To reach these goals, I need to land a good project worked on after graduation. Considering this, I would not pursue jobs only because of brand but the actual projects I would work on. For a new grad, it is important for me gain hands-on experience with real world datasets.

Future is great!

Conclusion

At the end of 2018, I started reading The Count of Monte Cristo. It was a great novel and I could not put it down. I read it on New Year Eve and finished on the first morning of 2019. I was shocked and totally immersed. No words could describe my gain and feeling at that time.

“All human wisdom is contained in these two words - Wait and Hope”

I might not be a good data scientist now, but I will. I believe myself. I believe one day a company will find me and recognize my talent and skills. Meanwhile, I just need to make progress and gradient boost every day. Goodness, it made me recall the great book. Highly recommended.

Keep up the work!