This article was authored by General Assembly’s Divya Venkatraman and originally published by GA here.
Data is everywhere
The amount of data captured and recorded in 2020 is approximately 50 zettabytes, i.e., 50 followed by 21 zeros(!) and it’s constantly growing. Other than data captured from social media platforms, as individuals, we are constantly using devices that measure our health by tracking the number of footsteps, heart rate, sleep, and other physiological signals more regularly. Data analytics has helped greatly to discover patterns in our day-to-day activities and gently nudge us towards better health via everyday exercise and improving our quality of sleep. Just like how we track our health, internet sensors are used on everyday devices such as refrigerators, washing machines, internet routers, lights etc., to not only operate them remotely but also to monitor their functional health and provide analytics that help with troubleshooting in case of failure.
Organizations are capturing data to better understand their products and help their consumers. Industrial plants today are installed with a variety of sensors (accelerometers, thermistors, pressure gauges) that constantly monitor high-valued equipment in order to track their performance and better predict downtime. As internet users, we’ve experienced the convenience that results from capturing our browsing data — better search results on search engines, personalized recommendation on ecommerce websites, structured and organized inboxes, etc. Each of these features is an outcome of data science techniques of information retrieval and machine learning applied on big data.
On the enterprise side, digital transformation such as digital payments and ubiquitous use of software and apps has propelled data generation. With a smart computer in every palm and a plethora of sensors both on commercial and industrial scale, the amount of data generated and captured will continue to explode. This constant generation of data drives new and innovative possibilities for organizations and their consumers through approaches and toolsets rooted in data science.
Data science drives new possibilities
Data science is the study of data aimed towards making informed decisions.
On the one hand, monitoring health data and data analytics is guiding individuals to make better decisions towards their health goals. On the other hand, aggregation of health data at the community level in a convenient and accessible way sets the stage to conduct interdisciplinary research towards answering questions like, Does the amount of physical activity relate to our heart health? Can changes in heart rate over a period of time help predict heart disorders? Is weight loss connected with the quality of our sleep? In the past it was unimaginable to support such research with significant data points. However, today, a decade worth of such big data enables us to drive research on the parameters connected to different aspects of our health. It’s significant that this research is not restricted to laboratories and academic institutions but are instead driven by collaborative efforts between industry and academia.
Due to the infusion of such data, many traditional industries like insurance are getting disrupted. Previously, insurance premiums were calculated based on age and a single medical test that was performed at sign up. Now, there are efforts taken by life insurance providers to lower premiums through regular monitoring of their customers fitness trackers. With access to this big data, insurance providers are trying to understand and quantify health risks. The research efforts described above would drive quantifiable ways to measure overall health risk by fusing a variety of health metrics. All these new products will heavily rely on the use of advanced analytics that uses artificial intelligence and machine learning (AI/ML) techniques to develop models that predict personalized premiums. In order to drive these new possibilities for insights, the application of data science toolsets approaches goes through a rigorous process.
Data science is an interdisciplinary process
A data science process typically starts up with a business problem. Data required to solve the problem can come from multiple sources. Social media data such as text and images from social media platforms like Facebook and Instagram would be compartmentalized from enterprise data such as customer info and their transactions. However, depending on the problem to be solved, all relevant data are collected and can be fused across social media and enterprise domains to gain unique insights to solve the business problem.
A data science generalist works on different data formats and systematically analyses the data to extract insights from it. Data science can be subdivided into several specialized areas based on data format used to extract insights: (1) computer vision, i.e., field of study of image data, (2) natural language processing, i.e. analysis of textual data, (3) time-series processing, i.e. analysis of data varying in time such as stock market, sensor data, etc.
A data scientist specialist is capable of applying advanced machine learning techniques, to convert unstructured data to structured format by extracting the relevant attributes of an entity from unstructured data with great accuracy. No other area has seen the impact of the data science generalist or the specialist more than in the product development lifecycle, across a gamut of organizations of all sizes.
Data scientist as a unifier in the product development lifecycle
The role of a data scientist spans across multiple stages of the product development process. Typically, a product development goes through the stages of envisioning, choosing different features to build and finally, designing those specific features. A data scientist is a unifier across all of these stages in the modern world. Even during the envisioning part, data analysis on the marketing data enables the decision on what features need to be built in terms of the need from the maximal number of customers and from a competitive standpoint.
Once the feature list has been decided, the next step is designing those specific features. Typically, such design activities have been in the realm of designers and to a lesser extent developers. Traditionally, the designer designs features and then makes a judgment call based on user experience studies with a small sample size. However, what might be a good design for 10 users might not be a good design for 90 other users. In such situations, the designers’ judgment cannot necessarily address the entire user base.
Organizations run different experiments to gather systematic data to audit the progress of the product. With data science toolsets, deriving the ground truth no longer needs to be constrained by such traditional design approaches. Based on the nature of the feature design, data from A/B experiment testing can provide input to both developers and designers alike on design options and product decisions that are optimal for the user base.
Data science is the future
The spectrum of the data scientist’s role and contribution is vast. On one end, the data scientist can drive new possibilities through data-backed insights in areas like healthcare, suggest personalization options for users based on their needs, etc. On the other end, the data scientist can drive a cost-based discussion on which feature to design or what optimal option to choose. Data scientists are now the voices of customers throughout the product development process, and the unifiers through an interdisciplinary approach.
Just like making a presentation, editing documents and composing emails have become ubiquitous skills today, data science skills will pervasively be used across different functional roles to make business decisions. With the explosion in the amount of data, the demand for data scientists, data analysts, and big data engineers in the job market will only rise. Organizations are constantly looking for data professionals who can convert data into insights to make better decisions. A career in data science is simulating — the dynamic and ever-evolving nature of the field tied closely with current research keeps one young!