What’s Data Science?: (Some of) The Basics

Data science is one of those topics that you may have heard mentioned an umpteen number of times but really don’t have a great idea of what it is.  In this article, I’ll give you a general overview on data science and what it looks like in healthcare. Many topics in this article could be full-fledged articles themselves, so please forgive me if I favor a shallow introduction of many topics versus a deep explanation of a single topic. These topics will be expanded on in other pieces for the Pharmacy Informatics Academy, so please give a shout-out for topics you want to hear about most!

data science basics

What Is Data Science (DS)?

Data science is the science of using data. Case closed. Unhelpful? Certainly. Defining DS precisely is made tricky by how broad of a topic data science is and how it compares to some other similar fields you may have heard of. I’d be willing to bet that in addition to DS, you’re familiar with the terms artificial intelligence (AI) and machine learning (ML). The field of statistics is also frequently mentioned alongside these other buzzwords. In my opinion, for the majority of people, there’s little to gain in a deep understanding of the nuances between traditional statistics, DS, AI, and ML. The underlying current of these fields is the use of mathematical and statistical methods to gain insight from data.

data science basics

Instead of a labored comparison between the 4 fields, I will share how they work together in a data science project. The image above highlights how the 4 fields intersect. A way to conceptualize the fields is by grasping how some of them serve as building blocks for the others. In the grand scheme of things, math and statistics are the building blocks of machine learning and machine learning is the building blocks for Data Science and AI. This description is simplistic but an important one to understand before you can dive deeper into any one of the fields. Even though I described each different “step” as a building block, it’s important to realize that even though ML is a building block for DS and AI, ML is not the only building block that makes up DS and AI. For example, another building block for DS and AI is computational resources like cloud computing that allow machine learning models to run faster by using cloud servers or different types of computer processing hardware (like GPUs). A way that these “steps” have interacted in a recent project my team completed was when information gain (math/statistics) was used in a Random Forest algorithm (ML) to predict a patient’s likelihood of going to the hospital in the next 12 months and creating a full tool from that information (DS/AI). As you can see, statistics were a foundational component of the ML model which in turn powered the whole tool for our users, who were Care Managers. I would like to mention that oftentimes, AI and DS are talked about as one and the same. They are not technically the same but they are the two that are most closely related out of the bunch of 4 fields. AI seems to be the more popular term these days but from personal experience working on a data science team at a health system, the work we do could be appropriately labeled as DS or AI. I’m on the side of spending more time creating new tools and worrying less about what we call a certain technology – I hope you’ll join me!

How is DS Used in Healthcare?

DS is continually growing in it’s use cases in healthcare. Few days go by that I don’t see a new advancement in DS and AI tools in our industry. You might be thinking: why does our industry have so many DS resources being pulled into it currently? Healthcare is one of a few industries that usually lag behind others in technological advancements. While other industries have been using DS for decades, healthcare on a broad scale, is just catching up. If you feel like you need a new podcast recommendation (I promise this is related), Pivot is a podcast about technology, business, and society hosted by a legendary technology journalist (Kara Swisher) and a Entrepreneur/Marketing Professor at NYU (Scott Galloway). Recently, the two have made comments about which industries will produce the world’s first trillionaire. A theme to the industries that they have focused on are ones that have been late to adopt technology, specifically climate change, education, and (relevant to this conversation) healthcare. The size and breadth of the healthcare industry will give healthcare a unique opportunity to see a massive expansion of DS and AI companies that could turn into the likes of Google, Microsoft, and Amazon.

artificial intelligence

There are numerous resources available already that can give you an idea of specific use cases of DS in healthcare. For the sake of brevity, I would encourage you to check out some of the good work by Christy Cheung, PharmD & Whitley Yi, PharmD, BCPS who also write for the Pharmacy Informatics Academy. Their recent “A Roadmap to Learning AI for Healthcare” series contains many ideas about how DS (and AI) is being used. One word of warning though, a disconnect that I often see is between potential use cases of DS and what is actually being done on the ground. Something that you’re pretty likely to have come across relating to DS and healthcare is the idea of automating or replacing the jobs of radiologists with AI. Great strides have been made in technologies like this but for boots-on-the-ground data scientists who work at health systems, this use case is vastly different than the type of work we do day-to-day. There’s not a doubt that making radiologists more effective in their job is important but there are many general use cases for DS that are a higher priority for healthcare organizations. A few examples for my organization include predicting which patients will be readmitted to the hospital within 30 days of discharge, which patients won’t show up to their upcoming appointment, and which patients in our many ACO contracts are most likely to go to the ED or hospital in the next 12 months. These use cases are less glamorous than computer vision problems like radiology but nevertheless still impactful. These predictions then allow healthcare professionals to prioritize which patients need more of their attention. In future articles, I would love to write more about the specific use cases I work on, so stay tuned!

What Does Working in Data Science Look Like?

Working in healthcare data science is an extremely rewarding job, especially as a pharmacist. Daily, I get to combine two things I love – programming and healthcare – all while making an impact on how well other HCPs can take care of their patients. The most common ways that data science professionals can work in healthcare are for startups (companies like Human API and Livongo – two of my favorites), legacy technology companies (IBM Watson Health and Microsoft’s healthcare ventures), and health systems (like my employer). These certainly are not the only healthcare companies that employ data science professionals. Data science is much like pharmacy in the way that the skills you have as a professional can translate to a multitude of different roles and companies.

data

A title of “Data Scientist” is one of the most common job titles for a data science professional. Other job titles that signify a similar type of position include Data Science Analyst (my title) and Machine Learning Engineer. Companies will often have their own variation of the titles I’ve already mentioned, which is something to be on the lookout for. For example, a position like mine would more often than not, be labeled as a Junior Data Scientist, a quirk that my employer has that can make comparison tricky. Traditionally a title of Data Analyst is not in the data science realm, but instead more often in the traditional data analytics and reporting realm. Data analytics traditionally tends to focus on analyzing data from the past to quantify progress (clinical measures perhaps), where data science uses the past to predict the future. This can be a common misconception but the work that both groups do is highly important.

The day-to-day work that data science professionals work on generally comes from the needs of the organization. Since data science is predictive in nature, leaders can identify an outcome (usually bad) that is occurring in the organization and task the team with identifying how they can stop the outcome from happening. If the bad outcome can be predicted and instead lead to a good outcome, you can see the power that a team can have in shifting the success of an organization. Health systems, like mine, that participate in Accountable Care Organizations (ACO) are often great examples of places where data science teams thrive. In an ACO, a health system’s financial health depends on providing great care efficiently and cost-effectively. Encounters like hospitalizations and ED visits can add up and put a health system over their allocated reimbursements. A number of the projects that we work on are related to these financial arrangements. One project involves predicting which Medicare Next-Gen (an ACO) patients are most likely to go to the hospital or ED in the next 12 months. The top 5% highest risk patients in this project are labeled as high risk patients and are served up to care managers to enroll in an intensive care management program. Another project predicts hospital readmissions within 30 days of discharge for all patients in the hospital. As you may know, too many readmissions can negatively impact reimbursement in the future. The readmissions initiative (of which data science was a part) have significantly reduced readmissions at my organization. While these examples may seem to focus only on making money for the health system, it’s important to understand that they also lead to better quality of life for patients when multiple teams come together for these projects. By giving better care to patients, who spend less time in the hospital and less time acutely sick, the health system and the patient both win. It’s gratifying to be able to use my training as a healthcare professional to achieve these two beneficial aims.

What Skills are Needed in Data Science?

The skills needed by health data science professionals fall under a few different categories, including math-based knowledge, technical skills, and soft skills. In order to be a high-performing data science professional and a leader on your team, you will need skills in all 3 arenas.

Math and statistics form the foundation of machine learning and data science as previously discussed. While data scientists can have a wide range of background in math and statistics, it’s important for a data science professional to have a basic understanding of the underlying math and statistics for different types of machine learning models. These skills can be learned on a wide variety of platforms. Khan Academy is a well know educational website that has classes on statistics and calculus, the main underlying components of machine learning models. Machine learning classes on platforms like DataCamp and Coursera also generally have lessons on the underlying math and statistics of each model as they teach you about how to use the model in R and Python.

R, SQL, Python, and data visualization platforms (in no particular order) like Tableau (the platform we use at my employer) are the most important technical skills in my day-to-day work. I’d argue that the most underrated technical skill in a data science job is SQL, although it’s probably the least “buzz-wordy” skill of the bunch. SQL is used to get the data we do our work with out of databases in Epic (our EHR). SQL can be an easier language to pick up compared to R and Python but it is vitally important to implement the correct logic in SQL to pull the correct data. Without special attention to how you write SQL, you can massively affect the quality of your tools and machine learning models. R and Python are programming languages we use to take the data returned by SQL and build our ML models that predict an outcome. After we have our models built, we use Tableau to create tools for HCPs to use in their normal workflow. Important to note is that some data scientists in healthcare might use data visualization platforms less than others. In a role on a health system data science team, data viz platforms are vitally important. The tools that I make on a daily basis are used by frontline HCPs and system leaders and require a ton of planning and visualization work. If your tool (created by the data viz platform) is not intuitive, you can have little hope for it to meaningfully change how HCPs do their job. In order to help us make tools that are intuitive, we interview our future end users about what they need and what can be improved in their current processes.

coding

Aside from the technical skills needed to work in data science, there are a number of soft skills that will make someone a more effective data science professional. A number of these are the same skills that make you a successful pharmacist. Just like all of the medication knowledge in the world is useless for a patient unless you can help the patient make sense of it, data science also requires solid communication skills. The ability to communicate effectively with an end user about how the tool was made and what underlying data was used will get you far in the field. Interdisciplinary communication has proven indispensable in my work. My healthcare training has given me a unique trust and relationship with our HCP users, with whom I can relate easier than if I had no prior healthcare knowledge. The critical thinking and problem solving skills of pharmacy education and pharmacy practice are also transferrable to data science. Much like you dig into issues that patients are having with their medication, you have to dig into data and assess how accurate data-points are or what they truly represent.

How Can I Learn More?

There are more data science resources available in the world than any one person possibly has time for. There are bountiful opportunities for “analysis-paralysis” when it comes to data science resources because of the sheer scale of outlets and blogs. Therefore, I think it’s important to find a few sources that you trust and focus most of your learning on them. I will share a few of my favorite resources to start you with and you can build on them as you get deeper into your journey with DS. Two blogs that may be helpful to start with are Towards Data Science and the Health Catalyst company blog. Towards Data Science is the premier blog for data science professionals and tends to focus on individual techniques or coding problems within the data science community. The Health Catalyst (disclosure: my organization uses this company’s software) company blog tends to focus on how health systems can create solutions for their most pressing data problems and also has case studies on how health systems have used data to make services more effective. Their blog highlights data science work as well as traditional data analytics work, both equally as impactful and important to familiarize yourself with.

Websites that are tailored to data science education are excellent places to start learning the technical skills needed in data science. These websites include DataCamp (sponsor me!), Coursera, Udemy, Udacity, and edX. Each platform has it’s pros and cons but are excellent resources. My personal favorite is DataCamp because of the reasonable monthly price and the guided career and skill tracks.

The last way to learn more about data science is to connect with people who do data science and talk to them about their jobs! The data science professionals that I know (myself included) are excited to talk about the work that they do. Reach out to these people on LinkedIn to learn more about the projects that they work on and their thoughts about the future of the field. An easy way to connect with these people is by searching LinkedIn for people with the job title Data Scientist. You can place this in the search bar along with the name of your local health system(s) or the name(s) of companies you would like to work for. Find individuals in the search results and send them a connection request (always send a message along with the request, they are much more likely to respond). I believe that you’ll find that most people will enjoy sharing their work.

product management

Closing

Thank you for taking the time to read my introduction to data science article! DS is an incredibly broad field and due to it’s nature, I’m not able to capture it all in a single article. I hope you’ll reach out to me with any questions you have or ideas on articles relating to data science that you’d like to see in the future. Until next time!