Extracting Data from the National Inpatient Sample

Extracting Data from the National Inpatient Sample

Patricia Cerrito (University of Louisville, USA) and John Cerrito (Kroger Pharmacy, USA)
DOI: 10.4018/978-1-61520-905-7.ch005


In the other type of health care database that we discuss in this chapter, there are multiple columns for each patient observation. It is more difficult to find both the most frequently occurring codes, or to find patients with specific codes for the purpose of extraction. For this reason, many studies focus on the primary diagnosis or procedure. We will provide the programming necessary to find the most frequent codes and to find the patients who have a specific condition. Another aspect of preprocessing we will explore in this chapter using the National Inpatient Sample is that of propensity scoring. When it is not possible to perform a randomized, controlled trial, an attempt is made to emulate such a trial by comparing two observational subgroups. The two groups are matched based upon demographic factors and related patient conditions. It is possible to define a level of patient severity and then to match patients with the severity level as part of the propensity score.
Chapter Preview


The National Inpatient Sample is frequently used to examine health outcomes. While it is very complete concerning inpatient events, it is limited in that there is no longitudinal follow up. For this reason, the National Inpatient Sample is used primarily to examine national trends and patterns. (Berthelsen, 2000; Charytan, Kuntz, Charytan, & Kuntz, 2007; Kabir, et al., 2005; C. G. Patil, et al., 2007; Poleshuck & Poleshuck, 2005) The National Inpatient Sample (NIS) also contains information identified by geographic region and by hospital as well as by hospital type. It contains several patient severity scores that can be used to create a propensity score for matched samples. In addition, it contains detailed information concerning the patient condition and procedures performed while in the hospital. There are up to fifteen columns available to list ICD9 diagnosis codes and another fifteen columns to list ICD9 procedure codes. These columns can be used to define a propensity code. They can also be used to extract subsamples with specific conditions or procedures performed.

Propensity scores are used for many reasons. The most common is to define patient risk scores for death, for disease progression, or for heart disease risk. (Ankle Brachial Index, et al., 2008; Beale, et al., 2008; Eichler, et al.; Karim, et al., 2008; Kulkarni, et al., 2007; May, et al., 2009; Pulitano, et al., 2007; Qin, et al., 2008; Tleyjeh, et al., 2008) However, they are also used to perform case-matching. (Austin & Austin, 2007, 2008b; Falagas, Mourtzoukou, Ntziora, Peppas, & Rafailidis, 2008) Unfortunately, many studies do not provide information as to how propensity is defined. Moreover, there are often errors in the definition of such a propensity score. (Austin, 2008; Blackstone, 2002; Rosenbaum & Rubin, 1984) These errors can give results that are highly misleading. (Austin, Grootendorst, & Anderson, 2006; Austin, Grootendorst, Normand, & Anderson, 2006) The National Inpatient Sample contains several severity scores that can be used for comparison purposes.

One of the biggest problems is that any propensity score should match the severity of the patient’s condition in addition to the basic demographic factors. However, such matching depends upon how the severity level is defined, which often depends upon a separate propensity (or severity) score. As we will explain in detail in Chapter 11, this is no trivial matter. (Patricia B Cerrito, 2009b)

Complete Chapter List

Search this Book: