Paper presentations are the heart of a PharmaSUG conference. PharmaSUG 2020 will feature over 200 paper presentations, posters, and hands-on workshops. Papers are organized into 15 academic sections and cover a variety of topics and experience levels.
Note: This information is subject to change. Last updated 06-Jun-2020.
Artificial Intelligence (Machine Learning)
|Paper No.||Author(s)||Paper Title (click for abstract)|
|AI-025||Karen Walker||Gradient Descent: Using SAS for Gradient Boosting|
|AI-058||Surabhi Dutta||Pattern Detection for Monitoring Adverse Events in Clinical Trials - Using Real Time, Real World Data|
& Yuichi Nakajima
|How to let Machine Learn Clinical Data Review as it can Support Reshaping the Future of Clinical Data Cleaning Process|
& Ajith Baby Sadasivan
& Limna Salim
|Automate your Safety tables using Artificial Intelligence & Machine Learning|
Data Visualization and Reporting
|Paper No.||Author(s)||Paper Title (click for abstract)|
|HT-111||Troy Hughes||YO.Mama is Broke 'Cause YO.Daddy is Missing: Autonomously and Responsibly Responding to Missing or Invalid SAS® Data Sets Through Exception Handling Routines|
& Peter Flom
|Why You are Using PROC GLM Too Much (and What You Should Be Using Instead)|
|Paper No.||Author(s)||Paper Title (click for abstract)|
& Louise Hadden
|Are you Ready? Preparing and Planning to Make the Most of your Conference Experience|
|LS-016||Carey Smoak||One Boys’ Dream: Hitting a Homerun in the Bottom of the Ninth Inning|
|LS-037||Jeff Xia||Microsoft OneNote: A Treasure Box for Managers and Programmers|
& Jeff Xia
|An Effective Management Approach for a First-Time Study Lead|
|LS-185||Siva Ramamoorthy||Leadership Lessons from Start-ups|
& Prasanna Sondur
|Building a Strong Remote Working Culture – Statistical Programmers Viewpoint|
|LS-297||Darpreet Kaur||The Art of Work Life Balance.|
|LS-359||Janette Garner||Leading without Authority: Leadership At All Levels|
|LS-364||Jian Hua (Daniel) Huang
& Rajan Vohra
& Andy Chopra
|Project Metrics- a powerful tool that supports workload management and resource planning for Biostats & Programming department.|
|Paper No.||Author(s)||Paper Title (click for abstract)|
|MD-020||Carey Smoak||CDISC Standards for Medical Devices: Historical Perspective and Current Status|
|MD-041||Phil Hall||Successful US Submission of Medical Device Clinical Trial using CDISC|
|MD-360||Shilpakala Vasudevan||An Overview of Medical Device Data Standards|
Real World Evidence and Big Data
|Paper No.||Author(s)||Paper Title (click for abstract)|
|RW-053||Jayanth Iyengar||NHANES Dietary Supplement component: a parallel programming project|
|RW-113||Troy Hughes||Better to Be Mocked Than Half-Cocked: Data Mocking Methods to Support Functional and Performance Testing of SAS® Software|
|RW-192||Tabassum Ambia||Natural History Study – A Gateway to Treat Rare Disease|
Software Demonstrations (Tutorials)
|Paper No.||Author(s)||Paper Title (click for abstract)|
& Justin Slattery
& Hans Gutknecht
|A Single, Centralized, Biometrics Team Focused Collaboration System for Analysis Projects|
Statistics and Analytics
|Paper No.||Author(s)||Paper Title (click for abstract)|
& Sara Shoemaker
& Robert Kleeman & Kate Ostbye
|Moving A Hybrid Organization Towards CDISC Standardization|
& Susan Kramlik
& Suhas Sanjee
|PROC Future Proof;|
& Bill Coar
|Assessing Performance of Risk-based Testing|
& Jake Gallagher
|You down to QC? Yeah, You know me!|
Advanced ProgrammingAP-005 : %MVMODELS: a Macro for Survival and Logistic Analysis
Jeffrey Meyers, Mayo Clinic
The research field of clinical oncology heavily relies on the methods of survival analysis and logistic regression. Analyses involve one or more variables within a model, and multiple models are often compared within subgroups. Results are prominently displayed within either a table or graphically with a forest plot. The MVMODELS macro performs every step for a univariate or multivariate analysis: running the analysis, organizing the results into datasets for printing or plotting, and creating the final output as a table or graph. MVMODELS is capable of running and extracting statistics from multiple models at once, performing subgroup analyses, outputting to most file formats, and contains a large variety of options to customize the final output. The macro MVMODELS is a powerful tool for analyzing and visualizing one or more statistical models.
AP-007 : Quick, Call the "FUZZ": Using Fuzzy Logic
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.
SAS® practitioners are frequently called upon to do a comparison of data between two different data sets and find that the values in synonymous fields do not line up exactly. A second quandary occurs when there is one data source to search for particular values, but those values are contained in character fields in which the values can be represented in myriad different ways. This paper discusses robust, if not warm and fuzzy, techniques for comparing data between, and selecting data in, SAS data sets in not so ideal conditions.
AP-018 : Like, Learn to Love SAS® Like
Louise Hadden, Abt Associates Inc.
How do I like SAS®? Let me count the ways.... There are numerous instances where LIKE or LIKE statements can be used in SAS - and all of them are useful. This paper will walk through such uses of LIKE as: searches and joins with that smooth LIKE operator (and the NOT LIKE operator); the SOUNDS LIKE operator; using the LIKE condition to perform pattern-matching and create variables in PROC SQL; and PROC SQL CREATE TABLE LIKE to create empty data sets with appropriate metadata.
AP-022 : It’s All about the Base—Procedures, Part 2
Jane Eslinger, SAS Institute
“It’s All about the Base—Procedures” (a PharmaSUG 2019 paper) explored the strengths and challenges of commonly used Base SAS® procedures. It also compared each procedure to others that could accomplish similar tasks. This paper takes the comparison further, focusing on the FREQ, MEANS, TABULATE, REPORT, PRINT, and SQL procedures. As a programmer, whether novice or advanced, it is helpful to know when to choose which procedure. The first paper provided best-use cases, and this paper takes it a step further in its discussion of when to choose one procedure over another. It also provides example code to demonstrate how to get the most out of the procedure that you choose.
AP-073 : Fuzzy Matching Programming Techniques Using SAS® Software
Stephen Sloan, Accenture
Kirk Paul Lafler, Software Intelligence Corporation
Data comes in all forms, shapes, sizes and complexities. Stored in files and data sets, SAS® users across industries know all too well that data can be, and often is, problematic and plagued with a variety of issues. Two data files can be joined without a problem when they have identifiers with unique values. However, many files do not have unique identifiers, or “keys”, and need to be joined by character values, like names or E-mail addresses. These identifiers might be spelled differently, or use different abbreviation or capitalization protocols. This paper illustrates data sets containing a sampling of data issues, popular data cleaning and user-defined validation techniques, data transformation techniques, traditional merge and join techniques, the introduction to the application of different SAS character-handling functions for phonetic matching, including SOUNDEX, SPEDIS, COMPLEV, and COMPGED, and an assortment of SAS programming techniques to resolve key identifier issues and to successfully merge, join and match less than perfect, or “messy” data. Although the programming techniques are illustrated using SAS code, many, if not most, of the techniques can be applied to any software platform that supports character-handling.
AP-083 : Sometimes SQL Really Is Better: A Beginner's Guide to SQL Coding for DATA Step Users
Brett Jepson, Rho Inc.
Structured Query Language (SQL) in SAS® provides not only a powerful way to manipulate your data, it enables users to perform programming tasks in a clean and concise way that would otherwise require multiple DATA steps, SORT procedures, and other summary statistical procedures. Often, SAS users use SQL for only specific tasks with which they are comfortable. They do not explore its full capabilities due to their unfamiliarity with SQL. This presentation introduces SQL to the SQL novice in a way that attempts to overcome this barrier by comparing SQL with more familiar DATA step and PROC SORT methods, including a discussion of tasks that are done more efficiently and accurately using SQL and tasks that are best left to DATA steps.
AP-090 : Dating for SAS Programmers
Josh Horstman, Nested Loop Consulting
Every SAS programmer needs to know how to get a date... no, not that kind of date. This paper will cover the fundamentals of working with SAS date values, time values, and date/time values. Topics will include constructing date and time values from their individual pieces, extracting their constituent elements, and converting between various types of dates. We'll also explore the extensive library of built-in SAS functions, formats, and informats for working with dates and times using in-depth examples. Finally, you'll learn how to answer that age-old question... when is Easter next year?
AP-093 : One Macro to create more flexible Macro Arrays and simplify coding
Siqi Huang, Boehringer Ingelheim
The purpose of using macro array is to make it easier to repetitively execute SAS code. Macro array is defined as a list of macro variables sharing the same given prefix and a numeric suffix, such as A1, A2, A3, etc., plus an additional macro variable with a suffix of “N” containing the length of the array. In this paper, I will introduce a %MAC_ARRAY macro, which provides a more flexible way to create a macro array from any given list of values, or from any selected variable in a dataset, either character or numeric. The application of this macro array is broad as well, including but not limited to: 1) creating similar datasets; 2) stacking multiple datasets; 3) repeating same calculation among multiple variables in the datasets; 3) automatically updating parameters used in other macros. To sum up, %MAC_ARRAY macro can easily keep your code neat and improve program efficiency.
AP-129 : Shell Script automation for SDTM/ADaM and TLGs
Pradeep Acharya, Ephicacy Lifescience Analytics
Aniket Patil, Pfizer Inc
In clinical trials, every study has multiple datasets (SDTM/ADaM) and several number of tables, listings and figures (TLFs). Throughout the study, there are many instances where you are required to re-run the datasets due to new incoming data or updates in key datasets such as subject level dataset (ADSL), which results in refresh of all other datasets and TLFs to maintain timestamp consistency. One of the approaches is to run the datasets and TLFs one after the other consuming a significant amount of time for program execution. Another commonly used method is to manually create an executable batch script file that includes all the program names. The drawback in these is you can inadvertently miss any of the dataset/TLG program execution due to manual error, and it is cumbersome to go through the logs and outputs to identify which program has been missed. To overcome such situations and reduce repetitive tasks, we have developed an automated program which creates the shell script by reading the program names directly from the designated study folder for all the datasets and/or TLGs, thereby eliminating any chances of missing any program. This program also provides flexibility to decide the order in which your programs will be run as is generally the case with SDTM/ADaM programs. If there is one to one relation between programs and output for TLFs more functionality can be added to cross check the numbers and confirm a smooth run. Having this facility frees up programmer time for more productive work.
AP-136 : Transpose Procedure: Turning it Around Again
Janet Stuelpner, SAS
In the life science industry, CDISC standards dictate that we keep the data for several domains in a vertical format. It is very efficient to have this format to store the data in this way as there is less waste of space. In order to create our tables, listings and figures, we need to transform or transpose the data into a horizontal format. This is a more efficient way to analyze the data. There are many ways in which this can be done. The purpose of the Transpose Procedure is to reshape the data so that it can be stored as needed and then analyzed in the easiest way possible. PROC TRANSPOSE is the easiest and most complex procedure in SAS. It has only has five options. This paper will revisit how to change the format of the data. You will be taken from the easiest way of doing the transformation without any options to a more complex manner using many options.
AP-240 : Next Level Programming-Reusability and Importance of Custom Checks
Akhil Vijayan, Genpro Research Inc.
Limna Salim, Genpro Life Sciences
SDTM-domain structures and relationships are similar across studies under a therapeutic area which leads to code standardization and reusability especially within ISS/ISE submissions. Interim data transfers also come with changes in data leading to rerun of existing programs with minor updates. The possibility of errors in such scenarios are large with truncation in data, new data issues being unidentified, attribute changes etc. This paper details the importance of using standard macros and alternative programming approaches like enabling custom checks/warnings that makes reusability of programs a much smoother process. Not all Data Issues are identified at the initial stage of Source Data Validation, but they tend to surface during the development of CDISC datasets. In addition, certain data issues identified at the initial run might not always be necessarily resolved in the next data transfer. This is where custom errors / warnings play a significant role. Similarly, in case of Statistical programming, a standard code might be replicated for various TFLs with changes only to the parameters considered. In such cases, specific custom checks based on parameters also comes into importance. This paper discusses various situations with examples where user defined errors/warnings can be implemented, like -Validation of subject included alongside DM data -Verifying the length of source variable (ensuring data after 200 characters are successfully mapped to SUPP) -Verifying the baseline flags populated after derivation The paper also discusses the need for standard macros which support custom checks like, macros for -Source data variable length check -Automating formats -Attribute generation
AP-251 : How to Achieve More with Less Code
Timothy Harrington, Navitas Data Sciences
One of the goals of a SAS Programmer should be to achieve a required result with the minimal amount of code. The two reasons for "code complexity" are: one, too many lines of code, and two: code which is unduly difficult to decipher, for example a large number of nested operations, pairs of parentheses, operators, and symbols packed closely together. This paper describes methods for reducing the amount and complexity of SAS code, and for avoiding repetition of code. Included are examples of SAS code using v9.2 or later functions such as IFN, IFC, CHOOSE, WHICH, and LAG. This content and discussion is primarily intended for beginner and intermediate SAS users.
AP-255 : Python-izing the SAS Programmer 2: Objects, Data Processing, and XML
Mike Molter, PRA Health Sciences
As a long-time SAS programmer curious about what other languages have to offer, I cannot deny that the leap from SAS to the object-oriented world is not a small one to be taken lightly. Anyone looking for superficial differences in syntax and keywords will soon see that something more fundamental is at play. Have no fear though, for languages such as Python have plenty of similarities to give the SAS programmer a strong base of knowledge from which to start their education. In this sequel to an earlier paper I wrote, we will explore Python approaches to programming tasks common in our industry, taking every opportunity to expose their similarities to SAS approaches. After an introduction to objects, we’ll see the many ways that Python can manipulate data, all of which will look familiar to SAS programmers. With a solid working knowledge of objects, we’ll then see how easy object-oriented programming makes the generation of common industry XML. This paper is intended for SAS programmers of all levels with a curiosity about, and an open mind to something slightly beyond our everyday world.
AP-264 : Hidden Gems in SAS Editor: Old Wine in New Bottle
Ramki Muthu, Senior SAS Programmer
Clinical SAS programmers tend to spend more than 90% of their work time using the SAS editor, and it is imperative to maximize the available functioanlities for their routine task. While most of the programmers are already familiar handling the SAS editor, and the topic was reviewed earlier, this paper attempts to hunt some hidden (or under utilized) features in enhanced SAS editor window for Windows. Briefly, starting with few common etiquettes this paper discuses: Summary of keyboard shortcuts that programmers should be aware, Window management to make a split view of a program and utilizing tile vertical options, Using command bars to display required variables from a dataset, subsetting the records and precisely moving to the desired record (e.g. from 1st to 28474th record) using the command prompt. With few examples, it also discusses how we could utilize keyboard macros for common tasks and to learn new (or unfamiliar) SAS procedures.
AP-278 : SAS Formats: Same Name, Different Definitions FORMAT-ters of Inconvenience
Jackie Fitzpatrick, SCHARP
It’s always best to catch data discrepancies early in the analysis, especially with the format library. Background: Our organization receives behavioral questionnaire data from outside sources as SPSS sav files. Typically, they use question number as their variable name. Now, this seems like a great idea, but question number one in section B at Baseline may not be the same for the following study visits. For example, the question could have 1 as Yes, 2 as No, but the following visits use 1 as Once, 2 as Twice, and 3 as Three or More To find discrepancies between formats in a quick and efficient manner, my solution consists of two programs: one that finds discrepancies and creates an Excel spreadsheet to show them with the second program helping you fix the discrepancies in a proficient manner. In conclusion, this process will save time, resources, and hair pulling.
Applications DevelopmentAD-046 : Normal is Boring, Let’s be Shiny: Managing Projects in Statistical Programming Using the RStudio® Shiny® App
Girish Kankipati, Seattle Genetics Inc
Hao Meng, Seattle Genetics Inc
The ability to deliver projects on schedule, within budget, and aligned with business goals is key to gaining an edge in today’s fast-paced pharmaceutical and biotechnology industry. Project management plays an important role to achieve key milestones with high quality and optimal efficiency. While successful project management is an application of processes, methods, skills, and expertise, tools like SAS®, RStudio®, and PythonTM can help track project status and better position the lead programmer to allocate appropriate resources. One suitable app that fits the specific needs for status tracking in Statistical Programming is R Shiny®, a flexible user-friendly tool that can present an instant study status overview in graphical format with minimal coding and maintenance effort. This paper introduces an innovative design to track study status through an RStudio® Shiny® app that is interactive and reusable and can present status on demand. Based on simple server metadata, we can display a graphical representation of not only the total number of SDTM, ADaM, and TLFs that have been programmed and validated, but also the trend in progress to date to help lead programmers and statisticians determine any resource adjustments based on timely and effective status reporting that refreshes on a monthly, weekly, or daily basis to monitor ongoing study progress. Sample project tracker metadata, the high-level inner workings of our Shiny® app, and Shiny® graphs will be discussed in depth in this paper.
AD-052 : ‘This Is Not the Date We Need. Let’s Backdate’: An Approach to Derive First Disease Progression Date in Solid Tumor Trials
Girish Kankipati, Seattle Genetics Inc
Boxun Zhang, Seattle Genetics
A time-to-event analysis, such as duration of response or progression-free survival, is an important component of assessments in oncology clinical trials. Derivation of true response dates is a key feature in developing ADRS and ADTTE ADaM data sets and a solid understanding of how such dates are derived is therefore critical for statistical programmers. This paper discusses hands-on examples in solid tumor trials based on investigator assessment using RECIST 1.1 and presents SAS® programming techniques that are usually implemented in ADRS. Response date derivations are categorized as below: 1. Overall response date derivation: a. When an overall response is disease progression (PD): when at least one target, non-target, or new lesion shows PD, the date of PD is derived from the earliest of all scan dates that showed PD b. When an overall response is equivocal progression: when at least one non-target or new lesion shows equivocal progression, the date of PD will be the earliest of all scan dates that showed equivocal progression c. When an overall response is other than PD or equivocal progression: the response date is derived from the latest of all radiologic scan dates for the given response assessments 2. Progression backdating: If a new lesion progressed to unequivocal right after equivocal assessments, or a non-target lesion progressed to disease progression right after equivocal progression assessments, backdate the progression event to the earliest scan where the new or non-target lesion was assessed as equivocal progression
AD-055 : A Set of VBA Macros to Compare RTF Files in a Batch
Jeff Xia, Merck
Shunbing Zhao, Merck & Co.
Post database lock changes in a clinical trial are impactful and can result in significant rework. Statistical programmers must access the updated data and regenerate the TLFs for a CSR to maintain data and result traceability. This paper briefly discusses the challenges in comparing two sets of RTF files before and after the post database lock changes, each set might contain tens or hundreds of files, and provides an elegant solution based on VBA technology. Three standalone VBA macros were developed to perform this essential task: 1) Compare: programmatically compare each RTF file in two different versions and record the track change(s); 2) Find_Change: scan every RTF file and see whether there are one or more changes between these two versions and produce a report to show whether a file has been changed or remains the same; 3) Change_details: scan every RTF file and for all RTFs with an update provide a report with all the details in the track changes, i.e., what text was inserted, or deleted, etc. Each VBA macro will be described in this paper and reviewed with examples.
AD-071 : Using SAS ® to Create a Build Combinations Tool to Support Modularity
Stephen Sloan, Accenture
With SAS ® procedures we can use a combination of a manufacturing Bill of Materials and a sales specification document to calculate the total number of configurations of a product that are potentially available for sale. This will allow the organization to increase modularity and reduce cost while focusing on the most profitable configurations with maximum efficiency. Since some options might require or preclude other options, the result is more complex than a straight multiplication of the numbers of available options. Through judicious use of SAS procedures, we can maintain accuracy while reducing the time, space, and complexity involved in the calculations.
AD-084 : Metadata-driven Modular Macro Design for SDTM and ADaM
Ellen Lin, Seattle Genetics, Inc.
Aditya Tella, Seattle Genetics, Inc.
Yeshashwini Chenna, Seattle Genetics, Inc.
Michael Hagendoorn, Seattle Genetics, Inc.
In clinical trial data analyses, macros are often used in a modular or building-block fashion to standardize common data transformations in SDTM or derivations in ADaM to heighten efficiency and quality across studies. A traditional design of such macros is the parameter-driven approach, which provides users with many parameters to control differences on input, output, and data processing from study to study. There are limitations with this kind of design, for example the macro may stop working when unexpected differences arise beyond the scope controlled by parameters, or a lengthy revision and documentation cycle is needed to rework parameter-driven code once more variations are introduced by new studies. Metadata-driven programming is a much more dynamic approach for macro design, especially when targeting areas where differences between studies are less predictable. This design allows key portions of logic and processing to be driven by study-specific metadata maintained outside the macro instead of being exclusively controlled by parameters. It also greatly simplifies user input through parameters and opens the door for a robust and stable macro library at department level. This paper will describe the metadata-driven approach in detail, discuss important considerations on design of such metadata to allow sufficient flexibility, and explain several optimal programming techniques for it. We will also provide real-world data processing examples where a traditional parameter-driven macro would be challenging and the metadata-driven approach fits much better.
AD-088 : Demographic Table and Subgroup Summary Macro %TABLEN
Jeffrey Meyers, Mayo Clinic
Clinical trial publications frequently allocate at least one of the allotted tables to summarize the demographics, stratification factors, and other variables of interest of the patients involved with the study. These tables generally include basic distribution information such as frequencies, means, medians, and ranges while also comparing these distributions across a key factor, such as treatment arm, to determine whether there was any imbalance in patient populations when doing analysis. These distributions are not difficult to compute and combine into a table, but as treatments become more specific to patient characteristics such as genetic biomarkers and tumor stages there is a need to be able to display the distributions in subgroups. The macro %TABLEN is a tool developed to compute distribution statistics for continuous variables, date variables, discrete variables, univariate survival time-to-event variables, and univariate logistic regression variables and combine them into a table for publication. %TABLEN has multiple features for comparisons, subgrouping, and outputting to multiple destinations such as RTF and PDF. The macro %TABLEN is a valuable tool for any programmer summarizing patient level data.
AD-106 : Macro To Produce Sas®-Readable Table Of Content From Tlf Shells
Igor Goldfarb, Accenture
Ella Zelichonok, Naxion
The goal of this work is to develop a macro that automates a process of reading the shells for tables, listings and figures (TLF) and transforming them into SAS®-readable ordered table of content (TOC). The proposed tool can significantly save time for biostatisticians and lead programmers who have numerously create, revise, update shells documents for TLF. Development of the shells for TLF for any clinical trial is a time-consuming task. Copying titles and footnotes from the shell document (typically MS Word file) into SAS® program or external source of titles and footnotes (e.g., Excel file) is a tedious process requiring scrupulous work subject to human errors. The proposed macro (developed in Excel VBA) automates this process. It reads the shell document (Word) and creates/updates SAS®-readable ordered TOC (Excel) in a matter of seconds. The macro identifies common part of the titles and footnotes for all TLF as well as detects differences for specific outputs. The developed tool analyzes all the requests for repeating outputs, updates their sequential numbers and titles and adds them to the TOC. It also allows to change population if required for repeating tables/figures. Finally, the macro generates an Excel file containing ordered TOC that is immediately ready to be used for the final run of SAS® programs to output planned TLF. Any further updates in the shell document can be incorporated in TOC simply by rerunning this macro.
AD-114 : Chasing Master Data Interoperability: Facilitating Master Data Management Objectives Through CSV Control Tables that Contain Data Rules that Support SAS® and Python Data-Driven Software Design
Troy Hughes, Datmesis Analytics
Control tables are the tabular data structures that contain control data—the data that direct software execution and which can prescribe dynamic software functionality. Control tables offer a preferred alternative to hardcoded conditional logic statements, which require code customization to modify. Thus, control tables can dramatically improve software maintainability and configurability by empowering developers and, in some cases, nontechnical end users to alter software functionality without modifying code. Moreover, when control tables are maintained within canonical data structures such as comma-separated values (CSV) files, they furthermore facilitate master data interoperability by enabling one control table to drive not only SAS software but also non-SAS applications. This text introduces a reusable method that preloads CSV control tables into SAS temporary arrays to facilitate the evaluation of business rules and other data rules within SAS data sets. To demonstrate the interoperability of canonical data structures, including CSV control tables, functionally equivalent Python programs also ingest these control tables. Master data management (MDM) objectives are facilitated because only one instance of the master data—the control table, and single source of the truth—is maintained, yet it can drive limitless processes across varied applications and software languages. Finally, when data rules must be modified, the control data within the control table must be changed only once to effect corresponding changes in all derivative uses of those master data.
AD-130 : A Novel Solution for Converting Case Report Form Data to SDTM using Configurable Transformations
Sara Shoemaker, Fred Hutch / SCHARP
Matthew Martin, Fred Hutch / SCHARP
Robert Kleemann, Fred Hutch / SCHARP
David Costanzo, Fred Hutch / SCHARP
Tobin Stelling, Fred Hutch / SCHARP
Converting case report form (CRF) data to SDTM is a complicated process, even when data are collected in CDASH format. Conversion requires many dataset manipulations and demands flexibility in the order of execution. In addition, different domain types require different actions on the source data, e.g. Findings domains require transposition of data records. Many conversion solutions address this complexity by performing data pre-processing, mapping, and post-processing as disparate pipeline sections. Most include programming in a language such as SAS where blocks of code can obscure the details of the transformation from data customers. This paper describes a solution for SDTM conversion that uses a method termed “Configurable Transformations”. This model both achieves conversion using one consistent pipeline for all phases from CRF data to SDTM and provides visibility into the data transformations for non-programmers. This is achieved by a human readable configuration that uses a small set of simple transformation step types to produce derived datasets. These resulting configurations can be defined by data analysts and can be understood by data customers. Our group was able to successfully map and convert all CRF data for an HIV prevention study using this model with no need for procedural code. This paper will go into the details of the Configurable Transformation model and discuss our use of it in converting data for a study.
AD-142 : Data Library Comparison Macro %COMPARE_ALL
Jeffrey Meyers, Mayo Clinic
Reproducible research and sharing of data with repositories are becoming more standard, and so the freezing of data for specific analyses is more crucial than ever before. Maintaining multiple data freezes requires knowing what changed within the data from one version to another. In SAS there is the COMPARE procedure that allows the user to compare two datasets to see potential new variables, lost variables, and changes in values. Relying on the COMPARE procedure can be tedious and cumbersome when maintaining a database containing several datasets. The COMPARE_ALL macro was written to ease this burden by generating a Microsoft Excel report of a comparison of two data libraries instead of just two datasets. The report indicates any new or lost datasets, variables or observations and checks for changed data values within all variables. Multiple ID variables can be specified and the macro will determine which variables are relevant with each dataset for comparison. The COMPARE_ALL macro is a fantastic tool for managing multiple versions of the same SAS database.
AD-146 : MKADRGM: A Macro to Transform Drug-Level SDTM Data into Traceable, Regimen-Level ADaM Data Sets
Sara McCallum, Harvard T.H. Chan School of Public Health, Center for Biostatistics in AIDS Research (CBAR)
In our support of the NIH funded AIDS Clinical Trials Group (ACTG) and International Maternal Pediatric Adolescent AIDS Clinical Trials (IMPAACT) networks, participants concurrently take multiple medications for the treatment of HIV and other diseases in many of our studies. For analysis and presentation to investigators, our statisticians find an ADaM data set that aggregates all drugs in a regimen at a given time into a single record, where that record represents the period of time those drugs were taken, is needed to summarize regimens, identify regimen changes or treatment gaps, and to aid in explaining other events on study (e.g., occurrence of adverse events or emergence of drug resistance). This paper will present our organization’s SAS macro, MKADRGM. The macro accepts WHO Drug coded CDISC domains recorded at the drug level, and outputs an ADaM data set at the regimen level with start and stop dates. Other variables in the analysis data set include the regimen duration, drug counts, and drug class information. An intermediate data set is used to facilitate traceability to the source SDTM domain(s). This macro was developed for HIV regimens, but also works with other therapeutic areas our organization is involved with, including TB, HCV, and HBV. This paper will highlight some of the inner workings of the macro, and demonstrate the novel usage of the SRC* triplicate (SRCVAR, SRCDOM, and SRCSEQ) to provide traceability even when multiple source SDTM records and domains are mapped to a single regimen in the analysis data set.
AD-171 : Clinical Database Metadata Quality Control: Two Approaches using SAS and Python
Craig Chin, Fred Hutch
Lawrence Madziwa, Fred Hutch
Clinical Database Metadata Quality Control: Two Approaches using SAS and Python A well-designed clinical database requires well-defined specifications for Case Report Forms, Field Attributes, and Data Dictionaries. The specifications are passed on to the Electronic Data Capture Programmers, who program the clinical database for data collection. How can a study team ensure that the source specifications are complete, and the resulting clinical database metadata match the source specifications? This paper presents two approaches for comparing metadata between source specifications and resultant clinical database. Initially, SAS was used to read in the specifications and clinical database metadata and to provide comparison checks. Then, a new project in Python was initiated in order to build a more user-friendly tool that allows customers to run the checks themselves. This project made the study database development a more efficient process and improved the quality of our clinical databases.
AD-208 : Programming Patient Narratives Using Microsoft Word XML files
Yuping Wu, PRA Health Science
Jan Skowronski, Genmal A/S
Patient narratives are an important component of a clinical study report (CSR). The ICH-E3 guidelines require a brief narrative to describe each death, each serious adverse event and other significant adverse events that are judged to be of special interest because of clinical importance. Different from individual tables and listings in a CSR, patient profiles present most, if not all, of the information collected for a subject over the course of a clinical trial. Such documents typically integrate the various sources of data into one cohesive output with each part present in different formats which pose challenging problems for both programmers and medical writers. This paper introduces a new method that uses SAS together with Office Open XML. The narrative Word templates layout is created separately with properly tagged cells. The Word file is then broken up into its XML parts. Read and updated by SAS with the clinical data and finally saved back into its original format which now contains a populated narrative.
AD-296 : SAS Migration from Unix to Windows and Back
Valerie Williams, ICON Clinical Research
SAS® software has been a mainstay for performing analysis and reporting (A&R) in the pharmaceutical industry, for the past four decades. SAS occasionally releases new versions of their software with code enhancements and bug fixes to SAS® functions, procedures, and/or elements. In order to exploit those changes, SAS users may need to upgrade their programming environment and migrate their SAS programs and data to a new platform. Migrating A&R directories across operating systems has proven to be the most challenging, of any migration process, but much can be learned from migrating to a different operating system twice. Resources from SAS, IT, and all business units that use or plan to use SAS must work together to perform the appropriate level of testing and documentation, to ensure that proper directory permissions are in place and that the new environment is fit for purpose, without affecting business unit delivery timelines. This paper will endeavor describe the process and pitfalls of migrating SAS programs and data across SAS versions, SAS platforms (native SAS to SAS GRID), and operating systems, to a mid-level SAS programmer, and provide a few helpful tips to avoid problems, when migrating.
AD-298 : RStats: A R-Shiny application for statistical analysis
Sean Yang, Syneos Health Clinical
Hrideep Antony, Syneos Health USA
Aman Bahl, Syneos Health
This paper will introduce RStats application which is an interactive and dynamic R-Shiny based application that can perform popular statistical analysis models that are frequently used in clinical trials. The application as shown in figure 1, can perform even the advanced analytics such as Logistic regression, Survival analysis, Bootstrap Confidence interval, Anova/Ancova, etc., along with the summary plots for users with no prior R programming or even with limited statistical knowledge. This application also eliminates the need for numerous lines of programming effort to create a similar statistical analysis. As with any software program, there usually is more than one way to do things through R. The RStats application in this paper is one of the ways to perform these statistical analyses. The process flow diagram in Figure 1 explains the steps to perform the statistical analysis using this application: Step1: Import the dataset for the analysis Step2: Select the dependent, independent variables and treatment variable needed for analysis. Step3: Choose the right statistical model Step4: The output and plots are displayed with the option to save the outputs in an external folder location. The advantage of using R-Shiny is the interface that can be easily accessed using the Web. Shiny allows R users to put data insights into the hands of the decision-makers while providing a user-friendly framework that does not require any additional toolsets.
AD-308 : Using Data-Driven Python to Automate and Monitor SAS Jobs
Julie Stofel, Fred Hutchinson Cancer Research Center
This paper describes how to integrate Python and SAS to run, evaluate, and report on multiple SAS programs with a single Python script. It discusses running SAS programs in multiple environments (latin-1 or UTF-8 encoding; command-line or cron submission) and ways to avoid potential Python version issues. A handy SAS-to-Python function guide is provided to help SAS programmers new to Python find the appropriate Python method for a variety of common tasks. Methods to find and search files, run SAS code, read SAS data sets and formats, return program status, and send formatted emails are demonstrated in Step-by-Step instructions. The full Python script is provided in the Appendix.
AD-341 : Opening Doors for Automation with Python and REST: A SharePoint Example
A large part of increasing the efficiency of a process is finding new ways to automate. If manual components of a process can be replaced with automation, those components can then move to the background where they can be validated and trusted to be reliable. Large swaths of data sit within company intranets or other services that may seem ‘disconnected’ – but they may be more connected than one might think. This paper will explore extending automation capabilities with Python and REST APIs, using SharePoint as a primary example. Some topics covered will include using Python to make web requests, the basics of authentication, and how to interact with REST APIs. The paper will demonstrate how data repositories that may seem disconnected can be integrated into automated processes, opening doors for new data pipelines and data sources to pave the way for process improvements, efficiency increases, and better information.
AD-347 : A Tool to Organize SAS Programs, Output and More for a Clinical Study
Yang Gao, Merck & Co., Inc.
Depending on the complexity of a study, there can be many programs and output to manage and review or even rerun due to new data or requirements. Programmers must navigate to different program subfolders to select and run the affected programs and revalidate as required. This paper presents a utility macro which can substantially reduce the resource and time spent on this task. This utility macro captures all SAS programs or output from a specified folder and saves them in a permanent excel file. The resulting excel file provides the path and associated file link. The embedded hyperlinks for these SAS programs and output saves time manually navigating to individual folders and subfolders. In addition, this macro includes functionality to list the output folders, such as rtf and log, with the corresponding SAS programs in the same spreadsheet. Programmers can quickly find the corresponding programs and outputs. The hyperlinks facilitate output review for programmers and statisticians when there is a need for updates. This utility can also generate a SAS program to enable a batch run of the programs without needing to hard code the program names. This utility macro improves programming efficiency by reducing manual effort and decreasing errors during the process. This utility is also useful for supporting multiple studies and reviewing work completed by a partner.
Artificial Intelligence (Machine Learning)AI-025 : Gradient Descent: Using SAS for Gradient Boosting
Karen Walker, Walker Consulting LLC
Much of applied mathematics is centered around evaluation of ERROR in effort to reduce it during computations. This paper will illustrate mathematics equations applied to thousands of calculations. Then show algorithms that are proven to optimize these calculations. It will show ways to reduce the error through a gradient descent algorithm and render it to a SAS program. Next, this paper will use the gradient descent program to find the smallest ERROR when matching two models and subsequently introduce the SAS Viya GRADBOOST procedure. Finally it will prove how more precision is achieved using gradient descent with gradient boosting and show why this is so important to convolutional neural network models.
AI-058 : Pattern Detection for Monitoring Adverse Events in Clinical Trials - Using Real Time, Real World Data
Surabhi Dutta, EG Life Sciences
Patient care involves data capture from disparate sources of care delivery. This includes clinical trials data, sensor data from wearable sensors, hand held devices and Electronic Health Records. We are accustomed to devices that generate health indicator data in large volumes and rapid rate. This paper discusses the benefits, challenges, methods of utilizing this real-time data for pattern detection using machine learning algorithms. This will be done using real world data that has been standardized and integrated during clinical trials. In our previous years paper we had discussed about “Merging Sensor Data in Clinical Trials” (https://www.pharmasug.org/proceedings/2019/PO/PharmaSUG-2019-PO-305.pdf). The paper dealt with standardizing clinical trials data and making it ready to use for analysis. This year we will delve deeper in to using the sensors data for Pattern Recognition and AE Monitoring by classifying and segregating specific group of high risk patients, participating in clinical trials, right from first subject first data point. This kind of pattern detection will also be used in Patient Profiling and predicting risk factors for high risk patients in trials. Challenges with current method of Patient Monitoring in Clinical Trials: Any drug related AE’s are usually documented at the end of the episode. This poses significant time and monitory risks for sponsors for ensuring patient safety, achieving drug efficacy and conducting the trials in a timely manner. This paper would explore Patient Profiling using Machine learning techniques like Clustering and PCA to segregate high risk patients for close monitoring and using predictive analytics for visit based monitoring.
AI-224 : How to let Machine Learn Clinical Data Review as it can Support Reshaping the Future of Clinical Data Cleaning Process
Mirai Kikawa, Novartis Pharma K.K.
Yuichi Nakajima, Novartis
Technology utilized in pharmaceutical industry has been evolving. There are a lot of innovative new technology such as Artificial Intelligence, Machine Learning, Digitization, Blockchain, Big Data, Open Source Software, etc., which can build a new era of clinical drug development. Manual data review is one of the required processes to ensure clinical data cleanness and readiness for analysis that are essential for patient safety and reliability of the submission documents. Manual data review process involves several roles of people such as Data Manager, Clinical and/or Medical Reviewer, Safety Reviewer, etc. Since it requires complicated logical thinking and clinical and medical knowledge and expertise, it has to be “manual”. That has been the common understanding, and thus the traditional approach. However, does it have to keep being true? In recent years, clinical data collected during clinical trials have been structured and standardized by industrial efforts such as introduction of CDISC and standard operational process by each pharmaceutical company. The structured and standardized data across clinical trials increases compatibility of data utilization, which enables more robust approach for data review. It can be fed into Machine Learning using Python, which is one of the ideas to break the traditional approach and reshape the future of clinical data cleaning. This paper proposes a potential way to let machine learn clinical data review using Python.
AI-242 : Automate your Safety tables using Artificial Intelligence & Machine Learning
Roshan Stanly, Genpro Research Inc.
Ajith Baby Sadasivan, Genpro Research
Limna Salim, Genpro Life Sciences
For FDA submissions, a common reporting strategy for analysis is to create tables which specify the output of each statistical analysis. One such set of tables display the counts and statistics of subject’s safety parameters such as adverse events, laboratory results, vital signs etc. With the advent of big data technologies and high throughput computing resources, large complex data sets can be easily analyzed and safety table generation can be automated. This paper explores the possibilities of developing a dynamic software framework using Angular JS, SAS®, R, Python® and Named Entity Recognition (NER) model for easy and effective analysis of safety data. The input files needed are just the standardized ADaM datasets. The system has standardized templates for most of the safety tables which will vary depending upon the study design (such as single arm, multiple arm, cross over etc). As a first step the tables shells are selected by the user from the various templates offered. Next, we extract the contents like Titles, Headers, Parameters & Sub-Parameters, Statistics, Footnotes etc as Specific entities from the table shell and create a csv file using Python. A map file is also created alongside using Artificial Intelligence and Machine learning which will specify which all variables are to be considered for each parameter, header or descriptive statistics generation from the ADaM datasets. Then there are standards macros written in SAS which will take both the CSV files and ADaM datasets as input and generate the final table in rtf format.
Data StandardsDS-023 : Data Transformation: Best Practices for When to Transform Your Data
Janet Stuelpner, SAS
Olivier Bouchard, SAS
Mira Shapiro, Analytic Designers LLC
When is the best time to create the CDISC standard data? This has been debated for many, many years. Some say that it should be done at the very end of the study before the protocol is submitted. Some say to transform the data at the very beginning of the study as subjects start to enroll. And some do it as needed as the study is enrolling, the data is being cleaned and the shells of the tables, listings and figures are in the process of creation. This is a great forum for experts in the field to give their opinion as to how and when to perform the transformation into CDISC format.
DS-031 : Ensuring Consistency Across CDISC Dataset Programming Processes
Jennifer Fulton, Westat
Whether you work for a small start-up, a mid-level CRO, or Fortune 500 biopharma company, CDISC compliance is a daunting prospect at the start of any project and requires creative solutions and teamwork. Consistency is the hallmark of a CDISC project and was the impetus for the formation of the consortium. Each CDISC project should be approached consistently as well, leading to improved accuracy and productivity, regardless of staff skill and experience. This approach leads to quality final product and FDA approval, ultimate goal. Westat approached this goal by developing an overarching CDISC development and delivery checklist. It provides a visual of the scope of a CDISC project, organizes work instructions and templates for users, and helps assure critical steps are not missed. Like the 26 miles in a marathon, our paper will lay out 26 steps to CDISC compliance, along with tools and techniques we have developed, to help others reach the finish line.
DS-062 : Untangling the Subject Elements Domain
Christine McNichol, Covance
The Subject Elements (SE) domain is unique and challenging in its sources and mapping. Without much encouragement, a map of the sources, derivations and interaction between records in SE can start to look more like a tangled mass of spaghetti than common linear SDTM mapping. Why is this? SE can have multiple sources for the data points needed to derive each element’s start and end. Compared to other domains, SE also has a good deal more direct correlation to values in other domains. For successful implementation, it is critical to understand SE’s purpose and requirements, the unique mapping path from source to SE and how this differs from other common domains, the steps needed to successfully derive SE, and programming methods that can be used. This paper will explore the inner workings of SE and explain how to successfully create SE one manageable bite at a time.
DS-080 : Standardised MedDRA Queries (SMQs): Programmers Approach from Statistical Analysis Plan (SAP) to Analysis Dataset and Reporting
Sumit Pratap Pradhan, Syneos Health
Standardised MedDRA Queries (SMQs) are groupings of MedDRA terms, ordinarily at the Preferred Term level that relate to a defined medical condition. SMQ information is provided by MedDRA in the form of SMQ files (SMQ_LIST, SMQ_CONTENT) and Production SMQ Spreadsheet. One of them should be used to create look-up table/SAS dataset which will be merged with adverse event dataset to derive SMQ information in analysis dataset which will be further used to do reporting. Now the question is how this SMQ information has to be implemented at study level - how many SMQs will be involved? Is Customized Query also involved? Which reports should be generated based on SMQ? Answer to these questions can be found in SAP. In this paper all the topics have been covered step by step with examples that will help even novice programmer to understand and implement SMQ at study level. First basics of SMQ (Narrow and Broad Scope, SQM Category, Algorithm Search, Hierarchical Structure) has been explained. Then detailed guideline has been mentioned how a programmer can create look-up table from SMQ Spreadsheet or SMQ files. Then variables capturing details of SMQ as per Occurrence Data Structure (OCCDS) version 1.0 has been explained. Then scenarios have been explained which will help a user to decide if he needs to use Standardized MedDRA Query or Customized Query or both? At last, example of SAP is shown which will explain how to decode SAP along with SMQ implementation at analysis dataset (ADAE) and reporting level.
DS-082 : Implementation of Immune Response Evaluation Criteria in Solid Tumors (iRECIST) in Efficacy Analysis of Oncology Studies
Weiwei Guo, Merck
In recent years the Immunotherapy have gained attention as being one of the most promising types cancer treatment on the horizon. Immunotherapy, also called biologic therapy, is a type of cancer treatment that boosts the body's natural defenses to fight cancer. While conventional RECIST criteria have served us well in evaluating chemotherapeutic agents, in immuno-oncology, a small percentage of patients manifest a new response pattern termed pseudoprogression, in which, after the initial increase in tumor burden or after the discovery of new lesions, a response or at least a prolonged stabilization of the disease can occur. Tumors respond differently to immunotherapies compared with chemotherapeutic drugs, raising questions about analysis of efficacy. Therefore, a novel set of anti-tumor assessment criteria iRECIST was published to standardize response assessment among immunotherapy clinical trials. In this paper, the difference between the RECIST and iRECIST criteria assessment is described first, then a step by step implementation of iRECIST in efficacy analysis in solid tumors oncology studies using investigator assessment (INV) will be provided starting from the data collection up to the final statistical analysis.
DS-109 : Impact of WHODrug B3/C3 Format on Coding of Concomitant Medications
Lyma Faroz, Seattle Genetics
Jinit Mistry, Seattle Genetics
The WHODrug dictionary is the industry standard for coding concomitant medications. As CDISC becomes more prevalent and strongly recommended by regulatory authorities for submission, the dictionary has evolved over time to ensure full compliance. It is maintained by the Uppsala Monitoring Centre (UMC) with updates provided to industry users twice every year, that is, 1st March and 1st September. The previous WHODrug B2/C formats are now up versioned to B3/C3, which make WHODrug coded data fully compliant with the expectations of regulatory authorities and bring heightened efficiency and other benefits to the industry. The older vs newer format length updates have impact on mapping of coding concomitant medication data according to SDTM CM guidelines. To add to that, per a notice in the Federal Register published by FDA in October 2017, the use of the B3 format is required in submissions of studies starting after 15th March 2019. Hence, it is critical for statistical programmers to learn and be aware of these updates and apply them in new studies. In this paper, we will describe how the WHODrug B3 and C3 formats relate to the U.S. FDA Data Standards Catalog, shed light on aspects relevant to statistical programmers receiving concomitant medication data in these formats, and illustrate efficient ways of handling them in the SDTM CM domain in full compliance with CDISC standards and regulatory submission expectations.
DS-110 : Demystifying SDTM OE, MI, and PR Domains
Lyma Faroz, Seattle Genetics
Sruthi Kola, SVU
CDISC’s SDTM IG is an extensive repository of domain metadata that helps organize clinical trial data into relevant and detailed classifications. With rapid advancements in new drug development, patients now have superior and expansive options for treatment. These innovations in medicine necessitate continual updates to the SDTM IG. However, study implementations may not always keep pace with these updates, thereby not fully utilizing valuable resources available through the IG. This paper highlights three such lesser-known SDTM domains which allow statistical programmers to more efficiently structure study data for downstream analysis and submission. We will also share sample CRFs as part of our case study on these domains: Ophthalmic Examinations (OE) Added in SDTM IG v3.3, this is part of the Findings class. It contains assessments that measure ocular health and visual status to detect abnormalities in the components of the visual system and determine how well the person can see. Microscopic Findings (MI) Also, part of the Findings class, it holds results from the microscopic examination of tissue samples performed on a specimen which is prepared with some type of stain. An example is biomarkers assessed by histopathological examination. Procedures (PR) This is part of the Interventions class. This domain stores details of a subject’s therapeutic and diagnostic procedures such as disease screening (e.g., mammogram), diagnostic tests (e.g., biopsy), imaging techniques (e.g., CT scan), therapeutic procedures (e.g., radiation therapy), surgical procedures (e.g., diagnostic surgery).
DS-117 : CDISC-compliant Implementation of iRECIST and LYRIC for Immunomodulatory Therapy Trials
Kuldeep Sen, Seattle Genetics
Sumida Urval, Seattle Genetics
Yang Wang, Seattle Genetics
The current RECIST and LUGANO criteria are designed to assess efficacy of traditional chemotherapeutic regimens in solid tumor and lymphoma trials, respectively. They are less suitable to assess efficacy of regimens studied in immunotherapy trials, as these may cause tumor flares during treatment which can be associated with clinical and imaging findings suggestive of progressive disease (PD). As a result, without a more flexible interpretation, some patients in such trials might be prematurely removed from a potentially beneficial treatment, leading to underestimation of the true magnitude of the clinical benefit of the agent under investigation. For this reason, iRECIST and LYRIC guidelines were introduced in solid tumor and lymphoma immunotherapy trials, respectively. This paper focuses on the implementation of the additional response criteria introduced by these guidelines, namely “Unconfirmed PD (iUPD)” and “Confirmed PD (iCPD)” by iRECIST and “Indeterminate Response (IR)” by LYRIC. This paper will demonstrate how iUPD, iCPD, and IR data can be collected on the CRF, mapped into SDTM and ADaM, and reported efficiently. We will share our experience with challenges and solutions from an implementation perspective along this entire data flow, with emphasis on CDISC compliance and effective reporting.
DS-133 : Is Your Dataset Analysis-Ready?
Kapila Patel, Syneos Health
Nancy Brucken, Clinical Solutions Group
One of the fundamental principles of ADaM is that analysis datasets should be analysis-ready, which means that each item displayed on an output table, listing or figure can be generated directly from the dataset. The ADaM datasets produced by assuming a 1:1 relationship between SDTM and ADaM datasets may not be analysis-ready, especially if the actual analysis requirements have not been considered. This paper will provide several examples of how an SDTM domain can be split into more than one ADaM dataset to meet analysis needs, and show that the SDTM domain class does not have to drive the class of the resulting ADaM datasets.
DS-195 : Standardizing Patient Reported Outcomes (PROs)
Charumathy Sreeraman, Ephicacy Lifescience Analytics
Any health outcome directly reported by the subject in the trial is referred to as Patient reported Outcome (PROs). It is an addendum to the data reported by the investigator and/or study staff who are conducting the trial. Patient-reported data helps in better understanding of the subject’s perspective. In addition to providing physiological effects, it is critical in evaluating the safety and efficacy of a drug administered. The patient-reported data is typically collected through subject diary. Subject diary, often called patient diary, is a tool used during a clinical trial or a disease treatment to assess the patient's condition or to measure treatment compliance. The use of digitized patient-reported data, or patient-reported data, is on the rise in today's health research setting. Subject diary can collect the information about: Daily symptoms, daily activities, safety assessment, usage of the study medication to measure the compliance, usage of the concomitant medication and disease episodes on frequent basis. In this presentation, we will be exploring the standardisation of the diary data with standard SDTM domains in different therapeutic areas.
DS-196 : Simplifying PGx SDTM Domains for Molecular biology of Disease data (MBIO).
Sowmya Srinivasa Mukundan, Ephicacy
Charumathy Sreeraman, Ephicacy Lifescience Analytics
Pharmacogenomics/genetics peruses how the genetic makeup of an individual affects his/her response to drugs. It deals with the influence of acquired and inherited genetic variation on drug response in patients by correlating genetic expression with pharmacokinetics (drug absorption, distribution, metabolism and elimination) and pharmacodynamics (effects mediated through a drug's biological targets). The purpose of the SDTMIG-PGx is to provide guidance on the implementation of the SDTM for biospecimen and genetics-related data. The domains presented in the SDTMIG-PGx are intended to hold data that fall into one of three general categories: data about biospecimens, data about genetic observations, and data that define a genetic biomarker or assign it to a subject. The paper will throw some light on the mapping challenges encountered in MBIO data with sample CRF pages illustration.
DS-235 : Tackle Oncology Dose Intensity Analysis from EDC to ADaM
Song Liu, BeiGene
Cindy Song, BeiGene Cororperation
Mijun Hu, BeiGene Corporation
Jieli Fang, BeiGene Cororperation
Exposure analysis in oncology can be complicated when dose is adjusted based on body weight changes from baseline, or the treatment is administered in 2-5 consecutive days on different frequency rather than with one infusion within one treatment cycle. The paper is to demonstrate how we mapped exposure data collected on EDC to SDTM.EX and how we build ADaM exposure analysis data. Dose intensity involves some critical parameters: the planned dose intensity (mg/m2/cycle) which is planned dose per cycle as defined in the protocol; and the actual dose intensity (ADI), defined as the total cumulative dose taken divided by the treatment duration. Without BSA being derived in EDC, dose intensity derivation becomes complicated to standardize dose taken within one cycle with EXDOSE (mg/m2/cycle) for treatment that goes on several days then off for the remaining 21-day cycle. It becomes more challenging to check dose adjustment when weight changed >=10% from the baseline or previous weight used as the new baseline without BSA in EDC. We requested BSA from the IRT system and merged that with the exposure data by visit to solve the problem. Since chemotherapy includes different treatments in this study, the treatment duration, total actual dose taken, total treatment cycle are all unique and critical to derive dose intensity. For table programming efficiency, instead of using PARCAT for each of the treatment, we created five ADEXmed, where med represents each treatment name by keeping the ADaM structure and variable names the same for each treatment data.
DS-248 : Best practices for annotated CRFs
Amy Garrett, Pinnacle 21
There is no doubt that the SDTM annotated CRF (aCRF) is one of the most cumbersome submission documents to create. Once a purely manual task, the extreme burden required to create the aCRF has led to several novel methods to automate or partially automate the process. As the industry moves away from manually annotating CRFs and towards automation, it’s more important than ever to truly understand the properties of a high-quality aCRF. This paper reviews published guidance from regulatory agencies and provides best practices for CRF annotations. Following these best practices will ensure your aCRF fulfills current regulatory requirements, as well as meets the needs of internal users and programs.
DS-261 : SUPPQUAL datasets: good bad and ugly
Sergiy Sirichenko, Pinnacle 21
SUPPQUAL datasets were designed to represent non-standard variables in SDTM tabulation data. There are many recent discussions about whether the SDTM Model should allow the addition of non-standard variables directly to General Observation Class domains instead of using SUPPQUAL datasets? However; there is still a lack of implementation metrics across the industry to understand actual utilization of SUPPQUAL datasets. In this presentation we will summarize metrics from many studies and different sponsors to produce an overall picture of utilization of SUPPQUAL datasets by the industry. We will analyze commonly used SUPPQUAL information for being potentially promoted to standard SDTM variables. Also; we will provide and discuss examples of correct and incorrect utilization of SUPPQUAL datasets in submission data to understand if the industry is ready to switch from SUPPQUAL datasets to non-standard variables?
DS-329 : Overcoming Pitfalls of DS: Shackling 'the Elephant in the Room'
Soumya Rajesh, SimulStat
Michael Wise, Syneos Health
Disposition (DS) is a standard SDTM domain that has been around since the inception of SDTM. Although familiar, it has often been misinterpreted or misused. Unlike other SDTM domains, direct mapping from CRF pages presents challenges within DS. For example, CRF values may not be a perfect fit for the terms defined in controlled terminology code-lists, especially as seen in 'End of Study' or 'End of Treatment' pages. When a code-list does not have an exact match with the CRF text, you may need to request NCI to extend the code-list or add new terms. This, however, may create problems because not every “new term” should extend a code-list. Also, it’s important to understand the differences between criteria so that DSCAT / DSDECOD values are assigned appropriately. From an annotation perspective, this means that if the values in a variable are 'assigned', it should not be annotated on the CRF. This paper will guide you through the mapping of CRF pages to DS, and illustrate how to choose appropriate control terminology for variables like DSCAT, DSDECOD and EPOCH. So you should hopefully be able to overcome the pitfalls of DS and shackle that ‘elephant in the room'.
DS-344 : Trial Sets in Human Clinical Trials
Fred Wood, Data Standards Consulting Group
The Trial Sets table has been included in the SDTM since Version 1.3, published in 2012. The only implementation guide in which it appears, however, is the SEND Implementation Guide (SENDIG). The Trial Sets dataset (TX) allows for the subsetting of subjects within an Arm (treatment path) and facilitates the “grouping” multiple Arms together. A Trial Set represents the most granular subdivision of all the experimental factors, treatment factors, inherent characteristics, and distinct sponsor designations as specified in the design of the study. Within a nonclinical trial, each animal is assigned to a Set in addition to an Arm. The Set Code (SETCD) variable is Required in the SEND DM dataset. While there is no such requirement in the SDTMIG DM dataset, Trial Sets has potential uses in human clinical trials, particularly when the randomization or the study design is based on factors other than treatment (e.g., subjects who have undergone previous heart surgery vs. those who have not). This presentation will provide an introduction to Trial Sets as it’s used in nonclinical studies as well as examples of how this dataset could be used in human clinical trials.
DS-365 : Creating SDTMs and ADaMs CodeList Lookup Tables
Sunil Gupta, TalentMine
Do you need to review and confirm codelist values for variables in SDTMs and ADaMs? Codelists are list of unique values for key variables such as LBTEST, AVISIT and AVISITN. Codelists need to be cross-referenced with control terms. Codelist dictionary compliance checks are very important but are often neglected. Since all raw data is now standardized to control terms, there are many opportunities for cross checking SDTM, ADaMs with codelist dictionary tables. In addition, the define xml file must have a correct and updated codelist section. This presentation shows how to automatically create a codelist dictionary across all ADaMs and SDTMs as well as compare codelist dictionaries from SDTMs and define xml specifications for example. Both examples are essential to meet CDISC compliance. Note that without an automated process to create codelist dictionaries, the alternative method of applying Proc FREQ on all categorical variables is very time consuming.
Data Visualization and ReportingDV-004 : Library Datasets Summary Macro %DATA_SPECS
Jeffrey Meyers, Mayo Clinic
The field of clinical research often involves sharing data with other research groups and receiving data from other research groups. This creates the need to have a quick and concise way to summarize incoming or outgoing data that allows the user to get a grasp of the number of datasets, number of variables, and number of observations included in the library as well as the specifics of each variable within each dataset. The CONTENTS procedure can fulfill this role to an extent, but the DATA_SPECS macro uses the REPORT procedure along with the Excel Output Delivery System (ODS) destination to create a report that is fine tuned to summarize a library. The macro produces a one page overview of the datasets included in the specified library, and then creates a new worksheet for each dataset that lists all of the variables within that dataset along with their labels, formats, and a short distribution summary that varies depending on variable type. This gives the user an overview of the data that can be used in documents such as data dictionaries, and demonstrates an example of the powerful reports that can be generated with the ODS Excel destination.
DV-006 : What's Your Favorite Color? Controlling the Appearance of a Graph
Richann Watson, DataRich Consulting
The appearance of a graph produced by the Graph Template Language (GTL) is controlled by Output Delivery System (ODS) style elements. These elements include fonts, line and marker properties as well as colors. A number of procedures, including the Statistical Graphics (SG) procedures, produce graphics using a specific ODS style template. This paper provides a very basic background of the different style templates and the elements associated with the style templates. However, sometimes the default style associated with a particular destination does not produce the desired appearance. Instead of using the default style, you can control which style is used by indicating the desired style on the ODS destination statement. However, even one of the 50 plus styles provided by SAS® still does not achieve the desired look. Luckily, you can modify an ODS style template to meet your own needs. One such style modification is to control what colors are used in the graph. Different approaches to modifying a style template to specify colors used will be discussed in depth below.
DV-009 : Great Time to Learn GTL: A Step-by-Step Approach to Creating the Impossible
Richann Watson, DataRich Consulting
Output Delivery System (ODS) graphics, produced by SAS® procedures, are the backbone of the Graph Template Language (GTL). Procedures such as the Statistical Graphics (SG) procedures dynamically generate GTL templates based on the plot requests made through the procedure syntax. For this paper, these templates will be referenced as procedure-driven templates. GTL generates graphs using a template definition that provides extensive control over output formats and appearance. Would you like to learn how to build your own template and make customized graphs and how to create that one highly desired, unique graph that at first glance seems impossible? Then it’s a Great Time to Learn GTL! This paper guides you through the GTL fundamentals while walking you through creating a graph that at first glance appears too complex but is truly simple once you understand how to build your own template.
DV-050 : Oncology Graphs-Creation (Using SAS and R), Interpretation and QA
Taniya Muliyil, Bristol Meyers Squibb
Data visualization plays a key role in analyzing and interpreting data. Oncology graphs help to visualize, interpret and analyze trends in data from a statistical perspective. Graphical outputs help in exploring data ,identifying issues with data and in turn help to improve data quality. Most commonly used statistical software’s for creating oncology graphs in pharmaceutical industry is SAS and R. Programmers create complex graphical outputs using these statistical tools but many are unable to interpret the results that these graphs display. Ability to interpret results helps to identify issues with the graphs or the data used to generate these outputs. This in turn also helps to verify these graphical outputs. This paper focuses on creating some common oncology graphs like spider plot, swimmer plot and waterfall plot using R and SAS, along with interpreting the results displayed by these graphs. It also discusses common QA findings that will reduce issues while generating these outputs and in turn help with statistical interpretation and analysis.
DV-057 : R for Clinical Reporting, Yes – Let's Explore It!
Hao Meng, Seattle Genetics Inc.
Yating Gu, Seattle Genetics, Inc.
Yeshashwini Chenna, Seattle Genetics
RStudio® is an interpreted programming language-based software application which can be an ideal platform for statistical analysis and data visualization. For biostatisticians and programmers in the pharmaceutical and biotech industry, it offers a wide and rapidly growing range of user-developed packages containing functions which can efficiently manipulate complex data sets and create tables, figures, and listings. While SAS® remains a critical tool in these industries, the popularity of RStudio® in both academia and the clinical industry increased exponentially over the last decade due to its free availability, easy access, flexibility, and efficiency. As RStudio® is gaining popularity and given that regulatory agencies have not endorsed any particular software for clinical trial analysis and submission, understanding the competency of RStudio® and being well-positioned to use it in a clinical data reporting environment is a worthy endeavor. This paper introduces a pragmatic application of RStudio® by describing an actual use case of clinical trial data manipulation and export (e.g., SDTM and ADaM to XPORT format), creation of tables, figures, and listings, and a simulation use case. We will share the RStudio® packages used as well as pros and cons between RStudio® and SAS®. This paper also provides a brief background to the RStudio® platform and our programming environment set-up, along with relevant statistical programming details.
DV-066 : Simplifying the Derivation of Best Overall Response per RECIST 1.1 and iRECIST in Solid Tumor Clinical Studies
Xiangchen (Bob) Cui, Alkermes, Inc
Sri Pavan Vemuri, Alkermes
The objective tumor response rates (ORR) is one of endpoints in solid tumor clinical studies per FDA guideline . It plays the critical role in earlier phase oncology studies. It is based on the best overall response (BOR), which is defined as the best response across all time-point responses. Response Evaluation Criteria in Solid Tumors version 1.1 (RECIST 1.1)  has been quickly adopted since 2009, and iRECIST  has been recommended for use in trials testing immune therapeutics by the RECIST working group since March 2017. These two guidelines are applied to derive the overall response at each time point (both on-treatment and follow-up). For non-randomized trials, both RECIST 1.1 and iRECIST require the confirmation of complete response (CR) and partial response (PR). Moreover, progressive disease (iUPD) needs the confirmation for iRECIST. These confirmations provide challenges in statistical programming during the derivation of BOR. This paper presents a new approach to overcome these challenges by illustrating the logic and data flow for the derivation of BOR. The new technique simplifies the process and ensures the technical accuracy and quality. Furthermore, the traceability for the derivation of the date of Best Overall Response (BOR) and its result is also built in this “simple” process. The sharing of hands-on experiences in this paper is intended to assist readers to apply this methodology to prepare an ADaM dataset for the reporting of ORR to further support clinical development of cancer drugs and biologics.
DV-132 : Automation of Flowchart using SAS
Xingshu Zhu, Merck
Bo Zheng, Merck
Flowchart diagrams are commonly used in clinical trials because they provide a direct overview of the process, including participant screening, recruited patient enrollment status, demographic information, lab test results, etcetera. Flowcharts are particularly useful when working on Pharmacoepidemiology studies, which require dynamic study designs due to the unpredictability and variation in data sources. In this paper, we introduce a simple SAS macro that allows to create customized flowchart diagrams that fit a customer’s individual needs by selecting two different methods and applying them to various types of source data.
DV-135 : Making Customized ICH Listings with ODS RTF
Huei-Ling Chen, Merck
William Wei, Merck & Co, Inc.
ICH (International Consortium on Harmonization) data listings are common reports prepared by pharmaceutical companies in regulatory submission. Such listings are often submitted in RTF format. This paper presents a technique that can be used to efficiently produce a customized ICH Abnormal Lab Listing based on a company’s uniform RTF output standards. This technique includes five components: data preparation; and four SAS® syntaxes (SYSTEM OPTIONS, ODS TAGSETS.RTF, PROC TEMPLATE style, and PROC REPORT) can be utilized to define the layout and to render a data listing table. This paper will first describe the problem that we are trying to solve and then give details on the ODS TAGSETS.RTF options that we have chosen to solve this problem with mock data.
DV-148 : Butterfly Plot for Comparing Two Treatment Responses
Raghava Pamulapati, Merck
Butterfly plot is an effective graphical representation for comparing two treatment responses for the same subject across different time points. Butterfly plots are drawn across a centered axis to separate two treatment responses in one picture. Butterfly plots accommodate a subject’s two treatment responses in one plot, such as, current treatment response with prior study/non-study treatment response or active treatment response with control treatment response or mono therapy response with combination therapy response. Butterfly plots can be created using SAS® PROC SGPLOT. The inclusion of the HBAR statement creates horizontal bars on the Y-axis. Each bar represents one subject. The RESPONSE option displays duration of study medications on the X-axis. The FILLATTRS option categorizes the response in colors by using variables that are derived for each treatment and response type with corresponding description displayed at plot legend using KEYLEGENT statement. More specific details on the PROC SGPLOT syntax and plot options will be presented in the body of the paper. Furthermore, steps to derive required variables and dataset pre-processing to categorize response will be discussed.
DV-157 : Automating of Two Key Components in Analysis Data Reviewer’s Guide
Shunbing Zhao, Merck & Co.
Jeff Xia, Merck
The Analysis Data Reviewer’s Guide (ADRG) provides reviewers with additional context for analysis datasets received as part of a regulatory submission. It is crucial to submit a clear, concise and precise ADRG. The ADRG consists of seven sections (1. Introduction, 2. Protocol Description, 3. Analysis Considerations Related to Multiple Analysis Datasets, 4. Analysis Data Creation and Processing Issues, 5. Analysis Dataset Descriptions, 6. Data Conformance Summary, 7. Submission of Programs) and optional appendices. In sections 4 and 5, two key components are highly recommended to be included by FDA reviewers: a graph showing data dependencies and a table that identifies efficacy and primary outcome datasets. This paper presents three useful SAS macros to automatically create two key important components for ADRG: the first macro generates the data dependency graph by using SAS Graph Template Language (GTL); The second macro generates the table of “Analysis Dataset Descriptions” in ADRG; Additionally, the third macro automatically inserts the generated data dependency graph and the table of “Analysis Dataset Descriptions” into the right place in ADRG. This innovative approach removes a few trivial steps from the ADRG generation process, which had to be done manually previously. It helps to create a clear, concise and precise ADRG.
DV-158 : Enhanced Visualization of Clinical Pharmacokinetics Analysis by SAS GTL
Min Xia, PPD
Graphs are essential tools to comprehensively represent data and raise the readability of analysis. They are an integral part of Pharmacokinetic (PK) analysis and the high-quality PK graphs are very important for Clinical Study Report (CSR) and a key task for regulatory submissions. This paper provides instructions to create PK analysis graphs powered by SAS Graph Template Language (GTL) such as multiple panels series plot and multiple columns forest plot. In addition, DYNAMIC variables and other advanced techniques are introduced to increase the flexibility of GTL and the data visualization accuracy.
DV-163 : plots and story with diabetes data
Yida Bao, Auburn University
Zheran Rachel Wang, Auburn University
Jingping Guo, Conglomerate company
Philippe Gaillard, Auburn University
The graph has always been more intuitive than icy statistics. SAS gives us quite powerful graph capabilities. In this project, we use diabetes datasets to do data visualization research. The detection of diabetes can generally be judged by several indicators- Glucose, Insulin, BMI and so on. In general, our research contains three parts. First we apply SAS ® procedure PROC Gplot to create line diagrams, which helps us to explore the inner structure between different factors. We will introduce two different methods to generate overlay plot based on the binary detection result. Also, we will apply SAS Enterprise Miner to proceed with the principal component analysis, which helps us to bright the project. Later, we’ll use the typical discrimination method, convert the dataset into several canonical variables and generate the proper plot to express the result. At last, we’ll summarize the result to establish information hierarchy, and tell the other researcher our understanding of the diabetes dataset.
DV-164 : Using R Markdown to Generate Clinical Trials Summary Reports
Radhika Etikala, Statistical Center for HIV/AIDS Research and Prevention (SCHARP) at Fred Hutch
Xuehan Zhang, Statistical Center for HIV/AIDS Research and Prevention (SCHARP) at Fred Hutch
The scope of the paper is to show how to produce a statistical summary report along with explanatory text using R Markdown in RStudio. Programmers write a lot of reports, describing the results of data analyses. There should be a clear and automatic path from data and code to the final report. R Markdown is ideal for this, since it is a system for combining code and text into a single document. I’ve found that R Markdown is an efficient, user-friendly tool for producing reports that do not need constant updating. RStudio is often used in the Pharmaceutical industry and health care for analysis and data visualization, but it can also be successfully used for creating reports and datasets for submission to regulatory agencies. This paper presents an RStudio program that demonstrates how to use R Markdown to generate a statistical table showing adverse events (AE) by system organ class (or preferred term) and severity grade along with text that explains the table. Collecting AE data and performing analysis of AEs is a common and critical part of clinical trial world. A well-developed reporting system like the one generated with R Markdown, provides a solid foundation and an efficient approach towards a better understanding of what the data represent.
DV-166 : An Introduction to the ODS Destination for Word
David Kelley, SAS
The SAS® Output Delivery System (ODS) destination for Word enables customers to deliver SAS® reports as native Microsoft Word documents. The ODS WORD statement generates reports in the Office Open XML Document (.docx) format, which has been standard in Microsoft Word since 2007. The .docx format uses ZIP compression, which makes for a smaller storage footprint and speedier downloading. ODS WORD is preproduction in the sixth maintenance release of SAS 9.4. This paper shows you how to make a SAS report with ODS WORD. You learn how to create your report's content (images, tables, and text). You also learn how to customize aspects of your report's presentation (theme and styles). And you learn how to enhance your report with reader-friendly features such as a table of contents and custom page numbering. If you're cutting and pasting your SAS output into Microsoft Word documents, then this paper is especially for you!
DV-198 : r2rtf – an R Package to Produce Rich Text Format Tables and Figures
Siruo Wang, Johns Hopkins Bloomberg School of Public Health
Keaven Anderson, Merck & Co., Inc., Kenilworth, NJ, USA
Yilong Zhang, Merck & Co., Inc., Kenilworth, NJ, USA
In drug discovery, research and development, the use of open-source R is evolving for study design, data analysis, visualization, and report generation across many fields. The ability to produce customized rich text format (RTF) tables in the R platform becomes crucial to complement analyses. We developed an R package, r2rtf, that standardizes the approach to generate highly customized RTF tables, listings, and figures (TLFs) in RTF format. The r2rtf R package provides flexibility to customize table appearance for table title, subtitle, column header, footnote, and data source. The table size, border type, color, and line width can be adjusted in detail as well as column width, and row height, text format, font size, text color, and alignment, etc. The control of the format can be row or column vectorized by leveraging the vectorization in R. Furthermore, r2rtf provided pagination, section grouping, multiple tables concatenations for complicated table layouts. In this paper, we overview r2rtf workflow with three required and four optional easy-to-use functions. Code examples are provided to create customized RTF tables with highlighted features in drug development.
DV-283 : Programming Technique for Line Plots with Superimposed Data Points
Chandana Sudini, Merck & Co., Inc
Bindya Vaswani, Merck & Co., Inc
Line graphs are mainly plotted by connecting associated data points over a specified time interval to portray an overall trend. Despite the existence of many visualization methods and techniques, line plots continue to be a simple way of displaying quantitative data patterns in exploratory analyses. Line plots can be used to present either summary statistics such as mean, standard deviation of a population or individual subject level data over a time period. In a scenario where we need to understand the impact of concomitant medication (CM) on the laboratory measurements (LB) for each subject, presenting line plots with subject level data from those two sources can be extremely challenging, since the y-axis value at the time of CM administration may be unknown. In order to accomplish this, we propose using the properties of a straight line to predict the potential y-value at the time of the CM administration. We can derive these predicted values by programmatically fitting a coordinate between the LB data points, before and after the CM administration (using mathematical concept of slope and constant of straight line). Once the predicted values are calculated, plugging those values into the annotation data step will enable us to generate line plots with superimposed data points, resulting in a meaningful representation of the data being analyzed. This paper details the SAS logic required to generate these line plots.
DV-299 : Effective Exposure-Response Data Visualization and Report by Combining the Power of R , SAS programming and VBScript
Shuozhi Zuo, Regeneron Pharmaceuticals
Hong Yan, Regeneron Pharmaceuticals
Understanding the relationship between exposure and response is critical to finding a dose that optimally strikes a balance between drug efficacy and adverse events, therefore comprehensive Exposure-Response (ER) analysis are needed throughout all phases of clinical trials. This type of analysis could be planned, ad-hoc or exploratory, it requires high quality data visualization and fast turnaround for dose selection, phase 2 decision or regulatory submissions etc. This paper introduces an innovative efficient way for ER analysis figures by combining the power of R language in data visualization and SAS programming in data processing, dynamic data exchange and statistical procedures. We will use the logistics regression analysis as an example, explaining how the data quality/accuracy are maintained in SAS and connect R function/codes to generate multiple figures by different exposure parameters and endpoints. This paper will also describe the process in detail from input datasets to final outputs including reading SAS datasets into R, Implementing R function like SAS macro, combining ggplot2 package with statistical analysis, reading titles and footnotes from excel sheet for each output automatically, applying RTF package and VBScript in R for both RTF and PDF format and batch run all the R programs at once to update the multiple results.
DV-350 : Dressing Up your SAS/GRAPH® and SG Procedural Output with Templates, Attributes and Annotation
Louise Hadden, Abt Associates Inc.
Enhancing output from SAS/GRAPH®has been the subject of many a SAS® paper over the years, including my own and those written with co-authors. The more recent graphic output from PROC SGPLOT and the recently released PROC SGMAP is often "camera-ready" without any user intervention, but occasionally there is a need for additional customization. SAS/GRAPH is a separate SAS product for which a specific license is required, and newer SAS maps (GfK Geomarketing) are available with a SAS/GRAPH license. In the past, along with SAS/GRAPH maps, all mapping procedures associated with SAS/GRAPH were only available to those with a SAS/GRAPH license. As of SAS 9.4 M6, all relevant mapping procedures have been made available in BASE SAS, which is a rich resource for SAS users. This paper and presentation will explore new opportunities within BASE SAS for creating remarkable graphic output, and compare and contrast techniques in both SAS/GRAPH such as PROC TEMPLATE, PROC GREPLAY, PROC SGRENDER, and GTL, SAS-provided annotation macros and the concept of "ATTRS" in SG procedures. Discussion of the evolution of SG procedures and the myriad possibilities offered by PROC GEOCODE's availability in BASE SAS will be included.
Hands-On TrainingHT-111 : YO.Mama is Broke 'Cause YO.Daddy is Missing: Autonomously and Responsibly Responding to Missing or Invalid SAS® Data Sets Through Exception Handling Routines
Troy Hughes, Datmesis Analytics
Exception handling routines describe the processes that can autonomously, proactively, and consistently identify and respond to threats to software reliability, by dynamically shifting process flow and by often notifying stakeholders of perceived threats or failures. Especially where software (including its resultant data products) supports critical infrastructure, has downstream processes, supports dependent users, or must otherwise be robust to failure, comprehensive exception handling can greatly improve software quality and performance. This text introduces Base SAS® defensive programming techniques that identify when data sets are missing, incorrectly formatted, incompletely populated, or otherwise invalid. The use of &SYSCC (system current condition), &SYSERR (system error), and other SAS automatic macro variables is demonstrated, including the programmatic identification of warnings and runtime errors that eliminate the necessity for SAS practitioners to routinely and repeatedly check the SAS log to evaluate software status.
HT-139 : Why You are Using PROC GLM Too Much (and What You Should Be Using Instead)
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Peter Flom, Peter Flom Consulting
It is common knowledge that the general linear model (linear regression and ANOVA) is one of the most commonly used statistical methods. However, the analytical problems that we encounter often violate the assumptions of this model type, leading to its inappropriate implementation. Lucky for us, modern modeling techniques have been created to overcome these violations and provide better results, which has resulted in the development of specialized SAS PROCs to assist with their implementation. These include: Quantile regression, Robust regression, Cubic splines and other forms of splines, Multivariate adaptive regression splines (MARS), Regression trees, Multilevel models, Ridge Regression, LASSO, and Elastic Nets, among other methods. Covered PROCs include QUANTREG, ROBUSTREG, ADAPTIVEREG and MIXED. This workshop will begin with a brief refresher on regression, including a discussion of the assumptions of the GLM and ways of diagnosing violations. It is designed with the assumption that attendees have a working knowledge of linear regression with PROC GLM.
Leadership SkillsLS-008 : Are you Ready? Preparing and Planning to Make the Most of your Conference Experience
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.
Whether you are a first-time conference attendee or an experienced conference attendee, this paper can help you in getting the most out of your conference experience. As long-standing conference attendees and volunteers, we have found that there are some things that people just don’t think about when planning their conference attendance. In this paper we will discuss helpful tips such as making the appropriate travel arrangements, what to bring, networking and meeting up with friends and colleagues, and how to prepare for your role at the conference. We will also discuss maintaining a workplace presence with your paying job while at the conference.
LS-016 : One Boys’ Dream: Hitting a Homerun in the Bottom of the Ninth Inning
Carey Smoak, S-Cubed
One boys’ dream of hitting a homerun in the bottom of the ninth inning has been realized in my career. My career started out as an epidemiologist in academia. My SAS® skills were pretty basic back then. My SAS skills advanced tremendously as I transitioned to working as a statistical SAS programmer in the pharmaceutical and medical device industries. My career has been varied from strictly working as a statistical SAS programmer to managing statistical SAS programmer. My interest in statistics began with my interest in baseball. Little did I realize that my interest in statistics as a teenager would lead to a fulfilling career and, thus, fulfill my childhood dream.
LS-037 : Microsoft OneNote: A Treasure Box for Managers and Programmers
Jeff Xia, Merck
Microsoft One note has many functionalities that are helpful for managers and lead programmers who have increasing responsibility of managing project deliverables as well as understanding staff availability to ensure quality deliverables with compliance to department SOPs. Additionally, managers and leads have the responsibility to keep upper management informed of the overall status in the group operation, including project progress, success stories, challenges as well as potential problems within the group. To effectively perform their daily operations with so many responsibilities in different directions, it is essential for managers and programming leads to find an efficient and effective way to organize the necessary information in the entire process so that they can locate files and information quickly to resolve any unexpected issues. Microsoft OneNote is a tool that can serve this purpose. This paper briefly introduces some key features of Microsoft OneNote as well as the hierarchy of Notebook, Section and Page within OneNote. It also provides three examples of using OneNote on how to organize information in different categories as the manager or lead programmer of statistical SAS programing group in the pharmaceutical industry.
LS-059 : An Effective Management Approach for a First-Time Study Lead
Himanshu Patel, Merck & Co.
Jeff Xia, Merck
Becoming a first-time study lead from a programming study team member is a significant step that comes with various challenges. Perseverance plays a substantial role in their development process, which helps to overcome these challenges. Programming of tables, listings, graphs, and datasets is mostly a logical and technical task, whereas leading the study involves additional management tasks. The successful accomplishment of any project is determined by the technical and management approach used by the study lead. Lead can adopt different management approaches, which include technical and leadership skills, effective communication, team motivation, stakeholder relationship, proper planning, and risk management. These factors allow study leads to create a committed and robust team that is dedicated to achieving the best result. This paper illustrates certain key features of an effective management approach that can be implemented to ensure effective and robust study management. This may be relevant and helpful to those hoping to be a study lead or those who recently took study lead responsibilities for the first-time.
LS-185 : Leadership Lessons from Start-ups
Siva Ramamoorthy, Ephicacy Lifesciences Analytics
Over the last two decades, we have seen an extraordinary growth of start-ups and start-up ecosystems. In countries like the US, China, and India, ‘billion dollar plus valuation’ unicorns, successful entrepreneur role models, enormous media buzz are center stage. Start-ups are characterized by enormous zeal, about bring a new idea to market and making a difference in the world. Such start-up’s often begin with a single idea or a single solution to an existing problem and with zero investment. Still, they attract hundreds and thousands of dreamy-eyed employees who work for free with a promise of deferred future wealth. The failure rate of such start-ups is as high as 90%, yet employees are happy to take the risk. Besides being strongly motivated, start-up employees are enormously productive and bring innovative ideas at rapid speeds. How is it that start-ups have such passionate committed employees? How is it their employees excel in innovation application? In the life sciences space, we have an equal and perhaps larger opportunity to make a difference in people’s lives. We create solutions that enable better medicines to be created. We have an opportunity like start-ups to get our employees passionately involved in the work by communicating clearly the enormous impact of their work. In this paper I articulate three categories of leadership lessons from the start-up ecosystem to the life sciences industry namely: (i) Systemic Innovation Management -following a winning way of qualifying innovative ideas and building them into products/solutions (ii) People Management Lessons (iii) Velocity
LS-265 : Building a Strong Remote Working Culture – Statistical Programmers Viewpoint
Ravi Kankipati, Nola Services
Prasanna Sondur, Ephicacy LifeScience Analytics
Remote Working or Telecommuting has become more of a norm than exception. To add a dimension of off shore Functional Service Provider who extend this to their employees whilst maintain the status quo is a challenge in itself. Companies situated offshore are embracing this feature of flexible hiring by casting a wider net or to retain the existing talent. Only a handful of companies in the Statistical Programming Sphere have been successful in having a Full Remote workforce. Statistical Programming is a diverse field involving frequent discussions with fellow Programmers and Client Point of Contacts (PoCs) within the Statement of Work (SOW). A strong liaising with Data management staff, Statisticians and Medical writers is expected. Remote working thus throws a challenge for effective communication within & between global teams. Some examples being: Communicating timelines of each deliverable, data issues, validation comments and work transitions. The authors provide their thoughts of overcoming such challenges and in building a strong, transparent working culture. Other common issues like Upgrading skill sets, time management, work-life balance are also discussed. Authors with their own diverse experience, One with majority of experience at Onsite and the other from Offshore share their own candid experience of their Statistical Programmer journey in the current Remote working realm.
LS-297 : The Art of Work Life Balance.
Darpreet Kaur, Statistical Programmer
We are the healthcare industry, engaged in putting the pieces together to make this world a little more healthier, a little more happier. In the process, we develop lifestyles that might not be the best for our own health and happiness. What can we,as professionals do and what steps can employers take to ensure a healthy and happy workforce? How can employers offer better work life balance. 1-Access to exercise/pay for Gym memberships 2-Work from remote/flexible hours 3-Create opportunities for casual mingling. 4-Offer good healthcare and retirement benefits. 5-Continuing Education and learning opportunities. 6-Offer volunteering opportunities/community engagements. 7-Team building events. 8-Mandatory vacation policy. 9-Gratitude towards the employees and encouragement to do better. 10-Regular one on one meetings to address any challenges at work. How can employees ensure better work life balance: 1-Set realistic goals and priorities. 2-Take time off for physical and mental needs. 3-Make physical, mental and emotional health a priority. 4-Exercise and eat healthy. Drink enough water. 5-Take short breaks to improve productivity. 6-Give social media and technology a break. 7-Set boundaries and work hours. 8-Ask for help if you are stuck at work. 9-Invest in kitchen gadgets that save time. 10-Reconsider your job profile. There is no definition of an ideal work life balance. We all have different goals and different expectations from our lives and work, so my idea of work life balance is different from yours. Balance is a very personal thing and only you can decide what lifestyle choices suit you the best.
LS-359 : Leading without Authority: Leadership At All Levels
Janette Garner, MyoKardia, Inc.
The idea of what it means to be a leader takes on different meanings for each person and can vary across situations. One image is that of a manager directing a team under tight timelines. Another example would be a team member who proposes a solution to a novel problem. At its core, leadership is the process of driving influence from others to achieve a goal. This does not require seniority, organizational hierarchy, titles or personal attributes, though these are often conflated. Anyone can be a leader, including those without direct reports. This paper examines different leadership styles and contains numerous examples of how an individual contributor can make an impact in an organization, both large and small.
LS-364 : Project Metrics- a powerful tool that supports workload management and resource planning for Biostats & Programming department.
Jian Hua (Daniel) Huang, BMS
Rajan Vohra, BMS
Andy Chopra, BMS
For pharmaceutical company which has a large number of pipelines (i.e. several hundreds of studies ongoing) and a big group of Biostats & Programming team (i.e. > 200 people), it could be challenging for its management team to do workload management and resources planning efficiently. My team creates a “project metrics” which is aimed to collect Biostats & Programming workload related information from a centralized location. The metrics contains information of work deliverables which are organized under following categories: Function, TA Group, Lead, Compound, Study and Deliverable. It provides pre-defined pull-down lists (i.e. compound list, type of deliverables) to ensure the consistency of data entry. In addition, it creates customized dashboard reports that give management team a comprehensive overview and great details of related information, those reports include study summary, resources predication, a 360-degree dashboard report and a Sankey diagram of workload distribution. Finally, some new feature development has been fully discussed in the end of paper. In a quick summary, the project metrics provides a centralized location for data collection and a powerful tool for summary report. It increases the feasibility and efficiency for management team of conducting workload management and resources planning across Biostats & Programming department.
Medical DevicesMD-020 : CDISC Standards for Medical Devices: Historical Perspective and Current Status
Carey Smoak, S-Cubed
Work on SDTM domains for medical devices was begun in 2006. Seven SDTM domains were published in 2012 to accommodate medical device data. Minor updates to these seven SDTM domains were published in 2018. These seven SDTM domains are intended for use by medical device companies in getting their products approved/cleared by regulatory authorities and for use by pharmaceutical companies to put ancillary device data. As evidenced by the Therapeutic Area User Guides, pharmaceutical companies are using these seven SDTM domains for ancillary devices. However, adoption of these seven SDTM domains by medical device companies seems to be happening rather slowly. In 2014, a statistician at the Centers for Radiologic Health and Devices (CDRH) presented the issues that CDRH has with medical device submission and in 2015 the CDISC medical device team presented how the CDRH issues could be solved with CDISC standards. Recently CDRH published a document titled ‘Providing Regulatory Submissions for Medical Devices in Electronic Format.’ In 1999, the FDA published a similar document for pharmaceutical products which was the beginning of the development of CDISC standards for the pharmaceutical industry. While CDRH has not made a statement that they are moving towards the requirement of CDISC standards for medical device submissions the publishing of this document is a step in the right direction.
MD-041 : Successful US Submission of Medical Device Clinical Trial using CDISC
Phil Hall, Edwards Lifesciences
It is not yet mandatory for medical device trial data to be submitted using CDISC but The Center for Devices and Radiological Health (CDRH) accepts clinical trial data in any format, including CDISC. This paper serves as a case-study of the successful FDA submission of the Edwards Lifesciences’ PARTNER 3 trial which utilized SDTMS and ADaMs. There will be a review of the SDTM domains used for medical device-specific data and a general discussion of the submission approach.
MD-360 : An Overview of Medical Device Data Standards
Shilpakala Vasudevan, Ephicacy Lifescience Analytics
Medical device trials are widely conducted in recent times, for different therapeutic and diagnostic reasons. The nature of device trials is different from traditional clinical trials, and involve different study design, ways of collecting and data standards. Over the last few years, SDTM domains have been identified to accommodate data for devices, which can be used alongside with other regular domains usually found in clinical trials. Additionally, there have been analysis data set proposals that are being currently used.
In this paper, we will see what medical devices are, what data standards exist for devices, and how different they are with example.
Quick TipsQT-035 : A SAS Macro for Calculating Confidence Limits and P-values Under Simon’s Two-Stage Design
Alex Karanevich, EMB Statistical Solutions
Michael Ames, EMB Statistical Solutions
Simon’s two-stage designs are popular single-arm binary-endpoint clinical trials that includes a single interim analysis for futility. One is typically interested in the proportion of successes (response rate), but SAS does not provide p-values or confidence limits for this statistic. Calculating these is not straightforward due to the interim analysis: one is required to make a multiplicity adjustment for any trial that proceeds beyond the first stage. This quick tip provides derivations (along with a SAS macro) for calculating such a trial’s p-value and associated confidence limits. The macro requires the user to input the design parameters for the trial, as well as the total observed successes.
QT-049 : PROC COMPARE: Misnomer of the statement “NOTE: No unequal values were found. All values compared are exactly equal.”
Alex Ostrowski, Pfizer
When validating SAS code and datasets, it is common to use PROC COMPARE and receive the following statement “NOTE: No unequal values were found. All values compared are exactly equal.” The logical interpretation of this statement is the datasets are “exactly equal” and there are “no unequal values” between the two datasets. However, this is not always the case. As such, there is a risk if this is the primary method of providing equivalence in data sources, key differences may be overlooked. This paper will explore how to detect and fix these differences by comparing datasets with PROC SQL. With this method, the datasets are extensively compared and will only generate an “exactly equal” result if the datasets are indeed equivalent. In the case the datasets are not equivalent, differences are logged into “major” and “minor” groups and are summarized in the output. This method also lists the detailed differences between datasets by individual variable indicating changes in label, type, order, length, format, and informat. This provides a solution to avoid being misled by PROC COMPARE while helping make an improved assessment of “data equivalence” by performing additional comparisons and clearly stating the comparison results, allowing programmers to accurately compare datasets with confidence.
QT-128 : A Solution to Look-Ahead Observations
Yongjiang (Jerry) Xu, CSL Behring
Yanhua (Katie) Yu, PRA Health Sciences
In clinical trials, it is common to compare certain laboratory test results across a period or various time points. For example, the confirmation of drug responses in multiple myeloma studies and some hematology indicators emerging back to safe level from nadir. These data are usually mapped into finding domains in SDTM and programmed into BDS structure type ADaM datasets. However, it is difficult to look ahead the values of future observations or look back the values from previous observations in SAS® because SAS reads one observation at a time into the PDV. It is a known issue in SAS data management that we cannot do comparisons across observations. This paper will present a practical solution to transpose and do data comparison and calculations cross observations using DO loop and Array. Software Used: SAS v9.4. Operation system: Windows. Audience: Intermediate skill level.
QT-183 : A Brief Understanding of DOSUBL beyond CALL EXECUTE
Ajay Sinha, Novartis
M Chanukya Samrat, Novartis
SAS® is one of the most widely used programming language in clinical space, the newer release of SAS® 9.3 R2 and above offers a function “DOSUBL” that can come very handy for programmers to shorten and make code more efficient. This paper intends to explain the use of DOSUBL function with help of some examples that can make code more robust and flash the best use of this function within the code. Function DOSUBL enables immediate execution of SAS code after a text string is passed. It is somewhat similar to CALL EXECUTE routine however differs significantly, this paper intends to present the merits and pitfalls of DOSUBL function and where it can come handy to make most use of this function. The DOSUBL function has a single argument, which is a string value. In a data step, the function submits code to SAS for immediate execution. DOSUBL should be used in a DATA step. SAS documentation states that this function can also be used with %SYSFUNC outside a step boundary. DOSUBL executes code in a different SAS executive (known as a side session). This function is comparatively slower than CALL EXECUTE subroutine and uses more CPU resources however, the main advantage of using DOSUBL is immediate execution of the code in the side session.
QT-209 : Automation of Conversion of SAS Programs to Text files
Sachin Aggarwal, Rang Technologies
Sapan Shah, Rang Technologies
Statistical Programmers in clinical trials develop SAS programs to produce SDTM & ADaM datasets, tables, listings and graphs output according to study requirements. Apart from these SAS programs, there are other programs, which are developed during various ad-hoc and post-hoc requests. These programs have the same file extension .SAS and thus can be used on various SAS platforms e.g. SAS desktop, SAS Enterprise guide, etc. The .SAS extension is a requirement for a program to be functional on various SAS platforms. Whereas, the FDA requires all SAS programs to be in the text file and not .SAS extension for NDA submission. The Macro, which we are going to present in this paper, can convert multiple SAS programs saved in one folder to text files in a single run without making any changes to original SAS programs. This SAS to TEXT Macro will help saving programmer enormous amount of time by eliminating manual conversion of every single SAS file. It will also eradicate any possibility of error, which may occur if you manually convert SAS files to TEXT files. This Macro follows a process, which does not make any change to the original SAS files present in the original folder. This Macro reads & copies .SAS files from the original folder and pastes it to the second folder, where it converts copied .SAS files to .TXT files. After this, it deletes the copied .SAS files present in the second folder and leave only SAS programs in .TXT format as an output.
QT-213 : A SAS Macro for Dynamic Assignment of Page Numbers
Manohar Modem, Cytel
Bhavana Bommisetty, Vita Data Sciences
In clinical domain we usually create many safety and efficacy tables with various statistics. While creating these tables, the dataset with statistics is introduced into proc report to create rtf output. Using Proc Report-BREAK-PAGE we can make sure that each unique value of a parameter starts in a new page. If we want to make sure that a group of statistics does not break abruptly between pages, we may need to use conditional statements to assign page numbers. Whenever data or table shell gets updated, the number of records in the dataset with statistics may change which in turn requires an update in conditional statements to prevent abrupt breaks in the output. This led to an effort to create a macro such that it can be used for any table with simple modifications to macro parameters. The purpose of this paper is to describe how the page numbers were dynamically assigned using SAS macro.
QT-249 : Text Wrangling with Regular Expressions: A Short Practical Introduction
Noory Kim, SDC
Regular expressions provide a powerful way to find and replace patterns in text, but their syntax can seem intimidating at first. This paper presents a few simple practical examples of adapting text from data set or TLF specifications for insertion into SAS programs, using the text editor Notepad++. This paper is intended for SAS users of any skill level. No prior knowledge of regular expressions is needed.
QT-253 : Implementing a LEAD Function for Observations in a SAS DATA Step
Timothy Harrington, Navitas Data Sciences
A common situation in DATA step processing is the need to reference the value of a variable (column) in a prior or later observation. The SAS system provides the functions LAG and DIFF to return the value of the variable in the prior observation or the difference between the current value of the variable, in the Program Data Vector (PDV), and the prior value. LAGn and DIFFn refer to the nth prior value, where 1<=n<=100 and must not refer to before the first observation (_N_=1) in the dataset. However, there is no corresponding LEAD function which looks at values in observations still to be read into the PDV. This paper demonstrates three different methods of implementing a LEAD function functionality. The modus operandi of each method is illustrated with examples of SAS code and the advantages and disadvantages of each method are discussed, as is the suitability of each method for specific types of programming situations.
QT-289 : Highlight changes: An extension to PROC COMPARE
Abhinav Srivastva, Gilead Sciences
Although version control on the files, datasets or any document can be challenging, COMPARE® Procedure provides an easy way to compare two files and indicate the differences between them. The paper utilizes the comparison results from PROC COMPARE® and builds it into a SAS® macro to highlight changes between files in terms of addition, deletion or an update to a record in a convenient excel format. Some common examples where this utility can be useful is comparing CDISC Controlled Terminology (CT) release versions, comparing Medical dictionary versions like MedDRA, WHODrug, or comparing certain Case Report Form (CRF) data like Adverse Events (AE) to review new events being reported at various timepoints for data monitoring purposes.
QT-313 : PROC REPORT – Land of the Missing OBS Column
Ray Pass, Forma Therapeutics
So whatever happened to the OBS column in PROC REPORT, the one that you get for free in PROC PRINT? Well, stop looking for the option to turn it on because it’s not there. The missing column can however be generated pretty painlessly, and learning how to do so also serves to teach you about an important distinction between two very basic types of variables used in PROC REPORT, namely “report variables” and “DATA step variables”. It’ll only take a few lines of code and a few minutes of time, so pay attention and don’t blink.
QT-315 : A SAS macro for tracking the Status of Table, Figure and Listing (TFL) Programming
Yuping Wu, PRA Health Science
Sayeed Nadim, PRA Health Science
A clinical study report (CSR) typically involves in generation of hundreds of TFLs. To efficiently manage such large batch of outputs, Lead programmers and managers often need to know the current status of each programs and outputs. Many organizations may use an Excel based tracking document where production and validation programmers can enter the TFL status. However, for a CSR delivery that usually takes several months to prepare, such document is often out of status due to ADaM updating, miscommunication between production and validation programmers, etc.. This paper introduces a tool that can efficiently monitor the current status of the TFL outputs for their logs, PROC COMPARE, dataset and output creation.
QT-349 : Macro for controlling Page Break options for Summarizing Data using Proc Report
Sachin Aggarwal, Rang Technologies
Sapan Shah, Rang Technologies
In Statistical Programming when Statistical Programmers use Proc Report to generate various Summary Tables and Listings, they face a number of challenges related to Summary reports. Given below are some of these challenges: 1.) Printing certain number of non-missing lines on one page 2.) New parameter coming at the end of a page 3.) Starting different parameters on different pages 4.) Combining the groups and sub-groups on one page 5.) Continuation of the main group heading in case the splitting of the main group and subgroup is unavoidable, example- adverse events 6.) Addition of suffix in case of the group splitting to a different page This paper will present Page Break Macro, which will provide a solution to the above-mentioned challenges faced by the Statistical Programmers while generating Summary reports. This macro utilizes basic programming logic to generate the SAS dataset. These output SAS datasets are then used in Proc Report to generate Summary reports as per the Clinical Study requirement. When this Macro is used in combination with Existing Reporting Macros in the Pharmaceutical companies will be quite helpful in cutting down the programming time wasted in the formatting of Tables and Listings.
Real World Evidence and Big DataRW-053 : NHANES Dietary Supplement component: a parallel programming project
Jayanth Iyengar, Data Systems Consultants LLC
The National Health and Nutrition Examination Survey (NHANES) contains many sections and components which report on and assess the nation's health status. A team of IT specialists and computer systems analysts handle data processing, quality control, and quality assurance for the survey. The most complex section of NHANES is dietary supplements, from which five publicly released data sets are derived. Because of its complexity, the Dietary Supplements section is assigned to two SAS programmers who are responsible for completing the project independently. This paper reviews the process for producing the Dietary Supplements section of NHANES, a parallel programming project, conducted by the National Center for Health Statistics, a center of the Centers for Disease Control (CDC)
RW-113 : Better to Be Mocked Than Half-Cocked: Data Mocking Methods to Support Functional and Performance Testing of SAS® Software
Troy Hughes, Datmesis Analytics
Data mocking refers to the practice of manufacturing data that can be used in software functional and performance testing, including both load testing and stress testing. Mocked data are not production or “real” data, in that they do not abstract some real-world construct, but are considered to be sufficiently similar (to production data) to demonstrate how software would function and perform in a production environment. Data mocking is commonly employed during software development and testing phases and is especially useful where production data may be sensitive or where it may be infeasible to import production data into a non-production environment. This text introduces the MOCKDATA SAS® macro, which creates mock data sets and/or text files for which SAS practitioners can vary (through parameterization) the number of observations, number of unique observations, randomization of observation order, number of character variables, length of character variables, number of numeric variables, highest numeric value, percentage of variables that have data, and whether character and/or numeric index variables (which cannot be missing) exist. An example implements MOCKDATA to compare the input/output (I/O) processing performance of SAS data sets and flat files, demonstrating the clear performance advantages of processing SAS data sets in lieu of text files.
RW-192 : Natural History Study – A Gateway to Treat Rare Disease
Tabassum Ambia, Alnylam Pharmaceuticals, Inc
Natural History Study is a study that follows a group of people over time who have, or are at risk of developing, a specific medical condition or disease. Natural history study bears significant importance in the discovery, marketing and post-marketing phases of a drug. There are different types of natural history studies which may help to determine the requirement of the adequate treatment in a target population or to assess the outcome of a treatment in real life. Real world evidence (RWE) data are the primary source of health information to be used in a natural history study. Statistical analysis is focused on both incidence and prevalence of the disease involving different procedures to determine the distribution of characteristics or events over a certain time and correlation with covariates. A rare disease is the one which affects a very small percentage of population. Recently FDA has emphasized the importance of natural history study for the development of orphan drugs for rare diseases. Natural History Studies can also help the research to treat non-rare disease. This paper will discuss the basics of natural history study in line with rare disease and orphan drugs, real-world evidence data for natural history study, types of natural history study with analysis techniques and limitation, difficulties during the development of orphan drugs which can be mitigated through natural history studies and FDA recommendation for using natural history studies in RCTs to develop the treatments of rare disease.
Software Demonstrations (Tutorials)SD-281 : A Single, Centralized, Biometrics Team Focused Collaboration System for Analysis Projects
Chris Hardwick, Zeroarc
Justin Slattery, Zeroarc
Hans Gutknecht, Zeroarc
Statisticians and programmers work in concert with each other to complete analysis projects and typically resort to using Excel and email to store metadata, track project milestones and collaborate with each other. See how using a single, centralized system to store TFL/Dataset metadata, communicate key analysis project milestones, report on QC efforts, conduct collaborative TFL reviews, and track TFL change requests can dramatically reduce data entry and reap big productivity benefits.
Statistics and AnalyticsSA-013 : Diaries and Questionnaires: Challenges and Solutions
Marina Komaroff, Noven Pharmaceuticals
Sandeep Byreddy, Noven Pharmaceuticals, Inc.
In PharmaSUG2019 conference opening session, there was a question about the most unfavorable data set for programmers and statisticians to work with. Diaries and questionnaires (QS) was named among the first five! The rationale was the complexity of QS: too many items to work with, and hard to compare responses across the time points, within and between subjects. Clustering and/or categorization of diaries’ items using clinical judgement is known approach that helps with analyses. However, to compare the responses of the questions across multiple time points still requires deep understanding of the research question and strong programming skills. The goal of this paper is to convert diaries-haters to diaries-lovers and explain how appropriate algorithm should be developed and programmed. As example, the research question was to find a fraud in filling up the diaries and check out if subjects repeat the same responses (possibly randomly changing a couple of points) across different time points of the study. The authors suggest an algorithm and provide SAS® Macro to answer this research question; yet, this program can be easily adapted for other needs.
SA-034 : A Doctor's Dilemma: How Propensity Scores Can Help Control For Selection Bias in Medical Education
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
An important strength of observational studies is the ability to estimate a key behavior or treatment’s effect on a specific health outcome. This is a crucial strength as most health outcomes research studies are unable to use experimental designs due to ethical and other constraints. Keeping this in mind, one drawback of observational studies (that experimental studies naturally control for) is that they lack the ability to randomize their participants into treatment groups. This can result in the unwanted inclusion of a selection bias. One way to adjust for a selection bias is through the utilization of a propensity score analysis. In this paper we explore an example of how to utilize these types of analyses. In order to demonstrate this technique, we will seek to explore whether clerkship order has an effect on NBME and USMLE exam scores for 3rd year military medical students. In order to conduct this analysis, a selection bias was identified and adjustment was sought through three common forms of propensity scoring: stratification, matching, and regression adjustment. Each form is separately conducted, reviewed, and assessed as to its effectiveness in improving the model. Data for this study was gathered between the years of 2014 and 2019 from students attending USUHS. This presentation is designed for any level of statistician, SAS® programmer, or data scientist/analyst with an interest in controlling for selection bias.
SA-051 : Calculation of Cochran–Mantel–Haenszel Statistics for Objective Response and Clinical Benefit Rates and the Effects of Stratification Factors
Girish Kankipati, Seattle Genetics Inc
Chia-Ling Ally Wu, Seattle Genetics
In oncology clinical trials, primary and secondary endpoints are analyzed using different statistical models based on the study design. Objective response rate (ORR) and clinical benefit rate (CBR) are commonly used as key endpoints in oncology studies, in addition to overall survival (OS) and progression-free survival (PFS). The use of ORR and CBR as an endpoint in these trials is widespread as objective response to therapy is usually an early indication of treatment activity and it can be assessed in smaller samples compared to OS; furthermore, FDA considers ORR and CBR as clinical and surrogate endpoints in traditional and accelerated approvals. Bringing new therapies to market based on ORR and CBR requires specialized statistical methodology that not only accurately analyzes these key endpoints but can also accommodate stratified study designs aimed at controlling for confounding factors. The Cochran-Mantel-Haenszel (CMH) test provides a solution to address these many needs. This paper introduces CMH test concepts, describes how to interpret its statistics, and shares insights into SAS® procedure settings to use it correctly. The calculation of ORR and CBR with 95% confidence intervals using the Clopper-Pearson method and relative risk and strata-adjusted p-values using the CMH test are discussed with sample data and example table shells, along with examples of how to use the FREQ procedure to calculate these values.
SA-072 : Assigning agents to districts under multiple constraints using PROC CLP
Stephen Sloan, Accenture
Kevin Gillette, Accenture Federal Services
The Challenge: assigning outbound calling agents in a telemarketing campaign to geographic districts. The districts have a variable number of leads and each agent needs to be assigned entire districts with the total number of leads being as close as possible to a specified number for each of the agents (usually, but not always, an equal number). In addition, there are constraints concerning the distribution of assigned districts across time zones, in order to maximize productivity and availability. Our Solution: uses the SAS/OR ® procedure PROC CLP to formulate the challenge as a constraint satisfaction problem (CSP), since the objective is not necessarily to minimize a cost function, but rather to find a feasible solution to the constraint set. The input consists of the number of agents, the number of districts, the number of leads in each district, the desired number of leads per agent, the amount by which the actual number of leads can differ from the desired number, and the time zone for each district.
SA-103 : Risk-based and Exposure-based Adjusted Safety Incidence Rates
Qiuhong Jia, Seattle Genetics
Fang-Ting Kuo, Seattle Genetics
Chia-Ling Ally Wu, Seattle Genetics
Ping Xu, Seattle Genetics
In clinical trials, safety event incidences are summarized to help analyze the safety profile of investigational drugs. The most common and straightforward method is the crude rate, which is the total number of subjects with at least one event of interest within a given population. However, if the average duration of exposure differs significantly between treatment groups within a trial or between trials included in an analysis due to differential drop-out rates or study design, such incidence rates may need statistical adjustment to make the comparison meaningful. The analysis of exposure-adjusted incidence rates is often found useful in such cases. This paper introduces a simplified exposure-adjusted rate where sum of treatment duration of a population is used as the denominator, as well as a time-at-risk adjusted rate where sum of person-time at risk for each event of interest is used as the denominator. Person-time at risk for each subject is usually defined as the time from treatment start to the first onset of an event or to the end of follow-up if the event does not occur. Step-by-step SAS® code to derive these adjusted rates is examined using hypothetical adverse event data, and the statistical implications are discussed in detail with comparison to the crude incidence rates. Further examples of exposure-adjusted analysis considerations such as presenting results (e.g., exposure-adjusted rate differences) in forest plots with confidence intervals will also be demonstrated.
SA-112 : Should I Wear Pants? And Where Should I Travel in the Portuguese Expanse? Automating Business Rules and Decision Rules Through Reusable Decision Table Data Structures
Troy Hughes, Datmesis Analytics
Louise Hadden, Abt Associates Inc.
Decision tables operationalize one or more contingencies and the respective actions that should be taken when contingencies are true. Decision tables capture conditional logic in dynamic control tables rather than hardcoded programs, facilitating maintenance and modification of the business rules and decision rules they contain—without the necessity to modify the underlying code (that interprets and operationalizes the decision tables). This text introduces a flexible, data-driven SAS® macro that ingests decision tables—maintained as comma-separated values (CSV) files—into SAS to dynamically write conditional logic statements that can subsequently be applied to SAS data sets. This metaprogramming technique relies on SAS temporary arrays that can accommodate limitless contingency groups and contingencies of any content. To illustrate the extreme adaptability and reusability of the software solution, several decision tables are demonstrated, including those that separately answer the questions Should I wear pants and Where should I travel in the Portuguese expanse? The DECISION_TABLE SAS macro is included and is adapted from the author’s text: SAS® Data-Driven Development: From Abstract Design to Dynamic Functionality.
SA-149 : Sample size and HR confidence interval estimation through simulation of censored survival data controlling for censoring rate
Giulia Tonini, Menarini Ricerche
Letizia Nidiaci, Menarini Ricerche
Simona Scartoni, Menarini Ricerche
Time to event variables are commonly used in clinical trials where survival data are collected. When planning a trial and calculating the sample size, it is important to estimate not only the expected sample size but also the minimum value of the Hazard Ratio for which the trial can be considered successful (i.e. accepting the alternative hypothesis). Calculating the confidence interval for the HR when the difference of treatments is statistically significant, can be performed simulating censored survival data. Simulating censoring can be tricky, especially when the trial has not a fixed duration or when a high drop-out rate is expected. This is often the case in Oncology trials where patients exit the study prematurely (e.g. in case of AE or disease progression),. In this work we present a case study where a dataset is simulated controlling for the resulting censoring rate. In particular we simulate data from an oncology trial comparing two treatments. We give an example of how to realize that simulation in SAS. We compare results coming from different assumptions regarding drop-out rate and methods to model censoring. We investigate the effect of the censoring rate on potential bias in estimating the treatment effect using a Cox model.
SA-207 : A Brief Introduction to Performing Statistical Analysis in SAS, R & Python
Erica Goodrich, Brigham and Women's Hospital
Daniel Sturgeon, Priority Health
Statisticians and data scientists may utilize a variety of programs available to them to solve specific analytical questions at hand. Popular programs include commercial products like SAS and open source products including R and Python. Reasons why a user may want to use differing programs will be discussed. This presentation aims to present a brief primer into the coding and output provided within these programs to preform data exploration and commonly used statistical models in a healthcare or clinical space.
SA-225 : Principal Stratum strategy for handle intercurrent events: a causal estimand to avoid biased estimates
Andrea Nizzardo, Menarini Ricerche
Giovanni Marino Merlo, Menarini Ricerche
Simona Scartoni, Menarini Ricerche
The ICH E9 (R1) addendum “Estimands and Sensitivity Analysis in Clinical Trial” emphasizes the necessity to quantify better treatment effects addressing the occurrence of intercurrent events that could lead to ambiguity in the estimates. One approach proposed in the addendum is the “Principal Stratum strategy” where the target population for the analysis is a sub-population composed of patients free from intercurrent events. The main concern is that it is not possible to identify the stratum in advance and the analysis is then not causal and liable to confounding. Moreover, the occurrence of intercurrent events is not predictable, and each subject is observed on one treatment only and could experience different intercurrent events on different treatments. Also the FDA Missing Data Working Group recommends the use of a causal estimand for the evaluation of primary interest endpoints. A causal estimand based on principal stratum and a relative tipping point sensitive analysis is then proposed for a clinical superiority study. A case study is showed. An evaluation by simulations of the power obtained with this approach is also presented comparing the ideal situation when all patients are free from intercurrent events and scenarios where different percentages of patients with intercurrent events are supposed.
SA-262 : Using SAS Simulations to determine appropriate Block Size for Subject Randomization Lists
Kevin Venner, Almac Clinical Technologies
Jennifer Ross, Almac Clinical Technologies
Kyle Huber, Almac Clinical Technologies
Noelle Sassany, Almac Clinical Technologies
Graham Nicholls, Almac Clinical Technologies
ICH E9 (Regulatory) guideline specifies that a minimum block size should not be used for the generation of Randomization lists to avoid predictability/selection bias and to avoid full or partial unblinding to treatment assignment. Additionally, knowledge of the block size used within the Randomization list should be restricted (typically known by Sponsor Biostatistician only). However, in reality Sponsor Biostatisticians repeatedly utilize the same (default) block size for list generation (e.g. always use a block size of 4 for x2 treatments with equal allocation ratio) and/or utilize the minimum block size. Thus, Sponsor (and Site) personnel are not really ‘blinded’ to the utilized block size, potentially compromising the entire study’s validity. This paper will illustrate how SAS is an effective simulation tool to provide evidence that different block size designs/parameters can yield acceptable balance without having to use the minimum or most obvious/default block size. Case study examples will demonstrate how simulations can evaluate expected treatment balance for alternative block size designs. Through use of SAS macro programming, different randomization block size designs can be efficiently simulated though minor macro updates, allowing for quick delivery of statistically robust results. While Treatment balance for a clinical trial can be critical for establishing effectiveness, perfect Treatment balance is not required. The goal of randomization is to maintain the integrity of the study (keeping study blind, avoiding bias) and to achieve acceptable balance. Simulation results can provide data-driven results to show that acceptable balance can be achieved with alternative block size designs.
SA-284 : Implementing Quality Tolerance Limits at a Large Pharmaceutical Company
Steven Gilbert, Pfizer
Predefined quality tolerance limits (QTLs) were introduced in the revised ICH E6 (R2) Section 5 update to help identify systematic issues that can impact subject safety or reliability of trial results. This paper will focus on Pfizer’s implementation of this requirement. The key focus will concern the approach with respect to loss of evaluable subjects, patient discontinuation and inclusion/exclusion errors, that is easily measurable attribute data. We discuss a team approach in setting tolerance limit as well as lessons learned in monitoring the progress and the important role of simulations in defining best practices for monitoring trials. Examples of simulation methods, signal detection through control charts suitable for short-run attribute data such as variable life adjusted displays and other graphical methods will be demonstrated along with example code. We reflect on preferred methodology and challenges inherit in various clinical trial designs ending with a look at future work needed to maximize the use of QTLs in mitigating trial risk and ensuring the integrity of published results.
Strategic ImplementationSI-170 : Moving A Hybrid Organization Towards CDISC Standardization
Kobie O'Brian, SCHARP Fred Hutchinson
Sara Shoemaker, SCHARP Fred Hutchinson
Robert Kleemann, SCHARP Fred Hutchinson
Kate Ostbye, SCHARP Fred Hutchinson
This paper discusses the experience of implementing standardization of data collection and data set development of submission-ready data sets at a unique organization at the intersection of Academia and Industry Partners. SCHARP (Statistical Center for HIV/AIDS Research and Prevention) at Fred Hutchinson is an academic center with a nonprofit business model. It is in a unique position requiring a balance of standard regulatory reporting requirements as well as specific sponsor needs with stakeholders including the National Institutes of Health (NIH), academic centers, nonprofit foundations, and pharmaceutical manufacturers. This requires a tailored approach using Clinical Data Interchange Standards Consortium (CDISC) standards for collecting, submitting, and analyzing data across the organization. Governance of the different CDISC implementation strategies for organization-wide data collection, storage, and analysis is discussed as well.
SI-173 : PROC Future Proof;
Amy Gillespie, Merck & Co., Inc.
Susan Kramlik, Merck & Co., Inc.
Suhas Sanjee, Merck & Co., Inc.
Clinical trial programmers are key contributors to regulatory submissions, manuscripts, and statistical analyses. They operationalize analysis plans by creating high-quality, innovative and compliant reporting deliverables to address stakeholders’ needs. Clinical trial programmers are experts in authoring programming code and developing or leveraging programming standards to produce deliverables in a validated, efficient, and reproducible manner. However, the job role and work process for clinical trial programmers have stayed relatively constant in the past decades. This paper evaluates recent advances in technology and the skillsets of clinical trial programmers to identify opportunities for improved compliance and work efficiency while ultimately optimizing the programming function and potentially transforming the clinical trial programming role for continued success. Use cases leveraging natural language processing (NLP) and linked data will be explored to evaluate whether digital solutions are applicable within clinical trial programming processes. The use of different software tools and methods will also be evaluated. We expect this paper to be the first of a series of publications on this topic.
SI-206 : Assessing Performance of Risk-based Testing
Amber Randall, SCHARP - Fred Hutch Cancer Research Center
Bill Coar, Axio Research
Trends in the regulatory landscape point to risk-based approaches to ensure high quality data and reporting for clinical trials. Risk-based methods for validation of production programming code which assign testing methods of varying robustness based on an assessment of risk have been evaluated and accepted by some industry leaders, yet they have not been fully adopted. Some view risk-based testing as simply an attempt to save money or compensate for limited resources while claiming a minimal impact on overall quality. While that may sometimes be the case, the intent should rather be to focus finite resources on what matters most. Even with the robust gold standard of full independent reproduction, mistakes still happen. Humans make errors. Therefore, risk and consequence should be considered in choosing verification methods with a resource emphasis on those areas with greatest impact. However, the assessment of these decisions must be regularly and consistently evaluated to ensure that they are appropriate and effective. Metrics both within and across projects can be implemented to aid in this evaluation. They can report the incidence, type, and method of identification of issues found at various timepoints including internally prior to the completion of output verification, internally during final package review, and during external review. These data are crucial for the effective evaluation of the performance of risk-based testing methods and decisions.
SI-241 : You down to QC? Yeah, You know me!
Vaughn Eason, Catalyst Clinical Research, LLC
Jake Gallagher, Catalyst Clinical Research, LLC
The successful delivery of clinical trial-related analysis datasets and outputs are heavily dependent on an efficient and fluid relationship between the Production Programmer, QC Programmer, and Lead Statistician. Given the increasing complexity and rapidity of project delivery, coupled with multiple regulatory standards that must be adhered to; a sound strategy and clear model of communication are paramount to a study’s overall quality. Understanding the pressure and stress associated with project deadlines and nuanced sponsor requirements will help navigate communication. Further understanding of each role’s subjective nature will help outline a succinct archetype to achieving high-quality results with minimal headaches. The three main barriers of communication between Production Programmer, QC Programmer, and Lead Statistician are vague understanding of client expectations, overall study comprehension and the rarely considered, Ego factor. Tackling these issues can seem unattainable, however, we have constructed a roadmap to alleviate the friction that may occur during the delivery and quality assurance process. Some of these points include thorough ongoing communication with study leadership, purposeful and engaging meetings to eliminate opacity of overall objectives between team members, and finally, an effective way to universally communicate between any type of team member you may encounter.
Submission StandardsSS-030 : End-to-end Prostate-Specific Antigen (PSA) Analysis in Clinical Trials: From Mock-ups to ADPSA
Joy Zeng, Pfizer
Varaprasad Ilapogu, Ephicacy Consultancy Group
Xinping Cindy Wu, Pfizer
Prostate-specific antigen (PSA) level is a key biomarker in prostate cancer that has been used in standard guidelines as a measurement of clinical outcomes for patients with prostate cancer. This paper aims to provide an end-to-end overview of the programming aspects of PSA-related trials. We describe the concepts of PSA response and time to PSA progression, two important end points in assessing efficacy of prostate cancer trials, along with the statistical methods involved in estimating the distribution of time to PSA progression. The paper also addresses the design of metadata from PSA-related mock-up tables and presents the considerations involved in the creation of CDISC-compliant ADPSA dataset based on the metadata. Programming in the oncology therapeutic area is highly specialized and we hope this paper serves as a one-stop shop for providing the necessary tools to navigate through it.
SS-045 : Updates in SDTM IG V3.3: What Belongs Where – Practical Examples
Lucas Du, Vertex Pharmaceuticals INC
William Paget, Vertex Pharmaceuticals INC
Lingyun Chen, Vertex Pharmaceuticals INC
Todd Case, Vertex Pharmaceuticals Inc
CDSIC SDTM Implementation Guide (IG) Version 3.3 was released on 11/20/2018. New domains and implementation rules have been added to standardize SDTM implementation within the industry. Comparing to version 3.2, a lot of information was updated during the 5 years between releases. It also brings a great challenge for people working in Pharma/Biotech to figure out all the details. For example, what are the new domains and how should we use the new domains. furthermore, Same information may map to different domains due to the purpose, how should we decide which domain the information should go to. Also, In the Trial Summary (TS), Comments (CO), Trial Inclusion/Exclusion Criteria (TI) or other general observational class, the variables with any context with more than 200 characters will be mapped to the corresponding SUPP domain. But the label for variables in SUPP domain varies within the industry. In addition, it’s not clear how to populate the EPOCH variables in event, finding and interventions domains or how to deal with subjects in DM domain who are randomized but never dosed. In this paper, updates will be highlighted, and examples will be provided.
SS-054 : RTOR: Our Side of the Story
Shefalica Chand, Seattle Genetics, Inc.
Eric Song, Seattle Genetics, Inc.
FDA’s Real-Time Oncology Review (RTOR) pilot program was initially introduced in June 2018 for supplemental New Drug Applications (NDA) and supplemental Biologics License Applications (BLA) and more recently extended to original NDAs and BLAs. This presents a new ray of hope for cancer patients, as the program aims to expedite review of oncological submissions with improved efficiency and quality by allowing FDA earlier access to clinical safety and efficacy data and results, especially those related to Biometrics. This in turn may help expedite availability of novel treatments to cancer patients. Seattle Genetics participated in the RTOR pilot program for a supplemental BLA for ADCETRIS ECHELON-2 in CD30-expressing PTCL, which received approval in an unprecedented 11 days from sBLA submission. Against the backdrop of our positive RTOR experience, this paper will provide a background of the program, its eligibility criteria, and its success so far. We will give you insights into: • How and why a submission can be accepted into this program • FDA’s RTOR expectations and how they evolved since our sBLA to now • Effective communication and collaboration within our organization and with FDA • Seamless preparation and planning to enable rapid submission and review • Post-submission activities and efficient handling of regulatory questions • The pivotal role of Statistical Programming and best practices towards perpetual submission-readiness We are excited to share our story as well as insights into more recent RTOR developments to help colleagues in industry be optimally prepared to get drugs to cancer patients faster!
SS-081 : Why Are There So Many ADaM Documents, and How Do I Know Which to Use?
Sandra Minjoe, PRA Health Sciences
As of this writing, the CDISC website has the following ADaM documents for download: a model document, three versions of the implementation guide, an adverse event data structure, an occurrence data structure, a time-to-event document, a document with examples in commonly used statistical analysis methods, an analysis result metadata document, conformance rules, and an important considerations document. Additionally, you can download three release packages, each containing a subset of these documents. This paper describes why there are so many documents, walks through basic information contained in each, and makes recommendations of which set of documents to use in which circumstances.
SS-097 : Data Review: What’s Not Included in Pinnacle 21?
Jinit Mistry, Seattle Genetics
Lyma Faroz, Seattle Genetics
Hao Meng, Seattle Genetics Inc.
Many pharmaceutical and biotechnology companies outsource statistical programming activities and submission package preparation to CROs. Still, for all programming deliverables the sponsor remains responsible for quality, completeness, and compliance to published standards and regulatory guidance. This makes it critical for the sponsor to implement efficient vendor oversight that touches on sufficient detail to ensure quality of the product provided by the CRO. Sponsors are widely using the Pinnacle 21 toolset to ensure SDTM and ADaM compliance with CDISC guidance. However, by itself this is not sufficient, and additional review of how the CRO adopted CDISC implementation guides to assign or derive variables in alignment with study design, protocol, and SAP need to be conducted beyond Pinnacle 21 reports. For example, Pinnacle 21 checks whether variable values are present and runs several logic and interdependency checks, but it doesn’t validate the correctness and accuracy of such values in relation to study documents and other specifications or constraints. This paper will share various CDISC data validation checks that can be performed outside of Pinnacle 21 to significantly heighten the quality of any submission and help mitigate review questions and technical rejections.
SS-140 : Pinnacle 21 Community v3.0 - A Users Perspective
Ajay Gupta, PPD Inc
Pinnacle 21, also previously known as OpenCDISC Validator, provides great compliance checks against CDISC outputs like SDTM, ADaM, SEND and Define.xml. This validation tool provides a report in Excel or CSV format which contains information categorized as errors, warnings, and notices. In May2019, Pinnacle 21 team had released Community v3.0. This paper will provide an overview of all major updates in Pinnacle 21 community v3.0 e.g. new validations checks, ADaM IG v1.1 support, latest Controlled Terminology support. Later, this paper will cover the SNOMED, MedDRA dictionaries installation process which is no longer supported by Pinnacle 21.
SS-150 : Challenges and solutions for e-data submission to PMDA even after submission to FDA
Akari Kamitani, Shionogi
HyeonJeong An, Shionogi Inc.
Yura Suzuki, Shionogi & Co., Ltd.
Malla Reddy Boda, Shionogi Inc.
Yoshitake Kitanishi, Shionogi & Co., Ltd.
The data submission at NDA to FDA has already been mandatory. That of application to PMDA has been also mandatory after April 2020. Required data to be submitted are CDISC compliant. As a result, we tend to be considered that submission to PMDA is easy after we submit e-data to FDA. However, this is not true in fact. Our company (SHIONOGI) has headquarters in Japan, and group companies are in US and Europe and promote the drug development globally. Therefore, it is assumed that we will submit to PMDA after we submit to FDA. In this paper, we verify differences by taking the specific submission as an example under such situation. Specifically, there were differences in validation rules, target dates, document for consultation, and so on. Of course, utilizing this verification, we aim to apply for data efficiently, regardless of whether the application is earlier to FDA or PMDA. For this reason, during developing deliverables for each clinical trial, it is necessary to get closer to preparing one package that meets the rules of both authorities. Intended Audience: Anyone in the industry who is interested in data package preparation for PMDA and FDA
SS-156 : Analysis Package e-Submission – Planning and Execution
Abhilash Chimbirithy, Merck & Co.
Saigovind Chenna, Merck & Co., Inc
Majdoub Haloui, Merck & Co. Inc.
Analysis package is one of the e-Submission components submitted to regulatory agencies as part of INDs, NDAs, ANDAs, BLAs and sBLAs. Analysis package contains the study analysis data and related files following a standardized electronic format. Proper planning and resources are required to create this package which has several individual components such as datasets in XPT format, define, analysis results metadata (ARM), data reviewers guide, and programs. These components are separate deliverables but interrelated. With frequently identified challenges, questions and issues from previous studies, we have provided guidance related to planning, setting deliverables timelines, identifying team members responsibilities and ensuring regulatory required data standards compliance for the components submitted. In this paper, we will present details for best practices, proper planning and checklists that can help teams efficiently create an analysis package. It also highlights ways to achieve effective cross-functional collaboration and consistently meet regulatory compliance.
SS-159 : Automating CRF Annotations using Python
Hema Muthukumar, Statistical Center For HIV/AIDS Research and Prevention(SCHARP) at Fred Hutch
Kobie O'Brian, SCHARP, Fred Hutch
When data are submitted to the FDA, an Annotated Case Report Form (aCRF) is to be provided as a PDF document, to help reviewers find the origin of data included in submitted datasets. These annotations should be simple, clean, and should respect appearance and format (color, font) recommendations. aCRFs are traditionally done manually. This involves using a text editor in PDF and working variable by variable across many pages. This is a time-consuming process that can take many hours. In addition, maintaining consistency across pages requires substantial effort. This paper talks about an effective way to automate the entire aCRF process using Python. This approach automatically annotates the variables on the CRF next to their related questions on the appropriate pages. In this method, we use the following: a Study Design Specification which is an excel sheet of the study details as built by an Electronic Data Capture (EDC) system; an SDTM mapping specification, which is also an excel sheet; and the study case report form in PDF format. The output for this method is an FDF file, which is used to automatically create the final aCRF. This method significantly reduces the time and effort required to create aCRF while eliminating inconsistent annotations. This method is very useful since it is flexible and can be implemented to annotate CRFs for different types of trials and organizations.
SS-197 : Preparing a Successful BIMO Data Package
Elizabeth Li, PharmaStat, LLC
Carl Chesbrough, PharmaStat, LLC
Inka Leprince, PharmaStat, LLC
In order to shorten the time for regulatory review of a new drug application (NDA) or biologic license application (BLA), more and more biotech and pharmaceutical companies prepare their Bioresearch Monitoring Program (BIMO) packages as part of their initial submissions. In this paper, we walk the reader through a process of producing BIMO information, particularly the subject-level data line listings by clinical site (by-site listings) and the summary-level clinical site (CLINSITE) dataset. This paper concludes with methods of preparing electronic Common Technical Document (eCTD) documentation, such as data definition (define.xml) and the reviewer’s guide, to support the CLINSITE dataset. In addition, we discuss challenges as we share our experience in planning, producing, and quality control (QC) for a successful BIMO package.
SS-317 : Improving the Quality of Define.xml: A Comprehensive Checklist Before Submission
Ji Qi, BioPier Inc.
Yan Li, BioPier Inc.
Lixin Gao, BioPier Inc.
The define.xml is the cover letter in Module 5 of the electronic Common Technical Document (eCTD) submission to U.S. Food and DrugAdministration (FDA) which provides a high-level summary of the metadata for all the data submitted. A functioning, complete and informative define.xml is required by FDA regulation. A high-quality define.xml will not only aid in the ease of FDA review but also convey the attentiveness of the sponsor’s work attitude to the reviewers and augment their trust in the results. In this paper, we will provide a list of details to check and fix before submission of thedefine.xml, focusing on those for clinical tabulation data (SDTM) and analysis data (ADaM), based on a review of common issues reported in papers as well as our own experience generating define.xml for clients.
SS-325 : Getting It Right: Refinement of SEND Validation Rules
Kristin Kelly, Pinnacle 21
The CDISC SDTM metadata, outlined in the SDTM Model, are used for submission of data from both clinical trials and nonclinical studies. Until recently, many of the Pinnacle 21 validation rules were assigned for both SDTM and SEND domains when in some cases, a specific rule did not apply for SEND data as outlined in the SENDIG. Over the past year, the SEND rule set has been refined through the modification of existing rules, removal of others and creation of new rules. All rules are based on either an FDA Business rule, an FDA Validator rule, or CDISC rules. This paper will discuss some of the changes that have been made in an effort to ‘get the rules right’ for SEND.
ePostersEP-064 : A Guide for the Guides: Implementing SDTM and ADaM standards for parallel and crossover studies
Azia Tariq, GlaxoSmithKline
Janaki Chintapalli, GlaxoSmithKline
In clinical studies, dataset structures are heavily impacted by the study design and how treatment groups are compared. The two most common study designs used in clinical research are Parallel and Crossover. In a parallel study, participants are randomly assigned to a single treatment. Each treatment can include a placebo, a specific dose of the drug being investigated or a standard-of-care treatment. Crossover study design, on the other hand, randomly assigns participants to a specified sequence of treatments. When one treatment is completed, the subject will then “crossover" to another treatment during the course of the trial, resulting in each subject acting as its own control group. Typically, all subjects will receive the same number of treatments and be involved in the same number of periods. This means that even if participants are initially put into a placebo group, they will also eventually receive the study drug or standard-of-care during the trial. Usually, a cross-over study also includes a washout period which enables the effects of the preceding treatments to dissipate and eliminate any carry-over effect. The washout period is a predetermined amount of time during which patients receive no treatment. Parallel studies are straightforward when assigning treatments and deriving other analysis variables. Crossover studies require some additional work when creating treatment variables and other analysis variables. This paper will examine both study designs and explain how CDISC implementation will be different in parallel and crossover studies.
EP-099 : Color Data Listings and Color Patient Profiles
Charley Wu, Atara Biotherapeutics
During clinical trials, there are frequent data datacuts for safety data review, interim data analysis, conference presentations, CSR, etc. Medical Monitors, Statisticians, Clinical Data Managers, Pharmacovigilance usually need to review the data carefully to ensure data accuracy and integrity. These functions frequently complain that they have already reviewed the same data many times before, and they don’t like to review the same data repeatedly. They would rather pay more attention to new and updated data. However, most of data listings/patient profiles/reports cannot tell what are new data and what are old data. To solve this issue, we developed color data listings and color patient profiles. The idea is that we can set the first datacut as benchmark, all future data changes are then highlighted with different colors. For example, update data are colored yellow. New data are colored green. Deleted data are colored grey. Unchanged data are not colored. By doing so, reviewers can easily identify any new, updated, or deleted data since last datacut. Though we just implemented this in regular data listings and patient profiles, we already got very positive feedback from data reviewers. It usually takes them 1-2 weeks to finish reviewing all data listings and patient profiles. They can now finish data review in couple of days. It also makes data review an enjoyable process as colored data changes pop up to reviewer’s eyes. Please see sample output attached.
EP-172 : TDF – Overview and Status of the Test Data Factory Project, Standard Analyses & Code Sharing Working Group
Nancy Brucken, Clinical Solutions Group
Peter Schaefer, VCA-Plus, Inc.
Dante Di Tommaso, Omeros
Test Data Factory is one of six projects within PhUSE’s Standard Analyses and Code Sharing Working Group. Suitable test data are an essential part of software development and testing. The objective of the TDF Project is to provide up-to-date CDISC-compliant data sets to empower statistical programmers and software developers. Users should be able to customize fundamental aspects of test databases. The TDF Project team have published two data packages based on SDTM and ADaM data sets that CDISC published in a pilot. Now the TDF team have begun to implement SAS and R code to simulate a clinical trial database based on user configuration. PhUSE is a volunteer organization that relies on community contribution to progress initiatives such as TDF. This poster and paper inform the community of TDF history, current activities and future plans, and have the secondary intent of inspiring community members to join our efforts and to contribute their expertise.
EP-174 : Standard Analyses and Code Sharing Working Group Update
Nancy Brucken, Clinical Solutions Group, Inc.
Dante Di Tommaso, Omeros
Jane Marrer, Merck
Mary Nilsson, Eli Lilly and Company
Jared Slain, MPI Research
Hanming Tu, Frontage
This paper updates the community on the efforts of the six project teams in the PHUSE Standard Analyses and Code Sharing Working Group. The Working Group publishes recommended analyses of clinical data suitable across therapeutic areas. These publications include presentations (tables, listings and figures) of the results from those analyses. The Working Group's GitHub repository contains a wealth of scripts that have been written by PHUSE members, or developed and contributed by the FDA and other organizations. The collaborative efforts of this group improve our collective efforts to design and implement transparent and robust analyses of our clinical data for regulatory decision making. Crowd-sourcing code development of these recommended analyses can promote access to and adoption of these analyses, and bring efficiencies and savings to our drug development and review processes.
EP-176 : 10 things you need to know about PMDA eSubmission
Yuichi Nakajima, Novartis
From April 2020, PMDA mandates electronic submission in new drug application in Japan. PMDA has published Basic Principals, Notification on Practical Operations, Technical Conformance Guide and FAQs of electronic submission for applicants. Although those guidance documents are covering general topic, there are many operational and technical challenges found in the transitional period from October 2016 to March 2020. FDA and PMDA are different. Needless to say, electronic Case Report Tabulation (eCRT) package accepted by FDA is not always accepted by PMDA. It is difficult to define “golden standard” of electronic submission due to its various submission scenarios. This poster will provide several tips and awareness, which was obtained from actual experiences during transitional period, and support your smooth PMDA submission.
EP-177 : Detecting Side Effects and Evaluating the Effectiveness of Drugs from Customers’ Online Reviews using Text Analytics, Sentiment Analysis and Machine Learning Models
Thu Dinh, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Drug reviews play a very significant role in providing crucial medical care information for both healthcare professionals and consumers. Customers are increasingly utilizing online review sites, discussion boards and forums to voice their opinions and express their sentiments about experienced drugs. However, a potential buyer typically finds it very hard to go through all comments before making a purchase decision. Another big challenge would be the unstructured, qualitative, and textual nature of the reviews, which makes it difficult for readers to classify the comments into meaningful insights. In light of that, this paper primarily aims at classifying the side effect level and effectiveness level of prescribed drugs by utilizing text analytics and predictive models within SAS® Enterprise Miner™. Additionally, the paper explores specific effectiveness and potential side effects of each prescription drug through sentiment analysis and text mining within SAS® Sentiment Analysis Studio and SAS® Visual Text Analytics. The study’s preliminary results show that the best performing model for side effect level classification is the rule-based model with a validation misclassification rate at 27.1%. Regarding effectiveness level classification, text rule builder model also works best with a 22.4% validation misclassification rate. These models are further validated using transfer learning algorithm to evaluate performance and generalization. The results can act as practical guidelines and useful references to facilitate prospective patients in making better informed purchase decisions.
EP-337 : Generating ADaM compliant ADSL Dataset by Using R
Vipin Kumpawat, Eliassen Group
Lalitkumar Bansal, Statum Analytics LLC
SAS has been widely used for generating clinical trials data sets. Whereas, R is used for data analysis and has gained some popularity among statisticians and programmers. R can be considered as a viable alternative to SAS for generating specialized clinical trials data sets like SDTM, ADaM and tables and figures. In this work we generate an ADaM compliant ADSL data set (Subject Level Analysis Data set) by using R. R packages such as sas7bdat,dplyr, tidyr, parsedate and hmisc are used and compared to SAS functions in terms of their computational efficiencies. This paper detail’s the typical steps used to create the ADSL data set which begins with reading the various SDTM data sets followed by a procedure to transpose the SUPPDM data set and merge with DM data set. A procedure to extract the EX and DS variables from EX and DS data sets respectively and then merging with final DM data set is detailed. We also demonstrate how to derive numeric variables, flags, treatment variables and trial dates for the ADSL data set. Finally, the R procedures to attach labels to the variables are discussed; this procedure then culminates in the export of the final ADSL data set. A side by side comparison between R and SAS procedures is outlined. Certain weakness in R such as attaching labels to the variables has been resolved in this work. Finally, the challenges encountered in generating the ADSL data set using R are discussed and compared to SAS.
EP-353 : Visually Exploring Proximity Analyses Using SAS® PROC GEOCODE and SGMAP and Public Use Data Sets
Louise Hadden, Abt Associates Inc.
Numerous international and domestic governments provide free public access to downloadable databases containing health data. Two examples include the Demographic and Health Surveys which include data from Afghanistan to Zimbabwe and the Centers for Medicare and Medicaid Services' Compare data and Part D Prescriber public use files. This paper and presentation will describe the process of downloading data and creating an analytic data base which includes geographic data; running SAS®’ PROC GEOCODE (part of Base SAS®) using Tiger street address level data to obtain latitude and longitude at a finer level than zip code; and finally using PROC SGMAP (part of Base SAS®) with annotation to create a visualization of a proximity analysis.
EP-362 : SDSP: Sponsor and FDA Liaison
Bhanu Bayatapalli, University of Thiruvalluvar at INDIA
The discussion between a sponsor and FDA on data standards for statistical programming deliverables in an electronic submission should start at the early stages of product development and continue along the way to filing. This discussion will involve data standards, structures, and versions to be used for each study submitted with an NDA or BLA. The Study Data Standardization Plan (SDSP) is used as a tool to communicate with FDA on these aspects. Sponsors and applicants are encouraged to utilize established FDA-sponsor meetings (e.g., pre-IND, end of phase 2, Type B/C) to share and discuss the SDSP.