Paper presentations are the heart of a PharmaSUG conference. Here is the list including the next batch of confirmed paper selections. Papers are organized into 12 academic sections and cover a variety of topics and experience levels. This list will be updated once all of the paper selections have been finalized.
Note: This information is subject to change. Last updated 05-May-2023.
Data Visualization and Reporting
|Paper No.||Author(s)||Paper Title (click for abstract)|
|DV-024||Jeffrey Meyers||Methods of a Fully Automated CONSORT Diagram Macro %CONSORT|
& Weiming Du
& Toshio Kimura
|Divergent Nested Bar Plot and SAS Implementation|
|DV-073||Kirk Paul Lafler||Ten Rules for Better Charts, Figures and Visuals|
|DV-116||Marek Solak||Post-surgical opioid pain medications usage evaluation using SAS and Excel|
|DV-134||Abhinav Srivastva||Life Table Analysis for Time to First Event Onset|
& Aakar Shah
|Amazing Graph Series: Butterfly Graph Using SGPLOT and GTL|
& Kaijun Zhang
|The Flexible Ways to Create Graphs for Clinical Trial|
& Rayce Wiggins
|Complementary Overlay - A Programmatic Approach to Figure Output Validation|
|DV-287||Lin Gu||Meta-Analysis in R|
|DV-310||Ilan Carmeli||Revolutionizing Statistical Outputs Validation: a Product Demonstration of Verify, an ML-powered Automation Solution Streamlining the Validation Process|
|Paper No.||Author(s)||Paper Title (click for abstract)|
& Matthew Slaughter
|Commit early, commit often! A gentle introduction to the joy of Git and GitHub|
|HT-093||Troy Hughes||Undo SAS® Fetters with Getters and Setters-Supplanting Macro Variables with More Flexible, Robust PROC FCMP User-Defined Functions That Perform In-Memory Lookup and Initialization Operations|
|HT-105||Jayanth Iyengar||Understanding Administrative Healthcare Datasets using SAS programming tools.|
& Endri Elnadav
|Generating Clinical Graphs in SAS and R - A Comparison of the Two Languages|
|HT-265||Phil Bowsher||Blastula for Communicating Clinical Insights with R via Email|
|HT-355||Bhavin Busa||Introducing TFL Designer: Community Solution to Automate Development of Mock-up Shells and Analysis Results Metadata|
|HT-356||Charu Shankar||Sas Sql 101|
|HT-357||Ajay Gupta||SAS Enterprise Guide: 8.x (What is new!)|
|HT-358||Magnus Mengelbier||R package validation|
|Paper No.||Author(s)||Paper Title (click for abstract)|
|LD-004||Daryna Yaremchuk||Are you a great team player?|
|LD-028||Stephen Sloan||Developing and running an in-house SAS Users Group|
|LD-038||Laura Needleman||The Interview Process: An Autistic Perspective|
& Richann Watson
|Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream|
|LD-095||Carey Smoak||Lessons Learned from a Retired SAS® Programmer|
& Lisa Pyle
|Planning Your Next Career Move - Resume Tips for the Statistical Programmer|
|LD-178||Priscilla Gathoni||Practical Tips for Effective Coaching for Leaders and Managers in Organizations|
& Ershlena McDaniel
& Lisa Stetler
|What's the F in specialized people? Let's talk FSP - the models, variations, and what it takes to be successful|
& Lisa Mendez
|Get a GPS to navigate your skills to find career purpose|
|Paper No.||Author(s)||Paper Title (click for abstract)|
& Aman Sharma
& Lili Li
& Durga Prasad Chinthapalli
|Gear up the Metadata - Automating Patient Profile Generation, a Metadata Driven Programming Approach|
& Ran Li
& Mimi Vigil
& Shan Yang
|Masters of the Table Universe: Creating Table Shells Consistently and Efficiently Across All Studies|
|MM-205||Keith Hibbetts||Automating SDTM: A metadata-driven journey.|
& Reema Baweja
|Challenges with Metadata Repository System Implementation: A Sponsor's Perspective|
& Kavitha Mullela
|Traceability: Not just about Data|
& Abhishek Dabral
|Better CDISC Standards with Metadata Programming|
|MM-315||Karen Walker||Macro to Automate Creation and Sync of Shell Document and TOC|
& Bess LeRoy
|All You Need to Know about the New CDISC Analysis Results Standards!|
|Paper No.||Author(s)||Paper Title (click for abstract)|
& Rohit Alluri
& Melanie Besculides
& Jeffrey Lavenberg
& Ji-Hyn Lee
& Fanni Natanegara
& Lilia Rodriguez
& Dr. Yang Veronica Pei
& Liping Sun
& Helena Sviglin
& Matt Becker
& Mike McDevitt
& Mike Stackhouse
Quick Programming Tips
Real World Evidence and Big Data
|Paper No.||Author(s)||Paper Title (click for abstract)|
& Denis Nyongesa
& John Dickerson
& Jennifer Kuntz
|Real World Evidence in Distributed Data Networks: Lessons from a Post-Marketing Safety Study|
& Iryna Kotenko
|Exploring the spread of COVID-19 in the United States using unsupervised graph-based machine learning|
|RW-113||Kevin Lee||Patient's Journey using Real World Data and its Advanced Analytics|
& Bo Zheng
& Li Ma
|Automating Non-Standard New Study Set Up with a SAS Based Work Process|
|RW-163||Zeke Torres||CMS VRDC - A simplified overview. What to expect in terms of data, system, code.|
& Samiul Haque
|Novel Applications of Real World Data in Clinical Trial Operations: Clinical Trial Feasibility|
& Samiul Haque
|Novel Applications of Real World Data in Clinical Trials: External Control Arms|
Statistics and Analytics
Strategic Implementation & Innovation
Advanced ProgrammingAP-009 : A Configuration File Companion: testing and using environment variables and options; templates for startup-only options initstmt and termstmt
Ronald Fehd, Fragile-Free Software
Monday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 1
The startup process of SAS software reads one or more configuration files, *.cfg, which have allocations of environment variables, the values of which are used in SAS startup-only options to provide access to libraries, lists of directories that contain files that SAS uses for functions, macros, and procedures. This paper provides programmers and advanced users programs to review the default configuration files; procedures, options, and sql to discover options; and a suite of programs to use in Test-Driven Development (TDD) to trace and verify user-written configuration files.
AP-015 : Using SQL Dictionaries to Research the Global Symbol Table
Ronald Fehd, Fragile-Free Software
Monday, 8:30 AM - 8:50 AM, Location: LVL Ballroom: Continental 1
The sql procedure in SAS software provides a number of dictionaries that can be used to research entries in the global symbol table. These dictionaries include lists of dataset and variable names, option values, and catalog entries for format values and macro definitions. This paper provides example programs to research values in the global symbol table assigned by the global statement options, procedure output from the format procedure, and macro definitions. The sql procedure can also be used to create lists of objects for list processing: list of variable names, or dataset names.
AP-021 : Have a Date with ISO®? Using PROC FCMP to Convert Dates to ISO 8601
Richann Watson, DataRich Consulting
Monday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 1
Programmers frequently have to deal with dates and date formats. At times, determining whether a date is in a day-month or month-day format can leave us confounded. Clinical Data Interchange Standards Consortium (CDISC) has implemented the use of the International Organization for Standardization (ISO) format, ISO® 8601, for datetimes in SDTM domains, to alleviate the confusion. However, converting "datetimes" from the raw data source to the ISO 8601 format is no picnic. While SAS® has many different functions and CALL subroutines, there is not a single magic function to take raw datetimes and convert them to ISO 8601. Fortunately, SAS allows us to create our own custom functions and subroutines. This paper illustrates the process of building a custom function with custom subroutines that takes raw datetimes in various states of completeness and converts them to the proper ISO 8601 format.
AP-026 : A Quick Look at Fuzzy Matching Programming Techniques Using SAS® Software
Stephen Sloan, Accenture
Kirk Paul Lafler, sasNerd
Monday, 4:00 PM - 4:20 PM, Location: LVL Ballroom: Continental 1
Data comes in all forms, shapes, sizes and complexities. Stored in files and data sets, SAS® users across industries know all too well that data can be, and often is, problematic and plagued with a variety of issues. Two data files can be joined without a problem when they have identifiers with unique values. However, many files do not have unique identifiers, or "keys", and need to be joined by character values, like names or E-mail addresses. These identifiers might be spelled differently, or use different abbreviation or capitalization protocols. This paper illustrates data sets containing a sampling of data issues, popular data cleaning and user-defined validation techniques, data transformation techniques, traditional merge and join techniques, the introduction to the application of different SAS character-handling functions for phonetic matching, including SOUNDEX, SPEDIS, COMPLEV, and COMPGED, and an assortment of SAS programming techniques to resolve key identifier issues and to successfully merge, join and match less than perfect, or "messy" data. Although the programming techniques are illustrated using SAS code, many, if not most, of the techniques can be applied to any software platform that supports character-handling.
AP-033 : Twenty Ways to Run Your SAS® Program Faster and Use Less Space
Stephen Sloan, Accenture
Monday, 10:30 AM - 10:50 AM, Location: LVL Ballroom: Continental 1
When we run SAS® programs that use large amounts of data or have complicated algorithms, we often are frustrated by the amount of time it takes for the programs to run and by the large amount of space required for the program to run to completion. Even experienced SAS programmers sometimes run into this situation, perhaps through the need to produce results quickly, through a change in the data source, through inheriting someone else's programs, or for some other reason. This paper outlines twenty techniques that can reduce the time and space required for a program without requiring an extended period of time for the modifications. The twenty techniques are a mixture of space-saving and time-saving techniques, and many are a combination of the two approaches. They do not require advanced knowledge of SAS, only a reasonable familiarity with Base SAS® and a willingness to delve into the details of the programs. By applying some or all of these techniques, people can gain significant reductions in the space used by their programs and the time it takes them to run. The two concerns are often linked, as programs that require large amounts of space often require more paging to use the available space, and that increases the run time for these programs.
AP-039 : A utility to combine study outputs into all-in-one PDF for DSUR
Wei Shao, Bristol Myers Squibb
Monday, 11:00 AM - 11:20 AM, Location: LVL Ballroom: Continental 1
During the clinical development, periodic analysis of safety information is crucial for the assessment of risk to trial participants, because it is important information for health authorities (HAs) to evaluate the safety profile of the investigational drug on a regular basis. As one of important safety aggregate reports, the development safety update report (DSUR) provides a periodic update on the drug safety information. The statistical programming teams are often responsible for generating cumulative and within reporting period outputs for supporting DSUR. This paper focuses on the two regional listings which are provided to DSUR regional specific. Further the paper introduces a SAS macro that generates consolidated PDF files from sets of individual study SAS listings. The resulting PDF file(s) contains all the converted listings with self-extracted and properly sorted bookmarks. The macro package turns individual listings in a study into one or more submission ready compliant (SRC) PDF files for DSUR submission.
AP-048 : Documenting your SAS programs with Doxygen and automatically generated diagrams.
Philip Mason, Wood Street Consultants
Monday, 1:30 PM - 2:20 PM, Location: LVL Ballroom: Continental 1
Doxygen has been used to document programs for over 25 years. It involves using tags in comments to generate HTML, RTF, PDF, and other forms of high-quality documentation. It supports the DOT language for making diagrams from simple text directives. PROC SCAPROC can be used to generate a trace of a SAS® programs' execution. My SAS® code can then analyze the trace and produce DOT language directives to make a diagram of the execution of that SAS program. Those directives can then be put into the Doxygen tags to add the diagram to your documentation. And the analysis can also show the performance of the SAS program to be used for tuning purposes. This paper shows how to use Doxygen with SAS and provides the code to automatically produce diagrams for that documentation or tuning purposes.
AP-049 : Friends are better with Everything: A User's Guide to PROC FCMP Python Objects in Base SAS
Matthew Slaughter, Kaiser Permanente Center for Health Research
Isaiah Lankham, Kaiser Permanente Center for Health Research
Monday, 2:30 PM - 3:20 PM, Location: LVL Ballroom: Continental 1
Flexibly combining the strengths of SAS and Python allows programmers to choose the best tool for the job and encourages programmers working in different languages to share code and collaborate. Incorporating Python into everyday SAS code opens up SAS users to extensive libraries developed and maintained by the open-source Python community. The Python object in PROC FCMP embeds Python functions within SAS code, passing parameters and code to the Python interpreter and returning the results to SAS. User-defined SAS functions or call routines executing Python code can be called from the DATA step or any context where built-in SAS functions and routines are available. This paper provides an overview of the syntax of FCMP Python objects and practical examples of useful applications incorporating Python functions into SAS processes. For example, we will demonstrate incorporating Python packages into SAS code for leveraging complex API calls such as validating email addresses, geocoding street addresses, and importing a YAML file from the web into SAS. All examples from this paper are available at https://github.com/saspy-bffs/pharmasug-2023-proc-fcmp-python
AP-057 : Best Practices for Efficiency and Code Optimization in SAS programming
Jayanth Iyengar, Data Systems Consultants LLC
Monday, 10:00 AM - 10:20 AM, Location: LVL Ballroom: Continental 1
There are multiple ways to measure efficiency in SAS programming; programmers' time, processing or execution time, memory, input/output (I/O) and storage space considerations. As data sets are growing larger in size, efficiency techniques play a larger and larger role in the programmers' toolkit. This need has been compounded further by the need to access and process data stored in the cloud, and due to the pandemic as programmers find themselves working remotely in distributed teams. As a criteria to evaluate code, efficiency has become as important as producing a clean log, or expected output. This paper explores best practices in efficiency from a processing standpoint, as well as others.
AP-061 : Going Command(o): Power(Shell)ing Through Your Workload
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.
Monday, 4:30 PM - 5:20 PM, Location: LVL Ballroom: Continental 1
Simplifying and streamlining workflows is a common goal of most programmers. The most powerful and efficient solutions may require practitioners to step outside of normal operating procedures and outside of their comfort zone. Programmers need to be open to finding new (or old) techniques to achieve efficiency and elegance in their code: SAS® by itself may not provide the best solutions for such challenges as ensuring that batch submits preserve appropriate log and lst files; documenting and archiving projects and folders; and unzipping files programmatically. In order to adhere to such goals as efficiency and portability, there may be times when it is necessary to utilize other resources, especially if colleagues may need to perform these tasks without the use of SAS software. These and other data management tasks may be performed via the use of tools such as command-line interpreters and Windows PowerShell (if available to users), used externally and within SAS software sessions. We will also discuss the use of additional tools, such as WinZip, used in conjunction with the Windows command-line interpreter.
AP-063 : RESTful Thinking: Using R Shiny and Python to streamline REST API requests and visualize REST API responses
Laura Elliott, SAS Institute Inc.
Crystal Cheng, SAS Institute Inc.
Tuesday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 1
REST APIs are a popular way to make HTTP requests to access and use data due to their simplicity. They can be used to carry out several types of actions in a statistical computing environment. Even though they are simple, some limitations have been observed such as lack of detail in responses, difficulty in debugging the failure of certain actions, and execution requires the user to have some basic HTTP request knowledge. This research focuses on mitigating these limitations by utilizing the strengths of both R and Python to build a user interface that executes REST APIs easily and displays responses with more detail. R Shiny was used to create an easy-to-use interface that contains several embedded HTTP requests, written using the Python requests module, that can be easily executed regardless of a user's previous knowledge. These requests perform specified actions in a statistical computing environment and return detailed results to be viewable in the R Shiny dashboard. This paper will explain the concept and build process of the dashboard, will discuss techniques used to integrate R and Python programming languages, and will introduce the resulting dashboard. In the end the paper will discuss challenges faced during development and some considerations for the future enhancement of REST APIs. The products used for development include R and Python programming languages and the statistical computing environment SAS Life Sciences Analytics Framework. This paper is intended for individuals with R and Python experience, and those who have knowledge of REST APIs.
AP-079 : Using the R interface in SAS ® to Call R Functions and Transfer Data
Bruce Gilsen, Federal Reserve Board of Governors
Tuesday, 8:30 AM - 9:20 AM, Location: LVL Ballroom: Continental 1
Starting in SAS ® 9.3, the R interface enables SAS users on Windows and Linux who license SAS/IML ® software to call R functions and transfer data between SAS and R from within SAS. Potential users include SAS/IML users and other SAS users who can use PROC IML just as a wrapper to transfer data between SAS and R and call R functions. This paper provides a basic introduction and some simple examples. The focus is on SAS users who are not PROC IML users, but who want to take advantage of the R interface.
AP-086 : An Introduction to Obtaining Test Statistics and P-Values from SAS® and R for Clinical Reporting
Brian Varney, Experis
Tuesday, 11:00 AM - 11:50 AM, Location: LVL Ballroom: Continental 1
Getting values of test statistics and p-values out of SAS® and R is quite easy in each of the software packages but also quite different from each other. This paper intends to compare and contrast the SAS and R methods for obtaining these values from tests involving Chi-Square and Linear Models such that they can be leveraged in tables, listings, and figures. This paper will include but not be limited to the following topics: * SAS ODS trace * SAS PROC FREQ * SAS PROC GLM * R stats::chisq.test() function * R stats::aov() function * R sasLM package functions * R broom package functions The audience for this paper is intended to be programmers familiar with SAS and R but not necessarily at an advanced level.
AP-090 : Top 5 gotchas for getting cloud ready in SAS Viya
Charu Shankar, SAS Institute
Tuesday, 10:00 AM - 10:20 AM, Location: LVL Ballroom: Continental 1
As a SAS®9 programmer, you may be interested to learn about the cloud and why SAS® Viya is such a big leap forward. In addition to hearing about the big picture and getting cloud-savvy, you are also interested in a deeper dive into how things are going to be in SAS® Viya vs SAS®9. In this session, I will discuss the top 5 gotchas for getting cloud ready in SAS® Viya. From database connections to data types, exploring metadata, loading tables in memory, and new engines, this is a must-attend session. All levels are welcome. Some awareness of SAS®9 would be beneficial.
AP-094 : Sorting a Bajillion Variables-When SORTC and SORTN Subroutines Have Stopped Satisfying, User-Defined PROC FCMP Subroutines Can Leverage the Hash Object to Reorder Limitless Arrays
Troy Hughes, Datmesis Analytics
Tuesday, 10:30 AM - 10:50 AM, Location: LVL Ballroom: Continental 1
The SORTC and SORTN subroutines sort character and numeric data, respectively, and are sometimes referred to as "horizontal sorts" because they sort variables rather than observations. That is, all elements within a SORTC or SORTN sort must be maintained in a single observation. A limitation of SORTC and SORTN is their inability to sort more than 800 variables when called inside the FCMP procedure. To overcome this disagreeable, arbitrary threshold, user-defined subroutines can be engineered that leverage the hash object to sort limitless variables. The hash object orders values that are ingested into it using the ORDERED argument, which can specify either ASCENDING or DESCENDING. This text demonstrates three failure patterns that occur when the OF operator specifies an array inside the FCMP procedure, which affect both character and numeric arrays, and which cause all built-in functions and subroutines to fail with runtime errors.
AP-126 : Facilitating Complex String Manipulations Using SAS PRX Functions
John LaBore, SAS Institute
Tuesday, 1:30 PM - 2:20 PM, Location: LVL Ballroom: Continental 1
SAS programmers learn to use many basic SAS functions within the DATA step, but surprisingly few learn about the SAS PRX (Perl Regular Expression) functions and call routines. The SAS PRX functions provide a powerful means to handle complex string manipulations, by enabling the same end result with fewer lines of code, or by enabling the analysis of data previously out of reach of the basic string manipulation functions. The PRX functions and call routines became available in SAS version 9, are accessible within the DATA step, and are tools that every advanced SAS programmer should have in their toolkit. Examples are provided to give a quick introduction to the syntax, along with a review of the resources available to the programmer getting started with SAS PRX functions.
AP-130 : Strategies for Code Validation at Statistical Center for HIV/AIDS Research and Prevention (SCHARP)
Marie Vendettuoli, Statistical Center for HIV/AIDS Research and Prevention, Fred Hutchinson Cancer Center
Xuehan (Emily) Zhang, Statistical Center for HIV/AIDS Research and Prevention, Fred Hutch Cancer Center
Rodger Zou, Statistical Center for HIV/AIDS Research and Prevention, Fred Hutch Cancer Center
Tuesday, 2:30 PM - 2:50 PM, Location: LVL Ballroom: Continental 1
Code validation presents a common challenge to programming teams embedded in the pharmaceutical data sphere. How much effort is enough? Is it possible to have a lightweight and nimble approach to validation that accommodates the variety that statistical programmers see in our daily codebase? At Statistical Center for HIV/AIDS Research and Prevention, Fred Hutchinson Cancer Center (SCHARP) we have adopted the R Validation Framework (PHUSE, 2021) as a language-agnostic paradigm for validating code at various stages of maturity, during both internal development efforts and when adopting community-authored resources. We will share examples from three major use categories: (1) Validation of community-authored resources (2) Validation integrated into development, and (3) Validation separate from R packages. A brief review of the R Validation Framework will be provided. Examples are drawn from the R language environment with additional discussion exploring the intersection of a risk-based strategy for adoption and development of an R codebase, with a particular emphasis to empower reproducibility and R package development. While examples shown in this paper and accompanying talk are in the R statistical programming language, the concepts are not dependent on programming language and no audience fluency is expected.
AP-151 : Real Projects, Real Transition, Really Revolutionary - Transitioning to R for Biometrics Work
Danielle Stephenson, Atorus Research
Alyssa Wittle, Atorus Research
Rebekah Oster, Atorus Research
Tuesday, 4:00 PM - 4:20 PM, Location: LVL Ballroom: Continental 1
Technology is developing quickly and staying on the cutting edge has its challenges. Often, diving into something new is an intimidating idea for programmers, companies, and sponsors - especially when the day-to-day work must continue. The pharmaceutical industry has been entrenched in SAS® for decades, and the time has come to explore open-source and dedicate the time to figure this out. What are the ups and downs of this transition? How can the learning curve become a bit less curvy? Can TFLs and CDISC-compliant datasets be created using R? We will dive into what it looks like for a team to go from SAS-fluent to multi-lingual in real time, with real projects, and ways to ensure quality while making the transition.
AP-189 : You can REST easy generating Synthetic Data
Ben Howell, SAS
Ben Bocchicchio, SAS Institute
Tuesday, 5:00 PM - 5:20 PM, Location: LVL Ballroom: Continental 1
Synthetic data is a promising approach to replace or supplement the control arm of a clinical trial. Effective use of synthetic data could theoretically cut costs, time, and patient burden of clinical trials in half. Machine learning models trained with existing standard-of-care data are key to generation of relevant, reliable, un-biased synthetic control data. The necessary machine learning and statistical capabilities for this purpose may not come standard in a clinical data repository (CDR) and statistical computing environment (SCE). In this case, it is necessary to integrate the CDR and SCE with a platform used to run machine learning models in order to keep sufficient electronic records for the regulatory compliance of clinical trial data during this process. Additionally, CDRs and SCEs can be accessed remotely through the use of REST APIs, which are simple HTTP requests that can be made from SAS, R, Python, curl commands, or the programming language of your choice. This paper demonstrates how a user with knowledge of REST APIs could run machine learning models to generate synthetic control data for clinical trials while keeping an audit trail of the process without requiring any knowledge of the CDR and SCE or machine learning platform. The products used in this research include PC SAS, the CDR and SCE SAS Life Science Analytics Framework, and SAS Viya. This paper is intended for individuals with knowledge of REST APIs.
AP-191 : Survival Methods for Crossover in Oncology Trials
Brian Mosier, EMB Statistical Solutions
Wednesday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 1
In oncology trials, patients are often allowed to switch from one treatment arm to the other when their disease progresses. This crossover of patients, which is typically not at random, may lead to bias in the estimates of overall survival for the study. As such, when patient crossover is allowed, the intent-to-treat population is no longer appropriate for analysis. Various methods have been developed to handle the crossover of patients. In this paper we present two SAS macros to implement methods that account for crossover in oncology trials. The methods included are (1) the rank preserving structural failure time (RPSFT) model, and (2) the inverse probability of censoring weighting (IPCW) model. Both models have previously been implemented in SAS, but we expand upon the implementations of both and provide corrections to code for the IPCW model.
AP-218 : Automation of Validation of QC Plan Entries and Managing SAS Programs
Venkatesh Nemalipuri, Vertex Pharamceuticals Inc
Ateet Shah, Vertex Pharmaceuticals Inc.
Monday, 11:30 AM - 11:50 AM, Location: LVL Ballroom: Continental 1
Clinical trial data computing environment can be part of audit by regulatory agencies. Validation of QC plan dates against the date stamps of Production SAS programs is often a manual or a non-existent process and manual checking can take a considerable amount of time and resources, not to mention the risks of errors in process. The first macro can compare date stamps of Production SAS programs against the QC Plan and if there are any mismatches, it can email the list of issues to their authors automatically. Also, it can alert programmers and study lead if program is present in the QC plan and not in the folders or vice versa and can write the Date QC passed in the QC Plan based on date stamps of the QC SAS programs. Manually creating hundreds of standardized SAS programs in short time period can be a tedious task. The second macro will generate SAS programs with standard header for SDTM, ADaM, TFL's and Adhoc based on QC plan information and will populate the Purpose field for TFL's using titles in the QC plan. Both of these macros help to automatize these quality control and validation processes in a clinical trial data computing environment.
AP-252 : Program It Forward! - Thinking Downstream when Coding
Frank Canale, SoftwaRx, LLC
Wednesday, 8:30 AM - 8:50 AM, Location: LVL Ballroom: Continental 1
Within the pharmaceutical industry during these times, we as SAS users are often faced with short timelines, demands for quick turnarounds, having to create reoccurring or multiple deliverables...all while doing so with decreasing resources. Couple this with ever increasing standards and standardization efforts, and we sometimes find there's less of a need to create programs from scratch. Instead, we try to harvest reusable code - copied from a prior study and modified to run on current study. However, in many instances this process causes unexpected headaches as one tries to understand what the original programmer wanted to do, why, and if it will even work for the next project!
AP-253 : Creating The cxtf Native SAS Test Framework
Magnus Mengelbier, Limelogic AB
Tuesday, 3:00 PM - 3:20 PM, Location: LVL Ballroom: Continental 1
The natural evolution of business processes will eventually gravitate towards standardization and further automation to not only improve upon time to delivery but to further quality and standards. SAS macro libraries play an integral part to drive the process forward. As the use of macros evolve from repetitive processing to utilities and shareable process components, the effort of testing requirements and validation approaches change. Although test frameworks that support SAS exist, they frequently require command line or external access, which is not always possible in advanced solutions or embedded systems. The design, evolution, and learnings from creating a native SAS test framework is discussed through practical examples, including the benefits and drawbacks of testing of SAS code and macros in situ.
AP-259 : No More Manual PDF Bookmarks! An Automated Approach to Converting RTF Files to a Consolidated PDF with Bookmarks
Tyler Plevney, Emanate Biostats, Inc.
Wednesday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 1
Preparing output documents for data review meetings and presentations can be a tedious process. What if I told you it doesn't have to be? Instead of opening and closing multiple RTF or Excel files with various sorts and filters to review, sometimes hundreds of outputs, why not consolidate them in an easy-to-review PDF document with all outputs easily separated with linked bookmarks? Doing this manually by converting RTF output files to PDF can take a long time. Add on the time to type out the bookmarks for each individual output and it doesn't seem worth it. In this paper I will present an automated way to convert an entire folder of RTF outputs to PDF while simultaneously changing the filenames to the bookmark text once consolidated into a combined PDF file. This is all done via a SAS program containing some VBScript code for file moving and converting in conjunction with an Excel spreadsheet containing the RTF output filenames and the title of the outputs.
AP-268 : Display Layout Specifications to Flexibly Design and Generate Tables
Songgu Xie, Regeneron Pharmaceuticals
Michael Pannucci, Regeneron Pharmaceuticals
Toshio Kimura, Regeneron Pharmaceuticals
Wednesday, 9:45 AM - 10:05 AM, Location: LVL Ballroom: Continental 1
Currently, tables are created in SAS by manually processing the data and defining PROC REPORT to generate a predefined fixed table. If different layouts are required, extensive data processing will be required to create alternative table layouts. We propose a new process where the statistical analysis procedure is decoupled from the display layout generation process. The first step is to perform the statistical analysis and store the analysis results in a standard results dataset (RDS) which is conceptionally similar to the CDISC analysis results standard (ARS). The RDS concept will be presented separately and thus will be out of scope of this paper. The second step is to process the RDS to generate the display. We propose a new specification called the display layout specifications (DLS) to drive the display generation process. Through the DLS, we will be able to flexibly design various table layouts. This includes defining the row and column specifications, nesting of row/column headers, display format and decimals, etc. The DLS will serve as an input into a flexible display generation macro that will process the RDS according to the DLS to create the desired table. Since the display generation process has been separated from the statistical analysis process, additional display layouts that were not predefined can be easily created by the display generation macro using the stored RDS and the new DLS as inputs, therefore, greatly facilitating the creation of alternative table layouts.
AP-291 : Make You Holla' Tikka Masala: Creating User-Defined Informats Using the PROC FORMAT OTHER Option To Call User-Defined FCMP Functions That Facilitate Data Ingestion Data Quality
Troy Hughes, Datmesis Analytics
Tuesday, 4:30 PM - 4:50 PM, Location: LVL Ballroom: Continental 1
You can't just let any old thing inside your tikka masala-you need to carefully curate the ingredients of this savory, salty, sometimes spicy delicacy! Thus, when reviewing a data set that contains potential tikka masala ingredients, an initial data quality evaluation should differentiate approved from unapproved ingredients. Cumin, yes please; chicken, the more meat the merrier; coriander, of course; turmeric, naturally; yeast, are you out of your naan-loving mind?! Too often, SAS practitioners first ingest a data set in one DATA step, and rely on a subsequent DATA step to clean, standardize, and format those data. This text demonstrates how user-defined informats can be designed to ingest, validate, clean, and standardize data in a single DATA step. Moreover, it demonstrates how the FORMAT procedure can leverage the OTHER option to create a user-defined informat that calls a user-defined FCMP function to perform complex data evaluation and transformation. Control what you put inside your tikka masala with this straightforward solution that leverages the flexibility of the FORMAT and FCMP procedures!
AP-309 : Macro To Automate Crossover Review In Produced Outputs
Igor Goldfarb, Accenture
Ella Zelichonok, Naxion
Wednesday, 10:15 AM - 10:35 AM, Location: LVL Ballroom: Continental 1
The goal of this work is to develop a macro that automates an important and time-consuming part of a final review process of produced tables, listings, figures and (TLF) - crossover review. Performing this type of validation is a well-accepted practice typically performed manually by biostatisticians as the final stage in the multi-step quality check (QC) process. The proposed tool can significantly simplify review work for biostatisticians who have to verify that TLF were generated correctly, and they are consistent across the study and its different sections. Final review of the produced TLF represents an important task in a flow from raw data to final outputs ready for submission. Comparison, analysis and making sure that the actual outputs are all consistent across the study is a tedious procedure requiring scrupulous manual work that is subject to human errors. The proposed macro (developed in Excel VBA) automates this process. It reads the titles and corresponding content of the produced outputs (e.g., big N - values of safety population by study group, by subgroup, etc.) and in a matter of seconds creates a SAS®-readable ordered table of content (TOC, Excel) that includes also data of interest. At the next step the macro analyzes the read data and verifies that all produced outputs are consistent across the study (e.g., numbers of safety population in all outputs by age subgroup add up correctly to the corresponding values in study groups). In case of any inconsistency found the macro marks the distinctions. Any further updates in the created TLF, can be easily reviewed another time by rerunning this macro.
Data StandardsDS-035 : Sound SDTM, Sound ADaM - Orchestrating SDTM and ADaM Harmonization
Nancy Brucken, IQVIA
David Neubauer, IQVIA
Soumya Rajesh, IQVIA
Monday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 2
Sound SDTM data is integral to having sound ADaM data. The ADaM model says "Whereas ADaM is optimized to support data derivation and analysis, CDISC's Study Data Tabulation Model (SDTM) is optimized to support data tabulation". Often times those who implement SDTM are not implementing ADaM and vice versa and they may not be working in harmony. If how the data will be analyzed has not been considered, tasks such as the definition of treatment arms and elements, and the assignment of collected data to SDTM domains can present various challenges during analysis and with traceability. This paper will cover some of the situations the authors have encountered and discuss how SDTM and ADaM implementation can be better in tune.
DS-036 : Why FDA Medical Queries (FMQs) for Adverse Events of Special Interest? Implementation and Case Study
Clio Wu, Chinook Therapeutics Inc.
Monday, 8:30 AM - 8:50 AM, Location: LVL Ballroom: Continental 2
There are many challenges associated with safety analyses and reporting of adverse events in clinical trials, including, but not limited to, study design issue, coding of the AEs, selection of the AEs of special interest (AESIs), inadequate grouping of likely or potential related AEs, events present in different ways or are reported with different terms, or AEs that are too specific can result in underestimation of an event. To standardize the NDA/BLA safety data review process, the U.S. FDA/CDER has published two documents on 05 September 2022 and collaborated with the Duke-Margolis Center for Health Policy to host a public workshop on 14 September 2022 to introduce the FDA Medical Queries (FMQs) and Standard Safety Tables and Figures Integrated Guide. The author has actively reviewed and promoted the implementation of FMQs at the author company to resolve AESIs issue of un-identifiable legacy studies defined Customized MedDRA Queries (CMQs), that led to the official implementation of this newly released AE grouping. This paper will share the experience of promoting and implementing FMQs, evaluating FDA published FMQ docket for potential issues and providing feedback to enhance future releases. Developing of efficient standardized end-to-end FMQ data pulling, AESIs data analysis and reporting processes. Incorporating Standardized MedDRA Queries (SMQs), FMQs, along with potential company defined CMQs to standardize medical monitoring process to ensure the consistent implementation within company itself. The paper will also share an FMQ case study for NDA ISS analysis and CSR reporting.
DS-041 : Leverage and Enhance CDISC TAUGs to Build More Traceability for and Streamline Development of Efficacy ADaM in Oncology Studies
Xiangchen Cui, Crisprtx Therapeutics
Monday, 10:00 AM - 10:50 AM, Location: LVL Ballroom: Continental 2
CDISC Breast Cancer Therapeutic Area User Guide and Prostate Cancer Therapeutic Area User Guide presented 'ADEVENT' and 'ADDATES', independently in 2016, and 2017. One of primary reasons for the creation of the intermediate datasets is to support traceability by building into event dataset and/or date dataset through the triplet of SRCDOM, SRCVAR, SRCSEQ variables, and all potential dates from them are used as inputs to generate a Time-to-Event (TTE) analysis. ADEVENT can also support another analysis dataset 'ADRESP' for best overall response, etc. The derivation of dates from tumor assessments is not straight forward and much more complex, especially when Response Evaluation Criteria in Solid Tumors (RECIST 1.1) is applied in the derivation, where the confirmation of a complete response (CR) and partial response (PR) is required. Hence the traceability of these derivations is also very critical to build the confidence of the analysis. The triplet from ADEVENT is not sufficient for the traceability of the events derived from tumor assessments due to the complexity of the derivation. The triplet from ADDATES only provides the traceability of the derivation of the dates of independent of tumor assessments. This paper explains the pros and cons of them and introduces a new approach to enhance them so that they can be broadly used to other areas of oncology studies to build more traceability, further streamline the development of efficacy datasets: ADEVENT, ADRESP, ADDATES, and ADTTE for both categorical analysis of tumor response and a TTE analysis and follow the best programming practice.
DS-051 : The Phantom of the ADaM: adding missing records to BDS datasets
Anastasiia Drach, Intego Group LLC
Monday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 2
Missing data is a 'pain' of any study. There are many imputation techniques available, but sometimes all we need to know is just that the data is missing. In these cases, it is useful to add derived records to your ADaM datasets with missing AVAL/AVALC to indicate missed visits or timepoints. Such records are called phantom records. In this paper, we discuss how to add them into BDS ADaM dataset using PRO data as an example. We will start with an overview of different ways to represent missing data in SDTM. The paper will present several types of analysis which require the inclusion of phantom records to account for missing data. It will cover various scenarios of adding such records, from the most straightforward to more complex ones. Finally, we will provide some ready-to-use solutions for the creation of phantom records, which could be easily adjusted to your individual needs.
DS-054 : SDTM Variables You Might Forget About
Nadiia Pukhliar, Intego Group, LLC.
Dariia Tsyhanenko, Intego Group, LLC.
Iryna Kotenko, Intego Group
Monday, 11:00 AM - 11:20 AM, Location: LVL Ballroom: Continental 2
With every new version of the SDTM/SDTMIG more and more examples of raw data mapping are presented, more details on specific variables are described. However, in practice the same mapping rules are transferred from study to study with no changes and everything less common is either mapped to the Supplemental Qualifiers SUPP-- datasets, the Findings About Events or Interventions (FA) domain or is not submitted at all. The paper collects several cases of SDTM mapping providing more coherent and detailed representation of collected data. Special attention in the paper is given to the Supplemental Qualifiers datasets examining standard supplemental qualifiers name codes per the SDTMIG. Further, we are sharing tricks on using the ADaM IG to get standard qualifier names in SUPP-- domains. Additional focus of the paper is on using accompanying text in the CRF and the protocol to procure more context in SDTM datasets by creating standard variables from the model that are not described in the implementation guide. The examples provided represent CRF pages and studies from our practice, they are a great testament to the versatility of the SDTM that covers various study and data collection designs.
DS-062 : From Sea to shining sea - End to End discussion on PR and LB data
Abraham Yeh, Bayer Pharma
Monday, 11:30 AM - 11:50 AM, Location: LVL Ballroom: Continental 2
From the clinical case report form (CRF), many big pharma and healthcare companies have standard sets that could fit various purposes from the study design to achieving the study's goal. The statistical analysis would be heavily dependent on these selections to fit the purpose of the CRF. On the other hand, we have standard tables that present the results using the data collected/analysis done and shared with the cross-functional study team. First, two options about the prior radiotherapy pages with specific dosages of drugs. One is directly a text field containing all information that can create a listing for the team. The other is to have a separate field containing the real numerical value and the unit that could be transformed. In such ways, further analysis can be performed and evaluated. Second, three types of Urine analysis were picked from the laboratory data. I will show different kinds of layouts on each and their implications. From one, frequency tables can be summarized. While on the other, further analysis using CTCAE version 5.0 could be utilized. I will share live examples that the team encountered on multiple SDTM panels - to show how important it is to select the correct CRF as per the needs of the analysis. Furthermore, we could brainstorm on other cases. "From Sea to Shining Sea" could be possible if we collect the best-fit selections tailored for the study/project and then "to shining sea" meant a shining future that we understood fully when selecting the appropriate options.
DS-114 : CDISC SDTM IG v3.4: Subject Visits
Ajay Gupta, Daiichi Sankyo
Monday, 1:30 PM - 1:50 PM, Location: LVL Ballroom: Continental 2
The Study Data Tabulation Model Implementation Guide for Human Clinical Trials (SDTMIG) Version 3.4 has been prepared by the Submissions Data Standards (SDS) team of the Clinical Data Interchange Standards Consortium (CDISC). Like its predecessors, v3.4 is intended to guide the organization, structure, and format of standard clinical trial tabulation datasets submitted to a regulatory authority. Version 3.4 supersedes all prior versions of the SDTMIG. In this presentation, I will do a quick walk-through on the updates within the SDTM IG v3.4 from his predecessor. Later, I will go over the updated SUBJECT VISITS (SV) with examples e.g., new proposed mapping to include missed visits, how to use additional variables in SV.
DS-125 : Questionnaires in ADaM: ADSF36
Keith Shusterman, Reata
Megan O'Grady, Reata Pharmaceuticals, Inc.
Mario Widel, Reata Pharmaceuticals Inc.
Monday, 2:00 PM - 2:20 PM, Location: LVL Ballroom: Continental 2
The 36-Item Short Form Survey (SF-36) is a questionnaire used as a quality-of-life outcome measure in clinical trials. The responses to these 36 questions are used to derive two values that are analyzed: A physical component summary, and a mental component summary. The process for deriving these two summary values from the raw scores is complicated. It is common for an independent party to derive the summary scores from the raw scores to be included as a source in the SDTM QS domain. However, there may be a compelling reason to derive these values in-house. For example, vendor algorithms are typically proprietary and cannot be incorporated into submission data and documentation. In this case, the process of deriving the summary scores needs to be appropriately documented in ADaM in a clear and traceable way. In this paper, we will outline the process of deriving and documenting in metadata both the physical component and mental component summary scores in a way that is conformant to the ADaM BDS structure, traceable, and analysis-ready.
DS-129 : Have Meaningful Relationships: An Example of Implementing SDTM Special Purpose Data Set RELREC with a Many-to-Many Relationship
Kaleigh Ragan, Crinetics Pharmaceuticals
Richann Watson, DataRich Consulting
Monday, 2:30 PM - 2:50 PM, Location: LVL Ballroom: Continental 2
The Related Records (RELREC) special purpose dataset is a tool, provided in the Study Data Tabulation Model (SDTM), for conveying relationships between records housed in different domains. Most SDTM users are familiar with a one-to-one relationship type, where a single record from one domain is related to a single record in a separate domain. Or even a one-to-many relationship type, where a single record from one domain may be related to a group of records in another. But what if there are two groups of records related to one another? How do you properly convey the relationship between these sets of data points? This paper aims to provide a clearer understanding of when and how to utilize the, not often encountered, many-to-many relationship type within the RELREC special purpose dataset.
DS-148 : Automation of the SDSP CBER Appendix for Vaccine Studies
Nicole Jones, Merck
Pritesh Solanki, Merck
Monday, 3:00 PM - 3:20 PM, Location: LVL Ballroom: Continental 2
The Study Data Standardization Plan (SDSP) is an important means used to communicate between the sponsor and FDA. The SDSP describes the data standards used by the sponsor in non-clinical and clinical studies across the drug development program . Vaccine studies that are submitted to the FDA Center of Biologics Evaluation and Research (CBER) require an additional appendix. Part of this appendix includes 4 sections of which two SDTM sections: (i) list of all SDTM datasets with all the expected or permissible variables used/planned for each study in the analysis and (ii) list of all SUPPQUAL variables used/planned for use. Manually generating these tables could take weeks if no data are available. This manual process is also prone to errors in the form of missing variables/domains. In this paper, we present a Shiny application developed to semi-automate the generation of these tables. The Shiny application can utilize SDTM mapping specs or SDTM datasets as input for generating the two appendix sections. Although some manual review is still required, this tool greatly reduces the time for table generation from weeks to days and improves accuracy.
DS-149 : Integrated Trial Design Model Datasets?
Christine McNichol, Labcorp
Monday, 4:00 PM - 4:20 PM, Location: LVL Ballroom: Continental 2
Creating integrated datasets for an Integrated Study of Safety or Efficacy can be a complicated process. Aside from the complexities of integrating dissimilar studies and the sheer size of some of the datasets, it can be confusing how much of the defined study level SDTM and ADaM requirements are applicable. ADaM guidance for integration has not yet been finalized and the source for those integrated ADaMs is not fixed. Because of this, many of the decisions on the path to integrated analysis datasets needs to be made by those working on the individual submission. Among the many considerations in creating integrated datasets, one area that has not had much attention is integrated Trial Design Model Datasets. The first decision to be made is whether or not they are necessary. From there, if it is decided that they will be created, there are different methods that could be used to create them as well as unique considerations for each domain. Once they are created, they will need special care with interpreting and supplementing any traditional compliance check output, since checks are focused on single studies. The overall approach taken with Trial Design Model Datasets differs between sponsors. This paper will discuss cases in which they might be created, methods to create them, special considerations for each domain, and an example of generating integrated Trial Design Model Datasets, including the assumptions and decisions made in the creation, special issues that arose, and the process of checking and submitting them.
DS-150 : ADaM Design and Programming Standardization for Oncology Efficacy Endpoints Based on RECIST 1.1
Ellen Lin, Seagen Inc.
Matt Ness, Seagen Inc.
Yinghui Wang, Seagen Inc.
Monday, 4:30 PM - 4:50 PM, Location: LVL Ballroom: Continental 2
RECIST 1.1 tumor response criteria are commonly used in clinical trials to evaluate treatment response of solid tumors, such as breast and colorectal cancer. Common oncology efficacy endpoints, such as the best overall response (BOR), duration of response (DOR), and progression-free survival (PFS), are often based on RECIST 1.1 and usually involve complex data collection and derivations. While sponsors strive to make such collection and derivation consistent across study protocols, there are often study-specific considerations that further challenge the programming of these efficacy endpoints. In this paper, we will share an innovative, intuitive, highly structured ADaM design Seagen used successfully in several recent filings to support the analyses of such common oncology efficacy endpoints based on RECIST 1.1. We will explain how we standardize these endpoint derivations into parameterized sections and data sets that build upon one another in an easily understood progressive data flow to significantly improve programming efficiency and quality. With the resulting analysis-ready ADaM structures and step-by-step data flows, we have succeeded in separating the study-specific from the common derivations, simplified the programming of these complex endpoints, and created opportunities to develop standards-driven departmental macros to further elevate speed and quality for several components embedded within this design.
DS-157 : Hard Coding: What is it? Why Does It Matter? What Should You Do When Faced With Such A Request?
Michael Nessly, ICON PLC
Tuesday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 2
A clear operating definition of what constitutes hard coding is when data, which are not present in the collected source data, or if present are incorrect, are inserted or corrected programmatically into the dataflow. Hard coding is the software development practice of embedding data directly into the source code of a program or other executable object, as opposed to obtaining the data from external sources. Outside of regulated activities such as analysis and reporting of clinical trials, it appears that this is a common and non-controversial practice. To understand why hardcoding is very poor practice, not to be taken lightly in clinical trials, one must be familiar with the concept of traceability. Traceability is, in essence, the provenance of data. Traceability is the property that enables the understanding of the data's lineage and/or the relationship between an element and its predecessor(s). Traceability facilitates transparency, an essential component in building confidence in results or conclusions. Traceability is not just good practice, it is required. Hard coding breaks traceability. This is risky in present use and can be disastrous in future use of data subjected to hard coding. While many statistical groups have strong policy on hard coding, it is actually common to receive requests for hard coding today. This presentation aims to increase knowledge of the issues, foster engagement in seeking alternative approaches and provide instruction on how to effectively document when there are cases where hard coding is determined to be done.
DS-161 : Handling Anti-Drug Antibody (ADA) Data for Efficient Analysis
Sabarinath Sundaram, Seagen Inc
Johnny Maruthavanan, Seagen
Tuesday, 8:30 AM - 8:50 AM, Location: LVL Ballroom: Continental 2
Large molecules have revolutionized the pharmaceutical industry. The complex nature of these therapeutics can be mistaken by the human body as foreign substances and their interactions with various endogenous proteins in the human body may induce an immunogenicity effect to produce anti-drug antibodies (ADAs). Based on their interaction with antigen binding sites, ADAs are classified as non-neutralizing antibodies (non-NAbs) and neutralizing antibodies (NAbs). These could impair the functionality of the drug by interfering with PK performance, decrease drug efficacy, and trigger serious hypersensitivity reactions. Monitoring ADA is key to evaluating safety, post-marketing surveillance, and defining risk mitigation strategies. High-quality programming support with solid understanding of ADA data is critical for the programmers to map it to relevant CDISC standard tests that serves as a base to create efficient and impactful ADA analysis. This paper will illustrate the mapping of unique raw data such as ADA Screening, ADA Confirmation, NAbs data, titer results from various sources into the Immunogenicity Specimen Assessments (IS) SDTM domain, deriving relevant ADA variables at the ADaM level, and share highlights of standard ADA reporting. Moreover, a few unique scenarios like how to handle baseline positive and post baseline positive results in relation to their titer values in summary report with oncology example data will be demonstrated. Additionally, this paper briefly touches upon the foundational mechanics of ADA, its impact in clinical trials, and relevant regulatory guidelines.
DS-186 : 'TUFF' LOVE: SDTM Pain Points EXPLAINED!
Kristin Kelly, ICON
Michael Beers, Pinnacle 21
Tuesday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 2
'TUFF' LOVE: SDTM Pain Points EXPLAINED! Though the industry is very familiar with preparing SDTM data, there are still many nuances that one can't always be sure! Trial Design, Relative Timing variables, PK, Exposure, validation rules... and the list goes on....In this paper, hot takes and topics that are crucial to 'getting it right' for submission will be discussed.
DS-193 : Can We Do It Better? Real-Time Validation Of Sdtm Mapping Is Superior To Double Programming
Daniel Rolo, Bioforum The Data Masters
Bremer Louw, Bioforum The Data Masters
Monday, 5:00 PM - 5:20 PM, Location: LVL Ballroom: Continental 2
Accurate, complete, and reliable clinical trial data is paramount to robust decision-making by regulatory authorities. Methodologies used to validate the data in question are resource and time intensive. The clinical industry typically relies on dependable but antiquated methods of validation to ensure that clinical subject data is robust. The most common approach is the use of double programming, predominantly using the SAS programming language. This paper introduces a systematic approach to the real-time validation of SDTM mapping of clinical trial data. The approach is module-based comprising a thorough review of the mapping logic, verification that all source data points have been converted to SDTM (SDTM completeness), and maintaining datapoint traceability from SDTM to the source data. Early and frequent validation of SDTM mapping by demonstrating SDTM completeness, traceability, and mapping logic correctness, in a technology-enabled manner, breathes fresh life into the SDTM mapping process and eliminates the resource drain associated with double-programming. This paper implores the industry to embrace this evolution in SDTM mapping and paves the way to a quality-by-design approach that empowers professionals to focus on the non-repetitive, decision-making aspects of clinical data handling. Keywords: SDTM Mapping, SAS, Validation, Data Completeness, Data Traceability
Data Visualization and ReportingDV-024 : Methods of a Fully Automated CONSORT Diagram Macro %CONSORT
Jeffrey Meyers, Regeneron Pharmaceuticals
Monday, 10:30 AM - 11:20 AM, Location: LVL Ballroom: Continental 3
The CONSORT diagram is commonly used in clinical trials to visually display the patient flow through the different phases of the trial and to describe the reasons patients dropped out of the protocol schedule. They are very tedious to make and update as they typically require using different software, such as Microsoft Visio or Microsoft PowerPoint, to manually create, align, and enter the values for each textbox. There have been several previous papers that explained methods of creating a CONSORT diagram through SAS, but these methods still required a great deal of manual adjustments to align all of the components. The %CONSORT macro removes these manual adjustments and creates a fully automated yet flexible CONSORT diagram completely from data. This presentation is a description of the methods used to create this macro.
DV-037 : Divergent Nested Bar Plot and SAS Implementation
Brian Lin, Regeneron Pharmaceuticals
Weiming Du, Regeneron Pharmaceuticals
Toshio Kimura, Independent Consultant
Monday, 10:00 AM - 10:20 AM, Location: LVL Ballroom: Continental 3
Bar plots are commonly used in visualizing categorical outcomes. However, multiple nested categorical outcomes in two directions (ex: improvement vs worsening) pose unique challenges in interpretation and implementation. One example in ophthalmology is the proportions of improving or worsening of at least 5, 10, or 15 ETDRS letters. In this scenario, outcomes are nested, for example patients with 10-letter gain inherently have 5-letter gain. Existing solutions include: 1) multiple independent bars or accumulative bars, 2) stacked bars. The most significant downside of these solutions is the loss of inherited nested relationship between outcomes, making interpretations difficult. We propose a novel figure, the divergent nested bar plot, to solve this problem: placing two sets of nested bar plot on the opposing sides of the y-axis (positive: improvement; negative: worsening). This figure is intuitive to interpretate. It provides comparisons of percentages while maintaining both the nested nature and the improvement/worsening directionality. Drawing divergent nested bar plot in SAS involves multiple specific steps and settings. This publication aims to provide a roadmap for implementation. The divergent nested bar plot is a marked improvement over existing solutions in visualizing multiple nested binary outcomes with two directions of responses.
DV-073 : Ten Rules for Better Charts, Figures and Visuals
Kirk Paul Lafler, sasNerd
Monday, 8:00 AM - 8:50 AM, Location: LVL Ballroom: Continental 3
The production of charts, figures, and visuals should follow a process of displaying data in the best way possible. However, this process is far from direct or automatic. There are so many ways to represent the same data: histograms, scatter plots, bar charts, and pie charts, to name just a few. Furthermore, the same data, using the same type of plot, may be perceived very differently depending on who is looking at the figure. A more inclusive definition to produce charts, figures, and visuals would be a graphical interface between people and data. This paper and ePoster highlights the work of Nicolas P. Rougier, Michael Droettboom, and Philip E. Bourne by sharing ten rules to improve the production of charts, figures, and visuals.
DV-116 : Post-surgical opioid pain medications usage evaluation using SAS and Excel
Marek Solak, Pacira BioSciences Inc.
Monday, 1:30 PM - 1:50 PM, Location: LVL Ballroom: Continental 3
Post-surgical pain management is often based on opioid medications administration. Opioid medications offer efficient pain control, but their use may, in some cases, lead to patient dependency. Opioid usage data for exploratory and clinical studies needs to be collected accurately and monitored frequently. To address potential dependence problem, the paper provides a systematic method, using Excel (with well-designed structure for data collection) and SAS (for periodical patient opioid medication usage evaluation). Reports are provided in a form of tables and boxplots (with converted to morphine equivalent doses). If higher than expected usage is detected, medical staff may try to offer alternate, pain reduction, protocols .
DV-134 : Life Table Analysis for Time to First Event Onset
Abhinav Srivastva, Exelixis Inc
Monday, 2:00 PM - 2:20 PM, Location: LVL Ballroom: Continental 3
Life Table Analysis is a useful way to study subject's survival with respect to an event of interest over a time period. It provides a good indicator of drug safety or toxicity over the course of a clinical trial due to the occurrence of a related event. For example, life table can be used to study the relation between a drug which is highly immunogenic in nature and the type of events it can trigger over a time period, such as increased liver events indicating signs of liver disease. In this paper we take a graphical approach to represent this information which is then enhanced to add exploratory and interactive features for the reviewer. Data preprocessing is done in SAS®, while all the plots are created in Python using open-source libraries such as matplotlib, seaborn, plotly and dash.
DV-190 : Amazing Graph Series: Butterfly Graph Using SGPLOT and GTL
Tracy Sherman, Ephicacy
Aakar Shah, Nektar Therapeutics
Monday, 2:30 PM - 2:50 PM, Location: LVL Ballroom: Continental 3
Have you been tasked to create a butterfly graph? If so, you might be asking yourself, what is a butterfly graph and which SAS procedure should I use to create one? Which procedure will be the most straight forward to learn but also has the flexibility for custom modifications? The two most common methods used to create these graphs are using the SGPLOT procedure and the GTL (graph template language). The documentation for these procedures to enhance the visual appearance of these graphs is lengthy and cumbersome. This paper will help you learn the syntax required and narrow the time it takes to produce high quality and amazing butterfly graphs which can be shared with upper management or in a conference presentation. In addition, the paper compares the SAS 9.4 SGPLOT procedure and GTL so that you can choose which method fits well with your programming requirements.
DV-226 : The Flexible Ways to Create Graphs for Clinical Trial
Hong Zhang, Merck & Co
Kaijun Zhang, Merck
Monday, 3:00 PM - 3:20 PM, Location: LVL Ballroom: Continental 3
Without a doubt, SAS programming can create high-quality graphics. In SAS language, PROC TEMPLATE with the SGRENDER procedure is a powerful and commonly used tool to create a customized figure. PROC TEMPLATE allows any figure type to overlay in one plotting space, lattice any figure type side by side, and control every visual aspect of the graphical field. However, the R language's more flexible tools have gained popularity recently by using the ggplot2 package, especially combined with the patchwork package. The combination of patchwork and the ggplot2 package can provide you with a more powerful and effective tool to create all kinds of graphs in clinical trial reports. This paper will introduce the R package patchwork combined with the ggplot2 using typical graphs in the clinical trial as examples. This paper will also include PROC TEMPLATE with SGRENDER procedure of SAS, as the comparison. In summary, patchwork combined with the ggplot2 package in R opens a new window for you to customize your graphs, especially combining different graphs or text content on one page or across the page.
DV-256 : Complementary Overlay - A Programmatic Approach to Figure Output Validation
Jesse Pratt, PPD, part of Thermo Fisher Scientific
Rayce Wiggins, PPD, part of Thermo Fisher Scientific
Monday, 4:00 PM - 4:20 PM, Location: LVL Ballroom: Continental 3
Effective and efficient validation of figures for a clinical trial can be a challenging process. Common techniques for validating outputs, such as PROC COMPARE and RTF parsing are not viable for figure validation. Without these tools, programmers must rely on double programming the input data set and manually inspecting the output against corresponding tables and listings. Many times, the input data sets are not "one PROC away" and manual inspection of the output is prone to human error. This paper illustrates a novel programmatic method to figure validation by leveraging the combined usage of complementary colors and the DRAWIMAGE statement within the SAS Graph Template Language (GTL).
DV-287 : Meta-Analysis in R
Lin Gu, Duke University
Monday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 3
With accumulation of large body of information and data in independent clinical trial studies, questions are often raised to estimate the combined effect sizes for multiple similarly designed studies for a particular research topic. Meta-analysis is such a statistical technique that not only compares the outcomes of each individual studies but also estimate the pooled effect sizes for this purpose. Over past decade, Meta-analysis has become a universally accepted research tool. In drug development, FDA has published a draft guidance on how to conduct Meta-analysis of randomized clinical trials to evaluate safety risks associated with the use of human drugs or biological products within the framework of regulatory decision making. SAS has been a widely used tool to perform a wide range of statistical analyses, but SAS procedures for Meta-analysis have been lacking. R is a powerful and increasingly popular software not only for statistical analysis but for visualization. In this paper, I used R as a tool to demonstrate end to end on how to perform Meta-analysis. I started from defining the research topic and discussing the eligibility criteria of how to select the studies, how to extract data in a reproducible way, and how to select right R packages and statistical models to do the analysis. The Proc and Cons of statistical models are also discussed. I also described how to use R Markdown to produce the publishing quality forest plots for Meta-analysis in the different formats including PDF, WORD, and PNG, etc.
DV-310 : Revolutionizing Statistical Outputs Validation: a Product Demonstration of Verify, an ML-powered Automation Solution Streamlining the Validation Process
Ilan Carmeli, Beaconcure
Monday, 11:30 AM - 11:50 AM, Location: LVL Ballroom: Continental 3
Validation of statistical outputs can present significant challenges within the pharmaceutical industry. However, the integration of automation and Machine Learning has been shown to provide a comprehensive and effective solution. Verify, an automated output validation solution specifically designed for the pharmaceutical industry utilizes Machine Learning to convert clinical data into a semantic and dynamic database, enabling the identification of errors and anomalies with high accuracy and efficiency. Verify will be demonstrated at our presentation. In this presentation we will demonstrate the following features: Project creation for validation Study delivery history Comparison between outputs from different delivery versions Within-table automated checks Cross-table automated checks Automated figures checks Management and resolution of discrepancies Maintenance of an audit trail
Hands-On TrainingHT-042 : Commit early, commit often! A gentle introduction to the joy of Git and GitHub
Isaiah Lankham, Kaiser Permanente Center for Health Research
Matthew Slaughter, Kaiser Permanente Center for Health Research
Tuesday, 8:00 AM - 9:30 AM, Location: LVL Lobby: Golden Gate 3-4
In recent years, the social coding platform GitHub has become synonymous with open-source software development. Behind the scenes, GitHub uses software called Git, which was originally developed as a distributed version control system for managing contributions of thousands of developers to the Linux kernel. In this hands-on workshop, we'll introduce you to the joy of Git and GitHub for managing changes to codebases of any size, whether working along or as part of a team. We'll also practice using the GitHub website, and we'll illustrate how to use Git from the command line within a web-based Google Colab environment. Topics will include basic Git/GitHub concepts, like forking and cloning code repositories, as well as best practices for using branches and commits to create a well-organized history of code changes. We'll also talk about how to sync code changes between local and remote environments like GitHub. Finally, we'll use the GitHub web interface for pull requests, which are the standard mechanism for contributing to open-source projects. No knowledge of Git or GitHub will be assumed, and no software will need to be installed. In order to work through interactive examples, accounts will be needed for GitHub and Google. Complete setup steps will be provided at https://github.com/saspy-bffs/pharmasug-2023-how
HT-093 : Undo SAS® Fetters with Getters and Setters-Supplanting Macro Variables with More Flexible, Robust PROC FCMP User-Defined Functions That Perform In-Memory Lookup and Initialization Operations
Troy Hughes, Datmesis Analytics
Monday, 10:15 AM - 11:45 AM, Location: LVL Lobby: Golden Gate 3-4
Bring your laptop to this HANDS-ON GAMING, and come play the first-ever interactive text adventure written in the SAS language! Installed as a first-time conference attendee, you will have to navigate conference obstacles and intrigue, and collect ribbons for your conference badge to ensure you can gain entry to the infamous "after party." In addition to real-time game play, attendees will receive a behind-the-scenes view of the underlying SAS software, which leverages PROC FCMP (and the RUN_MACRO built-in function) to model "getter" and "setter" user-defined functions that update state/lookup tables. Participation requires the SAS OnDemand for Academics (SODA) platform, which must be downloaded for free 24 hours in advance: https://www.sas.com/en_us/software/on-demand-for-academics.html.
HT-105 : Understanding Administrative Healthcare Datasets using SAS programming tools.
Jayanth Iyengar, Data Systems Consultants LLC
Monday, 4:00 PM - 5:30 PM, Location: LVL Lobby: Golden Gate 3-4
Changes in the healthcare industry have highlighted the importance of healthcare data. The volume of healthcare data collected by healthcare institutions, such as providers and insurance companies is massive and growing exponentially. SAS programmers need to understand the nuances and complexities of healthcare data structures to perform their responsibilities. There are various types and sources of administrative healthcare data, which include Healthcare Claims (Medicare, Commercial Insurance, & Pharmacy), Hospital Inpatient, and Hospital Outpatient. This training seminar will give attendees an overview and detailed explanation of the different types of healthcare data, and the SAS programming constructs to work with them. The workshop will engage attendees with a series of SAS exercises involving healthcare datasets.
HT-250 : Generating Clinical Graphs in SAS and R - A Comparison of the Two Languages
Kriss Harris, SAS Specialists Ltd.
Endri Elnadav, EE Statistical Analysis & Consulting
Monday, 8:00 AM - 9:30 AM, Location: LVL Lobby: Golden Gate 3-4
These days there are many R packages available which support the needs of clinical programming, but SAS is still a major player, so which language should we be using? This question is not easy to answer, as both R and SAS have their merits. In this hands-on workshop we will concentrate on combining these two programming languages. We will show you how to produce frequently requested clinical graphs such as Kaplan-Meier plots in SAS and R, and you will see how to produce a simple clinical table in SAS and R too. During the workshop we will also show you how R Shiny can be used to produce interactive plots and tables, and as a bonus demonstrate how code generators can be used in conjunction with template technology to create SAS (or R) code from Shiny.
HT-265 : Blastula for Communicating Clinical Insights with R via Email
Phil Bowsher, RStudio Inc.
Tuesday, 10:15 AM - 11:45 AM, Location: LVL Lobby: Golden Gate 3-4
The blastula package makes it easy to produce and send HTML email from R. This session will show how you can use blastula to embed reports and plots into emails, customize email body, and send emails automatically / conditionally. No prior knowledge of R/RStudio is needed. This short hands-on session will provide an introduction to Blastula for sending emails with R.
HT-355 : Introducing TFL Designer: Community Solution to Automate Development of Mock-up Shells and Analysis Results Metadata
Bhavin Busa, Independent
Tuesday, 4:00 PM - 5:30 PM, Location: LVL Lobby: Golden Gate 3-4
Our industry spends countless number of hours manually designing TFL shells and to write analysis datasets specification. And once they are made available, the programmer writes the SAS code to generate the analysis datasets and TFLs. And in most companies, the generation of analysis deliverables is very manual (sometimes with macros or re-using code). CDISC 360 proof of concept (POC) project demonstrated end-to-end automation of study specification, data processing and analysis using a metadata-driven approach. As a result, CDISC is working on adding a conceptual layer to the analysis standards (e.g., CDISC Analysis Results Standards) and are developing a Safety User Guide. There is a need of a tool that can leverage available CDISC analysis results standards/templates and accelerate generation of the TFL shells and support metadata-driven automation in development of ADaM and TFLs.
HT-356 : Sas Sql 101
Charu Shankar, SAS Institute
Tuesday, 1:30 PM - 3:30 PM, Location: LVL Lobby: Golden Gate 3-4
This hands-on training is for users wishing to learn PROC SQL in a step-by-step approach. PROC SQL is a powerful query language that can sort, subset, join and print results all in one step. Users who are continuously improving their analytical processing will benefit from this seminar. Participants will learn the following elements to master PROC SQL: understand the syntax order in which to submit queries to PROC SQL; select and calculate columns; Filter Rows; Join tables using join conditions like inner join and outer joins.
HT-357 : SAS Enterprise Guide: 8.x (What is new!)
Ajay Gupta, Daiichi Sankyo
Wednesday, 8:00 AM - 9:00 AM, Location: LVL Lobby: Golden Gate 3-4
SAS Enterprise Guide is a powerful Microsoft Windows client application that provides a guided mechanism to exploit the power of SAS and publish dynamic results throughout your organization. SAS Enterprise Guide 8.x has been redesigned to include a modern and flexible user interface with tab-based organization of content and flexible window management. This training will provide a quick walk through on Enterprise Guide v8.x newly added functionality and features e.g., updated user interface, new start page, general enhancements, running multiple items without unlocking the interface, functionality to unlock data, change appearance theme, quickly access recent items, ability to manage keyboard shortcuts, new git integration user interface, and new search pane.
HT-358 : R package validation
Magnus Mengelbier, Limelogic AB
Monday, 1:30 PM - 3:30 PM, Location: LVL Lobby: Golden Gate 3-4
The R Package Validation hands-on training discusses different approaches to R package validation, covering strategies, methods and tools and how those are impacted by business drivers, compliance requirements, risk, package managers, IT architectures and simply R in itself.
Leadership SkillsLD-004 : Are you a great team player?
Daryna Yaremchuk, Intego Group LLC
Tuesday, 5:00 PM - 5:20 PM, Location: LVL Ballroom: Continental 7
It is well known that being a team player is essential for managing projects successfully and great collaboration within a team brings fruitful results. No matter the role and position, everyone is important in achieving common goals, meeting timelines, providing the exemplary service and reaching the best customer satisfaction. I am ready to bet that at least once during their career every person, whether it is a junior programmer or team lead, has doubts regarding their productive contribution to the project. After that it is normal to reflect on your teamwork performance. Moreover, it is so pleasant to receive an email saying: "It was a great pleasure to work with you and I hope there will be a new opportunity for our collaboration next time". Also, excellent team player skills are an essential part of the job description for all positions within the statistical programmers society. Surely, it is not something that could be easily measured in contrast to programming skills, nevertheless it is the trait that is highly valued. In this paper the author - discusses what it means to be a good team player depending on your role and position, - describes common characteristics of good team players, - suggests tips on how to improve your must have teamwork skills and - shares her own experience in becoming a good team player from "let sleeping dogs lie" to "a true role model".
LD-028 : Developing and running an in-house SAS Users Group
Stephen Sloan, Accenture
Wednesday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 7
Starting an in-house SAS ® Users Group can pose a daunting challenge in a large worldwide organization. However, once formed, the SAS Users Group can also provide great value to the enterprise. SAS users (and those interested in becoming SAS users) are often scattered and unaware of the reservoirs of talent and innovation within their own organization. Sometimes they are Subject Matter Experts (SMEs); other times they are new to SAS but provide the only available expertise for a specific project in a specific location. In addition, there is a steady stream of new products and upgrades coming from SAS Institute and the users may be unaware of them or not have the time to explore and implement them, even when the products and upgrades have been thoroughly vetted and are already in use in other parts of the organization. There are often local artifacts like macros and dashboards that have been developed in corners of the enterprise that could be very useful to others so that they don't have to "reinvent the wheel".
LD-038 : The Interview Process: An Autistic Perspective
Laura Needleman, AstraZeneca
Wednesday, 8:30 AM - 8:50 AM, Location: LVL Ballroom: Continental 7
Neurodiversity is an emerging topic within our industry. It is a newer diversity category that we as an industry are just beginning to understand and incorporate into Diversity, Equity, and Inclusion (DE&I) plans. Many companies are beginning to understand there are benefits to seeking out and hiring neurodivergent talent. Topics include discussing various interview techniques and dissecting them from an autistic perspective. This presentation will also offer interviewing ideas that would better help to showcase neurodivergent talent during the interview process. I'll also be sharing my recommendations around providing interview questions in advance as well as formal skill testing as it relates to neurodivergent candidates. The target audience are those who engage in hiring as interviewers.
LD-056 : Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream
Josh Horstman, Nested Loop Consulting
Richann Watson, DataRich Consulting
Tuesday, 2:30 PM - 3:20 PM, Location: LVL Ballroom: Continental 7
While many statisticians and programmers are content in a traditional employment setting, others yearn for the freedom and flexibility that come with being an independent consultant. In this paper, two seasoned consultants share their experiences going independent. Topics include the advantages and disadvantages of independent consulting, getting started, finding work, operating your business, and what it takes to succeed. Whether you're thinking of declaring your own independence or just interested in hearing stories from the trenches, you're sure to gain a new perspective on this exciting adventure.
LD-095 : Lessons Learned from a Retired SAS® Programmer
Carey Smoak, Retired
Wednesday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 7
I had a successful 38-year career as an epidemiologist and as a statistical SAS programmer. I retired in August of 2021 and have had time to reflect on my career. I have seen a lot of innovation in my 38 years. I used to write reports using DATA _NULL_. The advent of ODS and PROC REPORT has made report writing much simpler. Today's servers are smaller, but more powerful than the mainframe computers that I used early in my career. But I have also learned a lot of lessons along the way, and I would like to share the lessons that I have learned. I'll also share some tips on preparing for retirement.
LD-104 : Planning Your Next Career Move - Resume Tips for the Statistical Programmer
Sapan Patel, Merck
Lisa Pyle, Accenture
Wednesday, 9:45 AM - 10:05 AM, Location: LVL Ballroom: Continental 7
The statistical programming market is HOT! No doubt about it, and you want to take the opportunity to build your career and move to the next level. A resume is your first impression when submitting to a company. It is circulated and used by many people in an organization to screen, evaluate and determine whether you fit the job role. As a prospective candidate, you must take time and effort to ensure that your resume is current, streamlined and highlights essential work experience related to the job you are applying for. Interviewers like us review many resumes, and over time we've agreed that there are a couple of quick wins to help the interviewer evaluate your resume and determine if you will fit the job role. This paper provides quick tips to prepare and modernize your resume for high-impact review so that you get that next interview.
LD-178 : Practical Tips for Effective Coaching for Leaders and Managers in Organizations
Priscilla Gathoni, Wakanyi Enterprises Inc.
Tuesday, 4:00 PM - 4:50 PM, Location: LVL Ballroom: Continental 7
As an executive or senior-level member of the organization, do you wish you had better skills in coaching, similar to a skilled professional? As a people manager, do you want to help your team members improve their performance in all aspects important to them? Do you find it challenging to manage people and their problems? Is developing a coaching mindset hindering you from being truly present in your coaching conversations? Do you struggle to handle different personality types? Coaching is a vital tool for every leader and manager to have at their disposal if they are committed to fulfilling and advancing the potential of their individuals and teams and improving organizational commitment. Coaching relationships are built upon truth, openness, and trust and allow the person being coached to be responsible for their own results and think creatively. Well-executed coaching empowers individuals to take action, increase their personal performance and professional effectiveness in problem-solving and decision-making skills, and influence others. We will explore the value of coaching, when to coach, coaching mistakes, and six practical tips for becoming an invaluable coach for your organization and your coaching business.
LD-196 : What's the F in specialized people? Let's talk FSP - the models, variations, and what it takes to be successful
Kathy Bradrick, Catalyst Clinical Research, LLC
Ershlena McDaniel, Catalyst Clinical Research
Lisa Stetler, Catalyst Clinical Research
Wednesday, 10:15 AM - 10:35 AM, Location: LVL Ballroom: Continental 7
FSP means something different to every client and service provider. Why are there so many variations? What are the pros and cons? Why is everybody talking about it? Let's explore different models and approaches, what makes collaborations successful, and how the roles and expectations can differ from traditional CRO models. We will explore the variety of FSP models, how the roles and responsibilities vary by collaboration, the challenges and expectations, and what it takes to realize success. We strive to shed light on the benefits and detriments to the client company, the FSP provider company, and the workers within these models. Our take home message is that while FSP seems to have so many meanings and so many different configurations, some things remain unchanged despite size and model. Establishing clear communication, expectations, and escalation pathways is key to making the collaboration successful.
LD-215 : Get a GPS to navigate your skills to find career purpose
Charu Shankar, SAS Institute
Lisa Mendez, Catalyst Clinical Research
Wednesday, 10:45 AM - 11:05 AM, Location: LVL Ballroom: Continental 7
The global pandemic saw employees take charge of their careers and focus on what inspired them in never seen ways. Finding purpose can seem overwhelming and a daunting task in these changing times. To address this issue, Charu and Lisa have written this paper to guide knowledge workers to find purpose and fulfillment in their work while bringing their best strengths forward. Using their combined skills and expertise in psychometrics, SAS, and testing and measurement, they will share self-assessment tools to understand one's skill set and then apply it to job fulfillment. Each participant will be provided with a self-assessment tool that will be discussed during the presentation.
Metadata ManagementMM-040 : Gear up the Metadata - Automating Patient Profile Generation, a Metadata Driven Programming Approach
Tanmay Khole, PTC Therapeutics
Aman Sharma, PTC Therapeutics
Lili Li, PTC Therapeutics
Durga Prasad Chinthapalli, PTC Therapeutics
Monday, 4:30 PM - 5:20 PM, Location: LVL Ballroom: Continental 7
Patient profiles are individual reports of subject's clinical data and provide a great benefit to clinical or medical teams when performing an on-going review in a clinical trial project. These reports are customized per reviewer's requests when safety, efficacy, and other key significant data points vary for each study design. The programming challenges for generating patient profile reports include the accommodation of variances in data source, structure, mapping, quality, and the customization in report outputs and formats. The application featured in this paper is an MS Excel VBA based utility featuring a user-friendly interface that uses SAS® macros in the backend with a robust design to analyze all clinical trials data in CDSIC SDTM or ADaM format and generate patient profiles by GUI. The SAS® macros are designed by implementing metadata-driven programming approach where the users only need to use the metadata specification and the VBA based utility to generate patient profiles without the core programming conventionally required.
MM-118 : Masters of the Table Universe: Creating Table Shells Consistently and Efficiently Across All Studies
Michael Hagendoorn, Seagen Inc.
Ran Li, Seagen Inc.
Mimi Vigil, Seagen Inc.
Shan Yang, Seagen Inc.
Tuesday, 8:00 AM - 8:50 AM, Location: LVL Ballroom: Continental 7
Creating specifications for tables, listings, and figures (TLFs) is traditionally not for the faint of heart. Over the course of weeks or more, our brave author valiantly translates the statistical analysis plan into sometimes hundreds of pages of detailed Microsoft Word-formatted specifications for their study. Meanwhile, two offices over, another author is going through the same painful process of creating a separate Word shell for their own study... and so on. While capturing TLF shells for each study in this manner is the norm, we developed an alternate approach by establishing a single set of shells at the compound level that provides the metadata for all CSR and integration TLFs across the entire product. Such a master shell setup yields many advantages: • Faster shell development for new studies and less maintenance for existing ones • Instant visibility into where studies differ on any detail • Higher consistency across studies, which elevates quality in TLF specs and programming • Increased programming efficiency through expanded macro coverage and less custom code • Enhanced departmental standards adoption to benefit compliance and review • Annotation to ADaM unlocks pathways for future submission documentation and metadata repository • Potential for powerful metadata-driven output generation and other innovations We will share the design, management, and governance model of our master shell implementation. We'll also discuss programming and biostatistics perspectives on the benefits and challenges we observed, along with our solutions - so you can immediately leverage this setup and unleash the power of compound-level specifications!
MM-205 : Automating SDTM: A metadata-driven journey.
Keith Hibbetts, Eli Lilly and Company
Tuesday, 11:00 AM - 11:50 AM, Location: LVL Ballroom: Continental 7
SDTM is one of the deliverables in every clinical trial. Efficient creation of SDTM can save hundreds of hours of effort, yet automation of SDTM has proven to be a very difficult proposition. This paper details the journey Eli Lilly has been on pursuing SDTM automation. To use a car analogy, you need four wheels and an engine to drive. On this particular journey, the four wheels are: a robust set of standards, a metadata repository to store and maintain those standards, a set of generic macros for dataset creation, and a programming process to utilize those macros. The engine is metadata. By defining a metadata model that not only defines the source and target but also the logic to convert the source to the target, we can build out the rest of the components to make this vision a reality. A proof-of-concept project based on this idea achieved 96% automation of SDTM variables in a test study. Now we're on the road to making this concept a production-ready reality.
MM-206 : Challenges with Metadata Repository System Implementation: A Sponsor's Perspective
Radhika Kale, Bristol Myers Squibb
Reema Baweja, Bristol Myers Squibb
Monday, 4:00 PM - 4:20 PM, Location: LVL Ballroom: Continental 7
In the modern world, Metadata Repository (MDR) systems are a critical element of metadata management commonly used in a variety of industries, such as biotechnology and pharma, finance, healthcare, manufacturing, to improve data governance and performance. MDR is a database created to gather, store, and distribute contextual information (i.e., metadata) about data. The most efficient MDR system helps to implement frustration free end to end implementation of standards while adhering to quality, automation, and flexibility. There are certain challenges that can arise when implementing and working with MDR systems that are commercially available or built in-house. This paper gives an overview of specific challenges from sponsor's standpoint and helps set expectations from an ideal MDR system. Challenges are described in two categories: Implementation Challenges including metadata strategy development, selection of standards and MDR system, building a metadata team, creating data catalog; and Operational Challenges pertaining to Data Quality, Data Governance, Integration with other systems, User Adoption, Maintenance and Support, and Scalability. Additionally, the paper also explores opportunities to mitigate specific challenges, thereby providing meaningful input for efficient product enhancements.
MM-272 : Traceability: Not just about Data
Rohit Nagpal, Kite Pharma, Inc
Kavitha Mullela, EXPERIS Solutions
Tuesday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 7
Sometimes in clinical trials we see gaps between data, documentation and process which leads to questions from cross-functional teams and Regulatory Agencies. To overcome these issues, we can consider traceability, not just in the context of data, but as a combination of data, documentation, and process involving all cross functional teams from data collection to submission. Traceability has many facets and is a joint effort that requires collaboration across all cross-functional teams. To accomplish this during the lifecycle of clinical trials, we use different documentation trackers like change control, database migration change, study status, process checklists and delivery approval form besides the standard documents. By using these additional documents, we can easily track database migration changes, changes during study lifecycle, recreate outputs at any point of clinical trials, ensure quality outputs, provide information to cross-functional teams, and can provide crucial information for integrated analysis. This paper provides examples of traceability concepts that can be used to produce a robust trail to achieve better outcomes which helps us in improving compliance, risk mitigation, better data integrity, reliability, and less follow-up questions from Regulatory Agencies.
MM-273 : Better CDISC Standards with Metadata Programming
Sunil Gupta, Gupta Programming
Abhishek Dabral, Alkermes Inc
Tuesday, 2:00 PM - 2:20 PM, Location: LVL Ballroom: Continental 7
Is metadata programming ready for prime time? Are you ready to start taking CDISC standards seriously and minimize coding? Are you ready to take advantage of metadata programming methods to enforce automation, standards and higher quality control in addition to reduced resources and time? This paper shows several key applications as well as techniques for metadata programming. We will explore beyond the basic dictionary datasets of dataset and variable attributes (SAS dictionary.columns and dictionary.tables) to more advanced programming methods of creating macro lists, looping through each item as well as code generation. These components create a robust macro system to be used across multiple studies as well as establish a foundation for corporate global standards. Tools that leverage metadata programming opens a world of applying regulatory rules to standardize raw data (SDTM transformation), data exploration using a single-source of truth view (eg tables from CDISC library) as well as checking for CDISC conformance (SDTM/ADaM rules).
MM-315 : Macro to Automate Creation and Sync of Shell Document and TOC
Karen Walker, Walker Consulting LLC
Tuesday, 1:30 PM - 1:50 PM, Location: LVL Ballroom: Continental 7
ABSTRACT In clinical trials, biostatisticians spend time in the development of a Statistical Analysis Plan (SAP). The SAP contains mock-up shells for tables, figures and listings (TLF). A shared document is used to edit and complete the TLFs for the SAP and it is typical a long Word Document. The metadata provided by the mock shells includes titles, subtitles, footnotes, comments and formatting details. Statistical programmers are tasked with transferring this information from the mock-ups to an Excel file called table of content (TOC), which SAS programs use to generate the final TFLs. Manual transfer of this metadata can be challenging, labor-intensive, and prone to errors. Both Shared shell document and TOC are living documents, subject to changes over the time. Several SAS users have developed Excel macro and others SAS macro to facilitate the generation of TOC. This paper introduces a novel approach that does much more beyond these. With a SAS Add-in for WORD a macro is developed to facilitate the creation of mock-up shell documents. SAS program Templates are used to export the relevant information from mock-up shells to the TOC programmatically. In addition, the program will identify the discrepancies between the shared shell document and TOC document, synchronizing them while keeping a complete change history.
MM-327 : All You Need to Know about the New CDISC Analysis Results Standards!
Bhavin Busa, Independent
Bess LeRoy, CDISC
Tuesday, 10:00 AM - 10:50 AM, Location: LVL Ballroom: Continental 7
The CDISC Analysis Results Standard (ARS) Team is developing standards to improve and facilitate the automation, reproducibility, reusability, and traceability of analysis results (Tables, Figures and Listings, aka TFL). The team is working towards this objective by creating an Analysis Results Metadata Technical Specification (ARM-TS) to capture machine-readable TFL metadata and defining a formal Analysis Results Data (ARD) model to store the results data. The ARM-TS will support automation and creation of data displays (TFL) while the ARD will support traceability, reuse and reproducibility of analysis results. This presentation will be a pre-launch of the CDISC Analysis Results Standards where the attendees will get early access to the analysis results formal model. We will discuss various use cases, benefits, challenges and key considerations for successful implementation.
Panel DiscussionPN-339 : DEI Panel
Radha Railkar, Merck & Co.
Rohit Alluri, Merck
Melanie Besculides, Mount Sinai's Icahn School of Medicine
Jeffrey Lavenberg, Merck
Ji-Hyun Lee, University of Florida
Fanni Natanegara, Eli Lilly and Company
Lilia Rodriguez, SAS
Tuesday, 1:30 PM - 3:30 PM, Location: LVL Lobby: Golden Gate 6-7
Advancing workplace diversity is more important today than ever before. Benefits of a diverse and inclusive workplace include a deeper trust and commitment from employees, respect among employees of different backgrounds, and the ability to integrate different perspectives to drive innovation.
PN-340 : FDA Panel
Jesse Anderson, CDER OTS OCS DRRR
Dr. Yang Veronica Pei, CDER OND ODES DBIRBD
Liping Sun, CDER OTS OB DAI
Helena Sviglin, CDER OSP DSS
Wednesday, 10:00 AM - 12:00 PM, Location: LVL Lobby: Golden Gate 6-8
Collection of presentations by FDA CDER and CBER representatives. See the FDA Wednesday page for a complete schedule.
PN-342 : Technology
Paul Slagle, IQVIA
Matt Becker, SAS
Mike McDevitt, Syneos Health
Mike Stackhouse, Atorus Research
Monday, 10:30 AM - 11:30 AM, Location: LVL Lobby: Golden Gate 6-7
This panel will feature industry leaders and technology experts Matt Becker, Mike McDevitt, Mike Stackhouse and Paul Slagle. They will share their insights on how the latest technological developments revolutionize the pharmaceutical industry. Join the discussion on cutting-edge technologies such as artificial intelligence and data-driven designs, and how these advancements transform everything from drug discovery to clinical trials. Share how you are putting digitalization to work!
Quick Programming TipsQT-001 : A SAS® Macro to Convert CSV files to SAS Datasets
Zemin Zeng, Sanofi
Bo Yuan, Sanofi
Monday, 8:00 AM - 8:10 AM, Location: LVL Ballroom: Continental 8-9
Statistical programmer in the pharmaceutical industry often needs to convert Comma Separated Values (CSV) files to SAS datasets. This paper provides a short SAS macro to automatically convert CSV files to SAS datasets. We shall share the technics in the macro on how programmatically to identify the column variables in a CSV file without any manual inputs. The macro has been widely used at work and has been proven to increase programming efficiencies and provide time/cost savings.
QT-020 : A Macro to Identify Repeating SAS(r) BY Variables in a MERGE
Timothy Harrington, Navitas Data Sciences
Monday, 8:30 AM - 8:40 AM, Location: LVL Ballroom: Continental 8-9
SAS MERGEs are best performed on data sets with unique BY variable values in one or both of the data sets being merged, ie: one-to-one, one-to-many, or many-to-one merges. When there are occurrences of duplicate BY variables in both of the data sets there is an ambiguity as to whether each repeating BY variable in one data set should be matched to which of the duplicated key observations in the other data set. The number of matching BY variables observations in the output data set would be the product of the number of the input observations. (The Cartesian product). To indicate this situation a message 'A MERGE has repeats of the BY variables' is written to the saslog file. Although this indicates a duplicated BY variable issue, it does not identify which observations are at fault. This macro performs a MERGE on two input data sets and if there are repeats of any of the BY variables, lists the observation numbers involved in both of the datasets, the values of the BY variables to the saslog, and the number of occurrences of different repeated BY variables.
QT-030 : Running Parts of a SAS Program while Preserving the Entire Program
Stephen Sloan, Accenture
Monday, 8:15 AM - 8:25 AM, Location: LVL Ballroom: Continental 8-9
We often only want to run parts of the programs while preserving the entire programs for documentation or future use. Some of the reasons for selectively running parts of a program are: • Part of it has run already and the program timed out or encountered an unexpected error. It takes a long time to run so we don't want to re-run the parts that ran successfully. • We don't want to recreate data sets that were already created. This can take a considerable amount of time and resources and can also occupy additional space while the data sets are being created. • We only need some of the results from the program currently, but we want to preserve the entire program. • We want to test new scenarios that only require subsets of the program.
QT-044 : Macro Code to Test Existence of Various Objects
Ronald Fehd, Fragile-Free Software
Monday, 9:15 AM - 9:25 AM, Location: LVL Ballroom: Continental 8-9
SAS(R) software provides functions to check the existence of the objects which it manages --- catalogs and data sets --- as well as folders referred to by the filename and libname statements. The purpose of this paper is to provide a set of macro statements for assertions of existence and to highlight the exceptions where these functions return non-boolean choices.
QT-046 : Repetitive Analyses in SAS® - Use of Macros Versus Data Inflation and BY Group Processing
Brad Danner, IQVIA
Indrani Sarkar, IQVIA
Monday, 9:00 AM - 9:10 AM, Location: LVL Ballroom: Continental 8-9
While preparing clinical reports, we are commonly tasked to produce multiple outputs of the same analysis, using a different endpoint of interest, or slightly different populations of interest, or according to a suite of categorical subgroups. Naturally, we can accomplish such repetitive tasks efficiently using SAS with MACRO processing. Alternatively, "data inflation", an approach that does not employ MACRO processing, with careful use of OUTPUT statements in the SAS data step, we 'inflate' the source data, so that all variations of the multiple analyses are in one dataset, which can then pass-through analysis procedures once with BY group processing. The objective of this article is to demonstrate these two approaches, either of which can be used for the purpose of analysis and review. Outputs from both approaches can be consolidated and exported into one source which will make the review process less time-consuming. Time-to-event analyses (Kaplan-Meier and Cox regression) will be used to demonstrate both techniques and will be discussed and compared.
QT-047 : Confirmation of Best Overall Tumor Response in Oncology Clinical Trials per RECIST 1.1
Danyang Bing, ICON Clinical Research
Monday, 8:45 AM - 8:55 AM, Location: LVL Ballroom: Continental 8-9
In oncology clinical trials for solid tumor, the revised RECIST guideline (version 1.1) is the standard guidance for response evaluation. Many statistical programmers with oncology experience are familiar with tumor burden calculations and deriving best overall response following RECIST v1.1, but have limited experience with confirmation of response. In non-randomized trials where response is the primary endpoint, confirmation of partial response (PR) or complete response (CR) is required and handled in response analysis datasets. Instruction in the RECIST v1.1 guideline does not provide the logic to handle response scenarios for all data. Clarification of the confirmation logic to use for specific scenarios based on RECIST v1.1 is presented. Subsequent timeline requirements and minimum durations for stable disease (SD) is addressed. Finally, handling of intervening responses of SD or not evaluable (NE) between two CR or PR response time points is explained.
QT-085 : Tips to Read In and Output Excel Spreadsheets in SAS
Chao Su, Merck
Jaime Yan, Merck
Changhong Shi, Merck
Monday, 10:00 AM - 10:10 AM, Location: LVL Ballroom: Continental 8-9
Excel is a powerful and versatile tool widely used in clinical trials to save and transfer data. Especially in organizations where research software like SAS and R is not commonly used, researchers can only manipulate data by relying on IT. Analysis and reporting programmers play an essential role in receiving different types of excel spreadsheets and creating customized excel reports according to requests from different departments. Usable code to read in different types of spreadsheets and generate friendly interface excel outputs is beneficial and meaningful. In this paper, a macro is developed to read in excel spreadsheets of various types (.xls, xlsx, xlm). Different methods are presented to output excel files to meet different complex requirements. It will significantly relieve A&R programmers' work and improve efficiency. It also helps to smooth the collaboration among different functional groups to achieve the common goal successfully.
QT-099 : SAS Macro Design Considerations to Generate Subgroup Table and Forest Plot in Oncology Studies
Yizhuo Zhong, Merck
Christine Teng, Merck
Monday, 4:00 PM - 4:10 PM, Location: LVL Ballroom: Continental 8-9
Subgroup analyses are essential when there is anticipated heterogeneity within a target population and potential inconsistency of therapeutic response to the study treatment. Subgroup tables and forest plots are commonly required for the efficacy analysis in oncology studies. It is important that the analysis programs can handle additional subgroups which may be added for exploratory analyses or agency requests. This paper will review some common scenarios using categorical variables. These subgroup variables are generally provided through the subject level dataset such as ADSL and efficacy datasets supporting the analysis of overall survival (OS), progression-free survival (PFS), and objective response rate (ORR). The proposed design of a SAS macro would allow flexibility to add additional subgroups, handle different sorting orders within a subgroup, and display text of the output without needing to update the analysis programs. The design also considers the case when the subgroup size is too small for comparison to eliminate surprising statistical errors when generating the analysis reports in an unblinded environment.
QT-100 : With a View to Make Your Metadata Function(al): Exploring the FMTINFO() Function
Louise Hadden, Abt Associates Inc.
Monday, 10:15 AM - 10:25 AM, Location: LVL Ballroom: Continental 8-9
Many SAS® programmers are accustomed to using SAS metadata on SAS data stores and processes to enhance their coding and promote data driven processing. SAS metadata is most frequently accessed through SAS functions, SAS View tables, and dictionary tables. Information can be gained regarding SAS data stores and drilling down, attributes of columns in data files. However, few programmers are aware of SAS's similar resources and capabilities with respect to SAS formats. This quick tip will briefly discuss how SAS metadata may be exploited in general, and will demonstrate how to use the FMTINFO() function specifically.
QT-121 : Smart Use of SAS Output System and SAS Macro for Statistic Test Selection
Mengxi Wang, University of Southern California
Monday, 1:30 PM - 1:40 PM, Location: LVL Ballroom: Continental 8-9
Choosing the optimum statistical test is crucial to generate accurate results in a quantitative research study. However, digging through piles of diagnostic test reports for useful results may at times, be a tedious task for beginners in the field of biostatistics, especially when there are many variables to examine. The purpose of this paper is to share an efficient way to automate the process of selection between non-parametric and parametric tests with the SAS output system and SAS macro. The statistic tests that are used as examples are some of the most widely used tests in table 1: Chi-square test, Fisher exact test, Independent two sample T-test, Wilcoxon-Mann-Whitney test, ANOVA test, and Kruskal Wallis test. The test selection is based on variable type, variable category, sample size, and sample distribution. The results of the test selection and associated p values, as well as basic descriptive statistics, will then be compiled into one dataset, which can then either be printed for use as a handy reference or exported to Excel for further formatting.
QT-152 : A utility tool to assist with the setup of the startup environment for remote access
William Wei, Merck & Co, Inc.
Shunbing Zhao, Merck & Co.
Monday, 10:45 AM - 10:55 AM, Location: LVL Ballroom: Continental 8-9
Along with the course of the clinical trial process, there are multiple deliverables for different purposes such as interim analyses, DMC, safety evaluations, etc. These will require programming updates for each deliverable, which stay in multiple subfolders and may involve multiple programmers. Therefore, the study level information also needs to be updated in the programming environment settings, for example the data cut date, study level macro variable values for titles/footnotes, folder structures, etc. Usually, programmers keep many copies of startup/setup files for developer/validation purposes and for different SAS platforms. With this approach, it takes considerable effort to keep track of updates that are done in the main startup file. It is tedious and error prone. If we can maintain one central copy of the startup file, and the other startup files can call the central startup file, it will reduce the effort to track the many startup file updates. In the remote submit mode, some techniques which SAS users do not use often need to be applied to pass SAS macro variables from local machine to server. In this paper, we are sharing the techniques used to pass the local macro variables to the server in order to setup the startup file. Also, an example macro and walk through of implementation and use are provided in this paper.
QT-165 : Running Python Code inside a SAS Program
Jim Box, SAS Institute
Monday, 11:00 AM - 11:10 AM, Location: LVL Ballroom: Continental 8-9
Did you know that you can execute Python code inside a SAS Program? With the SAS Viya Platform, you can call PROC PYTHON and pass variables and datasets easily between a Python call and a SAS program. In this paper, we will look at ways to integrate Python in your SAS Programs.
QT-175 : Let the Data Drive it and set your hands free - A Macro to create indicators for Special Interest AEs
Yiwei Liu, Eli Lilly
Jameson Cai, Eli Lilly
Monday, 11:15 AM - 11:25 AM, Location: LVL Ballroom: Continental 8-9
Adverse Events of Special Interest (AESI) form an important part of the safety profile for a compound in submission or CSR. AESI identification is sophisticated because its information originates from MedDRA, Standardized MedDRA Queries (SMQs), Customized Queries (CQs), and/or Adjudicated Events performed by an independent Adjudication Committee (AC). Semi-annual MedDRA updates further complicate the reporting process. As AESIs are defined at compound level, reporting AESIs should be consistent across all clinical trials of the compound. A macro is developed to create a set of ADaM variables that identify multiple types of AESI to streamline the process. It can be executed to automatically create all AESI tabulation and analysis reporting for multiple studies with a common definition. This method not only simplifies and standardizes the AESI reporting process, but also increases accuracy in AESI classification. The macro is portable to all compounds with a similar need.
QT-213 : SAS Macro to Automate Programmer Status Sheet
Jake Adler, PROMETRIKA
Denise Rossi, PROMETRIKA
Assir Abushouk, PROMETRIKA
Monday, 1:45 PM - 1:55 PM, Location: LVL Ballroom: Continental 8-9
When programming in clinical trials, you typically have a Production and QC programmer. For many, tracking the programs completion is dependent on the Production and QC programmers manually updating an Excel sheet with the status of their programming. Within this sheet, both programmers report the date their program was completed. This method presents problems, as the manual input can result in user error; it can be very easy to forget to update the date after several rounds of feedback and updating, particularly when these comments and updates occur in the same day and there is no real way to indicate the new completion date. Using Dynamic Data Exchange (DDE) in a SAS macro can automatically input the completion dates and the user running the program. This macro will also be checking for log issues, as well as QC compare mismatches that are reported into the excel sheet.
QT-222 : A SAS® System Macro to Quickly Display Discrepant Values that are too Long for the COMPARE Procedure Output.
Kevin Viel, Navitas Data Sciences
Monday, 2:00 PM - 2:10 PM, Location: LVL Ballroom: Continental 8-9
At times, values being compared by the SAS® System COMPARE procedure are truncated in the output (.lst) because they are too long and the differences are not shown. While the COMPARE procedure can produce various data sets, that approach can be cumbersome. Instead, the programmer might want to display the discrepant values in the log or redirect the log to a permanent file for a quick view. The goal of this paper is to introduce a SAS macro that quickly displays these overly long values in the log for quick comparison.
QT-233 : Put on the SAS® Sorting Hat and Discover Which Sort is Best for You!
Charu Shankar, SAS Institute
Louise Hadden, Abt Associates Inc.
Monday, 10:30 AM - 10:40 AM, Location: LVL Ballroom: Continental 8-9
The ordering, and combining, of data sets is one of the most ubiquitous tasks that programmers using any programming language face. The need to sort data is deeply entrenched in every analytic software package, and indeed, is an integral part of many procedures and packages in software packages. Sorting, or arranging or ordering data and data sets, in is an expensive process in terms of both time and resources consumed, and SAS® is no exception to that rule. Charu & Louise will explore some of popular as well as lesser known sort methods that SAS provides and incorporates into SAS processing. Emulate the sorting hat in Harry Potter! This paper and presentation provides the inside scoop on the dynamic processes that go on behind the scenes during SAS sorting that will enable you to pick the very best sorting method for your circumstances. Learn about some fantastical, magical SAS sorting processes: bubble sort, quick, threaded, and serpentine. Behold the effervescent bubble & squeak sort whilst sorting a hash object! In a hurry? Take a look at the quick sort. Looking for superior efficiency? Consider the threaded sort with PROC SQL. See how the hissing serpentine sort in SAS, like the slithering serpent Nagini sliding surreptitiously through walls, can come in handy! Explore the many facets of SAS sorting and consider which sort will you choose - or which sort will choose you?
QT-234 : Using P21 Checks to Help Your DM Out!
Chad Fewell, Deciphera Pharmaceuticals
Jesse Beck, N/A
Monday, 2:15 PM - 2:25 PM, Location: LVL Ballroom: Continental 8-9
There is a lot of work that goes into the cleaning of data. Data Management (DM) issues queries on an ongoing basis and Statistical Programming (SP) escalates data issues they see within their programs. DM will creates edit checks within the database to help with cleaning. In addition, they manually review the data on an ongoing basis looking for data that appears incorrect. Statistical Programming works to program raw data into SDTMs checking for data issues along the way. There appears to be a "gap" in data issues that are looked at from a continuous basis to issues that are discovered prior to submission. This is where Pinnacle 21 comes into play (P21). P21 uses SDTMs to help clean the data, but sometimes due to resourcing SDTMs are not completed until close to study termination. This causes a delay in issues that are difficult to find manually by DM but are essential to be cleaned prior to Database Lock (DBL). "How do we help avoid this delay in checking more complex data issues?" one may ask. P21 provides an extensive list of all the checks they are looking for along with the severity of the data issues. The purpose of this paper will detail how to execute P21 checks to supplement the database edit checks as well as all the ongoing data cleaning processes to provide a detailed user-friendly report to DM of data issues that can have a wide range in complexity typically not seen on an ongoing basis.
QT-236 : 10 Quick Tips for Getting Tipsy with SAS
Lisa Mendez, Catalyst Clinical Research
Richann Watson, DataRich Consulting
Monday, 11:30 AM - 11:50 AM, Location: LVL Ballroom: Continental 8-9
There are many useful tips that do not warrant a full paper but put some cool SAS® code together and you get a cocktail of SAS code goodness. This paper will provide ten great coding techniques that will help enhance your SAS programs. We will show you how to 1) add quotes around each tokens in a text string, 2) create column headers based values using Transpose Procedure (PROC TRANSPOSE), 3) set missing values to zero without using a bunch of if-then-else statements, 4) using short-hand techniques for assigning lengths of consecutive variables and for initializing to missing, 5) calculating the difference between the visit's actual study day and the target study and accounting for no day zero (0); 6) use SQL Procedure (PROC SQL) to read variables from a data set into a macro variable, 7) use the XLSX engine to read data from multiple Microsoft® Excel® Worksheets, 8) test to see if a file exists before trying to open it, such as an Microsoft Excel file, 9) using the DIV function to divide so you don't have to check if the value is zero, and 10) use abbreviations for frequently used SAS code.
QT-239 : Taming the Large SAS® Dataset by Splitting it Up
David Franklin, TheProgrammersCabin.com
Monday, 2:45 PM - 2:55 PM, Location: LVL Ballroom: Continental 8-9
Large SAS datasets tie up computer resources and are often not useful to have as stand-alone entities. You may also want to split a dataset set up into pieces either for the practicalities of sending a dataset in small chucks to a single recipient, or send subsets of that data to different recipients. This paper will show the development of two macros that are useful in splitting the large dataset, the first being specifying the number of observations that should go into an output dataset, and the second being split by values.
QT-263 : R Tables via GT for Regulatory Submissions
Phil Bowsher, RStudio Inc.
Monday, 3:00 PM - 3:10 PM, Location: LVL Ballroom: Continental 8-9
RStudio will be presenting an overview of the GT R package for the R user community at PharmaSUG. This is a great opportunity to learn and get inspired about new capabilities for generating TFLs (Tables, Figures, and Listings) for inclusion in Clinical Study Reports created in R. In this talk, we will review and reproduce a subset of common table outputs used in clinical reporting containing descriptive statistics, counts and or percentages. No prior knowledge of R or RStudio is needed. This short talk will provide an introduction to gt as a flexible and powerful package for generating tables as part of your research and reporting TFL programming. The talk will provide an introduction to TFL-producing R programs and include an overview of the gt R package with applications in drug development such as safety analysis and Adverse Events. A live environment will be available for attendees to explore the tables real-time.
QT-280 : Are you planning to create/validate CDISC data set in R? Here is a step-by-step guide!
Ganeshchandra Gupta, Ephicacy Consulting Group
Monday, 3:15 PM - 3:25 PM, Location: LVL Ballroom: Continental 8-9
In recent years, many pharmaceutical companies have adopted R as a data analysis tool. The main reasons for the increasing importance of R are the availability and ever-growing number of high-quality statistical methods, very good graphics capabilities and a wide range of useful programming extensions. However, no single programming language can solve every single problem you will encounter in your programming career. Though SAS is easy to learn and provides simpler coding options, R on the other hand has a stepwise learning, depending upon programming language. To work with R, by rule of thumb, you will have to know the basics of the R language which is quite easy. And still, no one ever talks about how simple it is to do clinical trial data manipulation and creation of CDISC: SDTM/ADaM data sets. Majority of Pharma/Biotech companies and CROs use double programming technique to validate SAS data sets. And so, we can use R programming to validate data sets created in SAS environment which could potentially reduce the license cost involved. This paper will provide a step-by-step guide on creating and validating SDTM Demographics (DM) domain along with SAS code comparison. Begin from the Beginning!
QT-314 : Assign Character Values from Logical Expression with CHOOSEC
Joel Campbell, Advanced Analytics, Inc.
Monday, 4:15 PM - 4:25 PM, Location: LVL Ballroom: Continental 8-9
If you've ever wished for a way to assign character values based on a logical expression in one line of code, well I've got great news for you: CHOOSEC, you'll see! Usually, character values are assigned in the data step using one line of code with a PUT() using a format or $format. But this requires creation of the format in another procedure altogether, sometimes even requiring the programmer to actually scroll up, create the new format, and potentially lose track of the progress being made in the data step. At minimum, the format/$format method isn't a true One Line Solution™ since additional lines are needed to create the format before it's available in the data step. CHOOSEC() can simulate a format or $format in one line of code in the data step.
QT-336 : How to generate questionnaire data compliance report
Phaneendhar Gondesi, Blueprint Medicines
Monday, 2:30 PM - 2:40 PM, Location: LVL Ballroom: Continental 8-9
Achieving acceptable compliance is an important factor for the success of any clinical trial. Periodic monitoring of study data is key to ensure the expected compliance. Although the acceptable compliance rate varies from study to study, from assessment to assessment, the basic concept of calculating compliance remains the same. In this paper, we are going to discuss how to programmatically generate compliance report for most questionnaire (QS) data.
Real World Evidence and Big DataRW-050 : Real World Evidence in Distributed Data Networks: Lessons from a Post-Marketing Safety Study
Matthew Slaughter, Kaiser Permanente Center for Health Research
Denis Nyongesa, KAISER PERMANENTE
John Dickerson, Kaiser Permanente Center for Health Research
Jennifer Kuntz, Kaiser Permanente Center for Health Research
Monday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 7
As a case study to illustrate both opportunities and challenges in distributed data networks, this paper will focus on the implementation of a post-marketing safety study in the Healthcare Systems Research Network (HCSRN) via the associated Virtual Data Warehouse (VDW) common data model. In response to an FDA post-marketing requirement, this study establishes the incidence of angioedema in chronic heart failure patients treated with Sacubitril/Valsartan and incorporates data from multiple distributed data networks, including HCSRN. Distributed data networks present exciting opportunities for gathering real-world evidence by pooling standardized datasets across institutions. Common data models facilitate the efficient allocation of programming work to conduct analysis while allowing participating sites to retain control of their own data. However, large-scale and high-quality data collection combining data from disparate health data systems presents technical, administrative, and scientific challenges. In addition to describing programming, data management, and validation techniques used by HCSRN analysts in this study, we will compare design choices made by the various data networks involved in the project, and explore their practical consequences.
RW-060 : Exploring the spread of COVID-19 in the United States using unsupervised graph-based machine learning
Kostiantyn Drach, Intego Group
Iryna Kotenko, Intego Group
Monday, 8:30 AM - 9:20 AM, Location: LVL Ballroom: Continental 7
Real-world data can be essential for our understanding of clinical data, especially with the emergence of phenomena such as the COVID-19 outbreak. In this paper, we analyze how the spread of the virus has advanced across the U.S. during the initial phase of the pandemic using novel graph-based machine-learning techniques. First, a cloud of graphs is extracted from several publicly available datasets. In these graphs, each node corresponds to a single county (>3000 nodes per graph), whereby two counties are connected with an edge if they have similar patterns in the advance of the pandemic spread over a specific timeframe. A graph (or a subset of graphs) from the cloud with the most robust geometric properties is subsequently revealed. This constitutes a topological model of data. Next, unsupervised machine learning algorithms discover communities of nodes within the chosen graph relying on the pure geometric properties of the model. Finally, the highlighted communities are compared to each other based on the real-world data employed by the model to explain dissimilarities between the communities. A variety of publicly available real-world data, including healthcare, social, demographic, economic, and geographic data, was used in the analysis. Our geometric, data-driven approach reveals insights that would otherwise have been difficult to identify through the implementation of standard statistical methods alone. The focus on topological properties helps to identify the underlying geometry of the dataset and to discover a set of unrelated features that may be causing the similarity in the spread of COVID-19 across the U.S.
RW-113 : Patient's Journey using Real World Data and its Advanced Analytics
Kevin Lee, Genpact
Monday, 10:00 AM - 10:20 AM, Location: LVL Ballroom: Continental 7
Real World Data (RWD) is data collected outside of clinical trial study, and Real-World Evidence (RWE) could be achieved through the insight from RWD. RWD sources come from EMR, health insurance claims, genomic data, and IoT from apps and wearables. RWD anonymized patient data has revolutionized how companies view patient data since it captures longitudinal pharmacy prescription, medical claims, and diagnosis. The paper is written for those who want to understand how RWD patient data are collected and how they could be analyzed to support pharmaceutical companies. Mainly, RWD patient data could support patient analytics, commercial analytics, and payer analytics such as source of business, switch of prescription, payment method, market analysis, promotional activities, drug launch and forecasting. The paper also discusses the technology that data scientists use for RWD such as Data Warehouse, Data Visualization, Opensource Programming, Cloud Computing, GitHub, and Machine Learning.
RW-154 : Automating Non-Standard New Study Set Up with a SAS Based Work Process
Xingshu Zhu, Merck
Bo Zheng, Merck
Li Ma, Merck
Monday, 11:30 AM - 11:50 AM, Location: LVL Ballroom: Continental 7
It's essential and effective to start a new study utilizing broad institutional standards macros, especially for Data quality review (DQR) and analysis and reporting (A&R) in real-world-evidence (RWE) studies. However, the rapid developmental pace of studies introduces the challenge of effectively managing standard macros and often requires the identification of what standard macros are needed and where they are located. Having a clearly defined work process and tool for automatically copying specific macros and folders to the new study directory significantly reduces the time spent on manual work and improves reliability and efficiency across multiple studies. This paper highlights the functional portions of a real-life work process and a SAS macro to easily automate this task while preserving the original file timestamps for traceability.
RW-163 : CMS VRDC - A simplified overview. What to expect in terms of data, system, code.
Zeke Torres, Code629
Monday, 10:30 AM - 11:20 AM, Location: LVL Ballroom: Continental 7
Lets review a huge medical claim repository called the VRDC by CMS. Its the virtual resource data center. It has all of the medicare/medicaid claims of all participants nation wide. Thats approximately over 150m patients. And it span decades of data. Lets discuss some basics about how to navigate some of this realm. The type of data you will see and the SIZE of it too. We will discuss things like: Inpatient, Outpatient claims - how they are similar and different. ICD9 vs ICD10 - lets review that too. This is not highly technical - more of the 5k foot view of the realm that is this analytic ecosystem. This presentation will also include an overview to why uploading/downloading need to be factored into how you manage your code and test your code. The VRDC is a great resource but also expensive. We want to foster a team environment where we code locally "off" the vrdc. And run "on" the vrdc after we have made all of our testing and development final and proven our code will work. There is also mention of some sample code (macros) that can be used on this environment and off the vrdc. This macro allows us to easily "mask" results of numbers that are too low (under a count of 11) to allow to publicize or will not be allowed to be downloaded.
RW-322 : Novel Applications of Real World Data in Clinical Trial Operations: Clinical Trial Feasibility
Sherrine Eid, SAS Institute
Samiul Haque, SAS Institute, Inc
Monday, 2:00 PM - 2:50 PM, Location: LVL Ballroom: Continental 7
INTRODUCTION: Today Real World Evidence (RWE) derived from Real World Data (RWD) is essential to explore target patient populations (TPP) and more accurately inform robust initial trial hypotheses and expected performance. Applying RWD creates more patient-centric protocols, identifies and reduces operational risk and development costs, while also increasing the likelihood of regulatory approval and patient retention. METHODS: Three scenarios of conventional and novel methods of informing target patient populations intended for Phase II and III clinical trials were explored and compared on overall population counts, demographic and baseline characteristic distributions, and scientific robustness using SAS Viya and Python. The first scenario (STRINGENT) implements a conventional approach where a subject matter expert defines the study TPP based on expertise and literature. The second scenario (RELAXED) leverages RWD with a subject matter expert to determine the impact of removing an exclusion criterion on the patient count, as well as the scientific robustness of the study. The third scenario (ML+SME) leverages RWD and machine learning algorithms to determine the role and importance of comorbidities in defining the eligibility criteria. RESULTS: The results showed that relaxing eligibility criteria in the RELAXED scenario increased the population count without compromising the scientific robustness of study outcomes. Although, machine learning algorithms in the ML+SME scenario revealed potentially additional exclusion criteria and smaller counts, it suggested a more precise TPP, which would yield less attrition, greater retention, and more efficient trial operations. CONCLUSION: RWE is a necessary and critical factor in assessing clinical trial feasibility.
RW-324 : Novel Applications of Real World Data in Clinical Trials: External Control Arms
Sherrine Eid, SAS Institute
Samiul Haque, SAS Institute, Inc
Monday, 1:30 PM - 1:50 PM, Location: LVL Ballroom: Continental 7
INTRODUCTION: Regulatory agencies have issued guidance on selection and evaluation of Control Groups in clinical trials. External control arms (ECA) may be sourced from prior clinical trial data (individual or pooled), or observational, real-world data (RWD), such as registries, electronic health records, and medical or pharmacy claims. By reducing or eliminating the need to enroll control participants, a synthetic or external control arm can increase efficiency, reduce delays, lower trial costs, and speed lifesaving therapies to market. ECAs should ideally be temporally and clinically relevant to the treatment arm to minimize bias and require sufficient patient-level data to ensure a statistically robust comparison. We address key considerations, risks and possible mitigation strategies when employing ECAs in their trial designs. In general, the use of ECAs to determine comparative treatment effect is not widely applicable or appropriate for most clinical studies due to the potential for bias, such as confounding, selection bias, temporal bias, or immortal time bias. Even in a well-understood disease, lack of appropriate measurement of exposure or outcome, misaligned contemporality, or poor selection of an appropriate control can create an artificially significant demonstrated treatment effect that is not related to therapeutic intervention. CONCLUSION: While ECAs can be the future of drug development if scientific rigor is ensured through analytical methods to compensate for the compromise in study design. Key considerations in data source, ECA definition and analytic methods are crucial to ensuring a valid ECA that will withstand scrutiny by health authorities.
Solution DevelopmentSD-069 : Application of Tipping Point Analysis in Clinical Trials using the Multiple Imputation Procedure in SAS
Yunxia Sui, AbbVie
Xianwei Bu, AbbVie
Monday, 8:00 AM - 8:20 AM, Location: LVL Lobby: Golden Gate 8
In phase 3 clinical studies, tipping point analysis has been increasingly requested by regulatory agencies as a sensitivity analysis under missing not at random (MNAR) assumption to assess the robustness of the primary analysis results. One way to implement the tipping point analysis is using the SAS procedure PROC MI, which includes two steps: step one is to impute missing data using multiple imputation (MI) under missing at random (MAR) assumption, and step two uses the MNAR statement to adjust the MI imputed values by a pre-specified set of shift parameters for each treatment group independently. The tipping points are outcomes where the significance of treatment effect is just reversed. In practice, the actual shifts to the MI imputed values are not always exactly the same as the shift parameters specified in the MNAR statement. We summarize our experience with this issue and potential pitfalls in implementing the tipping point analysis using PROC MI and propose alternative options such that the expected shift can be achieved. We propose tipping point analysis method using multiple imputation approach for both continuous and binary endpoints.
SD-070 : Do it the smart way, renumber with PowerShell scripts!
Menaga Guruswamy Ponnupandy, ThermoFisher Scientific (PPD)
Monday, 8:30 AM - 8:50 AM, Location: LVL Lobby: Golden Gate 8
Renumbering large numbers of SAS programs is often a time-consuming and tedious task. Traditional methods, such as manual renaming or using SAS macros, lack the flexibility to adapt to different numbering schemes or platforms. This paper presents a step-by-step guide to renumbering SAS programs efficiently using PowerShell scripts, offering a dynamic and adaptable solution. By leveraging PowerShell, data analysts can streamline the renumbering process, save time, and ensure accurate numbering according to client specifications.
SD-072 : Data Access Made Easy Using SAS® Studio
Kirk Paul Lafler, sasNerd
Shaonan Wang, Informatics Skunkworks
Nuoer Lu, UC San Diego Health
Zheyuan Walter Yu, Optimus Dental Supply
Daniel Qian, University of Washington
Monday, 10:00 AM - 10:50 AM, Location: LVL Lobby: Golden Gate 8
SAS® OnDemand for Academics (ODA) gives students, faculty, and SAS learners free access to SAS software and the SAS® Studio user interface using a web browser. SAS Studio provides users with a comprehensive and customizable integrated development environment (IDE) for all SAS users. A number of techniques will be introduced to showcase SAS Studio's ability to access a variety of data files; the application of point-and-click techniques using the Navigation Pane's Tasks and Utilities; the importation of external delimited text (or tab-delimited), comma-separated values (CSV), and Excel spreadsheet data files by accessing Import Data under Utilities; read JSON data files; and view the results and SAS data sets that are produced. We also provide key takeaways to assist users learn through the application of tips, techniques, and effective examples.
SD-082 : Battle of the Titans (Part II): PROC REPORT versus PROC TABULATE
Kirk Paul Lafler, sasNerd
Josh Horstman, Nested Loop Consulting
Ben Cochran, The Bedford Group
Ray Pass, "Retired and Having the Time of His Life”
Dan Bruns, "Very Happily Retired”
Monday, 11:00 AM - 11:50 AM, Location: LVL Lobby: Golden Gate 8
Should I use PROC REPORT or PROC TABULATE to produce that report? Which one will give me the control and flexibility to produce the report exactly the way I want it to look? Which one is easier to use? Which one is more powerful? WHICH ONE IS BETTER? If you have these and other questions about the pros and cons of the REPORT and TABULATE procedures, this presentation is for you. We will discuss, using real-life report scenarios, the strengths (and even a few weaknesses) of the two most powerful reporting procedures in SAS® (as we see it). We will provide you with the knowledge you need to make that difficult decision about which procedure to use to get the report you really want and need.
SD-084 : A Macro Utility for CDISC Datasets Cross Checking
Chao Su, Merck
Jaime Yan, Merck
Changhong Shi, Merck
Monday, 9:00 AM - 9:20 AM, Location: LVL Lobby: Golden Gate 8
High-quality data in clinical trials is essential for compliance with Good Clinical Practice (GCP) and regulatory requirements. However, data issues exist in ADaM and SDTM datasets within and between them in practical studies. In order to identify and clean data issues before database lock (DBL) or other main milestones, a macro is developed for discrepancies cross-checking between ADaM and SDTM datasets during analysis and reporting processes. In this paper, some common data checks among ADaM and SDTM datasets are presented and discussed. The findings are reported in an Excel spreadsheet with a friendly interface consisting of a neat summary tab and individual formatted tab for each data issue category. Moreover, the modularized structure provides excellent scalability and flexibility for the user to add a user-defined rule with simple and easy steps. This feature allows the macro to be used far beyond CDISC datasets. User-defined rules can be extended to various data structures and types across therapeutical areas and studies. This utility provides a friendly and flexible way to check and track data issues related to the A&R process accurately and efficiently.
SD-098 : Introduction of Developing Resistance Dataset
Jenny Zhang, Merck & Co., Inc
Shunbing Zhao, Merck & Co.
Monday, 1:30 PM - 1:50 PM, Location: LVL Lobby: Golden Gate 8
Antimicrobial resistance (AMR) is increasingly being recognized as a global threat to public health, with resistance data providing important information used to guide clinical development of a new investigational drug. We had an opportunity to develop a resistance dataset for an infectious disease clinical trial, facing many challenges in developing this complicated dataset. This paper introduces this dataset and summarizes challenges and how to overcome them. (1) Introduction of main variables in resistance dataset (2) Translating the completed dataset specification to a streamlined algorithm and SAS codes for genotypic and phenotypic data, providing positioning variables for hundreds of amino acid sequence data for PR (protease) and RT (reverse transcriptase) (3) Challenges dealing with RTXXXX and PRXXXX variables in the spec, translate them to 99 RTXXXX variables, 560 RT variables and 3 insertions
SD-103 : A SAS Macro to Perform Consistency Check in CSR Footnote References
Jeff Xia, Merck
Chandana Sudini, Merck
Monday, 2:00 PM - 2:20 PM, Location: LVL Lobby: Golden Gate 8
Footnote is an important part of tables, figures, and listings (TFLs) for CSR, which might include but not limited to abundant information such as abbreviations, acronyms, additional explanations that have values to be added as a part of the TLFs. Footnote is normally provided in a sequential order such as in the format of Roman sequence, or alphabetically, etc. It is often to see inconsistency of footnote reference between the body of TLFs and the footnote section, i.e., a reference number appear in the TFL body, but there is no corresponding reference number in the footnote section, or vice versa. Catching these inconsistencies by eye-browsing is an attention demanding and error prone task to perform. This paper introduces a SAS macro that compares the list of footnote reference number between the body of TFLs and the ones in the footnote section, and flag any discrepancies of footnote reference for each TFL. In addition, the macro generates a report to list the name of each TFL and details of the discrepancies. It is suggested to run this macro after TLFs produced as part of the dry run package, and before delivering the final TFLs to Clinical for CSR.
SD-109 : Importance of Creating a Learning Portal for Statistical Programming End-to-End Processes
Yogesh Pande, Merck Inc.
Donna Hyatt, Merck & Co., Inc.
Brandy Cahill, Merck & Co., Inc.
Monday, 2:30 PM - 2:50 PM, Location: LVL Lobby: Golden Gate 8
In a highly regulated pharmaceutical industry, there are multiple Standard Operating Procedures (SOPs) written with respect to the mandatory processes that a clinical/statistical programmer need to follow/practice in their day-to-day programming activities. Multiple SOPs/processes can cause confusion and lead to non-compliance with the processes followed within the Statistical Programming Department (SPD). To avoid this situation, the statistical programming leadership team produced an idea of creating a site/portal that contains important topics listed in one place, as hyperlinks with each topic explained from start to end. For this very reason, the statistical programming end-to-end (SP E2E) learning portal gained a lot of popularity within SPD among junior, new hires, and even the senior programmers. The goal of understanding which process should be followed and when it should be followed was achieved. This idea of developing a learning portal also ensured that details for each process/topic have reached the right audience and that the expectations are understood by every programmer within SPD. The paper is written explaining details on the specific format used for each topic, and the review process followed for each topic, before publishing the topic in SP E2E portal.
SD-122 : Building an Internal R Package for Statistical Analysis and Reporting in Clinical Trials: A SAS User's Perspective
Huei-Ling Chen, Merck & Co.
Heng Zhou, Merck & Co.
Nan Xiao, Merck & Co., Inc.
Monday, 3:00 PM - 3:20 PM, Location: LVL Lobby: Golden Gate 8
The programming language R has seen an increase in usage in the analysis and reporting sector of the pharmaceutical industry. Similar to how SAS programmers regularly write SAS macros, it is common for R users to write R functions to complete repetitive tasks, thus facilitating programming work. A R package is similar to a well-built SAS macro library; this includes a collection of functions, instruction documentation, sample data, and testing code with validation evidence. A R package formalizes access to the R functions. Yet new users may find a steep learning curve associated with creating an R package from scratch. This paper outlines the essential components of an R package and the valuable tools to help create these components. Relevant online reference materials are provided as well.
SD-167 : Bringing it All Together: Applying the Analytics Life Cycle for Natural Language Processing to Life Sciences
Sundaresh Sankaran, SAS
Tom Sabo, SAS
Monday, 4:00 PM - 4:50 PM, Location: LVL Lobby: Golden Gate 8
Life Sciences stakeholders, including the Pharmaceutical industry and regulators, seek timely, accurate, and interpretable safety signal analysis. However, much of this data is unstructured freeform text, including physicians' notes and patient narratives. Manual review of such data is time-consuming and labor-intensive. It involves considerable effort spent to understand the context behind safety signals. Volume pressure and lack of standardization in review increases the chance of error, which can have expensive repercussions through regulatory holds, financial penalties, and reputational risk. Artificial Intelligence techniques such as Machine Learning and Natural Language Processing (NLP) help achieve better analysis. These capabilities are accessible to subject matter experts and programmers. In this paper, we demonstrate the application of such techniques on SAS Viya including SAS Visual Text Analytics and show how they lead to faster realization of actionable insights. Automated dashboard explorations help domain experts and non-technical analytics consumers obtain early insights regarding topics and themes across freeform text, such as patterns of drug side effects or successfully treated symptoms. We further address the important issue of interpretability, to help organizations understand the logic behind models, and give subject matter experts the opportunity to refine these models without being a data scientist. Ultimately, we aim to give readers a better understanding of how the analytics lifecycle, particularly for NLP, can be applied to life sciences missions for a variety of user personas.
SD-169 : Effective APIs for SAS Language Applications
Randy Betancourt, Altair
Oliver Robinson, Altair
Tuesday, 8:00 AM - 8:50 AM, Location: LVL Lobby: Golden Gate 8
Pharmaceutical firms have large stocks of validated SAS language programs. Many programs are run episodically requiring some alterations to logic to produce output. An improvement on this process is to pass parameters to validated autocall Macros. Still, the user often hand-edit programs, tests, then copy and paste output to distribute results. In many cases, multiple manual steps for copying and pasting is required. The use of SAS language autocall Macro libraries provide a method for reuse of SAS program logic. With changes in workforce composition and skills, firms have an imperative to automate this logic as well as expand their software tooling and process automation to encompass Python, R, SQL, Perl, and the script languages such as Powershell and Linux shell scripts. We propose fully automating distribution of SAS language programs along with Python, R, and SQL by enabling any consumer (browser, Excel, middleware applications, web portals, etc.) to make REST API calls, passing Macro parameter to execute the program's logic. This paper describes a novel, easy-to-implement architecture and workflow upgrading validated autocall Macro libraries and exposing them as API calls passing parameters to auto-generated API endpoints. Another goal is to describe a scalable software architecture based on the OpenAPI Initiative (OAI) and its OpenAPI Specification (OAS) enabling these capabilities.
SD-183 : Was the load okay?
Lisa Eckler, Lisa Eckler Consulting Inc.
Tuesday, 10:00 AM - 10:50 AM, Location: LVL Lobby: Golden Gate 8
After a series of database tables are loaded from multiple data sources and before using the data to feed automated reports and business intelligence tools, we want to know whether the load was complete and correct. This goes beyond confirming that the jobs ran without errors. The more precise concerns are: • Were all of the data sources ingested? • Were the right number of rows of data added or updated in each table? • Were all of the appropriate columns populated? • Do the data values make sense? • Are the values of categorical variables different than expected? This paper describes a fully automated SAS® process for comparing a new set of data loaded to one or more tables with previous sets to check for reasonableness and completeness and highlight potential problems. It can also be used more generally as a data comparison tool for two or more subsets of similar data.
SD-185 : A Light-Weight Framework to Manage Programs and Run All the TLFs in R
Chi-Hua Huang, Astellas Pharma Global Development, Inc.
Tuesday, 1:30 PM - 1:50 PM, Location: LVL Lobby: Golden Gate 8
In most cases, a lot of Table/Listing/Figure (TLF) programs are developed in a study and many standard and study-specific considerations take place in its analyses. For each study deliverable, a set of programs are selected and run to produce outputs accordingly. Some programs have dependencies in general. Both program selections and dependencies determine the order of running the programs, i.e., processing time. Furthermore, R pre-loads data into memory, time to load physical data impacts more on processing, especially when dealing with large data. While R has been widely used to generate analysis results, there is not yet a comprehensive solution available to batch run multiple R scripts concurrently. To address this, this paper presents a light-weight framework that utilizes R base, tidyverse, sassy, and rstudioapi packages in conjunction with MS excel to efficiently produce/manage the TLFs for a deliverable. In addition, it will demonstrate the use of delegate function to connect the individual programs with a run-all program, by embedding the specific logics into a function while creating all the TLFs.
SD-207 : Acceleration and automation of genomic data analysis to meet corporate compliance standards using advanced cloud components
Gopal Joshi, Senior Scientist
Satyoki Chatterjee, Project Manager
Pankaj Choudhary, Bioinformatics Analyst
Sanjay Koshatwar, Circulants
Shekhar Seera, Circulants
Tuesday, 9:00 AM - 9:20 AM, Location: LVL Lobby: Golden Gate 8
Recent advancements in High-throughput next-generation sequencing (NGS) technologies grew exponentially in genomic research revolutionizing biological data analysis, and enhancing the study of complex biological systems at an unprecedented scale. The technological limitations of the NGS system are the deluge of genomic data produced. It's difficult for a single workstation to execute sequential methods and deliver results quickly. Efficacy decreases significantly with human interference. To mitigate them, we developed an in-house pipeline, with the help of AWS services and tools like snakemake, Kallisto, etc., for automating RNA-seq data analysis. It's efficient, scalable, reproducible, version-controlled, transparent, and cost-effective for large volumes of data. In this study, we have reviewed RNA-sequencing techniques using AWS to analyze gene expression at the transcriptional level. The systematic approach allows CROs to transfer raw data using an SFTP server, followed by an automated transfer to Simple Storage Service (S3) and preceded by data quality validation. Helper scripts then transfer data from S3 to Elastic File System (EFS), launches the Fastq processing pipeline, clone a GitHub repo of the corresponding project, and leverage AWS Batch to spin up a dynamic Elastic Compute Cloud (EC2) instance as desired. After successful execution, outputs are available in EFS, and secondary data analysis is performed using RStudio Workbench ending with automated results archival in S3.
SD-221 : SAS® System Macros to Summarize the COMPARE Procedure Results and SAS Logs for a Directory or Single File.
Kevin Viel, Navitas Data Sciences
Tuesday, 11:00 AM - 11:50 AM, Location: LVL Lobby: Golden Gate 8
A moderate project in clinical trial programming might have 30-50 ADaM data sets and 200-500 tables, listing, and figures. The convention is 100% independent programming with a comparison of the data sets. When the SAS® System is used, this may be referred to as DP-PC: Double-Programming, Proc COMPARE. Further, a check of the SAS logs for certain words or phrases indicative of unacceptable NOTES, ERROR messages, or WARNING is also appropriate. At times, a lead programmer or biostatistician may want to verify progress or confirm the attestations that Validation is complete yet reviewing so many SAS .lst and .log files manually is cumbersome and a sample may not suffice. The goal of this paper is to introduce two SAS macros and ancillary macros that they require, that summarize the results of COMPARE procedures and log checks of all files in directories or single files. Such activities are essential to a readiness audit for delivery or submission and for routine programming.
SD-230 : Automated Mockup Table and Metadata Generator
Jeff Cheng, Merck & Co., Inc.
Shunbing Zhao, Merck & Co.
Guowei Wu, Merck & Co., Inc.
Suhas Sanjee, Merck & Co., Inc.
Tuesday, 2:00 PM - 2:20 PM, Location: LVL Lobby: Golden Gate 8
Preparation and production of an analysis and reporting (A&R) package for a regulatory filing is a resource intensive process. The A&R package includes and is not limited to analysis datasets, tables, listings, and figures (TLFs). To ensure that the TLFs meet the needs of the stakeholders, mockup TLFs are generated by the statisticians to communicate the TLF specifications to the statistical programming team. Currently the mockup TLF generation process is a manual and time-consuming process. In addition, a statistical programmer creates call programs and the related documentation based on the mockups. Populating and maintaining the same information consistently in multiple documents is challenging and tedious. In order to streamline this process, we developed a proof-of-concept R Shiny app that provides a user-interface to: 1.) gather and store metadata required for generating TLFs once and use it to populate all relevant documents 2.) automatically generate mockup TLFs, required metadata and draft call programs for standard macros based on user inputs 3.) make it easy to reuse and adopt mockup specifications from an existing study for a new one 4.) create a preview of mockup TLFs and related documents. In this paper, we will describe the Shiny app, discuss what we have learned and the next steps.
SD-232 : Tired of Manual Language Translation? Give it a REST!
Shawn Hopkins, Seagen Inc.
Matthew Ness, Seagen Inc.
Tuesday, 2:30 PM - 2:50 PM, Location: LVL Lobby: Golden Gate 8
Working in a global environment or partnership settings, we are bound to encounter situations where we need to translate content from one language to another in order to analyze clinical trial data. With respect to SAS data sets with content in a foreign language, the FDA requires a verified English translation, which is often performed by a third party. However, this may take considerable time and could squeeze programming development too much to meet upcoming timelines. With that in mind, is there a way to get a "good-enough" data set translation in place quickly and efficiently, so programming can start development long before the official translation enters the scene? In this paper we'll share an innovative, fully automated approach to informal translation of SAS data sets received in Extended Unix Code simplified Chinese characters (EUC-CN) into UTF-8 encoding in English. Using Seagen's Microsoft Azure subscription with Text Translation service (free to try), we translated both the variable labels and the content of character variables in each data set using the cloud-based REST API via PROC HTTP. This setup can immediately support translations to and from over 100 different languages, and it has been tremendously helpful to get an early start on analysis programming to comfortably meet our timelines well before third-party translations are available.
SD-255 : Using Bundles for R Package Management
Magnus Mengelbier, Limelogic AB
Tuesday, 4:00 PM - 4:50 PM, Location: LVL Lobby: Golden Gate 8
The broader use of R within Life Sciences from a niche personal statistical analysis tool to an organization wide environment is making R package management a more complex and comprehensive task. As the organization grows and the types of analysis are ever expanded, a substantial increase in the number of packages is certain to follow. The use of R packages for dependency management, such as packrat and renv, is very applicable for personal R environments or where users are permitted to fully customize their collection of packages, but not necessarily validated GCP environments. The R bundle approach discussed is designed to provide a similar high degree of flexibility for dependency management while staying within the constraints imposed by GCP compliance. Additional benefits, such as significant decrease in complexity and improved overall life cycle of the R environment, are further explored through examples.
SD-257 : Using R to Create Population Pharmacokinetic Dataset
Yangwei Yan, Bristol Myers Squibb
Prema Sukumar, Bristol Myers Squibb
Neelima Thanneer, Bristol Myers Squibb
Tuesday, 3:00 PM - 3:20 PM, Location: LVL Lobby: Golden Gate 8
Population Pharmacokinetics (PopPK) analysis is essential to evaluate the drug safety and efficacy among population subgroups in drug development. One of the key steps in the PopPK modeling is to provide a structured input dataset for NONMEM and/or other modeling software. While SAS has long been a dominant tool for statistical programming and analysis in the pharmaceutical industry, R has become a trending programming tool and is widely used in multiple areas due to its power and flexibility in supporting statistical analysis and advanced visualization. However, the use of R in PopPK dataset preparation has not been discussed much. This paper demonstrates a step-by-step process to generate datasets for PopPK analysis in RStudio using R markdown with an example study data. It offers flexibility to use another programming language for programmers and pharmacometricians. We utilized typical R packages including tidyverse, lubridate, etc, and source data like ADaM ADPC, ADEX, ADSL, etc to create an ADPPK dataset, complying with the latest ISoP (International Society of Pharmacometrics) standard that's in the process of becoming CDISC ADaM PopPK Dataset. The paper will mainly focus on the process of handling the variables with date and time format, conducting dose time imputation, and deriving time-varying variables. We also briefly summarize the advantages of using R in preparing the PopPK analysis dataset.
SD-264 : R Package Quality & Validation: Current Landscape
Phil Bowsher, RStudio Inc.
Tuesday, 5:00 PM - 5:20 PM, Location: LVL Lobby: Golden Gate 8
RStudio will be presenting an overview of current developments for R package quality and validation in R for the R user community at PharmaSUG. This talk will review various approaches that have developed in the pharma community when using R within the regulatory environment. This is a great opportunity to learn about best practices when approaching validation in R. No prior knowledge of R/RStudio is needed. This short talk will provide an introduction to the current landscape of validation as well as recent developments. RStudio will share insights and advice from the last 6 years in helping pharma organizations incorporate R into clinical environments. This presentation will highlight many of the current approaches to validation in adding R (and some python) to an GxP environment.
SD-270 : Automating Data Validations with Pinnacle 21 Command Line Interface
Philipp Strigunov, Pinnacle 21
Wednesday, 8:00 AM - 8:20 AM, Location: LVL Lobby: Golden Gate 2
Pinnacle 21 Command Line Interface (CLI) allows automation of validation jobs resulting in higher reliability of results and better overall performance of the data preparation process. For example, using the command line tool allows us to integrate Pinnacle 21 software with programming environments including SAS® LSAF, and to work with SAS native SAS7BDAT format. Pinnacle 21 CLI also supports the creation of a Define.xml file. In this paper, we will overview P21 CLI, explain its syntax parameters, and discuss new features brought in the recent updates. We will also present examples of automating datasets and define.xml validations using shell scripts and SAS® programming language.
SD-300 : Standardized data handling framework for wearables
Madhu Annamalai, Algorics
Umayal Annamalai, Algorics
Wednesday, 8:30 AM - 8:50 AM, Location: LVL Lobby: Golden Gate 2
Wearable devices have revolutionized the way clinical research is conducted in recent times, thereby bringing technology closer to the patients. While wearables can help continuous and near-real-time monitoring of patient data, analysis and reporting continue to be a challenging as the quantum of data continues to grow. In this session, we will be discussing a problem and solutions framework for handling data from wearables to improve analysis and reporting. The key takeaways will include data integration strategies, data mapping based on CDISC standards, creating wearable device data-specific analysis models, and the generation of custom reports based on the study requirements.
SD-311 : Exploring the use of AI-based image recognition and Machine Learning to Improve the Efficiency and Accuracy of TLFs Validation Process.
Mor Ram-On, Beaconcure
Wednesday, 10:45 AM - 11:05 AM, Location: LVL Lobby: Golden Gate 2
Validation of tables, listings, and figures (TLFs) is a complex and labor-intensive process, particularly in the realm of visual comparison of figures. Statistical programmers often rely on manual methods, which involve opening tables and figures in separate windows and comparing them for data consistency, format consistency, and alignment with mock shells. However, this approach is prone to errors and oversights. In this presentation, we will explore a technology-based solution that utilizes AI-based image recognition through the integration of machine learning (ML) and deep learning (DL) methods to automate these complex validation tasks. This approach not only improves the efficiency of the validation process but also aligns with clinical standards, resulting in a significant reduction in manual efforts.
SD-313 : Be a Lazy Validator - Let Your Code Do the Work
Cara Lacson, Advance Research Associates, Inc
Ray de la Rosa, Advance Research Associates, Inc.
Carol Matthews, Advance Research Associates, Inc.
Wednesday, 9:45 AM - 10:05 AM, Location: LVL Lobby: Golden Gate 2
Regardless of the study phase or indication, programs written to produce output that supports clinical trials submitted to regulatory agencies have one thing in common - they need to be validated. While code and output can be independently validated by a number of methods, primary responsibility for ensuring that any code produces the correct result resides with the programmer who wrote that code. Both production and validation programmers are responsible for ensuring that their own code is valid before comparing the results. While many feel that this is a time-consuming process, there are many ways to quickly add code that effectively checks for data and logic issues so the act of validating your own code can be done efficiently. This paper will discuss a number of simple techniques to make your SAS and R programs self-validating so future code runs with updated data can be checked quickly and effectively.
SD-325 : Admiralonco - the cross-company R package for Oncology admirers
Neharika Sharma, GlaxoSmithKline Pharmaceuticals
Matthew Marino, GlaxoSmithKline Pharmaceuticals
Wednesday, 10:15AM - 10:35 AM, Location: LVL Lobby: Golden Gate 2
As companies across pharmaceutical industry are focusing on adopting R language for the creation of submission packages and data analysis, there becomes a growing requirement for coming out of silos and working collectively and efficiently towards developing functions and utilities that cater to the common industry-wide need. Through a collaborative effort between pharmaceutical companies like GlaxoSmithKline, Roche, Amgen and Bristol Myers Squibb, Admiralonco was developed as an extension package to the Admiral package to assist in the creation of oncology specific CDISC compliant ADaM datasets like ADRS, ADTTE etc. In this presentation, we will elaborate the contents and usability of the package which is readily available to all statisticians and programmers and which we believe which could help one and all to efficiently increase the usage of R while analyzing clinical data. We will provide an overview of the capabilities of this project and demonstrate the application of some of the functions within the package. Briefly taking audience through the concepts of WRAP and Github, we plan to showcase examples of utilizing these functions in any project. We will also discuss the utilization of Agile working in a collaborative setup between different companies and the process that was followed in the development of this package. Finally, we will share our personal experience working on Admiralonco and discuss our learnings and hardships. Through this presentation, we hope the audience will learn about this package and consider benefiting from it.
SD-334 : Making Publication Metric Tracking Easy: Using an Automated, Integrated System of R, SAS®, and Microsoft Power BI to Ease the Pain of Assessing Publication Metrics
Joshua Cook, Andrews Research & Education Foundation
Jessica Truett, Andrews Research & Education Foundation
Wednesday, 9:00 AM - 9:20 AM, Location: LVL Lobby: Golden Gate 2
Publications in peer-reviewed journals serve as a primary source of knowledge for many different areas of study, including data science and medicine. In addition to serving as a resource, publications are also quantified in the form of publication data metrics. These metrics include publication and citation count, affiliation spread, and journal impact factor. A combination of these metrics is often equated to productivity and impact of the published author or affiliated institution. As a result, publication metrics have been used to make business decisions regarding promotions, tenure, awards, benchmarking, grant funding, and even clinical study sponsorship. Despite their widespread use, publication metrics are not easy to quantify and manage. The simplest way to obtain information regarding these metrics is to use a database search engine such as that of PubMed. However, manually searching PubMed quickly becomes an issue when authors have multiple aliases and affiliations, or common names. This problem is further exacerbated when an institution is comparing metrics across many different authors on a repetitive schedule. By leveraging the tools of a data scientist, primarily R, SAS®, and Microsoft Power BI, it is possible and relatively simple to make an integrated system that can automate this arduous process for any organization. Customizable queries will be written and executed in R to extract publication metric data from a PubMed interface. The data frame from R will be connected to SAS® for data wrangling, and the resulting data set will be connected to Microsoft Power BI for analytical analysis and visualization.
SD-335 : Low Code Approach to Clinical Application Development using SAS Studio Custom Step
David Olaleye, SAS Institute
Monday, 5:00 PM - 5:20 PM, Location: LVL Lobby: Golden Gate 8
SAS Studio Custom Steps offer a quick low code application development environment for programmers to develop custom steps to create a user interface on top of their code. It can be used to design and create dynamic user interfaces for end users with little or no programming experience to quickly gain access to data assets at their site, as well as be able to perform tasks such as data exploration, analysis, and visualization. Custom Steps come with cascading prompts and prompt hierarchies that enable creation of data dependencies between control objects, thus allowing users to query data and interact with the application. In this paper, I will demonstrate how to create a stand-alone SAS custom step, combine multiple custom steps in a SAS Studio Flow, and provide a real-world example of using SAS Studio flow via a point-and-click analytic interface to perform single-cohort and multiple-cohorts propensity score analyses.
Statistics and AnalyticsSA-019 : A Macro to Apply Exclusion Criteria to SAS(r) PK data
Timothy Harrington, Navitas Data Sciences
Tuesday, 10:30 AM - 10:50 AM, Location: LVL Ballroom: Continental 2
PK observations in a SAS data set contain study drug dose administration or PK sampling analysis data. This macro checks the PK observations and finds and marks observations for exclusion from analysis tables and figures due to confirmed or possible invalid data. Examples of such invalid data are: a pre-dose sample having a later time than the dose to which it refers, duplicated observations, excessive difference between a nominal time and its corresponding actual time, and no doses recorded for a given patient. The macro uses a flag variable with a code identifying the exclusion criteria met and a text string containing a description of the exclusion. Exclusions have an order of precedence where an observation meets two or more of the criteria. As well as setting the exclusion flag and text, a note of the exclusion and its description is written to the saslog with the observation numbers _n_ and the BY variables (sort key) and their values. Observations not matching any of the exclusion criteria (the 'normal' situation) have a flag that is missing and a blank text.
SA-068 : Validating novel maraca plots - R and SAS love story
Nicole Major, AstraZeneca
Srivathsa Ravikiran, AstraZeneca
Monika Huhn, AstraZeneca
Samvel Gasparyan, AstraZeneca
Martin Karpefors, AstraZeneca
Tuesday, 4:00 PM - 4:20 PM, Location: LVL Ballroom: Continental 2
Hierarchical composite endpoints (HCE) are complex endpoints combining outcomes of different types and different clinical importance into an ordinal outcome that prioritizes the clinically most important (e.g., most severe) event of a patient. HCE can be analyzed with the win odds, an adaptation of the win ratio to include ties. One of the difficulties in interpreting HCE is the lack of proper tools for visualizing the treatment effect captured by HCE, given the complex nature of the endpoint. The recently introduced maraca plot solves this issue by providing a comprehensive visualization that clearly shows the treatment effects on the HCE and its components. The maraca package in R provides an easy-to-use implementation of maraca plots, building on powerful features provided by the ggplot2 package. The maraca package also provides the calculations for the complex statistical analyses involved in deriving the maraca plot, including the overall treatment effect characterized by win odds. An important gap in the package is the question of how to validate the analyses involved in deriving the maraca plots. In this paper we will demonstrate an approach using SAS to validate the outputs generated by the R maraca package and thereby combining the best of two worlds: the flexible plotting capabilities of R and the powerful data manipulation and statistical analysis tools of SAS.
SA-092 : Picking Scabs and Digging Scarabs: Refactoring User-Defined Decision Table Interpretation Using the SAS® Hash Object To Maximize Efficiency and Minimize Metaprogramming
Troy Hughes, Datmesis Analytics
Louise Hadden, Abt Associates Inc.
Tuesday, 1:30 PM - 2:20 PM, Location: LVL Ballroom: Continental 2
Decision tables allow users to express business rules and other decision rules within tables rather than coded statically as conditional logic statements. In the first author's 2019 book, SAS® Data-Driven Development, he describes how decision tables embody the data independence that data-driven programming requires, and demonstrates a reusable solution that enables decision tables to be interpreted and operationalized through the SAS macro language. In their 2019 white paper Should I wear pants?, the authors demonstrate the configurability and reusability of this solution by utilizing the same data structure and underlying code to interpret unrelated business rules that describe unrelated domain data-pants wearing and vacationing in the Portuguese expanse. Finally, in the current paper, the authors refactor this code by replacing metaprogramming techniques and macro statements with a user-defined function that leverages a dynamic hash object to perform the decision table lookup. The new hash-based interpreter is unencumbered by inherent macro metaprogramming limitations. This anecdotal "scab picking"-the subtle refactoring of software to expand functionality or improve performance-yields a more flexible interpreter that is more robust to diverse or difficult data, including special-character-laden data sets. In recognition of the authors' combined love for all-things-archaeological, the decision rules in this text separately model Mayan ceramics excavation and Egyptian scarab analysis.
SA-110 : Deep Learning to Classify Adverse Events from Patient Narratives
Tom Sabo, SAS
Sundaresh Sankaran, SAS
Qais Hatim, FDA
Tuesday, 4:30 PM - 4:50 PM, Location: LVL Ballroom: Continental 2
Patient narratives reported in clinical study reports (CSRs) provide clinical evidence of adverse events that occurred to patients and help scientific reviewers during pharmacovigilance activities. The manual review of these narratives by safety reviewers is time consuming and resource intensive. How can we improve the efficiency of identifying safety signals from patient narratives? This paper and presentation describes an implementation to accurately categorize one adverse event term, "Serotonin Syndrome", from postmarket narrative data as an example of what FDA is capable of when leveraging deep learning technology using SAS Viya. Furthermore, we leverage a supervised Boolean rule builder algorithm, which provides a layer of interpretability and the ability to interact with FDA subject matter experts to refine the models. We expect that use of deep learning methodology shall improve the accuracy of the medical coding (example MedDRA coding) process for adverse events, benefiting reviewers during the safety review process. We will also discuss the relevance of the capabilities discussed in this paper for both healthcare and life sciences.
SA-166 : What is Machine Learning, anyway?
Jim Box, SAS Institute
Wednesday, 8:30 AM - 8:50 AM, Location: LVL Ballroom: Continental 2
Heard lots of talk about Machine Learning and Artificial Intelligence, but not really sure what it all means and how it is different from the statistics you learned once? In this presentation, we'll look at what machine learning is, look at the basics of the approach and give an overview of some of the more popular algorithms and when they might be used.
SA-188 : Sensitivity Analysis for Missing Data Using Control-based Pattern Imputation
Jun Feng, Seagen Inc
Jingmin Liu, Seagen Inc.
Tuesday, 5:00 PM - 5:20 PM, Location: LVL Ballroom: Continental 2
Randomized controlled clinical trials are known to be an effective way to minimize bias and draw convincing conclusions on the efficacy and safety of a drug, but the data quality and the statistical methods used for the analyses will highly influence the results. Inevitable scenarios such as protocol deviations and subjects lost to follow-up will lead to missing data. How the missing data are handled is crucial for the integrity of the statistical analysis, especially for efficacy endpoints. Sensitivity analysis is a useful method to stress-test the credibility of statistical conclusions and explore the impact of the missing records. Using imputation methods to fill in these missing observations and consider imputation simulation results through sensitivity analysis will render a more robust statistical conclusion. Comparing these outcomes between the original and the imputed data can highlight the influence of the missing data on the results and establish credibility of the conclusions. SAS provides an efficient way for such sensitivity analyses using the MI and MIANALYZE procedures. This paper illustrates the statistical background and implementation difference between MAR (Missing at Random) and MNAR (Missing not at Random) assumptions along with an example of data manipulation for monotone missing data before sensitivity analysis, using Control-based Pattern Imputation with Mixed Model in SAS, and how to compare and interpret the statistical analysis results.
SA-228 : Shifting the drug development paradigm with Adaptive Design and master protocol
Aman Bahl, SYNEOS HEALTH
Wednesday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 2
Just as other industries have moved toward more flexible methodologies that foster continual improvement and operational efficiencies, clinical development is slowly ramping up the adoption of innovative designs after being encouraged by regulatory agencies to speed progress, reduce inefficiencies, and improve success rates. In recent years there has been a rise in trial designs more flexible than traditional adaptive and group sequential trials. In the paper, we will be discussing some of the common adaptive trial designs including Group Sequential Design, Sample Size Re-Estimation, Phase-I/II and Phase-II/III seamless designs, dose-escalation, dose-selection, and a mix of others along with examples. We will also be elaborating on various types of master protocol designs like platform, basket, and umbrella trials along with examples. One of the most promising ways to make drug development more efficient -- while enabling providers and patients to get better information about how a new medicine works -- is through the use of more modern approaches to the design of clinical trials. Master protocols, being a collaborative approach to drug development, could help biopharma companies derisk research programs, improve the quality of evidence, and enhance R&D productivity by cutting down research costs and time. One such approach is master protocol designs. We will be discussing following three types of Master protocols in detail in the paper: Basket trial, Umbrella trial, Platform trial.
SA-269 : Statistical Considerations and Methods for Handling Missing Outcome Data in Clinical Trials During the Era of COVID-19
Xi Qian, BioPier, Inc.
Chengfei Lu, BioPier, Inc.
Tuesday, 11:00 AM - 11:50 AM, Location: LVL Ballroom: Continental 2
Missing data may seriously compromise inferences from clinical trials and thus undermine the reliability and interpretability of results from clinical trials. Although missing data presents challenges for any clinical trials, these challenges, particularly in data analysis, are greater than expected for trials conducted during the COVID-19 pandemic. Statistical methods for dealing with missing data are categorized based on the types of assumptions that are made about the missing-data mechanisms. In this paper, we first introduce the definitions of three missing-data mechanisms and summarize commonly used statistical methods for handling missing data. We then focus on multiple imputation for both continuous and dichotomous outcome data and discuss issues on its practical implementation, including developing the imputation models, how to handle data with a monotone or non-monotone missing pattern, the number of imputed data sets that need to be created, and how to examine the robustness of missing-data assumption with sensitivity analysis. We illustrate the application of multiple imputation through analyses of disease response rate and quality-of-life score data on lung cancer patients in a hypothetical superiority trial. Statistical software codes for conducting multiple imputation in SAS are provided.
SA-284 : SAS-based Method for PK Noncompartmental Analysis and Validation
Hui Mao, Biopier Inc.
Lixin Gao, Biopier Inc
Wednesday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 2
Pharmacokinetics (PK) analysis is an important part of clinical trials that explores drug effects in humans. Noncompartmental analysis (NCA) is a model-independent and cost-effective method as the industry consensus. PK commercial software WinNonlin™ and open-source tools are traditional approaches for such analyses. This paper presents an alternative SAS-based PK NCA method that derives and validates PK parameters like Cmax, Tmax, T1/2, AUClst, AUCinf, CLO, Lambdaz, Half-Life Lambdaz, Vz, etc. The key benefit of such a method is to promote operational efficiency and compliance because no intermediate software is needed. It can be directly embedded in the statistical programming to create CDISC SDTM PP datasets, PK summary tables, and figures. The SAS code can be directly used for regulatory submission.
SA-285 : Guidelines for the Statistical Analysis in German Dossier Submissions
Meiling Gao, Biopier Inc.
Xi Qian, BioPier, Inc.
Hui Mao, Biopier Inc.
Wednesday, 9:45 AM - 10:05 AM, Location: LVL Ballroom: Continental 2
German dossier submission is the regulatory submission process required by the German federal institute for drugs and medical devices for the approval of new drugs, generic drugs, and biosimilars in Germany. The submission contains comprehensive information on the quality, efficacy, and safety of the drug, and must follow specific guidelines. Pharmaceutical companies must submit such information to a German dossier before launching their products in Germany. However, the submission process can be complicated and challenging, as it requires a thorough review and evaluation of drugs with complex statistical method. This paper aims to fill the gap by providing guidelines for conducting statistical analysis in German dossier submissions. We will cover the statistical strategies and common pitfalls when preparing the dossier submission. The paper focuses on a comprehensive discussion of the most used statistical methods for analyzing a broad range of data and outcomes, including dichotomous, continuous, time-to-event data, etc. For each category of analysis, we will begin with an introduction to the relevant statistical basics, followed by a description of the sample data. We will also provide sample SAS codes and guidelines for interpreting the results.
SA-303 : A Pain in My ISR - A Primer to Injection Site Reactions
Kjersten Offenbecker, GlaxoSmith Kline
Fox Mulder, GlaxoSmith Kline
Tuesday, 2:30 PM - 3:20 PM, Location: LVL Ballroom: Continental 2
We are all probably familiar with Adverse Events (AEs) and how to report them. But what about those special AEs that occur when your study product is an injectable? These special AEs are known as injection site reactions or ISRs and they are reported very differently than traditional AEs. In this paper we will explore what makes these AEs different and why they tend to be such a pain to programmers everywhere. We will not only explain what an ISR is and how it is different, we will also look at some examples of how they are reported and even how to help make them a little less of a pain.
Strategic Implementation & InnovationSI-005 : What's the story in your subgroup analysis
Lucy Dai, Abbvie
Tuesday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 3
As indicated in one paper in NEJM guidelines (Nov, 2007) that Investigators frequently use analyses of subgroups of study participants to extract as much information as possible. Such analyses, ....,may provide useful information for the care of patients and for future research. However, subgroup analyses also introduce analytic challenges and can lead to overstated and misleading results. The purpose of this paper is to present some of the challenge in understanding and interpreting subgroup analysis results through three examples from real clinical trials.
SI-006 : Digital Data Flow (DDF) and Technological Solution Providers
Piyush Singh, TCS
Prasoon Sangwan, TCS
Tuesday, 8:30 AM - 8:50 AM, Location: LVL Ballroom: Continental 3
Digital Data Flow (DDF) is an initiative to organize and automate the processing of clinical data and study protocol. From a technology perspective, one of the key purposes of this initiative is to deliver the technical standards which can be utilized to mechanize the study execution process, create a flexible solution and minimize manual effort during the study life cycle. One of the most important principles of the DDF initiative is being vendor agnostic, which means that different organizations can implement their solution in their own way, using reference architecture (RA) from DDF, from both process and technology perspectives. This paper explains how the technology providers/ technological product vendors can utilize the DDF deliverables to help pharmaceutical companies with new solutions/platforms to innovate and automate their manual and traditional study execution process which ultimately can help to reduce overall cost, duration of the study and operational effort, and increase the return. This paper also explains how Pharma companies can utilize the strength of technology to take maximum advantage of DDF.
SI-016 : Fitting Logitoid-Normal distributions with MLE estimate by SAS SEVERITY and FCMP procedures
Lili Huang, BMS
Helen Dong, BMS
Yuanyuan Liu, Bristol Myers Squibb
Tuesday, 10:00 AM - 10:20 AM, Location: LVL Ballroom: Continental 3
The real-world data in manufacture are usually non-normally distributed, and the behavior of statistical procedure applied depends on the distribution family from which the data are. The knowledge of the distribution family is necessary in exploration of the behavior. The SEVERITY procedure of SAS is capable of fitting distributions with MLE estimate of parameters. One curb of PROC SEVERITY is that the default pool of probability distribution models is quite limited. Distributions such as Johnson family (Johnson Su, Johnson Sl, and Johnson Sb), SHASH, and Logit-Normal are potentially applicable to manufactured data, unfortunately they are not available for PROC SEVERITY in the latest version SAS/ETS® 14.3. In this paper, four Logitoid-Normal distributions are used as examples in contrast with and Normal distribution to demonstrate that customized distributions could be defined with FCMP procedure, and hence the distribution model parameters can be fitted using PROC SEVERITY.
SI-055 : We Can Work it Out: Dos and Don'ts for Small Biotech and CRO NDA/BLA submission partnership
Charity Quick, Rho, Inc.
Jiaan Illidge, Mersana
Tuesday, 10:30 AM - 10:50 AM, Location: LVL Ballroom: Continental 3
Many pharmaceutical/biotech companies rely on outsourcing much of their biometric and regulatory work to CROs - even critical NDA or BLA submissions. At the start of 2020, as the reality of a global pandemic was setting in, a small pharma with only 2 statisticians and 5 programmers was able to file their company's first NDA with the support of a small CRO. The NDA was comprised of over 30 studies including 2 pivotal phase IIIs and additional phase III open label studies with back-to-back locks as well as legacy and remediation work. As the sponsor programming manager and CRO programming lead, the authors of this paper will share our lessons learned from this multi-year successful collaboration that began as a single phase II outsourced study and ended in a big win for both companies.
SI-088 : Key Safety Assessments following Chimeric Antigen Receptor (CAR) T-cell Therapy in Early Development Oncology
Leanne Tenry, Bristol Myers Squibb
Tamara Martin, Bristol Myers Squibb
Olga Belotserkovsky, Presenter
Ce Zhou, Bristol Myers Squibb
Ouying (Abraham) Zhao, Techdata Service Company LLC
Tuesday, 11:00 AM - 11:20 AM, Location: LVL Ballroom: Continental 3
In 2017 the U.S. Food and Drug Administration (FDA) approved the first chimeric antigen receptor (CAR) T cell therapy to treat cancer, and over the past five years there have been six approvals by the FDA. CAR T-cell therapy harnesses a patient's white blood cells and genetically engineered T-cell receptors to target and attack cancerous cells. While the success of CAR T-cell therapy is favorable, the two most common toxicities associated with this treatment, Cytokine Release Syndrome (CRS) and Immune Effector Cell-Associated Neurotoxicity Syndrome (ICANS), are often severe and periodically life-threatening. The purpose of this paper is to explore key safety assessments associated with CAR T-cell therapy, which are vital to accurately monitor the drug's safety for patients. You will find the analysis of these two Adverse Events of Special Interest (AESI(s)), CRS and ICANS, evaluated within an overall summary of events table and time-to-event analysis utilizing a high-low bar graph.
SI-102 : Key Statistical Programming Considerations in External Collaborative Clinical Trials
James Zhao, Merck & Co., Inc
Hong Qi, Merck & Co., Inc.
Mary Varughese, Merck & Co., Inc.
Tuesday, 11:30 AM - 11:50 AM, Location: LVL Ballroom: Continental 3
Collaboration between pharmaceutical industry (sponsor) and external partners is becoming increasingly popular in drug development as it can be mutually beneficial. Throughout this collaboration, study activities from start up to Interim Analysis (IA) are often performed by the partner who conducts the study, and the study data is transferred to the sponsor to support regulatory submission which requires CDISC compliant SDTM and ADaM datasets. This has brought many challenges due to the inconsistency in data collection among partners, the quality or format of data used for Analysis and Reporting (A&R), and the timing to access the data by the sponsor for evaluation and transformation according to the regulatory requirements. This paper discusses some key programming considerations during this process to improve the efficiency of data issue resolution, data transformation, statistical report generation and submission package preparation.
SI-106 : Automation of Dataset Programming Based on Dataset Specification
Liqiang He, Atara Biotherapeutics
Tuesday, 1:30 PM - 1:50 PM, Location: LVL Ballroom: Continental 3
In the clinical trial field, standard datasets, such as SDTM domains and ADaM datasets, are an integral part of electronic submission package, and are also a prerequisite for TFL generation. Dataset programming is a time consuming, tedious task for SAS programmer. A highly efficient, automatic programming for dataset generation will prevent manual programming typos, save programming time and resources, and deliver high-quality work. Dataset specification is a detailed instruction for dataset programming and a major reference for dataset validation. This paper demonstrates a new practical approach to automate dataset programming based on dataset specification. It is pivotal for successful auto-programming to rewrite variable derivation to SAS-readable with the aid of keywords and punctuation marks in the dataset specification.
SI-139 : Integrating Practices: How Statistical Programmers Differ and Align Within User Groups
Valeria Duran, Statistical Center for HIV/AIDS Research and Prevention at Fred Hutch
Radhika Etikala, Statistical Center for HIV/AIDS Research and Prevention (SCHARP) at Fred Hutch
Haimavati Rammohan, Statistical Center for HIV/AIDS Research and Prevention (SCHARP) at Fred Hutch
Tuesday, 2:00 PM - 2:20 PM, Location: LVL Ballroom: Continental 3
An expectation in the pharmaceutical industry is that every entity within an organization is aligned on day-to-day practices. Our organization strives to streamline processes to achieve traceable and reproducible outputs for Statistical Programmers coding in different languages, such as R and SAS. Although the fundamentals in programming practices are aligned, the day-to-day workflow deviates at departmental levels: stakeholders have different expectations for how data is processed and how workflows should proceed. This paper takes a closer look at (1) Statistical Programming practices, specifically with Validation, Version Control, Coding Practices, and Verification, (2) the tools and techniques that have contributed to the success of our analogous workflow, and (3) future work towards further alignment. The insight from this paper has implications for how different entities within a workplace can work towards the same goals while meeting regulatory standards.
SI-147 : An End-to-End and Fully Integrated Clinical Development Platform with eDC/Labs data Data Management, Medical Review, Statistical Analysis and Adaptive Design etc.
Peter Wang, Johnson and Johnson
Vindyala Sunil, Janssen Research and Development
Tuesday, 2:30 PM - 2:50 PM, Location: LVL Ballroom: Continental 3
End-to-end and fully integrated clinical development platform that pulls in raw data off eDC and dozens of Central/Local labs by schedule into LSAF where data managers all over the work leverage automated workflows to check, correct, and validate clinical raw data to ensure its completeness and accuracy on regular daily and sometimes hourly basis. Data then feeds into Medical Review/Safety Portal for additional/in-depth analysis per protocal and pre-set standards. And it's feed into AWS hosted SAS and R GRiDs for programming and statistical analysis. Pinncal 21 and Adaaptive Design are fully integrated into LSAF based on "real-time" clinical evidences that allows our clinical team to adjust development protocal timely. It took Janssen six years to build this fully integrated platform what starts that integrates eDC, Labs with LSAF and Medical Review in 2014. Pinncal 21 and Adaptive design functions were added 2 years later by 2016. SAS and R GRID were then fully integrated in 2020. There are many successful stories with excellent outcomes, and we plan to share our "lessons learnt" as well. There are 5000+ users across the global. And it's operational/supported 24x7.
SI-159 : SAS LOG/LST Filename improvement for easier: code review, code audit and project development history and chronology.
Zeke Torres, Code629
Tuesday, 3:00 PM - 3:20 PM, Location: LVL Ballroom: Continental 3
In this example, we show how to customize the names of the Log Lst files, and obtain a useful new name as the result. The new name of a log file might look like this: 20180904_hhmm_userID_name-of-code-that-was-run.log The benefit is that when one user or a team of users builds code, everyone can see its progress. This way, collaborating is simplified and results are easier for colleagues to share and consume. The typical problem encountered is we can run our SAS code and get useful output: LOG/LST but after we run it again the original is gone. Its overwritten. Or if one of our co-workers or team members runs the same code now our LOG/LST replaced makes it hard to know who ran the code. This is a solution meant to utilize the SAS Configuration file and a set of SAS Macros that make this improvement. This is more of a recommended methodology for you to consider. My reason for employing this technique is because often I am searching for information in many LOG/LST. Either to debug, help someone debug (Myself, Team Member, Client) or to track progress on a "Project/Build" of work that is critical to gauge where and how things are. Especially when one or more people work on a project.
SI-164 : Introduction of AWS Cloud Computing and its future for Biometric Department
Kevin Lee, Genpact
Tuesday, 4:00 PM - 4:20 PM, Location: LVL Ballroom: Continental 3
When statistical programmers or statisticians starts in open-source programming, we usually begin with installing Python and/or R on our local computer and writing codes in a local IDE such as Jupyter notebook or RStudio, but as biometric team grow, and advanced analytics become more prevalent, collaborative solutions and environments are needed. Traditional solutions have been SAS® servers, but nowadays, there is a growing need and interest for Cloud Computing. The paper is written for those who want to know about the Cloud Computing environment (e.g., AWS) and its possible implementation for the Biometric Department. The paper will start with the main components of Cloud computing - databases, servers, applications, data analytics, reports, visualization, dashboards etc., and its benefits - Elasticity, Control, Flexibility, Integration, Reliability, Security, Inexpensive and Easy to Start. Most popular Cloud computing platforms are AWS, Google Cloud and Microsoft Azure, and this paper will introduce AWS Cloud Computing Environment. The paper will also introduce the core technologies of AWS Cloud Computing - computing (EC2), Storage ( EBS, EFS, S3), Database ( Redshift, RDS, DynamoDB ), Security (IAM) and Networking (VPC ), and how they could be integrated to support modern-day data analytics. Finally, the paper will introduce the department-driven Cloud computing transition project that the whole SAS programming department has moved from SAS Window Server into AWS Cloud Computing. It will also discuss the challenges, and the lessons learn and its future in the Biometric department.
SI-170 : The implementation of Scrum in Pharmaceutical Data Analytics and Statistical Programming
Jagan Mohan Achi, Jazz Pharmaceuticals
Eliana D'Angelo, Scimitar - consulting for Jazz Pharmaceuticals
Tuesday, 5:00 PM - 5:20 PM, Location: LVL Ballroom: Continental 3
Scrum is a framework that is useful for addressing complex environments, such as Pharmaceutical Data Analytics and Statistical Programming. By using Scrum, teams can control risk, minimize investments, and improve predictability. Scrum teams focus on value and learning, rather than relying on opinions, which increases the chances of a good outcome and a high return on investment. In comparison to the traditional Waterfall approach, which is commonly used in our industry, Scrum allows teams to constantly adjust and adapt to changes, leading to more efficient and effective project delivery. The Scrum team is self-managing, cross-functional, and responsible for all product-related work.
SI-176 : Tips and traps on how to efficiently accelerate clinical trials to successful submission, approval, and launch
Michael Nessly, ICON PLC
Tuesday, 4:30 PM - 4:50 PM, Location: LVL Ballroom: Continental 3
Scenario 1: The review division team has suggested that if your submission can be accelerated by 6 months, then your company will be at the advisory committee meeting along with your competition Scenario 2: The preliminary results from your study have been presented at a clinical meeting and a health authority has contacted you indicating that this study could be the basis of a successful submission as it may provide effective treatment for patients who have none. These represent actual historic situations demanding rapid development and submission. I present here some basic approaches that have been effective in the past in such accelerations. These require up front efforts, maturity, and discipline. As cultural change is involved, these practices are not standard nor even commonly encountered. Some of the practices presented include developing a submission focus as opposed to being study centric, focus on critical data, strong parsimony and control over the bulk of reporting, focusing on telling the story that is in the data, financial arrangements to avoid stopping the flow of work, and engaging cross functional teams with clear lines of communication. While these practices are particularly important for key critical situations, particularly those with unmet medical needs, they are also just plain good practice for all clinical trials.
SI-202 : Making Multilingual Programmers - A Targeted Approach to R for Clinical Trials Training
Jagan Mohan Achi, Jazz Pharmaceuticals
Ashley Tarasiewicz, Atorus Research
Wednesday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 3
Over the past few years many pharmaceutical organizations have encountered the same challenge - there is a wealth of training available on open-source languages like R, but very little training specific to the traditional clinical trial workflows we use on a daily basis. Companies are beginning to see and realize the benefits of having a multilingual programming team - they can incorporate the best parts of each programming language in their processes to maximize efficiency. In recent months, Jazz Pharmaceuticals has implemented several successful strategies to train their SAS® programmers in the use of R. This paper will examine the challenges and successes of finding the right training content, format, and candidates for teaching clinical programmers how to use R.
SI-212 : R Package Qualification: Automation and Documentation in a Regulated Environment
Paul Bernecki, MSD
Nicole Jones, Merck
Uday Preetham Palukuru, MERCK
Abhilash Chimbirithy, Merck & Co.
Wednesday, 10:15 AM - 10:35 AM, Location: LVL Ballroom: Continental 3
In the recent years, there is an increasing trend in using open-source software, such as R, in clinical trial analysis and reporting (A&R). The qualification of an R package is a critical step in ensuring its quality and its compliance within the highly regulated clinical trial environment. Given the sheer number of R packages, automating qualification process is necessary to ensure consistent quality and increased efficiency. Aspects of our company's R package qualification process were presented in PharmaSUG 2022. The current paper focuses on some updates made since then. Firstly, an internal R package mkqualify streamlines the qualification process further while incorporating inputs from human reviewers. This package generates most of the documentation used to support the qualification. Secondly, the open source riskmetric package generates the risk scores associated with an R package. Communicating the qualification process also plays an important part in ensuring user compliance and in increasing user trust in the qualification process. We discuss internally developed Shiny applications, which include real time results of qualification, and automation of user qualification requests. Additionally, we discuss various processes how to communicate with users about R package qualification including training and blog posts. The elements of this qualification process are continuously evolving to ensure adherence to best practices followed industry wide.
SI-227 : Metaverse in the Healthcare - More Real than imagined!
Aman Bahl, SYNEOS HEALTH
Wednesday, 9:45 AM - 10:05 AM, Location: LVL Ballroom: Continental 3
The Metaverse is a digital environment implementing virtual (VR) and augmented reality (AR) where users can interact with one another, run experiments, and transition between the physical world and the virtual realm. In simpler terms, a fully immersive internet. Metaverse is becoming a new forum for scientists and engineers in the life sciences industry to meet, share ideas, and collaborate. Metaverse involves the convergence of three major technological trends, which all have the potential to impact healthcare individually. Together, though, they could create entirely new channels for delivering care that have the potential to lower costs and vastly improve patient outcomes. These are telepresence (allowing people to be together virtually, even while we are apart physically), digital twinning, and blockchain. In the paper we will be discussing some of the key applications of metaverse in healthcare i.e. Radiology, pain management, plastic surgeries, mental health, surgeries, medical education and training, fitness and wellness. Digital twinning - In this paper we will be discussing some of the following key applications of Digital twinning in healthcare i.e. Diagnosis, Monitoring, Surgery, Medical devices, drug development, Regulatory etc. Blockchain - We will be discussing some of the key Blockchain applications in healthcare i.e. Neuroscience, EHR Medical, Biomedical, Pharmaceuticals, Clinical, Genomics medicine. In the not-too-distant future, we expect that comprehensive healthcare in the metaverse will be not only feasible but will also become the norm.
SI-235 : Secret SAS - You Can Write eCTD Using ODS Word in SAS
Pete Lund, Looking Glass Analytics
Anusha Minnikanti, Fred Hutchinson Cancer Center
Calins Alphonse, Fred Hutchinson Cancer Center
Julie Stofel, Fred Hutchinson Cancer Center
Wednesday, 8:30 AM - 8:50 AM, Location: LVL Ballroom: Continental 3
The Electronic Common Technical Document (eCTD) standard is the standard format for submitting applications, amendments, supplements, and reports to FDA's Center for Drug Evaluation and Research (CDER) and Center for Biologics Evaluation and Research (CBER) (https://www.fda.gov/drugs/electronic-regulatory-submission-and-review/electronic-common-technical-document-ectd). In a nutshell, the eCTD standard requires linked information in the tables of contents, tables and figures. It also requires specific formatting for the sections of the report and the tables of contents. These requirements are typically met with manually inserting report information into a Microsoft® Word document and using the formatting capabilities of Word, a time consuming and error-prone process. However, the SAS® ODS Word destination (pre-production) available in version 9.4 M7 provides functionality to create reports in eCTD format with only a few manual steps, all point-and-click, in Word when complete. In this paper we present a number of available, but undocumented, ways to use the SAS® ODS Word destination to programmatically produce eCTD-formatted reports. A set of macros was developed to 1) insert headers that can be turned into Table of Contents entries, 2) produce captions for tables or figures that can be turned into Table of Figures or Table of Tables entries, and 3) insert blocks of text into the document. In addition, methods to include dynamic page numbers, bulleted lists, formatting text (bold, italic, hyperlinks, etc.), and inserting empty table templates (for manual update) will be discussed.
SI-277 : PHUSE Safety Analytics Working Group - Overview and Deliverables Update
Nancy Brucken, IQVIA
Clio Wu, Chinook Therapeutics Inc.
Mary Nilsson, Eli Lilly & Company
Greg Ball, ASAP Process Consulting
Wednesday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 3
The PHUSE Safety Analytics Working Group, a cross-disciplinary collaboration, is working to improve the content and implementation of clinical trial safety analyses for medical research, leading to better data interpretations and increased efficiency in clinical drug development and review processes. The Working Group has produced multiple deliverables (Conference Poster and Presentation, White Paper, Publication, Blog, etc.,) over the past 10 years and has multiple ongoing projects. This presentation will provide an overview of the Working Group and its associated project teams, share a list of the teams' progress, key deliverables for awareness and a summary of ongoing projects. We encourage feedback and we are accepting additional volunteers.
SI-302 : Auto-validation of SAS Macros through Regression Testing
Madhu Annamalai, Algorics
Umayal Annamalai, Algorics
Tuesday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 3
Validation of SAS macros is a critical step for reporting quality outcomes. Typically, statistical programmers spend extensive time and efforts in validation of macros. Furthermore, frequent code changes demanding repetitive manual validation make it even more resource intensive and affect quality of outcomes. Regression testing is a key step in agile product development. Its benefits have been established with a proven track record of testing automation in many other fields. Although, very limited content is published on the application of regression testing in SAS Macros validation. In this session, our goal is to demonstrate the comparison between both validation of SAS Macros manually versus auto-validation through regression testing and the efficiency in time and efforts for validation. We will be showcasing you our experience where automated regression testing is possible for most of the standard macros. This will be done simultaneously through test program execution by generating a report to confirm existing functionality is working as expected by comparing the current output against the validated output. From this session, key takeaways are the use of regression testing techniques for the SAS Macros validation, learning from our experience of having done this and how this can be implemented.
Submission StandardsSS-003 : Handling CRS/NT Data in CAR-T Studies and Submission
Joe Xi, Bristol Myers Squibb
Yuanyuan Liu, Bristol Myers Squibb
Monday, 5:00 PM - 5:20 PM, Location: LVL Ballroom: Continental 8-9
Chimeric antigen receptor T (CAR-T) cell therapy has been a popular and hot therapy in recent years, and many pharmaceutical / biotech companies are developing their pipelines using this new technology. In most CAR-T studies, cytokine release syndrome (CRS) and Neurotoxicity (NT) are the most common types of toxicity caused by CAR-T cells. As a result, the management of CRS and NT become very essential. During regulatory submissions, agencies have required special or extra data and analysis beyond regular AE data reporting. To facilitate the analysis, we have produced supplemental specialized CRS and NT datasets responding to FDA requests. As a team from Juno/Celgene/BMS, we have worked on the filing of both Breyanzi and Abecma (2 out 6 available CAR-T products) consecutively and accumulated some experience on how to handle the data for CRS/NT. In this paper, we hope to share our experience on the following two topics: 1. How we organized CRS/NT data in SDTM/ADaM package to support CSR. 2. How we supported health authority (HA) review by providing supplemental CRS and NT data which were requested by FDA review team. We hope this paper can provide some insights to other teams who are working on CAR-T studies, or preparing CAR-T product submission packages, from the perspective of handling CRS/NT data.
SS-045 : Optimizing Efficiency and Improving Data Quality through Meaningful Custom Fix Tips and Explanations
Jennifer Manzi, Pinnacle 21
Julie Ann Hood, Pinnacle 21
Tuesday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 8-9
Prior to submitting study data to any regulatory agency, the standardized data is evaluated using at least one validation tool to ensure data conformance and quality. The issues in the resulting report are then triaged, first to determine the cause of the issue, then to decide on the best course of action to resolve or address each one. Knowing which issues should be researched and which can only be addressed through explanations in the Reviewer's Guide is crucial to efficiently utilize a validation report. Guidance provided by an organization on how to approach validation issues can help save time and ensure the issue is addressed correctly, resulting in higher quality and consistent data across studies. This paper will focus on steps to create meaningful Fix Tips that will enable users to quickly evaluate researching and resolving an issue. It will also include examples of when an explanation is the best option and the details needed to create comprehensive standardized explanations.
SS-059 : CDISC Conformance and Compliance: So Many Resources, So Little Time!
Jennifer Fulton, Westat
Stephen Black, Westat
Tuesday, 8:30 AM - 8:50 AM, Location: LVL Ballroom: Continental 8-9
If you are new to CDISC you may be overwhelmed by the variety and scope of reference resources required to produce a CDISC-compliant data package for submission to regulatory authorities. Or if you have been doing CDISC for a while, you may be realizing that it is time to expand your research beyond the implementation guides (IGs). Where to start? What if there is a conflict in the information provided? Which resources take precedent? How do I keep current? And how do I know what resources are out there? This paper attempts to answer those questions and more, putting the wide array of guidelines, software, educational tools, and contributing parties in one place. The paper will also provide links and descriptions so that when you are ready you will have the proverbial "CDISC World" at your fingertips.
SS-097 : Study Data Technical Rejection Criteria (TRC) Considerations for Multiple Data Packages in a Single Submission
Christine Teng, Merck
Si Ru Tang, Merck
Janet Low, Merck
Tuesday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 8-9
The Study Data Technical Conformance Guide (TCG) provides technical recommendations to sponsors for the submission of animal and human study data and related information in a standardized electronic format in INDs, NDAs, ANDAs, and BLAs. Study datasets and their supportive files should be organized into a specific file directory structure. Submission files within the appropriate folders allows automated systems to detect and validate the presence of expected data and datasets for review and minimizes the need for manual processing. Effective September 15, 2021, the FDA implemented the Study Data Technical Rejection Criteria (TRC) validation which can reject a submission if criterions are not met through the automated inbound process Electronic Submission Gateway (ESG). This paper will share the challenges and successful approaches that were considered when preparing multiple data packages for submission. This paper will include the best practice and experience gained from submitting data packages from an interim analysis and final analysis; supportive studies; integrated analysis and meta-analysis.
SS-112 : Standardization of Reactogenicity Data into Findings
Charumathy Sreeraman, Ephicacy Lifescience Analytics
Tuesday, 10:00 AM - 10:20 AM, Location: LVL Ballroom: Continental 8-9
Reactogenicity event(s) is the key safety assessment of the vaccines TA. It refers to a particular expected or generic reaction following vaccine administration. The term reaction usually implies that the adverse event has a causal relationship with the vaccination, or at least there exists a distinct possibility. Reactogenicity is evaluated by observing a pre-specified set of adverse events over a pre-defined observation period. Standardizing the reactogenicity data in the SDTM datasets implements the 'FLAT MODEL' strategy prescribed by the Vaccines TAUG. This paper will be discussing on standardizing the diary data from study subjects into finding domains as applicable rolling it into 'Flat Model' strategy.
SS-117 : Consistency Checks Automation across Regulatory Submission Documents in the eCTD M5 folder
Majdoub Haloui, Merck & Co. Inc.
Loganathan Ramasamy, Merck & Co.,
Hemu Shere, Merck & Co.,
Tuesday, 10:30 AM - 10:50 AM, Location: LVL Ballroom: Continental 8-9
The preparation of the data reviewer's guide (DRG) and the define.xml is an important step in a regulatory submission package for clinical trials. DRGs provide regulatory agency reviewers with additional context and a single point of orientation for SDTM/ADaM datasets submitted as part of eCTD Module 5. The define.xml provides necessary information to describe the submitted datasets and their variables. High quality DRGs and define.xml files are important for a successful regulatory submission to FDA, NMPA, and PMDA agencies. In this paper, we will provide an overview of a Python & React.js tool that performs consistency checks across DRGs and SDTM/ADAM defines, which are not typically covered by commercially available tools such as Pinnacle 21 Enterprise. This paper focuses on how these checks are created and an efficient approach for running them.
SS-124 : How Can I Put This? - Using a pre-defined spreadsheet to explain your Pinnacle 21 Enterprise Issues
Mike Lozano, Eli Lilly and Company
Tuesday, 11:00 AM - 11:20 AM, Location: LVL Ballroom: Continental 8-9
Pinnacle 21 Enterprise (P21E) compliance issues are going to happen on every study. Whether validating SDTM or ADaM, or using the tool to create Define.xml documents, the odds are almost 100% that some compliance issues will be identified and almost equally high odds that some of those issues cannot or will not be resolved. Even in the very rare occasions that a sponsor might be using a different compliance tool, the regulatory agencies (FDA, PMDA, etc.) are still going to run the data through P21E as part of their review process. So it is given that at least some P21E compliance issues will exist. Lilly has taken a unique approach in our issue-resolution process through the use of a pre-defined spreadsheet that contains fix tips and suggested wording for most of the P21E issues. Our process improves both quality and speed by providing our study teams the ability to select explanations with text that is clear, appropriate, and consistent throughout the study. Having the fix-tips available throughout the life of the study has helped teams better understand and manage issues in real time rather than as a back-end task. This has not only improved the quality of the explanations but has helped teams resolve issues as the trial is ongoing so the overall number of unresolved issues at database lock is much smaller. This paper will explore the background on why we felt the issues-explanation spreadsheet was necessary to improve the quality and speed of issue explanation process.
SS-127 : Proposal for New ADaM Paired Variables: PARQUAL/PARTYPE
Elizabeth Dennis, EMB Statistical Solutions, LLC
Monika Kawohl, mainanalytics GmbH
Paul Slagle, IQVIA
Tuesday, 11:30 AM - 11:50 AM, Location: LVL Ballroom: Continental 8-9
For more than a decade, producers have struggled to create unique PARAM values to fully describe each analysis parameter. Even when it is less efficient to have fully unique PARAMs, it has been the requirement. With ADaM IG v3.0 this is expected to change. PARQUAL (and the paired variable PARTYPE) are expected additions that will allow PARAM to identify multiple analysis parameters. These are special purpose variables that are intended to be an exception, not a common occurrence. In most cases they will be unnecessary. However, when the meaning of PARAM essentially remains unchanged except for a single qualifier (such as 'Investigator' 'Central Reader'), PARQUAL can be a useful tool to simplify PARAM. This paper will summarize the current requirements of PARAM and PARCATy. It will review the history of past proposals for PARQUAL and its existence in TAUGs and other documents. The new requirements for PARQUAL and PARTYPE will be introduced, along with examples of correct usage. Examples of not allowed use cases will also be discussed. Finally, the status of the associated controlled terminology will be presented.
SS-140 : Working with Biomedical Concepts and SDTM Dataset Specializations for Define-XML using SAS© Open CST
Lex Jansen, CDISC
Tuesday, 1:30 PM - 2:20 PM, Location: LVL Ballroom: Continental 8-9
Biomedical Concepts are units of knowledge that relate to real-world entities. Getting Biomedical Concepts off the ground has been a long and challenging journey. There's little to debate that in theory Biomedical Concepts make sense, but implementation has been a great challenge with little to no realization of benefits. Perhaps that is because the scope has been too large and complex, making implementation extremely difficult. For this reason, CDISC has developed a simplified approach and model which includes an abstract conceptual layer that provides semantics as well as a simplified implementation layer of preconfigured dataset specializations (CDASH, SDTM, ...) linked to Biomedical Concepts. SDTM dataset specializations are ready to use building blocks for Define-XML. This provides immediate benefits to SDTM programmers and opens the door to efficient programming and automation. Biomedical Concepts are now available in CDISC Library via the API. This paper will show how SAS can work with Biomedical Concepts and SDTM Dataset Specializations. We will show how SDTM Dataset Specializations can be used by SAS Open CST for the creation of Value Level Metadata in Define-XML.
SS-146 : ADaM Datasets with Multiple Participations per Subject
Grace Fawcett, Syneos Health
Sandra Minjoe, ICON PLC
Elizabeth Dennis, EMB Statistical Solutions, LLC
Tuesday, 2:30 PM - 2:50 PM, Location: LVL Ballroom: Continental 8-9
The SDTM team has proposed a standardized technique for accommodating multiple participations per subject. In that proposal, the domain DM (Demographics) contains all subjects, with one primary record per subject, and the domain DC (Demographics for Multiple Participations) contains all subjects, with each participation per subject as a separate record. There is often also a need to represent the multiple participations in analysis, so the multiple participations must be brought into ADaM. A technique similar to the SDTM solution may be used, where the standard ADaM dataset ADSL still contains all subjects, with one primary record per subject, but another dataset, which we call ADPL (Participation-Level Analysis Dataset) is used to contain all participations per subject, with one record per participation per subject. This paper will summarize the SDTM DM vs. DC proposal and describe how a similar ADSL vs. ADPL solution could work in ADaM. We will provide some examples of what to include in ADSL vs. ADPL, showing how to use the data from these datasets in other datasets and for analysis. We will also give examples of issues that could become problematic across these datasets. Finally, we will give example text for an ADRG in order to make it clear what was done and where to find important information.
SS-216 : Regulatory Data Submission and CDISC Compliance: Sponsor and Vendor Collaboration Best Practices
Tabassum Ambia, Alnylam Pharmaceuticals, Inc
Tuesday, 3:00 PM - 3:20 PM, Location: LVL Ballroom: Continental 8-9
Submission of Electronic Data requires compliance to CDISC and evolving regulatory agency data standard requirements (e.g, FDA and PMDA). Close collaboration and partnership between sponsors and vendors throughout the data collection and analysis process is critical to ensure that data/submission packages efficiently meet requirements. Pinnacle 21 is widely used for data validation as well as for the creation and/or validation of define documents. Standards specified by FDA/PMDA can be pre-loaded in P21 to generate submission documents and validate datasets and define packages. Alnylam has been using P21 Enterprise to ensure compliance between internal programming and data management departments, and has expanded access to this platform/process to our data management vendors to centrally load data, generate defines, run validation, document data and mapping issues and collaborate to reconcile all issues. In this paper, we will discuss our experience in using a common platform to review/ensure Data Submission compliance with our vendors, the challenges and benefits of using a common platform to review data and submission issues as well as our recommendations for best practices between the sponsor and vendors.
SS-217 : BIMO Package: Challenges and Perspectives while keeping up with the upgrades in the BIMO Technical Conformance Guide and the BDRG Guidelines
Mathura Ramanathan, IQVIA
Sowmya Gabbula, IQVIA
Tuesday, 4:30 PM - 4:50 PM, Location: LVL Ballroom: Continental 8-9
The Food and Drug Administration (FDA) released an updated version (v3.0) of the Bioresearch Monitoring (BIMO) Technical Conformance Guide (TCG) in August 2022. Coincidentally during this time, PHUSE Working Group made its first release of the BIMO Data Reviewer's Guide (BDRG) package that included a well evolved BDRG template. As a part of the one of the NDA submissions, we prepared the BIMO Package in line with these latest BIMO guidance documents. In this paper, we highlight the updates from the latest version of BIMO TCG and their implications in our efforts in adapting to these upgrades while preparing the BIMO package. While BDRG is still an optional document, we also generated this document based on the BDRG template. In particular, the BDRG template included about 10 required sections, and we share our perspectives and describe the elements needed for completing these required components. Finally, as required in the BDRG, we also prepared a conformance report for the CLINSITE data using the Pinnacle 21 Enterprise (P21E), and we highlight the technical challenges that we encountered while generating the P21 report.
SS-223 : A case study of a successful RTOR submission and approval for Rylaze
Sudhir Kedare, Jazz Pharmaceuticals
Dilip Nalla, Jazz Pharmaceuticals
Kumud Kanneganti, Jazz Pharmaceuticals
Jagan Mohan Achi, Jazz Pharmaceuticals
Tuesday, 4:00 PM - 4:20 PM, Location: LVL Ballroom: Continental 8-9
The RTOR (Real-Time Oncology Review) program, launched by the FDA's Oncology Center of Excellence and Office of Oncologic Diseases in February 2018, aims to expedite the review process for oncology treatments by allowing for early submission of top-line efficacy and safety data. This enables FDA reviewers to begin the review process sooner and brings treatments to patients more quickly. The program also improves the review quality and early engagement between the sponsor and FDA. This paper examines the experience of the Programming team at Jazz Pharmaceuticals, which successfully submitted and gained approval for Rylaze (BLA, June 2021 and sBLA in Nov 2022), a chemotherapeutic regimen for acute lymphoblastic leukemia and lymphoblastic lymphoma under the RTOR program. It covers the team's role, involvement, and approach in the submission process, including steps taken to address post-submission requests from the FDA, providing valuable insights for anyone looking to use the RTOR program in the future.
SS-224 : From FDA to PMDA submission: How to resolve CDISC non-compliance issues
Karin Steffensen, H. Lundbeck
Carina Brixval, H. Lundbeck
Tuesday, 5:00 PM - 5:20 PM, Location: LVL Ballroom: Continental 8-9
From the perspective of both the US Food and Drug Administration (FDA) and Japan's Pharmaceuticals and Medical Devices Agency (PMDA), CDISC compliance is mandatory for a submission package. At the core of CDISC compliance are the validation rules specified by each regulatory agency and also within an agency's Technical Conformance Rules. Validation rules may differ between agencies, especially with respect to the assessment of the severity of non-compliance, which can cause an application review to be suspended. Along with a company acquisition, Lundbeck acquired a Biologics License Application (BLA) submission package that had been submitted to the FDA and that was accepted by the agency, which in time, is also intended for a PMDA submission. To prepare for the PMDA submission, we investigated the FDA package that had previously been submitted by the acquired company and discovered that, due to the difference in CDISC requirements between the FDA and PMDA, four of the trials within the FDA package would be considered CDISC non-compliant based on the PMDA's CDISC compliance rules. The purpose of this paper is to describe a solution on how to update an existing FDA package to be CDISC compliant for a PMDA submission. Specifically, we share our approach to ensure that while modifying data for CDISC-compliance, transparency, and traceability from data collection to analysis results remains intact, and that there are no alterations to the results. In our paper, we also briefly share our interactions with the PMDA, and how we prepared for meetings with the agency.
SS-261 : Recent updates in BIMO Technical conformance guidance and use case scenario
Vaibhav Garg, Anlylam Pharmaceuticals
Sreedhar Bodepudi, Alnylam Pharmaceuticals
Wednesday, 8:00 AM - 8:20 AM, Location: LVL Ballroom: Continental 8-9
Since 2017, FDA has recommended that Sponsors submit Bioresearch Monitoring (BIMO) outputs along with a study electronic submission (eSub) package. The BIMO package includes Clinical Study-Level Information, Subject-Level Data Listings by Clinical Site, a Summary-Level Clinical Site Dataset (clinsite.xpt), Data definition file (define.xml) and BIMO Data Reviewer guide (bdrg.pdf). FDA also released detailed requirements per the Bioresearch Monitoring (BIMO) Technical Conformance Guide (TCG) to detail BIMO submission package standards. The BIMO TCG provides detailed specifications, recommendations, and general considerations for preparing and submitting all submission related components. The BIMO eSub package is used by the Center for Drug Evaluation and Research (CDER) for planning of BIMO inspections for new drug applications (NDAs), biologics license applications (BLAs), and Supplemental new drug application (sNDA) or Supplemental biologics application (sBLA) containing clinical data that are regulated by CDER. This paper will include a summary of BIMO preparation requirements recommended by FDA per the BIMO TCG V2.0 (released in 2020) versus V3.0 (released in 2022). In addition, author(s) will also present one of Alnylam's use case/experience preparing and submitting BIMO, including lessons learned and best practices (i.e., where the analysis need triggered updates in developing the Summary-Level Clinical Site Dataset, and where BIMO population flags and related variables were updated by Alnylam's Statistical Programming and Statistics team to accommodate specific analysis needs).
SS-266 : Industry metrics for standards utilization and validation rules
Sergiy Sirichenko, Pinnacle 21
Wednesday, 8:30 AM - 8:50 AM, Location: LVL Ballroom: Continental 8-9
In recent years the industry has completed the adoption of standards for study data. It opens new opportunities. One of them is the availability of industry-wide analytics which allows us to understand common trends and reveal potential issues. At this presentation, we will share metrics for standards utilization and compliance with regulatory requirements. We will discuss the potential use of industry-wide analytics for the enhancement of existing standards, improvement of validation rules, and refinement of current processes.
SS-274 : F2: FAIR and Filing - Assessing Data Fitness for FAIR and Filing
Bidhya Basnet, Roche-Genentech
Dyuthi Yellamraju, Roche-Genentech
Wednesday, 9:00 AM - 9:20 AM, Location: LVL Ballroom: Continental 8-9
Have you ever asked the question: "How are study programming tasks progressing?" When working on programming or filing activities for a study, a lot of time is spent planning what datasets & variables will be required to create the TLGs for a reporting activity. Upfront planning is good; however, keeping track of the progress of the different deliverables such as SDTM/ADaM including conformance, aCRF and Define.xml generation, reviewers guide completion, and TLG creation, proves challenging, especially when there are a lot of data science members involved. The reliance for gauging progress often is on 'word of mouth,' 'guesstimates,' and a 'gut' feeling. This drove us to explore a better approach to assess Filing and FAIR (Findable, Accessible, Interoperable and Re-usable) progress which resulted in the creation of an R Shiny application. The F2 dashboard, as we call it, creates a color coded visual to demonstrate how the study is progressing - the colors indicate where progress is going well and identify areas where teams may need to place greater emphasis on. The assessment is based on tangible evidence provided by the study team. The F2 dashboard helps gauge the progress of study deliverables at any point of time. It could help understand how the team is FAIRing, help have conversations to know the road blocks and triage the issues that the team is facing Our overall objective is to be Filing and FAIR ready by database lock.
SS-276 : Adopting the New Integrated Analysis Data Reviewer's Guide (iADRG)
Randi McFarland, Ephicacy Consulting Group, Inc.
Srinivas Kovvuri, ADC Therapeutics USA
Kiran Kumar Kundarapu, Merck & Co., Inc
Satheesh Avvaru, Alexion Pharmaceuticals, Inc.
Wednesday, 9:45 AM - 10:05 AM, Location: LVL Ballroom: Continental 8-9
For pharmaceutical and biotechnology companies, the culmination of collecting clinical trials data on their investigational products is providing the integrated data and data analyses to regulatory agencies for approval. Integrated safety and efficacy data are submitted for regulatory review along with Analysis Data Reviewer's Guides. After sharing the draft and addressing public review comments, the PHUSE Optimizing the Use of Data Standards (ODS) Working Group is finalizing the integrated Analysis Data Reviewer's Guide (iADRG) template and supporting documents. These iADRG documents provide clarity and guidance to integrated data and analysis reporting. This paper will discuss key points and examples in adopting and implementing the new iADRG template. The iADRG submission document describes the traceability, and transformation from individual study data to the integrated analysis data. Key analysis considerations around data re-mapping, redefining analysis flags and data integration complexities are provided. Other points include harmonization of analysis data and documentation of differing regulatory agency requirements.
SS-282 : Handling of Vaccine SDTM Data for FDA CBER/OVRR Submission Compliance
Pragathi Mudundi, BioPier Inc.
Zhaoyu Xie, Biopier Inc.
Wednesday, 10:15 AM - 10:35 AM, Location: LVL Ballroom: Continental 8-9
SDTM (Study Data Tabulation Model) is one of the required standards for data submission to the FDA, and following SDTM-IG is the conventional practice. However, submitting Study Datasets for Vaccines to the Office of Vaccines Research and Review (OVRR) in FDA/CBER division has specific rules and few are deviated from SDTM-IG. They are often ignored in practice, which may result in compliance issues and even rejection of vaccine study submissions. In this paper, we will cover the rectification work for a vaccine study to address SDTM compliance with FDA CBER/OVRR rules, which includes re-mapping reactogenicity data, recording solicited AEs to CE and FACE; mapping unsolicited AEs, MAAE (PIMMC, NOCD) and death to AE and FAAE domains; relocating rule-specified information into domains where they belong, and covering the unique mapping of biological assay data, immunogenicity, and efficacy data.
SS-304 : As Simple as Falling Off a Log?: An Unusual Case Study of Mapping Data into the SDTM DA (Drug Accountability) Domain
Susan Mutter, PROMETRIKA, LLC
Monday, 4:30 PM - 4:50 PM, Location: LVL Ballroom: Continental 8-9
The SDTM DA (Drug Accountability) domain tabulates the amount of treatment units dispensed to a subject and the amount returned to gauge dosing compliance for each treated subject in a study and is often collected in a log form format. It seems like a fairly straightforward domain, so mapping subject data to it should be as easy as falling off a log. Or is it? Add together a sponsor and two different CRO's (Contract Research Organizations) with an evolving protocol, separate databases for the double-blind and open label extension portions of the study, and a creative data entry approach and you have a recipe for complexity. This paper will present a case study of mapping drug accountability data that was anything but simple.
e-PostersPO-018 : Visualization for Success: Driving KPIs and Organizational Process Improvements via Portfolio-level Analytics
Philip Johnston, Pinnacle 21
Julie Ann Hood, Pinnacle 21
Monday, 10:30 AM - 11:20 AM, Location: ePoster Station 1
Cloud-based data diagnostic platforms enable organizations to build institutional memory and drive process improvements. Platforms that passively aggregate metrics spare teams from having to "wrangle KPIs" and instead visualize the macro-level trends in their data quality, conformance, standards adoption, submission risks, and team activity at a glance. This poster highlights how these data are showcased in P21 Enterprise's built-in Analytics module and suggests actionable steps based on these trends to support inter- and intra-departmental process improvements. It also demonstrates how the various portfolio-level reports, filters, and views now available within the application support organizations in their coordination efforts and the development of best practices. Impactful use cases include: benchmarking data quality across therapeutic areas and over time, eliminating Reject Issues, monitoring the uptake of new standards, prioritizing Issues for which to create standardized explanations, developing guidance for frequently occurring Validation Rules, visualizing efforts to balance workloads, and encouraging documentation through "gamification."
PO-067 : Submission survival guidelines for Statistical Programmers
Mekhala Acharya, Takeda Pharmaceuticals
Katlyn Buff, Takeda Pharmaceuticals
Norihiko Oharu, Takeda Pharmaceuticals
Monday, 2:00 PM - 2:50 PM, Location: ePoster Station 1
With accelerated cycle times whether it is *PRIME, RTOR or SAKIGAKE, submission processes are being redesigned in major markets. As a result, pharmaceutical companies have to expedite drug approval through streamlined submission activities. This makes statistical programming function, a key stakeholder who needs to focus on a consistent approach across the portfolio in creating high quality submission deliverables while maintaining speed and efficiency. This can only be accomplished through proactive planning, education and awareness of submission requirements at every programming milestone. The intent of this poster is to provide a visualization of the programming life cycle for a successful submission. This will serve as a one-stop shop, consolidating guidelines from multiple sources and simplistically explaining each submission component. These guidelines in the form of a flowchart will cover industry best practices, link to external resources, implications of upfront planning and ongoing risk assessment, all targeting a successful submission package. This poster will illustrate the details of the submission survival guidelines a programming team will need to plan deliverables, and execute an efficient submission. *PRIME, RTOR, SAKIGAKE are accelerated pathways across major markets in EU, US, and Japan.
PO-080 : What PROC SQL can't handle while Data Step can?
Deming Li, Merck & Co., Inc
Tuesday, 10:30 AM - 11:20 AM, Location: ePoster Station 1
INTRODUCTION Our codes often consist of data steps or PROC SQL or both, i.e., they are used interchangeably. Both tools work well in terms of data manipulation but there are quite a few of things that PROC SQL can't handle: 1) Base SAS® can work with raw data file like text file or excel but PROC SQL can't. 2) Base SAS® transposes data easily by using proc transpose or array in data step. 3) Base SAS® can do different arrays or other iterations but the PROC SQL does not do it well. 4) Base SAS® lets us to use first dot and last dot in situations where we want to keep the first record in group or the last. 5) Base SAS® has proc sort with nodupkey that allows us to select distinct observations by the specified by vars. PROC SQL uses the distinct statement which works like the proc sort nodup. 6) Base SAS® creates multiple datasets when doing the merge and provides a good control by using the in= option. PROC SQL can create one table at a time and does not give out control when doing such joins. CONCLUSION Data Step or PROC SQL or Both? It is a good question. When knowing the pros and cons of PROC SQL, it is not hard to address it.
PO-089 : Creating a Centralized Controlled Terminology Mapping Repository
Danny Hsu, Seagen
Shreya Chakraborty, Seagen
Tuesday, 2:00 PM - 2:50 PM, Location: ePoster Station 1
Controlled Terminology (CT) is the set of code lists and valid values used with data items within CDISC-defined datasets for regulatory submission. The use of CT helps harmonize the data for all submitted studies and improves the efficiency of data review. It also opens the door to powerful internal automation for heightened quality and efficiency. The question how to most effectively map the collected data, especially any free-text values from CRFs, into CT plays a very important role in each study and product team's development of CDISC datasets that are not only consistent and compliant, but also support analyses and summaries efficiently. This article shares a concept of creating a centralized repository (such as a spreadsheet) to help streamline the process of CT mapping. A centralized CT mapping sheet is generated by collecting all mapped terms from previous studies and is used to identify the new terms which need attention. Following this process, study team could save time by focusing only on the newly added terms, therefore improving the efficiency of the data-mapping process and increase efficiency and quality of CDISC dataset generation and review. A centralized CT mapping sheet could not only support the creation of code and value list of define.xml, but also provide the format mapping used in SAS programs.
PO-119 : Programmer's Perspective: Step into Awareness Regarding Clinical Trial Deliverables and Their Impact
Lyma Faroz, Seagen Inc.
Jinit Mistry, Seattle genetics
Monday, 2:00 PM - 2:50 PM, Location: ePoster Station 2
There are a variety of deliverables that statistical programmers have to work through in their careers. Those include producing data sets, tables, listings, figures, patient narratives, and patient profiles, among many others. There is often a natural tendency for programmers to focus on generating such outputs without taking a step back and understand why those outputs have been requested in the first place and how they help the clinical study teams in decision making and assessing trial outcomes. When a programmer understands the bigger picture behind why specific analytical output is requested, how it is used to gauge trial outcomes, and how it contributes to the overall strategy in getting investigational product to patients in need, such insights can help increase the quality of production and QC output. A wide range of deliverables will be discussed in this paper, such as interim analyses, SMCs, DSURs, PBRERs, CSRs, publications, conferences, EudraCT and ClinicalTrials.gov, that will help programmers gain a holistic understanding of what drives specific analyses throughout the life of a clinical trial.
PO-143 : Real Time Analytical Reporting Using OpenFDA
Shubhranshu Dutta, University of Rochester
Tuesday, 10:30 AM - 11:20 AM, Location: ePoster Station 2
One of the main challenges during the drug development process is knowing how a drug in a clinical trial interacts/might interact with other drugs/concomitant medications taken by a patient. The FDA has enabled access to real-world data via OpenFDA. With the use of OpenFDA APIs - specifically by creating daily refreshed data reports - we can track the emergence/development of common adverse reactions or drug interactions along with the severity and seriousness of such adverse events across various patients. I will be pulling data from OpenFDA APIs and creating reports that reflect real time data from the OpenFDA database for the purpose of analysis. I will also be exploring the metadata and discussing the interactive charts provided by OpenFDA to help with the right query selection from the database. Utilizing tools such as R libraries, Excel pivot charts and pivot tables, and converting JSON files into data for creating customized reports, I will also be discussing a way of automating reports and notifications to enable a faster alert system for clinical investigators, ensuring greater patient safety.
PO-177 : A model for sponsors to support independent operation of IDMCs in clinical trials.
Bhanu Bayatapalli, University of Thiruvalluvar at INDIA
Yiyi Chen, PhD in Biostatistics from University of Iowa
Tuesday, 10:30 AM - 11:20 AM, Location: ePoster Station 3
The Independent Data Monitoring Committee (IDMC) is entrusted with monitoring patients' safety and benefits along with the power to stop a trial for safety, futility or efficacy based on pre-specified stopping rules. Since IDMC is a wholly independent entity appointed by sponsor, sponsors often engage a third party, like a contract research organization (CRO), to produce reports using the accumulated data for IDMC review and assessment. Therefore, ensuring IDMCs have the highest quality of reports for their review is critical to ensure trials are only stopped for the right reasons or, just as importantly, kept going based on valid conclusions. The question then becomes how a sponsor can best ensure quality of such reports without infringing upon the independence principle. In this paper we will share a working model we found to be successful, with special focus as follows: 1. Development of Scope of Work (SOW), charter, DTP, timelines, and meetings 2. Data sharing between the sponsor and CRO, and blinding of treatment arms 3. Prioritizing data sets and variables for IDMC analysis 4. Open versus closed reports 5. CRO output verification process to promote accurate IDMC review meetings
PO-258 : Using R to Automate Clinical Trial Data Quality Review
Melanie Hullings, TrialSpark
Emily Murphy, TrialSpark
Derek Lawrence, TrialSpark
Michelle Cohen, TrialSpark
Andrew Burd, TrialSpark
Tuesday, 2:00 PM - 2:50 PM, Location: ePoster Station 2
Over the past 20 years, the detection of the majority of clinical research site quality issues has been the remit of Clinical Research Associates (CRAs) performing on-site monitoring. While there have been advances that enable some remote data review of electronic data capture (EDC) systems, there remains a large amount of manual work by CRAs and Clinical Data Managers (CDMs). Deviations from clinical trial requirements, including critical quality issues such as drug noncompliance and patients who were treated but did not meet all eligibility criteria, are logged in order to track and monitor data integrity and patient safety. To more efficiently facilitate detection, our CDM team created a script in R Studio which can surface potential protocol deviations in the EDC dataset. The script produces a list of findings for CRAs to confirm with sites and enter into the deviation database. Initial beta testing is in progress for a Phase II clinical trial and has already identified additional protocol deviations not detected during manual review. While this process will always require manual effort by CRAs to monitor at the site-level and confirm, programmatic issue detection has the potential to decrease manual review burden on study teams and improve real-time issue identification and clinical trial data quality overall. Future directions include refining, expanding, and standardizing scripts and productizing these features in the EDC.
PO-294 : Data Doesn't Lie - Real Time Monitoring and Projecting on Clinical Trial Enrollment Progression
Wenjun He, EMMES
Monday, 10:30 AM - 11:20 AM, Location: ePoster Station 3
Determining enrollment target and projecting realistic enrollment rates are challenging in conducting a clinical trial. Underperforming participant recruitment has been a long-time problem for clinical trial. To address the question whether the ongoing recruitment will enable a trial to reach the planned target number fast enough while staying within budget, the real-time monitoring is key to a successful trial. This paper shows how to use SAS® to automatically visualize the dynamic accrual progression, and simultaneously make a data-driven projection on the accrual rate and the completion date of enrollment target. This real data driven projection can be used as a reference for the study team and sponsors to make necessary future amendment to enrollment planning and timeline scheduling.
PO-296 : Implementing Agile Methodologies: Using Trello™ to Generate and Optimize Kanban Boards for Recurring SAS Programming Tasks
Gina Boccuzzi, PROMETRIKA, LLC.
Patrick Dowe, PROMETRIKA, LLC.
Monday, 2:00 PM - 2:50 PM, Location: ePoster Station 3
Managing Data Management programming requests in a CRO environment presents multiple challenges. Data Management requests come in all shapes and sizes, but they all have one thing in common - the sponsor wants the results as soon as possible. While most requests are on a recurring schedule (quarterly, monthly, weekly), the exact timing of each request is partly dependent on the receipt of external data, either from the sponsor or a third-party vendor. The importance of completing the requests in a timely manner while the data is still current makes prioritization and resource planning difficult. With varying frequencies and timelines coming from multiple sponsors; an adequate system of tracking is needed to fulfill project requirements. Trello™ is one online application that allows the generation and optimization of Kanban Boards in the Agile methodology. A Kanban Board is a visual representation of the life cycle of a task from start to finish. Creating a board to manage requests optimizes work flow and communication, and automates many processes. The use of Kanban Boards works well for Data Management requests, which are a contained set of requests with some regular frequency and limited updates after the initial programming. Successful use in this somewhat limited application suggests that the process can be scaled up for even more complex challenges.
PO-337 : The Effect of Bethanechol on Tracheobronchomalacia in Infants
Chary Akmyradov, Arkansas Children's Research Institute
Tuesday, 2:00 PM - 2:50 PM, Location: ePoster Station 3
This single-center retrospective cohort study included 33 subjects (11 cases, 22 controls) hospitalized between 2017 and 2022. Cases treated with Bethanechol were matched with controls who did not receive Bethanechol. Case-control matching of 1:2 was performed based on birth gestational age, sex, the severity of TBM, the severity of bronchopulmonary dysplasia, and respiratory support. The primary outcome was the effect of Bethanechol on the mean Pulmonary Severity Score. This presentation demonstrates the statistical methods, visualization, and SAS programming behind this clinical study.