Agenda

8:30 - 9:00	Conference Registration
9:00 - 9:15	Welcome Address Speech: Prof. Marek Rocki Rector of SGH Warsaw School of Economics Prof. Michał Jerzy Zasada Vice-Rector for International Cooperation of Warsaw University of Life Sciences - SGGW Prof. Joanna Plebaniak Dean of the Collegium of Economic Analysis, SGH Warsaw School of Economics Prof. Tomasz Panek Deputy Director of the Institute of Statistics and Demography, SGH Warsaw School of Economics
9:15 - 9:45	What are the high-dimensional data used for? Business of the predictive medicine Prof. Maciej Wiznerowicz Professor of Medicine, Poznan University of Medical Sciences, Principal Investigator, Greater Poland Cancer Centre, President and co-founder, International Institute for Molecular Oncology in Poznan, POLAND Understanding complex mechanisms of the disease leads ultimately to specific diagnostics and effective treatments. Sequencing of the human genome at the beginning of the 21st century has opened the era of the Big Data in biomedical sciences. Since then, tens of thousands of genomes have been sequenced around the world from various patients’ populations. The initial analyses led to identification of myriads genetic features that are associated with predispositions for common diseases like: cancer, diabetes, cardiovascular diseases, neurodegenerative disorders and many others. The sequencing of the cancer genomes and transcriptomes over the last decade is paving new ways for precision oncology and more effective treatment for cancer. Petabytes of the obtained genomic pushed the computational biology towards implementation of machine learning for modelling of the biomedical processes. Natural language processing is currently being implemented to extract actionable information from the medicals texts. On the other hand, development of the telecommunication technologies (5G) and conscious patients engagement as stakeholders of the connected health allow for collection and monitoring of the prospective medical data. We are entering the era of the internet of people.
9:45 - 10:30	Keynote Presentation: What are the high-dimensional data challenges? Business of statistical bioinformatics Prof. Tomasz Burzykowski Professor of Biostatistics and Bioinformatics, Hasselt University,Vice-President of Research at the International Drug Development Institute (IDDI) in Louvain-la-Neuve, BELGIUM In the past decade, rapid developments has taken place in genetics and molecular biology. The developments have also very much influenced the field of biostatistics. Biostatisticians working in clinical research are daily confronted with high‑dimensional “omic” (genomic, proteomic, metabolomic, ….) data generated by a variety of advanced technologies used in molecular biology. The data are voluminous and have got a complex structure. Hence, their analysis faces important challenges that include the need to manipulate huge datasets, to deal with more variables than observations, to handle numerical complexity of statistical methods, etc. In the presentation, the various challenges and their consequences for biostatistician’s daily practice and, hence, for biostatistical education programs will be discussed and illustrated.
10:30 - 11:00	Big Data: how to improve health care and research Prof. Jeanine J. Houwing - Duistermaat Professor of Data Analytics and Statistics, Department of Statistics, University of Leeds, UK, Professor of Statistical Genetics, Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, THE NETHERLANDS The falling cost of high throughput techniques such as proteomics and whole genome sequencing, the widespread adoption of clinical e-health record (EHR) systems and the rise in collection of diverse personal data, imply that we now have ‘big data’ in the health sector. It is often hypothesized that these data sources can be used to improve health care and to reduce costs further. Due to the variety in systems for and data, lack of common infrastructure, security of systems, privacy issues etc. the health care sector lags far behind compared to other sectors with regard to making use of big data. Lack of statistical methodology will be a limiting factor too. Compared to other sectors, biological processes are complex and hard to unravel. As a consequence the measured datasets are very diverse, heterogeneous and noisy. There is a need for methods for integrated analysis of these datasets. Another challenge is to use electronic health records linked with other type of data such as data on pollution, weather or family structure for innovative and statistically efficient sampling designs. To accomplish the goals with regard to using big data in the health sector a multidisciplinary approach will be essential.
11:00 - 11:30	Break
11:30 - 12.15	Keynote Presentation: The role of biomedical data standards in the era of Big Data Jack Shostak Associate Director of Statistics, Duke Clinical Research Institute, Durham, NC USA We are living in an exciting time of biomedical big data, amazing computational power, and big dreams for leveraging biomedical data stores into knowledge. However, biomedical big data is not just big because it is voluminous, it is big because of the complexity, diversity, and lack of data organization. One of the ongoing roadblocks in moving big biomedical research forward is data integration, and a way to reduce that roadblock is to employ biomedical data standards. This talk will examine the various biomedical data standards (e.g., HL7, CDISC) and technologies, and look at a number of standards based applications in the public and private sectors. The current state of affairs with regards to biomedical data standards will be summarized, and suggestions for a path forward will be offered.
12:15 - 13:00	Challenges in genetic profiling of cancers and implications for The Cancer Genome Atlas project Prof. Przemysław Biecek Associate Professor, Warsaw University of Technology, University of Warsaw, POLAND The advent of high throughput techniques in molecular biology has created an opportunity to conduct very precise scanning of various genetic traits in a large quantity of samples. For example, The Cancer Genome Atlas is a rich source of, literally, millions of biomarkers for over 11 000 patients across 33 forms of cancer. The size of the full raw datasets from a single platform is measured in terabytes. Such data may be used in the development of new genetic signatures for various traits of clinical significance, e.g. predictive signatures for drug response or future survival. In this instance, signature is simply a classifier, but due to the size of the data and large number of biomarkers, it remains a challenge to train a stable and useful signature. During the talk I will compare a number of methodologies that may be used in this context, such as random forest, gradient boosting and regularized logistic regression. Also, we will discuss other challenges that arise from analysis of such data, i.e. variability of results between different snapshots, reproducibility or accessibility of partial results. We will summarize with an overview of infrastructure that is needed to satisfy computational requirements when dealing with such data.
13:00 - 13:30	Mixed Effects Models: Software and Examples of Applications Prof. Andrzej Gałecki Research Professor, Division of Geriatric Medicine, Department of Internal Medicine and Institute of Gerontology, University of Michigan Medical School, USA Mixed effects models are an important class of statistical models that can be used to analyze correlated data. Today’s applied statistician or research analyst has the luxury of working with variety of powerful software procedures designed to fit mixed effects models. Software developments are propelled both by advances in statistical methodology and technological progress in meeting computational demands. In addition to general-purpose software, stand-alone software has also been developed and became popular within specific disciplines, such as social sciences, econometrics, pharmaceutical community, ecology and linguistic research. We acknowledge general trend of developing multipurpose software suitable to fit other classes of models, such as latent class models, structural equation models in addition to mixed effects models. Example of an advanced mixed effects models applications will be briefly discussed.
13:30 - 14:15	Lunch
14:15 - 14:45	Medical data - a challenge and an opportunity Prof. Magdalena Chechlińska Head, Department of Immunology, Cancer Centre and Institute of Oncology, Warsaw, POLAND Medical data is remarkably heterogeneous and comes from many, usually non-integrated, sources. In addition, it is often unstructured and of poor quality. Hospital information systems that collect and store most of the medical data, provide limited data mining functionalities. Medical and research data of the M. Skłodowska-Curie Memorial Cancer Centre and Institute of Oncology (Poland) have been integrated in a data warehouse, being part of ONKOSYS ‑ “A Comprehensive IT Platform for Cancer Research” developed with the European Union funds. The ONKOSYS warehouse provides numerous analytical tools, including SAS Advanced Analytics tools. Moreover, in order to extract data from text records, such as physicians’ freeform notes and descriptions of test results (currently over 8 mln records, with a monthly increase of 60,000), SAS Content Categorization Studio has been applied for the first time in Poland for medical research purposes. The ONKOSYS data is not only a challenge for analysts, but also an exciting opportunity to explore large amounts of previously inaccessible, valuable medical datasets.
14:45 - 15:30	Extracting information from a medical text documents Dominik Spinczyk Faculty of Biomedical Engineering, Silesian University of Technology, POLAND Mariusz Dzieciątko Business Solution Manager, Technology and Big Data Competency Center, SAS Institute, POLAND Taking into account the problem of the length of the query in text analysis the paper presents the possibility of direct comparison of medical text content by using unstructured representation of document information in frequency matrix of terms. Dimensionality reduction is performed using Latent Semantic Indexing method. Two common metrics are used: Cosine distance and Jaccard metric. The analysis was performed using SAS Text Analytics elements on the set of a few hundreds of medical text documents.
15:30 - 15:40	Closing Remarks: Prof. Ewa Frątczak Head of Event History and Multilevel Analysis Unit, SGH Warsaw School of Economics
15:40	Networking Coffee