San Francisco Bay Area Chapter of the American Statistical Association Meeting

Title Big Data Meets Statistics

Presenter Dr. John Yijiang Li, Principal Data Scientist at Caspida

Time March 22th, 4:30 - 6pm (4:30-5pm networking, 5-6pm presentation with Q&A)

Location 701B North Shoreline Blvd. Mountain View, CA 94043

Registration and Event fee

Pre-registration is required for this event. Please send an email to sfasaofficers@gmail.com by March 20th 12pm PST with the following information:

Name, current affiliation and contact information

Are you a current member or plan to renew the membership/join the chapter on site

Free to all current SFASA members. You are welcome to join the chapter on site. Regular membership fee is $9 and student membership is $3 per year. Please have a check of the amount due paid to SFASA and a photo ID ready for this purpose. If paying by cash, please have the exact amount. We do not accept credit card payment at this moment

Abstract

Is data scientist just a statistician who lives in San Francisco and data science just statistics on a Mac? This talk will provide some answers and thoughts related to this question. The speaker will share practices and lessons learned as a statistician working on big/fast data problems while wearing the data scientist title.

The first part of the talk will cover techniques and concepts that are useful in data science and big/fast data analytics but are less considered or embraced by traditional statistical world. The speaker will dive into both stream (online) and batch (offline) computing paradigms to illustrate the potential gap. Practical examples based on my industrial experience will be provided to help with the presentation. The second part will briefly introduce the big data "A" team, namely Apache Spark, Apache Cassandra and Apache Kafka, a set of open source software packages that are crucial for the recent advancement in big/fast data analytics.

Short Bio of the Speaker

Dr. John Yijiang Li received his Ph.D. in Biostatistics from the University of Michigan at Ann Arbor. Part of his dissertation work on optimizing Kidney Paired Donation (KPD) programs was awarded the IBS (ENAR) distinguished paper award. After his doctorate, John joined Google to help evaluate and improve its search products, where he was a lead data scientist for search suggest and search feature (user experience) projects. John left Google for Caspida, an early-stage startup in enterprise security (later acquired by Splunk), where he serves as a principal data scientist to tackle challenge machine learning problems in cyberattacks. John is passionate about applying statistical learning methods with open-source technologies in the "big data" world.