San Francisco Bay Area Chapter of the American Statistical
Association Meeting
Title Big
Data Meets Statistics
Presenter Dr.
John Yijiang Li, Principal Data Scientist at Caspida
Time March
22th, 4:30 - 6pm (4:30-5pm networking, 5-6pm presentation with Q&A)
Location 701B
North Shoreline Blvd. Mountain
View, CA 94043
Registration
and Event fee
Pre-registration
is required for this event. Please send an email to
sfasaofficers@gmail.com by March
20th 12pm PST with the following information:
Name, current
affiliation and contact information
Are you a current member
or plan to renew the membership/join the chapter on site
Free
to all current SFASA members. You are welcome to join the
chapter on site. Regular membership fee is $9 and student membership is $3 per
year. Please have a check of the amount due paid to SFASA and a photo ID ready
for this purpose. If paying by cash, please have the exact amount. We do not
accept credit card payment at this moment
Abstract
Is data scientist just a statistician who lives in San
Francisco and data science just statistics on a Mac? This talk will provide
some answers and thoughts related to this question. The speaker will share
practices and lessons learned as a statistician working on big/fast data
problems while wearing the data scientist title.
The first part of the talk will cover techniques and
concepts that are useful in data science and big/fast data analytics but are
less considered or embraced by traditional statistical world. The speaker will
dive into both stream (online) and batch (offline) computing paradigms to
illustrate the potential gap. Practical examples based on my industrial
experience will be provided to help with the presentation. The second part will
briefly introduce the big data "A" team, namely Apache Spark, Apache
Cassandra and Apache Kafka, a set of open source software packages that are
crucial for the recent advancement in big/fast data analytics.
Short
Bio of the Speaker
Dr. John Yijiang Li received his
Ph.D. in Biostatistics from the University of Michigan at Ann Arbor. Part of
his dissertation work on optimizing Kidney Paired Donation (KPD) programs was
awarded the IBS (ENAR) distinguished paper award. After his doctorate, John
joined Google to help evaluate and improve its search products, where he was a
lead data scientist for search suggest and search feature (user experience)
projects. John left Google for Caspida, an
early-stage startup in enterprise security (later acquired by Splunk), where he serves as a principal data scientist to
tackle challenge machine learning problems in cyberattacks. John is passionate
about applying statistical learning methods with open-source technologies in
the "big data" world.