Statistically Significant
About
Posts
Projects
Posts
Monitoring Price Fluctuations of Book Trade-In Values on Amazon
I am planning to finish school soon and I would like to shed some weight before moving on. I have collected a fair number of books that …
Andrew J. Landgraf
Apr 8, 2015
3 min read
data
,
R
,
scrape
Time Stacking and Time Slicing in R
Time lapses are a fun way to quickly show a long period of time. They typically involve setting up your camera on a tripod and taking …
Andrew J. Landgraf
Dec 24, 2014
3 min read
photography
,
R
Yet Another Baseball Defense Statistic
Fangraphs recently published an interesting dataset that measures defensive efficiency of fielders. For each player, the Inside Edge …
Andrew J. Landgraf
Apr 22, 2014
7 min read
statistics
,
data
,
baseball
,
R
Hastie and Tibshirani Interview Jerome Friedman
Trevor Hastie and Rob Tibshirani are currently teaching a MOOC covering an introduction to statistical learning. I am very familiar …
Andrew J. Landgraf
Mar 2, 2014
2 min read
statistics
,
data
Top Songs by Artist on CD102.5 in 2013
In a previous post, I showed you how to scrape playlist data from Columbus, OH alternative rock station CD102.5. Since it's the end of …
Andrew J. Landgraf
Dec 27, 2013
3 min read
shiny
,
data
,
ggplot
,
Visualization
,
R
When Did CD102.5 Book the Summerfest Artists?
CD1025’s Playlist and Summerfest Last time, I showed you how to download CD1025’s playlist back to last year and did some …
Aug 27, 2013
4 min read
statistics
,
ggplot
,
Visualization
,
R
Downloading and Analyzing CD1025's Playlist
CD1025 is an “alternative” radio station here in Columbus. They are one of the few remaining radio stations that are independently …
Aug 20, 2013
5 min read
data
,
exploratory data analysis
,
ggplot
,
Visualization
,
R
What Is the Probability of a 16 Seed Beating a 1 Seed?
Note: I started this post way back when the NCAA men's basketball tournament was going on, but didn't finish it until now. Since the …
Apr 21, 2013
6 min read
statistics
,
GAM
,
college basketball
,
ggplot
,
R
Easily Access Academic Journals Off Campus with a Firefox Bookmark
As a grad student, I do lots of searches for research related to my own. When I am off campus, a lot of the relevant results are not …
Apr 18, 2013
2 min read
firefox
,
productivity
Copying Data from Excel to R and Back
A lot of times we are given a data set in Excel format and we want to run a quick analysis using R's functionality to look at advanced …
Feb 24, 2013
4 min read
data
,
R
Text Decryption Using MCMC
The famous probabilist and statistician Persi Diaconis wrote an article not too long ago about the "Markov chain Monte Carlo (MCMC) …
Jan 23, 2013
8 min read
statistics
,
ggplot
,
Visualization
,
Bayesian
,
optimization
,
R
Restricted Boltzmann Machines in R
Restricted Boltzmann Machines (RBMs) are an unsupervised learning method (like principal components). An RBM is a probabilistic and …
Jan 14, 2013
6 min read
Deep Learning
,
ggplot
,
Visualization
,
machine learning
,
collaborative filtering
,
R
Factor Analysis of Baseball's Hall of Fame Voters
Factor Analysis of Baseball's Hall of Fame Voters body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: …
Jan 9, 2013
6 min read
statistics
,
Hall of Fame
,
Visualization
,
baseball
,
Factor Analysis
,
R
Quick Post About Getting and Plotting Polls in R
With the election nearly upon us, I wanted to share an easy way I just found to download polling data and graph a few with ggplot2. …
Nov 5, 2012
3 min read
Election
,
ggplot
,
Visualization
,
Polls
,
R
Finding the Best Subset of a GAM using Tabu Search and Visualizing It in R
Finding the best subset of variables for a regression is a very common task in statistics and machine learning. There are statistical …
Aug 24, 2012
7 min read
GAM
,
ggplot
,
Visualization
,
optimization
,
tabu search
,
R
A Matrix Factorization Model for Hitter/Pitcher Matchups
Introduction Matrix factorization has been proven to be one of the best ways to do collaborative filtering. The most common example of …
Aug 10, 2012
6 min read
libFM
,
baseball
,
collaborative filtering
,
Matrix Factorization
The Magical Sparse Matrix
I have been toying around with Kaggle's Million Song Dataset Challenge recently because I have some interest in collaborative filtering …
Jul 20, 2012
2 min read
data
,
MATLAB
Random Forest Variable Importance
Random forests ™ are great. They are one of the best "black-box" supervised learning methods. If you have lots of data and lots of …
Jul 19, 2012
7 min read
statistics
,
Random forest
,
R
Rounding in R
Forgive me if you are already aware of this, but I found it quite alarming. I know that most code is interpreted by the computer in …
Jun 15, 2012
6 min read
MATLAB
,
R
Space Time Swing Probability Plot for Ichiro
I was having some fun with PITCHf/x data and generalize additive models. PITCHf/x keeps track of the trajectory, path, location of …
May 30, 2012
2 min read
GAM
,
Visualization
,
baseball
,
R
Sending a Text in R
Don't you hate it when you are running a long piece of code and you keep checking the results every 15 minutes, hoping it will finish? …
May 25, 2012
1 min read
R
Cleveland Indians' Attendance
Recently, Chris Perez, the closer for the Indians, displayed some frustration with the fans for not supporting the team. Currently, …
May 20, 2012
3 min read
Visualization
,
baseball
,
R
What's Up with Albert Pujols?
After signing a huge deal with the Angels, Pujols has been having a really bad year. He hasn't hit a home run this year, breaking a …
May 5, 2012
3 min read
Bayesian
,
baseball
,
R
Visualizing the Correlations of a Matrix
Correlation matrices are a common way to look at the dependence of a set of variables. When the variables have spatial relationships, …
Feb 17, 2012
3 min read
Visualization
,
R
Unsupervised Image Segmentation with Spectral Clustering with R
That title is quite a mouthful. This quarter, I have been reading papers on Spectral Clustering for a reading group. The basic goal of …
Feb 12, 2012
5 min read
image segmentation
,
spectral clustering
,
R
Using JMP to Create a Map
I am a big fan of SAS's JMP software. It is the first statistical program I learned and I really like how the emphasize visualization. …
Mar 10, 2011
2 min read
JMP
,
Visualization
Empirical Bayes Estimation of On Base Percentage
I guess you could call this On Bayes Percentage. *cough* Fresh off learning Bayesian techniques in one of my classes last quarter, I …
Dec 31, 2010
5 min read
Bayesian
,
baseball
,
R
Week 3 NFL Survival Odds
Continuing my series of trying to figure out which team is best to pick for survival football and then ignoring it, I present my week 3 …
Sep 21, 2010
2 min read
Week 2 NFL Survival Odds
So this is late, but I already did the analysis and I wanted to share my results for posterity. I used the same method as last time to …
Sep 20, 2010
3 min read
NFL Survival Odds
The NFL season is starting tomorrow night and I am in a survival league this year. If you are not familiar, in a survival league, each …
Sep 8, 2010
4 min read
Why We Blog
If the past is a predictor of future performance, then there is about a 99.3% chance that I will stop updating this in 2 weeks. But you …
Aug 29, 2010
1 min read
Cite
×