这是Computing for Data Analysis课程Week 3的学习笔记。这周的课程比较难，内容主要是作图。下面的笔记主要是课件，内容很多就没有翻译。
1. Simulation
Functions for probability distributions in R：
 rnorm: generate random Normal variates with a given mean and standard deviation
 dnorm: evaluate the Normal probability density (with a given mean/SD) at a point (or vector of points)
 pnorm: evaluate the cumulative distribution function for a Normal distribution
 rpois: generate random Poisson variates with a given rate Probability distribution functions usually have four functions associated with them. The functions are prefixed with a
 d for density
 r for random number generation
 p for cumulative distribution
 q for quantile function
Summary
 Drawing samples from specific probability distributions can be done with r*functions
 Standard distributions are built in: Normal, Poisson, Binomial, Exponential, Gamma, etc.
 The sample function can be used to draw random samples from arbitrary vectors
 Setting the random number generator seed via set.seed is critical for reproducibility
2. Plotting with Base Graphics
Base graphics are usually constructed piecemeal, with each aspect of the plot handled separately through a series of function calls; this is often conceptually simpler and allows plotting to mirror the thought process Base graphics are used most commonly and are a very powerful system for creating 2D graphics.
 Calling plot(x, y) or hist(x) will launch a graphics device (if one is not already open) and draw the plot on the device
 If the arguments to plot are not of some special class, then the default method for plot is called; this function has many arguments, letting you set the title, x axis lable, y axis label, etc.
 The base graphics system has many parameters that can set and tweaked; these parameters are documented in ?par; it wouldn’t hurt to memorize this help page! The par function is used to specify global graphics parameters that a?ect all plots in an R session. These parameters can often be overridden as arguments to specific plotting functions.
 pch: the plotting symbol (default is open circle)
 lty: the line type (default is solid line), can be dashed, dotted, etc.
 lwd: the line width, specified as an integer multiple(default is 1)
 col: the plotting color, specified as a number, string, or hex code; the colors function gives you a vector of colors by name(default is black)
 las: the orientation of the axis labels on the plot
 bg: the background color(default is transparent)
 mar: the margin size(default is bottom5.1 left4.1 top4.1 right2.1)
 oma: the outer margin size (default is 0 for all sides)
 mfrow: number of plots per row, column (plots are filled rowwise)
 mfcol: number of plots per row, column (plots are filled columnwise)
 plot: make a scatterplot, or other type of plot depending on the class of the object being plotted
 lines: add lines to a plot, given a vector x values and a corresponding vector of y values (or a 2column matrix); this function just connects the dots
 points: add points to a plot
 text: add text labels to a plot using specified x, y coordinates
 title: add annotations to x, y axis labels, title, subtitle, outer margin
 mtext: add arbitrary text to the margins (inner or outer) of the plot
 axis: adding axis ticks/labels The list of devices is found in ?Devices; there are also devices created by users on CRAN
 pdf: useful for linetype graphics, vector format, resizes well, usually portable
 postscript: older format, also vector format and resizes well, usually portable, can be used to create encapsulated postscript files, Windows systems often don’t have a postscript viewer
 xfig: good of you use Unix and want to edit a plot by hand
 png: bitmapped format, good for line drawings or images with solid colors, uses lossless compression (like the old GIF format), most web browsers can read this format natively, good for plotting many many many points, does not resize well
 jpeg: good for photographs or natural scenes, uses lossy compression, good for plotting many many many points, does not resize well, can be read by almost any computer and any web browser, not great for line drawings
 bitmap: needed to create bitmap files (png, jpeg) in certain situations (uses Ghostscript), also can be used to create a variety of other bitmapped formats not mentioned
 bmp: a nativeWindows bitmapped format
3. Plotting with Lattice
Lattice Functions:
 xyplot: this is the main function for creating scatterplots
 bwplot: boxandwhiskers plots (“boxplots”)
 histogram: histograms
 stripplot: like a boxplot but with actual points
 dotplot: plot dots on “violin strings”
 splom: scatterplot matrix; like pairs in base graphics system
 levelplot, contourplot: for plotting “image” data
Lattice functions generally take a formula for their first argument, usually of the form:y ~ x  f * g 
 On the left of the ~ is the y variable, on the right is the x variable

After the are conditioning variables — they are optional; the * indicates an interaction  The second argument is the data frame or list from which the variables in the formula should be obtained.
 If no data frame or list is passed, then the parent frame is used.
 If no other arguments are passed, there are defaults that can be used.
Lattice functions behave differently from base graphics functions in one critical way.
 Base graphics functions plot data directly the graphics device
 Lattice graphics functions return an object of class trellis.
 The print methods for lattice functions actually do the work of plotting the data on the graphics device.
 Lattice functions return “plot objects” that can, in principle, be stored (but it’s usually better to just save the code + data).
 On the command line, trellis objects are autoprinted so that it appears the function is plotting the data
 Lattice functions have a panel function which controls what happens inside each panel of the entire plot.
4. Graphics with ggplot2
The ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson’s Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers. (from Wikipedia)
The ggplot2 package implements a system for creating graphics in R based on a comprehensive and coherent grammar. This provides a consistency to graph creation often lacking in R, and allows the user to create graph types that are innovative and novel.
The simplest approach for creating graphs in ggplot2 is through the qplot() or quick plot function. The format is: qplot(x, y, data=, color=, shape=, size=, alpha=, geom=, method=, formula=,facets=, xlim=, ylim=, xlab=, ylab=, main=, sub=)
 x : x values
 y : y values
 data: data frame to use (optional). If not specified, will create one, extracting vectors from the current environment.
 facets : faceting formula to use. Picks facet_wrap or facet_grid depending on whether the formula is one sided or twosided
 margins : whether or not margins will be displayed
 geom : character vector specifying geom to use. Defaults to “point” if x and y are specified, and “histogram” if only x is specified.
 stat :character vector specifying statistics to use position :character vector giving position adjustment to use
 xlim : limits for x axis
 ylim : limits for y axis
 log : which variables to log transform (“x”, “y”, or “xy”)
 main : character vector or expression for plot title
 xlab : character vector or expression for x axis label
 ylab : character vector or expression for y axis label
 asp : the y/x aspect ratio
More about ggplot2 package: ggplot2 reference manual; ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham; graphics with ggplot2 in QuickR
Comments
kenny: 这个把Matlab懂些了就是相通的吧