介紹MARS®(Multivariate Adaptive Regression Splines) is a companion to CART that focuses on the development and deployment of accurate and easy-to-understand regression models.
MARS excels at finding thresholds and breaks in the relationships between a set of inputs and is thus ideal for detecting changes in the behavior of individuals or processes over time. Of all the Salford tools, MARS is the most adept at working with the small data sets frequently encountered in engineering contexts. MARS has also been involved in winning data mining competitions focused on large database customer relationship management (CRM) topics. Areas where MARS has exhibited very high-performance results include forecasting electricity demand for power generating companies, relating customer satisfaction scores to the engineering specifications of products, and presence/absence modeling in geographical information systems (GIS).
MARS is an innovative and flexible modeling tool that automates the building of accurate predictive models for continuous and binary dependent variables. Multivariate Adaptive Regression Splines was developed in the early 1990s by Jerry Friedman, a world-renowned statistician and one of the co-developers of CART. Salford Systems' MARS, based on the original code, has been substantially enhanced with new features and capabilities in exclusive collaboration with Friedman.
MARS excels at finding optimal variable transformations and interactions, the complex data structure that often hides in high dimensional data. In doing so, this new generation approach to data mining uncovers business critical data patterns and relationships that are difficult, if not impossible, for other approaches to uncover.
Given a target variable and a set of candidate predictor variables, MARS automates all aspects of model development, including:
Separating relevant from irrelevant predictors
Large numbers of variables are examined using efficient algorithms, and all promising variables are identified.
Transforming predictor variables exhibiting a nonlinear relationship with the target variable
Every variable selected for entry into the model is repeatedly checked for non-linear response. Highly non-linear functions can be traced with precision via essentially piecewise regression.
Determining interactions between predictor variables
MARS repeatedly searches through the interactions allowed by the analyst. Unlike recursive partitioning schemes, MARS models may be constrained to forbid interactions of certain types, thus allowing some variables to enter only as main effects, while allowing other variables to enter as interactions, but only with a specified subset of other variables.
Handling missing values with new nested variable techniques
Certain variables are deemed to be meaningful (possibly non-missing) in the model only if particular conditions are met (e.g., X has a meaningful non-missing value only if categorical variable Y has a value in some range).
Conducting extensive self tests to protect against overfitting
The user can choose to reserve a random subset of data for test, or use v-fold cross validation to tune the final model selection parameters.
MARS enables analysts to rapidly search through all possible models and to quickly identify the optimal solution, providing insights that can lead to a definitive competitive advantage. Because the software can be exploited via an easy-to-use GUI, intelligent default settings, and aesthetically appealing output, for the first time analysts at all levels can easily access MARS' innovations.
MARS for Windows also incorporates two alternative control modes that extend the program's features and capabilities. In addition to controlling MARS with the GUI, you can also issue commands at the command prompt or submit a command file.
User-Friendly Graphical User Interface
MARS' easy-to-use GUI allows the user to control the variables and functional forms to be entered into the model and the interactions to be considered or forbidden, while allowing the MARS algorithm to optimize those parts of the model the analyst chooses to leave free. Once the model is selected, the user can easily remove or add terms, instantly see the impact of changes on model fit, review diagnostics that assist in model selection, save the model and apply the model to new data for prediction.
MARS output is an easy-to-deploy regression model that can be automatically applied to new data from within MARS itself or exported as ready-to-run SAS® and C source code. To facilitate interpretation of the model, the output also includes interpretive summary reports as well as exportable two- and three-dimensional curve and surface plots:
The MARS® modeling engine is ideal for users who prefer results in a form similar to traditional regression while capturing essential nonlinearities and interactions. The MARS methodology’s approach to regression modeling effectively uncovers important data patterns and relationships that are difficult, if not impossible, for other regression methods to reveal. The MARS modeling engine builds its model by piecing together a series of straight lines with each allowed its own slope. This permits the MARS modeling engine to trace out any pattern detected in the data.
The MARS Model is designed to predict numeric outcomes such as the average monthly bill of a mobile phone customer or the amount that a shopper is expected to spend in a web site visit. The MARS engine is also capable of producing high quality classification models for a yes/no outcome. The MARS engine performs variable selection, variable transformation, interaction detection, and self-testing, all automatically and at high speed.
Areas where the MARS engine has exhibited very high-performance results include forecasting electricity demand for power generating companies, relating customer satisfaction scores to the engineering specifications of products, and presence/absence modeling in geographical information systems (GIS).
Stat-200 is a remarkably comprehensive general statistics package. It incorporates all the descriptive statistics, parametric and non-parametric statistical tests, graphics and data transforms you will need for processing, analyzing and presenting data. Parametric tests and procedures in Stat-200 total 60 and there are 66 non-parametric tests and procedures and 30 different descriptive statistics. ANOVA is comprehensive and includes general N factor analysis. Many unusual procedures are included e.g. survival analysis. All tests have associated information windows which give literature referen