# Regression Analysis

### Regression Analysis

Quantitative Methods Project Regression Analysis for the pricing of players in the Indian Premier League Executive Summary The selling price of players at IPL auction is affected by more than one factor. Most of these factors affect each other and still others impact the selling price only indirectly. The challenge of performing a multiple regression analysis on more than 25 independent variables where a clear relationship cannot be obtained is to form the regression model as carefully as possible. Of the various factors available we have leveraged SPSS software for running our regression analysis.

One of the reasons for preferring SPSS over others was the ease with which we can eliminate extraneous independent variables. The two methodologies used for choosing the best model in this project are: * Forward Model Building: Independent variables in order of their significance are incrementally added to the model till we achieve the optimum model. * Backward Elimination: The complete set of independent variables is regressed and the least significant predictors are eliminated in order to arrive at the optimum model.

Our analysis has shown that the following variables are the most significant predictors of the selling price: COUNTRY : whether the player is of Indian origin or not AGE_1 : whether the player is below 25 years or not T_RUNS : total number of test runs scored by the player ODI_RUNS : total number of runs scored in ODI matches ODI_WICKET : total number of wickets taken by the player RUNS_S : total number of runs scored by the player BASE_PRICE : the base price of the player set in IPL

Using the calculated coefficients the regression model equation can be stated as below: SOLD PRICE = -13366. 247 + 219850. 349(COUNTRY) + 204492. 531(AGE_1) -59. 957 (T_RUNS) + 53. 878 (ODI_RUNS) + 491. 636 (ODI_WICKET) + 194. 445(RUNS_S )+ 1. 442(BASE_PRICE) Analysis of Results * Following is a snapshot of the estimated best regression model ( explained in depth as part of answer to Q no 1) Model Summary| Model| R| R Square| Adjusted R Square| Std. Error of the Estimate| 1| . 772a| . 597| . 573| 265690. 463| a. Predictors: (Constant), BASE_PRICE, AGE_1, RUNS_S, ODI_WICKET, COUNTRY, T_RUNS, ODI_RUNS| From the regression model we have estimated BASE_PRICE is found to be the highest impact predictor. This implies that more than anything else the benchmark base price of a player is the single strongest determinant of the selling price of the player. * The analysis shows that T_RUNS, i. e. amount of runs scored in test matches negatively impacts the selling price of the player. It is surprising though not unexpected to find that superior performance by a batsman in test matches reduces his worth in IPL auctions. The positive correlation between AGE_1 and selling price indicates that the younger a player the higher is his expected compensation. * Players from India are expected to command much higher bids than their foreign counterparts, as evidenced by the positive coefficient of COUNTRY. * Another observation is that the total amount of runs scored by a player positively impacts his selling price. * The R Square value of the model comes out to be 0. 597 (and the adjusted R Square value is 0. 573). This small value of R Square indicates that our regression model has limitations. The standard error of the estimate is found to be large and equal to 265690. 463. Q3 What is the impact of ability to score “SIXERS” on the player’s price? In order to analyze the impact of the variable “SIXERS”, we add it in the regression model and then we observe that the probability of T statistics for SIXERS is 0. 862 and the value of RUNS_S is 0. 0504 which makes it in the rejection region. So this means that the impact of this variable has already been covered in the RUNS_S variable and hence adding this variable for regression is not a good idea.

Model Summary| Model| R| R Square| Adjusted R Square| Std. Error of the Estimate| 1| . 772a| . 597| . 570| 266752. 420| a. Predictors: (Constant), SIXERS, AGE_1, ODI_WICKET, BASE_PRICE, COUNTRY, T_RUNS, RUNS_S, ODI_RUNS| ANOVAb| Model| Sum of Squares| df| Mean Square| F| Sig. | 1| Regression| 1. 274E13| 8| 1. 592E12| 22. 378| . 000a| | Residual| 8. 610E12| 121| 7. 116E10| | | | Total| 2. 135E13| 129| | | | a. Predictors: (Constant), SIXERS, AGE_1, ODI_WICKET, BASE_PRICE, COUNTRY, T_RUNS, RUNS_S, ODI_RUNS| b. Dependent Variable: SOLD_PRICE| | | |

Coefficientsa| Model| Unstandardized Coefficients| Standardized Coefficients| t| Sig. | 95% Confidence Interval for B| | B| Std. Error| Beta| | | Lower Bound| Upper Bound| 1| (Constant)| -13757. 183| 49696. 116| | -. 277| . 782| -112143. 752| 84629. 386| | COUNTRY| 221562. 322| 54461. 595| . 269| 4. 068| . 000| 113741. 230| 329383. 414| | AGE_1| 203637. 395| 77003. 067| . 165| 2. 645| . 009| 51189. 514| 356085. 275| | T_RUNS| -58. 977| 17. 455| -. 479| -3. 379| . 001| -93. 533| -24. 421| | ODI_RUNS| 53. 455| 16. 302| . 471| 3. 79| . 001| 21. 182| 85. 728| | ODI_WICKET| 490. 322| 227. 281| . 134| 2. 157| . 033| 40. 358| 940. 286| | RUNS_S| 180. 730| 92. 993| . 273| 1. 943| . 054| -3. 373| 364. 834| | BASE_PRICE| 1. 437| . 177| . 541| 8. 112| . 000| 1. 087| 1. 788| | SIXERS| 379. 400| 2170. 467| . 022| . 175| . 862| -3917. 611| 4676. 411| a. Dependent Variable: SOLD_PRICE| | | | | | Q 4 What is the impact of the predictors’ batting strike rate and bowling strike rate on pricing? Identify the predictor that has the highest impact on the price of players.

In order to analyze the impact of the predictors’ batting strike rate and bowling strike rate on pricing , we first added only these two independent variables in the regression model and observed that the R Square value comes out to be too low . 051. Hence there is no regression relationship between the independent variables and the dependent variable ‘SOLD PRICE’. Model Summary| Model| R| R Square| Adjusted R Square| Std. Error of the Estimate| 1| . 226a| . 051| . 036| 399371. 359| a. Predictors: (Constant), SR_BL, SR_B| | ANOVAb|

Model| Sum of Squares| df| Mean Square| F| Sig. | 1| Regression| 1. 092E12| 2| 5. 462E11| 3. 424| . 036a| | Residual| 2. 026E13| 127| 1. 595E11| | | | Total| 2. 135E13| 129| | | | a. Predictors: (Constant), SR_BL, SR_B| | | | b. Dependent Variable: SOLD_PRICE| | | | Coefficientsa| Model| Unstandardized Coefficients| Standardized Coefficients| t| Sig. | 95% Confidence Interval for B| | B| Std. Error| Beta| | | Lower Bound| Upper Bound| 1| (Constant)| 217351. 463| 123693. 460| | 1. 757| . 081| -27415. 572| 462118. 497| | SR_B| 2188. 101| 980. 961| . 93| 2. 231| . 027| 246. 955| 4129. 246| | SR_BL| 3502. 089| 2307. 595| . 131| 1. 518| . 132| -1064. 224| 8068. 402| a. Dependent Variable: SOLD_PRICE| | | | | | Now we added the two independent variables of batting strike rate and bowling strike rate along with the previous list of independent variables in the regression model and observed that the probability of t statistic for the two independent variables of batting strike rate and bowling strike rate is found to be . 958 and . 935 respectively which means it falls in the rejection region.

Hence we can conclude that these two variables do not have any regression relationship with the independent variable. Model Summary| Model| R| R Square| Adjusted R Square| Std. Error of the Estimate| 1| . 772a| . 597| . 566| 267884. 659| a. Predictors: (Constant), BASE_PRICE, SR_BL, SR_B, COUNTRY, ODI_WICKET, AGE_1, ODI_RUNS, RUNS_S, T_RUNS| ANOVAb| Model| Sum of Squares| df| Mean Square| F| Sig. | 1| Regression| 1. 274E13| 9| 1. 415E12| 19. 721| . 000a| | Residual| 8. 611E12| 120| 7. 176E10| | | | Total| 2. 135E13| 129| | | | a.

Predictors: (Constant), BASE_PRICE, SR_BL, SR_B, COUNTRY, ODI_WICKET, AGE_1, ODI_RUNS, RUNS_S, T_RUNS| b. Dependent Variable: SOLD_PRICE| | | | Coefficientsa| Model| Unstandardized Coefficients| Standardized Coefficients| t| Sig. | 95% Confidence Interval for B| | B| Std. Error| Beta| | | Lower Bound| Upper Bound| 1| (Constant)| -15451. 294| 92855. 275| | -. 166| . 868| -199298. 275| 168395. 688| | SR_B| 38. 111| 729. 828| . 003| . 052| . 958| -1406. 897| 1483. 119| | SR_BL| -149. 541| 1819. 943| -. 006| -. 082| . 935| -3752. 901| 3453. 819| | AGE_1| 207089. 34| 81757. 329| . 168| 2. 533| . 013| 45215. 512| 368963. 155| | COUNTRY| 220464. 530| 54256. 883| . 267| 4. 063| . 000| 113039. 678| 327889. 382| | T_RUNS| -60. 151| 17. 118| -. 489| -3. 514| . 001| -94. 044| -26. 258| | ODI_RUNS| 53. 932| 16. 258| . 475| 3. 317| . 001| 21. 742| 86. 122| | ODI_WICKET| 497. 937| 240. 782| . 136| 2. 068| . 041| 21. 206| 974. 668| | RUNS_S| 193. 412| 53. 528| . 293| 3. 613| . 000| 87. 430| 299. 393| | BASE_PRICE| 1. 443| . 178| . 543| 8. 101| . 000| 1. 090| 1. 795| a. Dependent Variable: SOLD_PRICE| | | | | |

Referring to the regression model data outcome as given in Q1 we can see that the Standardized coefficient value for the BASE_PRICE is the highest which is . 543 and hence we can conclude that the BASE_PRICE is the predictor which has the highest impact on the price of players. Q 8 How much should Mumbai Indians offer Sachin Tendulkar if they would like to retain him? Is the model sufficient to predict the price of Icon players? SOLD PRICE = -13366. 247 + 219850. 349(COUNTRY) + 204492. 531(AGE_1) -59. 957 (T_RUNS) + 53. 878 (ODI_RUNS) + 491. 636 (ODI_WICKET)