Daniel Code: 2015

31/12/2015

[Excel]IF Function

一、IF

作用：根据条件进行判断并返回不同的值。

示例：

1、如果A1单元格值大于100，显示“完成”，否则显示“未完成”

=IF(A1>100，"完成"，"未完成")

2、判断成绩

=IF(A1<60，"不及格"，IF(A1<80，"良好"，"优秀"))

3、如果A1的值大于60并且B1不为空，显示“已结束”否则显示“未结束”

=IF(AND(A1>60，B1<>"")，"已结束"，"未结束")

二、IFerror

作用：把错误值显示为指定的内容

示例：

1、如果A1/B1返回错误值，则显示空

=Iferror(a1/b1，""）

2、如果Vlookup查找不到值，则显示空

=Iferror(vlookup(省略)，"")

三、CountIF

作用：根据条件统计个数

示例：

1、计算A列有多少财务部的人员

=Countif(a：a，"财务部")

2、计算A列有多少个包括“北京”的公司名称

=Countif(a：a，"*北京*)

四、SumIF

作用：根据条件求和

1、统计C列大于1000的销售金额之和

=Sumif(c：c，">1000")

2、统计A列产品为“三星”所对应B列的销售个数

=Sumif(a：a，"三星"，b：b)

五、CountIFs

作用：多条件计数

示例：

1、公司1的人事部有多少人（A列公司名称，B列部门名称）

=COUNTIFS(A2：A11，"公司1"，B2：B11，"人事部")

2、工资在3000~4000之间的有多少人（D列为工资）

=COUNTIFS(D2：D11，">3000"，D2：D11，"<4000")

六、SumIFs

作用：多条件求和

示例：

1、公司1人事部工资和（A列公司名称，B列部门名称。D列工资）

=SUMIFS(D2：D11，A2：A11，"公司1"，B2：B11，"人事部")

2、A列为"电视"，B列包括34的C列数量之和

=Sumifs(C：C，A：A，"电视"，b：b，"*34*")

七、AverageIF

作用：根据条件计算平均数

示例：

1、计算C列单价小于10的平均单价。

=AVERAGEIF(C：C，"<10")

2、计算C产品的平均单价(A列为产品名称)

=AVERAGEIF(A：A，"C"，B：B)

八、AverageIFs

作用：多条件计算平均值

示例：

1、计算甲公司，经理级的平均工资（A列公司名，B列职称，C列工资金额）

=AVERAGEIFS(D2：D11，A2：A11，"甲公司"，B2：B11，"经理")

2、统计工资在4000~8000之间的平均工资

=AVERAGEIFS(D2：D11，D2：D11，">4000"，D2：D11，"<8000")

29/12/2015

[SAS]Proc Logistic

***Predicting the probability of getting into graduate school (admit=1) versus not getting in (admit=0)
(Predicting the probability of Y=1 vs Y=0)

proc logistic data=data.binary descending;

class rank / param=ref ;

model admit = gre gpa rank;

run;

Model Information
Data Set	DATA.BINARY	Written by SAS
Response Variable	ADMIT
Number of Response Levels	2
Model	binary logit
Optimization Technique	Fisher's scoring

Number of Observations Read	400
Number of Observations Used	400

Response Profile
Ordered Value	ADMIT	Total Frequency
1	1	127
2	0	273

Probability modeled is ADMIT=1.

Class Level Information
Class	Value	Design Variables
RANK	1	1	0	0
	2	0	1	0
	3	0	0	1
	4	0	0	0

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics
Criterion	Intercept Only	Intercept and Covariates
AIC	501.977	470.517
SC	505.968	494.466
-2 Log L	499.977	458.517

The portion of the output labeled Model Fit Statistics describes and tests the overall fit of the model. The -2 Log L (499.977) can be used in comparisons of nested models.

Testing Global Null Hypothesis: BETA=0
Test	Chi-Square	DF	Pr > ChiSq
Likelihood Ratio	41.4590	5	<.0001
Score	40.1603	5	<.0001
Wald	36.1390	5	<.0001

The likelihood ratio chi-square of 41.4590 with a p-value of 0.0001 tells us that our model as a whole fits significantly better than an empty model.
The Score and Wald tests are asymptotically equivalent tests of the same hypothesis tested by the likelihood ratio test, not surprisingly, these tests also indicate that the model is statistically significant.

Type 3 Analysis of Effects
Effect	DF	Wald Chi-Square	Pr > ChiSq
GRE	1	4.2842	0.0385
GPA	1	5.8714	0.0154
RANK	3	20.8949	0.0001

Analysis of Maximum Likelihood Estimates
Parameter		DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept		1	-5.5414	1.1381	23.7081	<.0001
GRE		1	0.00226	0.00109	4.2842	0.0385
GPA		1	0.8040	0.3318	5.8714	0.0154
RANK	1	1	1.5514	0.4178	13.7870	0.0002
RANK	2	1	0.8760	0.3667	5.7056	0.0169
RANK	3	1	0.2112	0.3929	0.2891	0.5908

This shows the coefficients (labeled Estimate), their standard errors (error), the Wald Chi-Square statistic, and associated p-values. The coefficients for gre, and gpa are statistically significant, as are the terms for rank=1 and rank=2 (versus the omitted category rank=4). The logistic regression coefficients give the change in the log odds of the outcome for a one unit increase in the predictor variable.

For every one unit change in gre, the log odds of admission (versus non-admission) increases by 0.002.
For a one unit increase in gpa, the log odds of being admitted to graduate school increases by 0.804.
The coefficients for the categories of rank have a slightly different interpretation. For example, having attended an undergraduate institution with arank of 1, versus an institution with a rank of 4, increases the log odds of admission by 1.55.

Odds Ratio Estimates
Effect	Point Estimate	95% Wald Confidence Limits
GRE	1.002	1.000	1.004
GPA	2.235	1.166	4.282
RANK 1 vs 4	4.718	2.080	10.701
RANK 2 vs 4	2.401	1.170	4.927
RANK 3 vs 4	1.235	0.572	2.668

This gives the coefficients as odds ratios. An odds ratio is the exponentiated coefficient, and can be interpreted as the multiplicative change in the odds for a one unit change in the predictor variable.
For a one unit increase in gpa, the odds of being admitted to graduate school (versus not being admitted) increase by a factor of 2.24.

Association of Predicted Probabilities and Observed Responses
Percent Concordant	69.1	Somers' D	0.386
Percent Discordant	30.6	Gamma	0.387
Percent Tied	0.3	Tau-a	0.168
Pairs	34671	c	0.693

Reference:

http://www.ats.ucla.edu/stat/sas/library/logistic.pdf

http://www.ats.ucla.edu/stat/sas/output/sas_logit_output.htm

http://www.ats.ucla.edu/stat/sas/seminars/sas_logistic/logistic_prog.sas.txt

26/12/2015

[Excel]Notes for Excel 2010 Expert - Applying formula and function

=sum(A1:B1)
=sum(A1:B1,A2:B2) *two range
=sum(A1:B1 A1:A2) *only the overlap cell

press Ctrl many cell and input text then Ctrl + Enter to multi-input

=index(array,row_num,column_num)
=match(lookup_value,lookup_array,match_type)

=rank.eq(num,ref,order)
=mode.sngl(num1,num2...)

=choose(index_num,vaalue1,value2) *set 1 to text1 and 2 to text2

=text(value,format_text) * turn num to string

19/12/2015

[R]Basic syntax

Declare variable
x<-2
x<-y<-7

Remove variable
rm(x)

Check data type: numeric/character/date/Logical
class(x)
is.numeric(x)

Assign integer
x<-5L
is.integer(x)

15/12/2015

[SAS]Change the Work location to D:\ drive

Change Target to:
"C:\Program Files\SAS Institute\SAS\V8\sas.exe" -CONFIG "C:\Program Files\SAS Institute\SAS\V8\SASV8.CFG" -work "D:\temp"

Run below code to check:

proc options option=work;
run;

03/10/2015

[DOS command]Menu & Error checking

@echo off
CLS
TITLE Daniel Application Menu

echo.
echo ***************************************
echo * *
echo * Daniel Application Menu *
echo * *
echo ***************************************
echo.
echo 1 - Datamart and File Status Checking
echo 2 - Log Error Checking
echo.
echo.

::command to call different bat
set /p "cmd=Enter the command [Q > Exit]:"
IF %cmd% EQU 1 (
call "F:\project_DOS command\batch - Status checking.bat"
) ELSE (
IF %cmd% EQU 2 (
call "F:\project_DOS command\batch - Log error checking.bat"
) ELSE (
IF %cmd% EQU Q (
Exit
) ELSE (
echo Wrong Command
)))
pause

________________________________________________________________

@echo off
CLS
Title Error checking

call:checkerror "F:\project_DOS command\error(201509).log" test

::report layout
echo.
echo ***********************************
echo * *
echo * Error Checking *
echo * *
echo ***********************************
echo.
echo file name Error count
echo _____________________________________________
echo error(201509).txt %error_cnt_test%
echo.
echo.
::report layout

::command to call different bat
set /p "cmd=Enter the (0 to menu ; Q to Quit):
IF %cmd% EQU 0 (
call "F:\project_DOS command\Menu.bat"
) ELSE (
IF %cmd% EQU Q (
Exit
) ELSE (
call "F:\project_DOS command\batch - Log error checking.bat"
))
pause

::Funciton Declare
:checkerror
set "file=%~1"
set /a cnt=0
for /f %%a in ('type "%file%"^|find /I /c "error"') do set /a error_cnt_%2=%%a

27/09/2015

[D3.js]Bar chart

[D3.js]SVG code

26/09/2015

[D3.js]HTML +CSS +Javascript +D3 Basic coding

<!DOCTYPE html>
<html lang="en">
<head>

<meta charset="utf-8">
<script type="text/javascript" src="d3/d3.js">
</script>

<title></title>


<link rel="stylesheet" type="text/css" href="mystyle.css">

</head>

<body>


<script type="text/javascript" src="myscript.js"></script>

</body>
</html>

13/09/2015

[SAS]Email with SAS

reference: http://www2.sas.com/proceedings/sugi29/178-29.pdf

06/09/2015

[SAS]Create dummy variables

data b;
set a;
dummuy_a = (a="Y");
/*if a = Y then dummy_a = 1 else dummy_a =0*/
run;

[SAS]Sampling - Proc Surveyselect V.S. data step

Suppose you wish to select a random sample from a large SAS dataset. No problem. The PROC SURVEYSELECT step below randomly selects a 2 percent sample:

proc surveyselect data=large out=sample
  method=srs   /* simple random sample */
  n=1000000;  /* sample size */
run;

Do you have a SAS/STAT license? If not, or if you seek maximum efficiency, keep reading.

I benchmarked the above program against a SAS dataset with 50 million rows and 3 columns, running SAS 9.4 on my 64-bit Windows 7 machine.

Time: 2.8 seconds

Is the DATA step more efficient? It might be, if you know some tricks. Let's look at a few.

The DATA step below selects a random sample of approximately 1 million rows.

899  data sample;
900     set large;
901     if rand('uniform') < .02;
902  run;
 
NOTE: There were 50000000 observations read from the data set WORK.LARGE.
NOTE: The data set WORK.SAMPLE has 1001125 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           3.31 seconds
      cpu time            3.29 seconds

The RAND function generates random numbers between 0 and 1. The UNIFORM argument means that every value between 0 and 1 has an equal chance of being generated. The subsetting IF statement selects a random sample of approximately 2 percent.

Time: 3.3 seconds (Winner: PROC SURVEYSELECT, but keep reading)

The SET statement above reads all 50 million rows by default, which are then filtered by the subsetting IF statement. We can, however, change the SET statement to read only 1 million rows, bypassing the other 49 million, for a huge performance improvement, like this:

data sample;
   set large (obs=1000000);
run;

Time: .1 second

This, of course, is cheating, because the DATA step above reads the first million rows, which is not a random sample.

The DATA step below, like the one above, also reads only 1 million rows, bypassing the other 49 million, for very fast performance. This time, instead of reading the first million rows, the SET statement reads every 50th row.

data sample;
   do RowNum=50 to 50000000 by 50;
   set large point=RowNum;
   output;
   end;
   stop;
run;

The POINT= option designates a numeric variable whose value represents a row number. The SET statement jumps directly to each designated row, and reads that row, bypassing intervening rows.

The selected sample may be more representative than the first million rows, but it's still not random.

Time: .9 second

The DATA step below, like the two above, also reads only 1 million rows, bypassing the other 49 million. This time, however, the datastep selects a random sample, comparable to PROC SURVEYSELECT.

data sample;
 
   drop Need Pool Ratio Row;
 
   Need= 1000000;  * Desired sample size;
   Pool=50000000;  * Number of rows in dataset;
 
   do until (Need=0);         * Continue until no additional sample rows needed;
 
      Row+1;   * Row number. Every row in the pool is a candidate           
                 for possible inclusion in the sample;
 
      Ratio=Need/Pool;          * Initial ratio is .02 or 2 percent.  
  Ratio is continuoulsy adjusted as necessary;
 
if rand('uniform') < ratio then do; * Expression will be true for 
    exactly 2 percent of rows;
 
        set large point=Row;     * Jump to desired row. Read desired row;
     output;       * Output desired row;
     Need=Need-1;      * Because a row was selected, subtract 1  
  from remaining rows needed.;
  end;
 
      Pool=Pool-1;       * Subtract 1 from pool, regardless of       
  whether a row was selected or not;
 
   end;
 
   stop;
 
run;

Method Time Advantage

PROC SURVEYSELECT 2.8 seconds ease of use

DATA step 2.2 seconds speed

source from http://blogs.sas.com/content/sastraining/

[D3.js]Simple Bar Chart

<!doctype html>
<html>
<head>

<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.6/d3.min.js" charset="utf-8"></script>

</head>
<body>

<script>

var dataArray =[20, 40, 100];

var canvas = d3.select("body")
.append("svg")
.attr("width", 500)
.attr("height", 500);

var bars = canvas.selectAll("rect")
.data(dataArray)
.enter()
.append("rect")
.attr("width", function(d){return d;})
.attr("height", 50)
.attr("y", function(d,i){return i*100;});

</script>

</body>

</html>

05/09/2015

[D3.js]First Class from youyube learning

D3 = Data-Driven Documents

<!doctype html>
<html>

<head>
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.6/d3.min.js" charset="utf-8"></script>
</head>

<body>
<p> this is a paragraph</p>

<script>
d3.select("body")
.append("p")
.style("color","red")
.text("Hi");

console.log(d3)
</script>

</body>
</html>

## using notepad++

SVG shapes
var canvas = d3.select("body")
.append("svg")
.attr("width",500)
.attr("height",500);

var circle = canvas.append("circle")
.attr("cx",250)
.attr("cy",250)
.attr("r",250)
.attr("fill","red")

30/08/2015

[Data Visualization]Colour Template

http://loading.io/color/random/

http://colorschemedesigner.com/csd-3.5/

03/05/2015

[R]Beginning

Package
library(aod)

library(ggplot2)

Import
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")

View the first few rows of the data

head(mydata)

Generate simple statistic summary
summary(mydata)

20/02/2015

[SAS]SAS variables

X1-Xn
X:
X--A
X_numeric_A
X_character_A
_numeric_
_character_
_all_
_N_
_error_