포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ STDDEV Oracle Function ]
STDDEV함수는 지정한 인수 expr의 표본 표준편차를 반환한다. 집계 함수와 분석함수로 이용할 수 있다. STDDEV_SAMP와의 차이점은 입력 데이터가 단지 1 행으로 구성된 경우 STDDEV는 Zero를 반환하고, STDDEV_SAMP는 Null을 반환한다.
- 함수 설명 : STDDEV 오라클 함수 링크
1. Oracle(오라클)
1) 집계 함수
급여의 표본 표준편차를 반환한다.
Oracle Programming |
SELECT STDDEV(sal) as STDDEV_VAR
FROM EMP
Results |
STDDEV_VAR
-------------------------------------------
1182.503223516271699458653359613061928508
2) 분석함수
개별 부서별 급여의 표본 표준편차를 반환한다.
Oracle Programming |
SELECT DEPTNO, EMPNO,
STDDEV(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV
FROM EMP;
Results |
DEPTNO EMPNO STDDEV
--------------------------------------------------------
10 7782 1893.629671644731419588549193663502608485
10 7934 1893.629671644731419588549193663502608485
10 7839 1893.629671644731419588549193663502608485
20 7902 1123.332096933048557320586159803793607308
20 7788 1123.332096933048557320586159803793607308
20 7566 1123.332096933048557320586159803793607308
20 7369 1123.332096933048557320586159803793607308
20 7876 1123.332096933048557320586159803793607308
30 7521 668.331255192114045025291283415053677345
30 7654 668.331255192114045025291283415053677345
30 7844 668.331255192114045025291283415053677345
30 7900 668.331255192114045025291283415053677345
30 7499 668.331255192114045025291283415053677345
30 7698 668.331255192114045025291283415053677345
2. Python Pandas(파이썬)
1) 집계 함수
- stdev() 함수
급여의 표본 표준편차를 반환한다.
Python Programming |
from statistics import stdev
stdev(emp['sal'])
Results |
1182.5032235162716
- std() 함수
개별 부서별 급여의 표본 표준편차를 반환한다.
Python Programming |
withmooc = emp.copy()
sal_STDDEV = withmooc.sort_values('sal',ascending = False).groupby('deptno')['sal'].std()
sal_STDDEV
Results |
deptno
10 1893.629672
20 1123.332097
30 668.331255
Name: sal, dtype: float64
2) 분석함수
- transform('std')
분석함수를 사용하여서 개별 부서별 급여의 표본 표준편차를 반환한다.
Python Programming |
withmooc = emp.copy()
withmooc['sal_STDDEV'] = emp.sort_values('sal', ascending=False).groupby('deptno')['sal'].transform('std')
withmooc.sort_values('deptno').head(7)
Results |
empno ename job mgr hiredate sal comm deptno sal_STDDEV
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 1893.629672
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 1893.629672
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1893.629672
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 1123.332097
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 1123.332097
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 1123.332097
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 1123.332097
3. R Programming (R Package)
1) 집계 함수
- sd() 함수
%%R
sd(emp$sal)
Results |
[1] 1182.503
2) 분석함수
- sd() 함수
%%R
withmooc <- emp
withmooc['sal_STDDEV'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { sd(x) }))
withmooc[order(withmooc$deptno),][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno sal_STDDEV
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1894.
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1894.
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1894.
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1123.
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 1123.
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1123.
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1123.
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 1123.
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 668.
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 668.
4. R Dplyr Package
1) 집계 함수
- sd() 함수
R Programming |
%%R
emp %>%
dplyr::summarize(sal_STDDEV = sd(sal) )
Results |
# A tibble: 1 x 1
sal_STDDEV
<dbl>
1 1183.
2) 분석함수
- sd() 함수
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::mutate(sal_STDDEV = sd(sal) ) %>%
dplyr::arrange(deptno) %>%
head()
Results |
# A tibble: 6 x 9
# Groups: deptno [2]
empno ename job mgr hiredate sal comm deptno sal_STDDEV
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1894.
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1894.
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1894.
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1123.
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 1123.
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1123.
5. R sqldf Package
1) 집계 함수
- stdev() 함수
R Programming |
%%R
sqldf(" SELECT stdev(sal) sal_STDDEV FROM emp ")
Results |
sal_STDDEV
1 1182.503
- stddev_samp() 함수
RH2 패키지에서 지원하는 stddev_samp() 함수를 사용한다.
R Programming |
%%R
library(RH2)
sqldf(" SELECT stddev_samp(sal) sal_STDDEV FROM emp ")
Results |
sal_STDDEV
1 1182.503
2) 분석함수
- stdev() 함수
현재 stdev에 대한 분석 함수를 지원하지 않아서 그룹별 급여의 표준편차를 계산 후 데이터를 Join 하였다.
R Programming |
%%R
sqldf(" select a.*,b.sal_STDDEV
from emp a
left join (SELECT deptno,stdev(sal) sal_STDDEV FROM emp group by deptno) b
on a.deptno = b.deptno
order by a.deptno")[1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_STDDEV
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1893.6297
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1893.6297
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1893.6297
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1123.3321
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 1123.3321
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1123.3321
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1123.3321
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 1123.3321
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 668.3313
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 668.3313
6. Python pandasql Package
- 현재 지원하지 않음.
7. R data.table Package
1) 집계 함수
- sd() 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, list(sal_STDDEV = sd(sal) ), by =deptno]
Results |
deptno sal_STDDEV
1: 20 1123.3321
2: 30 668.3313
3: 10 1893.6297
2) 분석함수
- sd() 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, b.sal_STDDEV := sd(sal), by = deptno][order(deptno),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno b.sal_STDDEV
1: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1893.6297
2: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1893.6297
3: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1893.6297
4: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1123.3321
5: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 1123.3321
6: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1123.3321
7: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1123.3321
8: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 1123.3321
9: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 668.3313
10: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 668.3313
- sd() 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, `:=` ( b.sal_STDDEV = sd(unlist(.SD)) ),
by = deptno,
.SDcols = c("sal")][order(deptno),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno b.sal_STDDEV
1: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1893.6297
2: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1893.6297
3: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1893.6297
4: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1123.3321
5: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 1123.3321
6: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1123.3321
7: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1123.3321
8: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 1123.3321
9: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 668.3313
10: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 668.3313
8. Python Duckdb의 SQL
1) 집계 함수
Python Programming |
%%sql
SELECT STDDEV(sal) as stdev_func FROM emp
Python Programming |
duckdb.sql(" SELECT STDDEV(sal) as stdev_func FROM emp ").df()
Results |
stdev_func
0 1182.503224
2) 분석 함수
Python Programming |
%%sql
SELECT DEPTNO, EMPNO,
STDDEV(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV
FROM emp
order by deptno
LIMIT 6
Python Programming |
duckdb.sql(" SELECT DEPTNO, EMPNO, \
STDDEV(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV \
FROM emp \
order by deptno \
LIMIT 6 ").df()
Results |
deptno empno STDDEV
0 10 7782 1893.629672
1 10 7839 1893.629672
2 10 7934 1893.629672
3 20 7369 1123.332097
4 20 7566 1123.332097
5 20 7788 1123.332097
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글