본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[데이터 전처리- 문자함수 예제] 코드테이블 결합과 조건문 - 73

by 기서무나구물 2021. 8. 30.

포스팅 목차

    73. Display empno, ename, deptno from EMP table. Instead of display department numbers display the related department name (use decode function).

     

    * 부서번호에 해당하는 부서명을 출력하시오. (decode 함수, case when)


    • Oracle : 내부조인(Inner join), DECODE 구문, Case When 구문
    • 파이썬 Pandas : pd.merge(), np.select(), If 조건문 사용자 정의 함수, apply, lambda 함수, 딕셔너리 Map, IF 조건문
    • R 프로그래밍 : merge(), function(x), case_when(), ifelse()
    • R Dplyr Package : dplyr::left_join(), dplyr::select(), mutate(), dplyr::case_when() 조건문
    • R sqldf Package : 내부조인(Inner join), DECODE 구문, Case When 구문
    • Python pandasql Package : 내부조인(Inner join), DECODE 구문, Case When 구문
    • R data.table Package : dplyr::case_when() 조건문,
    • SAS Proc SQL : 내부조인(Inner join), Case When 구문
    • SAS Data Step : merge 구문, IF 조건문
    • Python Dfply Package : left_join 구문
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    • deptno 테이블을 결합하여서 부서코드에 해당하는 부서명을 출력한다.
    Oracle Programming
    select e.empno, e.ename, d.dname 
    from   emp e,
           dept d 
    where  e.deptno = d.deptno

     


    decode 함수를 사용하여서 부서코드에 해당하는 부서명을 작성 후 검색하여서 해당하는 부서명 출력한다.

    Oracle Programming
    SELECT E.EMPNO,E.ENAME,DECODE(E.DEPTNO,10,'ACCOUNTING',20,'RESEARCH',30,'SALES',40,'OPERATIONS') dname 
    FROM   EMP E;

     


    2. Python Pandas(파이썬)

    • deptno 테이블을 결합하여서 부서코드에 해당하는 부서명을 출력한다.
    Python Programming
    pd.merge(emp, dept, left_on='deptno', right_on='deptno', how='left')[["empno","ename","dname"]].head()

     

    Results
      empno ename dname
    0 7369 SMITH RESEARCH
    1 7499 ALLEN SALES
    2 7521 WARD SALES
    3 7566 JONES RESEARCH
    4 7654 MARTIN SALES

    np.select 함수를 사용하여서 부서코드에 해당하는 부서명을 출력한다. 조건을 비교하여서 대응되는 값을 선택하여 출력한다.

    Python Programming
    import copy
    
    withmooc = copy.copy(emp)
    withmooc
    
    withmooc['dname'] = np.select( [ withmooc['deptno'] == 10,
                                     withmooc['deptno'] == 20,
                                     withmooc['deptno'] == 30,
                                     withmooc['deptno'] == 40 ],
                                   ['ACCOUNTING','RESEARCH','SALES','OPERATIONS'] )
    withmooc.head()

     

    Results
      empno ename job mgr hiredate sal comm deptno dname
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 RESEARCH
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 SALES
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 SALES
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 RESEARCH
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 SALES

     


    조건문으로 구성된 함수를 작성하여서 부서코드에 해당하는 부서명을 출력한다.

    Python Programming
    import copy
    
    withmooc = copy.copy(emp)
    withmooc
    
    def get_category(job):
        if   job == 10 : cat = 'ACCOUNTING'
        elif job == 20 : cat = 'RESEARCH'
        elif job == 30 : cat = 'SALES'
        elif job == 40 : cat = 'OPERATIONS'
        else : cat = 'Etc'
    
        return cat
    
    withmooc['dname'] = withmooc['deptno'].apply(lambda x : get_category(x))
    withmooc.head()

     

    Results
      empno ename job mgr hiredate sal comm deptno dname
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 RESEARCH
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 SALES
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 SALES
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 RESEARCH
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 SALES

     


    부서코드와 대응되는 부서명으로 구성된 딕셔너리를 1차적으로 생성하고, map 함수를 사용하여서 딕셔너리에서 대응되는 부서명을 검색한다.

    Python Programming
    import copy
    
    withmooc = copy.copy(emp)
    withmooc
    
    # 부서코드와 대응되는 부서명으로 구성된 딕셔너리
    map_dictionary ={10 : "ACCOUNTING", 20 :"RESEARCH", 30 : "SALES",40:"OPERATIONS"} 
    
    withmooc['dname'] = withmooc['deptno'].map(map_dictionary) 
    withmooc.head()

     

    Results
      empno ename job mgr hiredate sal comm deptno dname
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 RESEARCH
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 SALES
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 SALES
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 RESEARCH
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 SALES

     


    IF ~ ELSE 조건문을 사용하여서 부서코드에 해당하는 부서명을 출력한다.

    Python Programming
    withmooc['deptno'] =  withmooc['deptno'].apply(lambda x : 'ACCOUNTING' if x == 10 else ('RESEARCH' if x == 20 else ('SALES' if x == 30 else ('OPERATIONS' if x == 40 else 'ETC'))) ) 
    
    withmooc.head()

     

    Results
      empno ename job mgr hiredate sal comm deptno
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN RESEARCH
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 SALES
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 SALES
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN RESEARCH
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 SALES

     


    3. R Programming (R Package)

    deptno 테이블을 결합하여서 부서코드에 해당하는 부서명을 출력한다.

    R Programming
    %%R
    
    merge(emp,dept,by.x=c("deptno"),
                   by.y=c("deptno"),all.x=T)[c("empno","ename","deptno","dname")][1:7, ]

     

    Results
      empno  ename deptno      dname
    1  7782  CLARK     10 ACCOUNTING
    2  7839   KING     10 ACCOUNTING
    3  7934 MILLER     10 ACCOUNTING
    4  7369  SMITH     20   RESEARCH
    5  7876  ADAMS     20   RESEARCH
    6  7566  JONES     20   RESEARCH
    7  7902   FORD     20   RESEARCH

     


    • case_when 조건문을 사용하여서 부서코드에 해당하는 부서명을 출력한다.
    R Programming
    %%R
    
    withmooc <- emp
    withmooc['dname'] = lapply(withmooc['deptno'], function(x) case_when( x == 10 ~ "ACCOUNTING",
                                                                          x == 20 ~ "RESEARCH",
                                                                          x == 30 ~ "SALES",
                                                                          x == 40 ~ "OPERATIONS",) ) 
    
    head(withmooc)

     

    Results
    # A tibble: 6 x 9
      empno ename  job        mgr hiredate     sal  comm deptno dname   
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl> <chr>   
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20 RESEARCH
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30 SALES   
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30 SALES   
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20 RESEARCH
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30 SALES   
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30 SALES   

     


    • ifelse 조건문을 사용하여서 부서코드에 해당하는 부서명을 출력한다.
    R Programming
    %%R
    withmooc <- emp
    
    withmooc['dname'] = ifelse(withmooc$deptno == 10 ,'ACCOUNTING',
                               ifelse(withmooc$deptno == 20, 'RESEARCH',
                                      ifelse(withmooc$deptno == 30, 'SALES',
                                             ifelse(withmooc$deptno == 40, 'OPERATIONS',
                                                    'ETC'))))
    head(withmooc)

     

    Results
    # A tibble: 6 x 9
      empno ename  job        mgr hiredate     sal  comm deptno dname   
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl> <chr>   
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20 RESEARCH
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30 SALES   
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30 SALES   
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20 RESEARCH
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30 SALES   
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30 SALES   

     


    4. R Dplyr Package

    deptno 테이블을 결합(Left Join)하여서 부서코드에 해당하는 부서명을 출력한다.

    R Programming
    %%R
    
    emp %>% 
      dplyr::left_join(dept, by = c('deptno' = "deptno")) %>%
      dplyr::select(empno, ename,deptno, dname) %>%
      head()

     

    Results
    # A tibble: 6 x 4
      empno ename  deptno dname   
      <dbl> <chr>   <dbl> <chr>   
    1  7369 SMITH      20 RESEARCH
    2  7499 ALLEN      30 SALES   
    3  7521 WARD       30 SALES   
    4  7566 JONES      20 RESEARCH
    5  7654 MARTIN     30 SALES   
    6  7698 BLAKE      30 SALES   

     


    case_when 조건문을 사용하여서 부서코드에 해당하는 부서명을 출력한다.

    R Programming
    %%R
    
    emp %>% 
      mutate(dname = dplyr::case_when(  deptno == 10 ~ "ACCOUNTING",
                                        deptno == 20 ~ "RESEARCH",
                                        deptno == 30 ~ "SALES",
                                        deptno == 40 ~ "OPERATIONS",
                                        TRUE ~ "ETC"
                                     )
             ) %>%
      head()

     

    Results
    # A tibble: 6 x 9
      empno ename  job        mgr hiredate     sal  comm deptno dname   
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl> <chr>   
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20 RESEARCH
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30 SALES   
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30 SALES   
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20 RESEARCH
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30 SALES   
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30 SALES   

     


    5. R sqldf Package

    deptno 테이블을 결합하여서 부서코드에 해당하는 부서명을 출력한다.

    R Programming
    %%R
    
    sqldf(" select e.empno, e.ename, d.dname 
            from   emp e,dept d 
            where  e.deptno = d.deptno") %>% head()

     

    Results
      empno  ename    dname
    1  7369  SMITH RESEARCH
    2  7499  ALLEN    SALES
    3  7521   WARD    SALES
    4  7566  JONES RESEARCH
    5  7654 MARTIN    SALES
    6  7698  BLAKE    SALES

     


    • Case When 조건문을 사용하여서 부서코드에 해당하는 부서명을 출력한다.
    R Programming
    %%R
    
    sqldf("select E.EMPNO,E.ENAME,                                    \
                  case when E.DEPTNO = 10 then 'ACCOUNTING'           \
                       when E.DEPTNO = 20 then 'RESEARCH'             \
                       when E.DEPTNO = 30 then 'SALES'                \
                       when E.DEPTNO = 40 then 'OPERATIONS' end dname \
           from emp E") %>% head()

     

    Results
      empno  ename    dname
    1  7369  SMITH RESEARCH
    2  7499  ALLEN    SALES
    3  7521   WARD    SALES
    4  7566  JONES RESEARCH
    5  7654 MARTIN    SALES
    6  7698  BLAKE    SALES

     


    6. Python pandasql Package

    • deptno 테이블을 결합하여서 부서코드에 해당하는 부서명을 출력한다.
    Python Programming
    ps.sqldf("select e.empno, e.ename, d.dname \
              from emp e inner join dept d on  e.deptno = d.deptno;").head()

     

    Results
      empno ename dname
    0 7369 SMITH RESEARCH
    1 7499 ALLEN SALES
    2 7521 WARD SALES
    3 7566 JONES RESEARCH
    4 7654 MARTIN SALES

     


    Python Programming
    ps.sqldf("SELECT E.EMPNO,E.ENAME, \
                     CASE WHEN E.DEPTNO = 10 THEN 'ACCOUNTING'           \
                          WHEN E.DEPTNO = 20 THEN 'RESEARCH'             \
                          WHEN E.DEPTNO = 30 THEN 'SALES'                \
                          WHEN E.DEPTNO = 40 THEN 'OPERATIONS' END dname \
              from emp E").head()

     

    Results
      empno ename dname
    0 7369 SMITH RESEARCH
    1 7499 ALLEN SALES
    2 7521 WARD SALES
    3 7566 JONES RESEARCH
    4 7654 MARTIN SALES

     


    7. R data.table Package

    • Case_when 조건문을 사용하여서 부서코드에 해당하는 부서명을 출력한다.
    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[,ename_str := dplyr::case_when( deptno == 10 ~ "ACCOUNTING",
                                        deptno == 20 ~ "RESEARCH",
                                        deptno == 30 ~ "SALES",
                                        deptno == 40 ~ "OPERATIONS",
                                        TRUE ~ "ETC"
      )][1:7, ]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno  ename_str
    1:  7369  SMITH    CLERK 7902 1980-12-17  800   NA     20   RESEARCH
    2:  7499  ALLEN SALESMAN 7698 1981-02-20 1600  300     30      SALES
    3:  7521   WARD SALESMAN 7698 1981-02-22 1250  500     30      SALES
    4:  7566  JONES  MANAGER 7839 1981-04-02 2975   NA     20   RESEARCH
    5:  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30      SALES
    6:  7698  BLAKE  MANAGER 7839 1981-03-01 2850   NA     30      SALES
    7:  7782  CLARK  MANAGER 7839 1981-01-09 2450   NA     10 ACCOUNTING

     


    8. SAS Proc SQL

    • deptno 테이블을 결합하여서 부서코드에 해당하는 부서명을 출력한다.
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select e.empno, e.ename, d.dname 
        from   emp e, dept d 
        where  e.deptno = d.deptno;
    QUIT;
    PROC PRINT data=STATSAS_1(obs=3);RUN;

     

    Results
    OBS empno ename dname
    1 7369 SMITH RESEARCH
    2 7499 ALLEN SALES
    3 7521 WARD SALES

     


    9. SAS Data Step

    • deptno 테이블을 결합(Merge)하여서 부서코드에 해당하는 부서명을 출력한다.
    SAS Programming
    %%SAS sas
    
    proc sort data=emp out=emp_1;
         by deptno;
    run;
    
    proc sort data=dept out=dept_1;
         by deptno;
    run;
    
    DATA STATSAS_2; 
     merge emp_1(in=a) dept_1(in=b);
         by deptno;
         if a;
         keep ename empno dname;
    RUN;
    
    PROC PRINT data=STATSAS_2(obs=3);RUN;

     

    Results
    OBS empno ename dname
    1 7782 CLARK ACCOUNTING
    2 7839 KING ACCOUNTING
    3 7934 MILLER ACCOUNTING

     


    10. Python Dfply Package

     

    Python Programming
    emp >> left_join(dept, by=['deptno']) >> select(X.empno, X.ename, X.deptno, X.dname) >> head()

     

    Results
      empno ename deptno dname
    0 7369 SMITH 20 RESEARCH
    1 7499 ALLEN 30 SALES
    2 7521 WARD 30 SALES
    3 7566 JONES 20 RESEARCH
    4 7654 MARTIN 30 SALES

     

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

    반응형

    댓글