본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[데이터 전처리- 문자함수 예제] 문자 대문자 변환 - 63

by 기서무나구물 2021. 8. 24.

포스팅 목차

    63. Display the names of employees in Upper case.

     

    * 직원의 이름을 대문자로 변경하여 출력하시오.


    • Oracle : Upper 함수
    • 파이썬 Pandas : upper()
    • R 프로그래밍 : lapply(), toupper 함수
    • R Dplyr Package : mutate(), toupper 함수
    • R sqldf Package : upper 함수
    • Python pandasql Package : upper 함수
    • R data.table Package : stringr::str_to_upper()
    • SAS Proc SQL : Upper 함수
    • SAS Data Step : upcase() 함수
    • Python Dfply Package : .str.upper()
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    • Upper 함수를 사용하여서 직원 이름을 대문자로 변환한다.
    Oracle Programming
    select ename,
           upper(ename) ename_upper
    from   emp;

     


    2. Python Pandas(파이썬)

    • lower 함수는 직원명을 소문자로 변환하고, upper() 함수는 대문자로 변환한다.
    Python Programming
    import copy
    
    withmooc = copy.copy(emp)
    withmooc
    
    withmooc['ename'] = withmooc['ename'].str.lower()
    display(withmooc.head())
    print()
    
    withmooc['ename'] = withmooc['ename'].str.upper()
    withmooc.head()

     

    Results
      empno ename job mgr hiredate sal comm deptno
    0 7369 smith CLERK 7902.0 1980/12/17 800 NaN 20
    1 7499 allen SALESMAN 7698.0 1981/02/20 1600 300.0 30
    2 7521 ward SALESMAN 7698.0 1981/02/22 1250 500.0 30
    3 7566 jones MANAGER 7839.0 1981/04/02 2975 NaN 20
    4 7654 martin SALESMAN 7698.0 1981/09/28 1250 1400.0 30

     

    Results
      empno ename job mgr hiredate sal comm deptno
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30

     


    3. R Programming (R Package)

    tolower 함수는 직원명을 소문자로 변환하고, toupper 함수는 대문자로 변환한다.

    R Programming
    %%R
    
    withmooc <- emp
    
    withmooc['ename'] = lapply(withmooc['ename'],tolower)
    print(head(withmooc))
    
    withmooc['ename'] = lapply(withmooc['ename'],toupper)
    head(withmooc)

     

    Results
    # A tibble: 6 x 8
      empno ename  job        mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7369 smith  CLERK     7902 1980-12-17   800    NA     20
    2  7499 allen  SALESMAN  7698 1981-02-20  1600   300     30
    3  7521 ward   SALESMAN  7698 1981-02-22  1250   500     30
    4  7566 jones  MANAGER   7839 1981-04-02  2975    NA     20
    5  7654 martin SALESMAN  7698 1981-09-28  1250  1400     30
    6  7698 blake  MANAGER   7839 1981-03-01  2850    NA     30
    # A tibble: 6 x 8
      empno ename  job        mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30

     


    4. R Dplyr Package

    tolower 함수는 직원명을 소문자로 변환하고, toupper 함수는 대문자로 변환한다. mutate() 함수를 사용하여서 신규 변수를 생성한다.

    R Programming
    %%R
    
    emp %>% 
      dplyr::mutate(ename_lower = tolower(ename)) %>%
      dplyr::mutate(ename_upper = toupper(ename)) %>%
      head()

     

    Results
    # A tibble: 6 x 10
      empno ename  job        mgr hiredate     sal  comm deptno ename_lower ename_upper
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl> <chr>       <chr>      
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20 smith       SMITH      
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30 allen       ALLEN      
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30 ward        WARD       
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20 jones       JONES      
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30 martin      MARTIN     
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30 blake       BLAKE      

     


    5. R sqldf Package

    Upper 함수를 사용하여서 직원 이름을 대문자로 변환한다.

    R Programming
    %%R
    
    sqldf("select upper(ename) ename_upper from emp") %>% head()

     

    Results
      ename_upper
    1       SMITH
    2       ALLEN
    3        WARD
    4       JONES
    5      MARTIN
    6       BLAKE

     


    6. Python pandasql Package

    Upper 함수를 사용하여서 직원 이름을 대문자로 변환한다.

    Python Programming
    ps.sqldf("select upper(ename) ename_upper from emp").head()

     

    Results
      ename_upper
    0 SMITH
    1 ALLEN
    2 WARD
    3 JONES
    4 MARTIN

     


    7. R data.table Package

    stringr::str_to_lower 함수는 직원명을 소문자로 변환하고, stringr::str_to_upper 함수는 대문자로 변환한다.

    R Programming
    %%R
    
    DT <- data.table(emp)
    DT[, ':='( ename_lower = stringr::str_to_lower(ename),
               ename_upper = stringr::str_to_upper(ename) ) ][1:5,]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno ename_lower ename_upper
    1:  7369  SMITH    CLERK 7902 1980-12-17  800   NA     20       smith       SMITH
    2:  7499  ALLEN SALESMAN 7698 1981-02-20 1600  300     30       allen       ALLEN
    3:  7521   WARD SALESMAN 7698 1981-02-22 1250  500     30        ward        WARD
    4:  7566  JONES  MANAGER 7839 1981-04-02 2975   NA     20       jones       JONES
    5:  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30      martin      MARTIN

     


    8. SAS Proc SQL

    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select upper(ename) as ename_upper
        from   emp;
    QUIT;
    PROC PRINT data=STATSAS_1(obs=3);RUN;

     

    Results
    OBS ename_upper
    1 SMITH
    2 ALLEN
    3 WARD

     


    9. SAS Data Step

    SAS Programming
    %%SAS sas
    
    DATA STATSAS_2; 
     SET emp;
         ename_lowcase = lowcase(ename);
         ename_upper = upcase(ename_lowcase);
         keep ename ename_lowcase ename_upper;
    RUN;
    
    PROC PRINT data=STATSAS_2(obs=3);RUN;

     

    Results
    OBS ename ename_lowcase ename_upper
    1 SMITH smith SMITH
    2 ALLEN allen ALLEN
    3 WARD ward WARD

     


    10. Python Dfply Package

    Python Programming
    emp >> mutate(ename_lower = X.ename.str.lower(), 
                  ename_upper = X.ename.str.upper() ) >> \
      head()

     

    Results
      empno ename job mgr hiredate sal comm deptno ename_lower ename_upper
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 smith SMITH
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 allen ALLEN
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 ward WARD
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 jones JONES
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 martin MARTIN

     


    [ 다양한 데이터 전처리 방법 목록 링크] SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE

    반응형

    댓글