포스팅 목차
144. List all employees with hire date in the format ‘June 4 1988’?
* 직원들의 입사일자를 월 일 년(‘June 4 1988’) 형식으로 출력하시오.
- 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
- Month Name : abbreviated month(“Jan”), unabbreviated month(“January”)
- [날짜 포맷] 날짜 출력 포맷 지정
- 데이터 전처리 - SQL, Pandas, R Prog, Dplyr, Dfply, SQLDF, PANDASQL, DATA.TABLE, SAS, Proc Sql
|
1. Oracle(오라클)
to_char() 함수에 출력형식을 지정하여서 입사일자를 출력한다.
Oracle Programming |
select empno,ename,hiredate,
to_char(hiredate,'month dd yyyy') month_name
from emp;
2. Python Pandas(파이썬)
to_datetime()함수를 사용하여 문자형(object)으로 할당되어 있는 입사일자를 일자시간타입(datetime64)으로 변환 후 strftime() 함수를 사용하여 출력형식(영문월 일 년)을 지정하여 출력한다.
Python Programming |
pd.to_datetime(emp['hiredate']).dt.strftime('%B %d %Y').head()
Results |
0 December 17 1980
1 February 20 1981
2 February 22 1981
3 April 02 1981
4 September 28 1981
Name: hiredate, dtype: object
to_datetime()함수를 사용하여 문자형(object)으로 할당되어 있는 입사일자를 일자시간타입(datetime64)으로 변환 후 apply() 함수 내에서 strftime() 함수에 출력형식(영문월 일 년)을 지정하여 출력한다.
Python Programming |
pd.to_datetime(emp['hiredate']).apply(lambda x: x.strftime('%B %d %Y')).head()
Results |
0 December 17 1980
1 February 20 1981
2 February 22 1981
3 April 02 1981
4 September 28 1981
Name: hiredate, dtype: object
[참고] 변수 타입 확인하기
Python Programming |
from pandas.api.types import is_string_dtype
emp.info()
type(emp['hiredate'])
emp.dtypes
is_string_dtype(emp['hiredate'])
Results |
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 empno 14 non-null int64
1 ename 14 non-null object
2 job 14 non-null object
3 mgr 13 non-null float64
4 hiredate 14 non-null object
5 sal 14 non-null int64
6 comm 4 non-null float64
7 deptno 14 non-null int64
dtypes: float64(2), int64(3), object(3)
memory usage: 1.0+ KB
True
3. R Programming (R Package)
날짜 관련 로케일(locale)을 영문으로 지정하여서 format() 함수 내에서 출력형식(영문월 일 년)을 지정하여 출력한다. 월에 대하여 영문이름으로 출력을 위해 지정하였던 옵션을 초기화 하기 위하여 작업을 완료 후 “LC_TIME” 에 설정하였던 값을 제거하여 기본 시스템에서 제공하는 값이 사용된다.
- Colab에서는 month name이 기본으로 출력 됨.
R Programming |
%%R
class(withmooc$hiredate)
Results |
[1] "Date"
R Programming |
%%R
Sys.setlocale("LC_TIME","English")
withmooc <- emp
withmooc['Month_name'] <- format(as.Date(withmooc$hiredate),'%b %d %Y')
print(withmooc[1:7, -c(3,4) ])
Sys.setlocale("LC_TIME","")
Results |
# A tibble: 7 x 8
empno ename hiredate sal comm deptno salary Month_name
<dbl> <chr> <date> <dbl> <dbl> <dbl> <chr> <chr>
1 7369 SMITH 1980-12-17 800 NA 20 $920 Dec 17 1980
2 7499 ALLEN 1981-02-20 1600 300 30 $1,840 Feb 20 1981
3 7521 WARD 1981-02-22 1250 500 30 $1,438 Feb 22 1981
4 7566 JONES 1981-04-02 2975 NA 20 $3,421 Apr 02 1981
5 7654 MARTIN 1981-09-28 1250 1400 30 $1,438 Sep 28 1981
6 7698 BLAKE 1981-03-01 2850 NA 30 $3,278 Mar 01 1981
7 7782 CLARK 1981-01-09 2450 NA 10 $2,818 Jan 09 1981
[1] "Korean_Korea.949"
날짜 관련 로케일(locale)을 영문으로 지정한 후 as.character() 함수 내에서 출력형식(‘영문월 일 년’)을 지정하여 출력한다. 월에 대하여 영문 약칭명으로 출력하기 위해 지정하였던 옵션을 초기화 하기 위하여 작업을 완료 후 “LC_TIME” 에 설정하였던 값을 제거하여 시스템에서 제공하는 기본 값으로 다시 변경된다.
R Programming |
%%R
Sys.setlocale("LC_TIME","English")
withmooc <- emp
withmooc['Month_name'] <- as.character(withmooc$hiredate, format = '%b %d %Y')
print(withmooc[1:7, -c(3,4) ])
Sys.setlocale("LC_TIME","")
Results |
# A tibble: 7 x 8
empno ename hiredate sal comm deptno salary Month_name
<dbl> <chr> <date> <dbl> <dbl> <dbl> <chr> <chr>
1 7369 SMITH 1980-12-17 800 NA 20 $920 Dec 17 1980
2 7499 ALLEN 1981-02-20 1600 300 30 $1,840 Feb 20 1981
3 7521 WARD 1981-02-22 1250 500 30 $1,438 Feb 22 1981
4 7566 JONES 1981-04-02 2975 NA 20 $3,421 Apr 02 1981
5 7654 MARTIN 1981-09-28 1250 1400 30 $1,438 Sep 28 1981
6 7698 BLAKE 1981-03-01 2850 NA 30 $3,278 Mar 01 1981
7 7782 CLARK 1981-01-09 2450 NA 10 $2,818 Jan 09 1981
[1] "Korean_Korea.949"
4. R Dplyr Package
날짜 관련 로케일(locale)을 영문으로 지정한 후 as.character() 함수 내에서 출력형식(‘영문월약칭명 일 년’)을 지정하여 출력한다. 월에 대하여 영문 약칭명으로 출력하기 위해 지정하였던 옵션을 초기화 하기 위하여 작업을 완료 후 “LC_TIME” 에 설정하였던 값을 제거하여 시스템에서 제공하는 기본 값으로 다시 변경된다.
R Programming |
%%R
Sys.setlocale("LC_TIME","English")
emp %>%
dplyr::mutate(Month_name = as.character(hiredate, format = '%b %d %Y') ) %>%
head() %>%
print
Sys.setlocale("LC_TIME","")
Results |
# A tibble: 6 x 10
empno ename job mgr hiredate sal comm deptno salary Month_name
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <chr> <chr>
1 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 $920 Dec 17 1980
2 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 $1,840 Feb 20 1981
3 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 $1,438 Feb 22 1981
4 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 $3,421 Apr 02 1981
5 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 $1,438 Sep 28 1981
6 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 $3,278 Mar 01 1981
[1] "Korean_Korea.949"
5. R sqldf Package
substr() 함수를 사용하여서 입사월에 해당하는 영문 약칭명을 추출하고, strftime() 함수에 출력형식을 지정하여 입사일과 입사년을 선택한 후 파이프(‘||’) 연산자로 월의 영문 약칭명+입사일자+입사년월을 연결하여 출력한다.
- sqlite에서 month name 지원 안 함.(case when / substr 등으로 직접 전환 처리)
- https://stackoverflow.com/questions/650480/get-month-from-datetime-in-sqlite
R Programming |
%%R
sqldf( " select hiredate,
datetime(hiredate * 3600 * 24,'unixepoch') hiredate_dtime,
strftime('%d %Y', datetime(hiredate * 3600 * 24,'unixepoch')) day_year,
substr('JanFebMarAprMayJunJulAugSepOctNovDec', 1 + 3*strftime('%m', datetime(hiredate * 3600 * 24,'unixepoch')), -3) month_name,
substr('JanFebMarAprMayJunJulAugSepOctNovDec', 1 + 3*strftime('%m', datetime(hiredate * 3600 * 24,'unixepoch')), -3) ||' '||strftime('%d %Y', datetime(hiredate * 3600 * 24,'unixepoch')) month_name1
from emp" ) %>% head()
Results |
hiredate hiredate_dtime day_year month_name month_name1
1 1980-12-17 1980-12-17 00:00:00 17 1980 Dec Dec 17 1980
2 1981-02-20 1981-02-20 00:00:00 20 1981 Feb Feb 20 1981
3 1981-02-22 1981-02-22 00:00:00 22 1981 Feb Feb 22 1981
4 1981-04-02 1981-04-02 00:00:00 02 1981 Apr Apr 02 1981
5 1981-09-28 1981-09-28 00:00:00 28 1981 Sep Sep 28 1981
6 1981-03-01 1981-03-01 00:00:00 01 1981 Mar Mar 01 1981
6. Python pandasql Package
Python Programming |
import copy
ps.sqldf(" select hiredate, \
date(substr(hiredate,1,4)||'-'||substr(hiredate,6,2)||'-'||substr(hiredate,9,2)) hiredate_dtime, \
strftime('%d %Y',date(substr(hiredate,1,4)||'-'||substr(hiredate,6,2)||'-'||substr(hiredate,9,2))) day_year, \
substr('JanFebMarAprMayJunJulAugSepOctNovDec', 1 + 3*strftime('%m', date(substr(hiredate,1,4)||'-'||substr(hiredate,6,2)||'-'||substr(hiredate,9,2)) ), -3) month_name, \
substr('JanFebMarAprMayJunJulAugSepOctNovDec', 1 + 3*strftime('%m', date(substr(hiredate,1,4)||'-'||substr(hiredate,6,2)||'-'||substr(hiredate,9,2)) ), -3) ||' '||strftime('%d %Y', datetime(hiredate * 3600 * 24,'unixepoch')) month_name1 \
from emp ").head()
Results |
hiredate | hiredate_dtime | day_year | month_name | month_name | |
0 | 1980/12/17 | 1980-12-17 | 17 1980 | Dec | Dec 04 1975 |
1 | 1981/02/20 | 1981-02-20 | 20 1981 | Feb | Feb 05 1975 |
2 | 1981/02/22 | 1981-02-22 | 22 1981 | Feb | Feb 05 1975 |
3 | 1981/04/02 | 1981-04-02 | 02 1981 | Apr | Apr 05 1975 |
4 | 1981/09/28 | 1981-09-28 | 28 1981 | Sep | Sep 05 1975 |
7. R data.table Package
날짜 관련 로케일(locale)을 영문으로 지정한 후 as.character() 함수 내에서 출력형식(‘영문월약칭명 일 년’)을 지정하여 출력한다. 월에 대하여 영문 약칭명으로 출력하기 위해 지정하였던 옵션을 초기화 하기 위하여 작업을 완료 후 “LC_TIME” 에 설정하였던 값을 제거하여 시스템에서 제공하는 기본 값으로 다시 변경된다.
R Programming |
%%R
Sys.setlocale("LC_TIME","English")
DT <- data.table(emp)
dept_DT <- data.table(dept)
salgrade_DT <- data.table(salgrade)
DT[, Month_name := as.character(hiredate, format = '%b %d %Y') ]
print(DT[1:5, ])
Sys.setlocale("LC_TIME","")
Results |
empno ename job mgr hiredate sal comm deptno salary Month_name
1: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 $920 Dec 17 1980
2: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 $1,840 Feb 20 1981
3: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 $1,438 Feb 22 1981
4: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 $3,421 Apr 02 1981
5: 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 $1,438 Sep 28 1981
[1] "Korean_Korea.949"
8. SAS Proc SQL
- 통계분석연구회 팁 참조(FORMATINFO / SASHELP.VFORMAT);
- NLDATE Function 링크
SAS Programming |
%%SAS sas
* 사용자 출력형식(format) 생성;
PROC FORMAT;
PICTURE DTTEST
LOW-HIGH = '%B %d %Y' (DATATYPE=DATE);
RUN;
* 사용자 출력형식(format) 생성;
proc format;
picture langtsda (default=50) other='%B, %d, %Y' (datatype=date);
run;
options locale=English_Unitedstates;
* options locale=Korean_Korea;
PROC SQL;
CREATE TABLE STATSAS_1 AS
select HIREDATE,
PUT(HIREDATE,monname.) AS HIREDATE_CHR,
PUT(HIREDATE,EURDFMY.) AS HIREDATE_CHR1,
PUT(HIREDATE,EURDFWDX.) AS HIREDATE_CHR2,
PUT(HIREDATE,DATE9.) AS HIREDATE_CHR3,
PUT(HIREDATE,DATE11.) AS HIREDATE_CHR4,
PUT(HIREDATE,NLDATEYM.) AS HIREDATE_CHR5,
PUT(HIREDATE,DTTEST17.) as HIREDATE_CHR6,
PUT(HIREDATE,langtsda19.) as HIREDATE_CHR7,
nldate(HIREDATE,'%B %d %Y') as HIREDATE_CHR8
from emp;
QUIT;
PROC PRINT;RUN;
Results |
hiredate | HIREDATE_CHR | HIREDATE_CHR1 | HIREDATE_CHR2 | HIREDATE_CHR3 | HIREDATE_CHR4 | HIREDATE_CHR5 | HIREDATE_CHR6 | HIREDATE_CHR7 | HIREDATE_CHR8 |
1980-12-17 | December | DEC1980 | 17 December 1980 | 17DEC1980 | 17-DEC-1980 | December 1980 | December 17 1980 | December, 17, 1980 | December 17 1980 |
1981-02-20 | February | FEB1981 | 20 February 1981 | 20FEB1981 | 20-FEB-1981 | February 1981 | February 20 1981 | February, 20, 1981 | February 20 1981 |
1981-02-22 | February | FEB1981 | 22 February 1981 | 22FEB1981 | 22-FEB-1981 | February 1981 | February 22 1981 | February, 22, 1981 | February 22 1981 |
1981-04-02 | April | APR1981 | 2 April 1981 | 02APR1981 | 02-APR-1981 | April 1981 | April 2 1981 | April, 2, 1981 | April 02 1981 |
1981-09-28 | September | SEP1981 | 28 September 1981 | 28SEP1981 | 28-SEP-1981 | September 1981 | September 28 1981 | September, 28, 1981 | September 28 1981 |
1981-03-01 | March | MAR1981 | 1 March 1981 | 01MAR1981 | 01-MAR-1981 | March 1981 | March 1 1981 | March, 1, 1981 | March 01 1981 |
1981-01-09 | January | JAN1981 | 9 January 1981 | 09JAN1981 | 09-JAN-1981 | January 1981 | January 9 1981 | January, 9, 1981 | January 09 1981 |
1982-12-09 | December | DEC1982 | 9 December 1982 | 09DEC1982 | 09-DEC-1982 | December 1982 | December 9 1982 | December, 9, 1982 | December 09 1982 |
1981-11-17 | November | NOV1981 | 17 November 1981 | 17NOV1981 | 17-NOV-1981 | November 1981 | November 17 1981 | November, 17, 1981 | November 17 1981 |
1981-09-08 | September | SEP1981 | 8 September 1981 | 08SEP1981 | 08-SEP-1981 | September 1981 | September 8 1981 | September, 8, 1981 | September 08 1981 |
1983-01-12 | January | JAN1983 | 12 January 1983 | 12JAN1983 | 12-JAN-1983 | January 1983 | January 12 1983 | January, 12, 1983 | January 12 1983 |
1981-12-03 | December | DEC1981 | 3 December 1981 | 03DEC1981 | 03-DEC-1981 | December 1981 | December 3 1981 | December, 3, 1981 | December 03 1981 |
1981-12-03 | December | DEC1981 | 3 December 1981 | 03DEC1981 | 03-DEC-1981 | December 1981 | December 3 1981 | December, 3, 1981 | December 03 1981 |
1982-01-23 | January | JAN1982 | 23 January 1982 | 23JAN1982 | 23-JAN-1982 | January 1982 | January 23 1982 | January, 23, 1982 | January 23 1982 |
9. SAS Data Step
SAS Programming |
%%SAS sas
options locale=English_Unitedstates;
* options locale=Korean_Korea;
* 사용자 입력형식(informat) 생성;
proc format;
invalue ifmt (default=25) '/(\D+) (\d+) (\d+)/' (REGEXP) = [ANYDTDTE25.]
other=_error_;
run;
PROC SQL;
CREATE TABLE STATSAS_2 AS
select HIREDATE,
PUT(HIREDATE,yymmddn8.) AS HIRE_CHAR_1,
INPUT('01'||HIREDATE_CHR||STRIP(put(year(today()),4.)),ANYDTDTE25.) AS HIRE_DATE FORMAT=monname.,
INPUT('01'||HIREDATE_CHR1,ANYDTDTE25.) AS HIRE_DATE1 FORMAT=EURDFMY.,
INPUT(compress(HIREDATE_CHR2),ANYDTDTE25.) AS HIRE_DATE2 FORMAT=EURDFWDX.,
INPUT(HIREDATE_CHR3,DATE9.) AS HIRE_DATE3 FORMAT=DATE9.,
INPUT(HIREDATE_CHR4,DATE11.) AS HIRE_DATE4 FORMAT=DATE11.,
INPUT('01'||HIREDATE_CHR5,ANYDTDTE25.) AS HIRE_DATE5 FORMAT=NLDATEYM.,
INPUT(HIREDATE_CHR6,ANYDTDTE25.) AS HIRE_DATE6 FORMAT=DTTEST17.,
INPUT(HIREDATE_CHR6,ifmt25.) AS HIRE_DATE6_1 FORMAT=DTTEST25.,
INPUT(HIREDATE_CHR6,ANYDTDTE25.) AS HIRE_DATE7 FORMAT=langtsda19.,
INPUT(HIREDATE_CHR8,ANYDTDTE25.) AS HIRE_DATE8 FORMAT=nldate.,
INPUT(HIREDATE_CHR8,nldate200.) AS HIRE_DATE8_1 FORMAT=nldate.
from STATSAS_1;
QUIT;
PROC PRINT data=STATSAS_2(obs=5);RUN;
Results |
hiredate | HIRE_CHAR_1 | HIRE_DATE | HIRE_DATE1 | HIRE_DATE2 | HIRE_DATE3 | HIRE_DATE4 | HIRE_DATE5 | HIRE_DATE6 | HIRE_DATE6_1 | HIRE_DATE7 | HIRE_DATE8 | HIRE_DATE8_1 |
1980-12-17 | 19801217 | December | DEC1980 | 17 December 1980 | 17DEC1980 | 17-DEC-1980 | December 1980 | December 17 1980 | December 17 1980 | December, 17, 1980 | December 17, 1980 | December 17, 1980 |
1981-02-20 | 19810220 | February | FEB1981 | 20 February 1981 | 20FEB1981 | 20-FEB-1981 | February 1981 | February 20 1981 | February 20 1981 | February, 20, 1981 | February 20, 1981 | February 20, 1981 |
1981-02-22 | 19810222 | February | FEB1981 | 22 February 1981 | 22FEB1981 | 22-FEB-1981 | February 1981 | February 22 1981 | February 22 1981 | February, 22, 1981 | February 22, 1981 | February 22, 1981 |
1981-04-02 | 19810402 | April | APR1981 | 2 April 1981 | 02APR1981 | 02-APR-1981 | April 1981 | April 2 1981 | April 2 1981 | April, 2, 1981 | April 02, 1981 | April 02, 1981 |
1981-09-28 | 19810928 | September | SEP1981 | 28 September 1981 | 28SEP1981 | 28-SEP-1981 | September 1981 | September 28 1981 | September 28 1981 | September, 28, 1981 | September 28, 1981 | September 28, 1981 |
10. Python Dfply Package
Python Programming |
emp >> \
mutate( hire_date_1 = make_symbolic(pd.to_datetime)(X.hiredate).dt.strftime('%B %d %Y') ,
hire_date_2 = make_symbolic(pd.to_datetime)(X.hiredate).apply(lambda x: x.strftime('%B %d %Y')) ) >> \
select( ~X.job, ~X.mgr, ~X.comm, ~X.deptno ) >> \
head()
Results |
empno | ename | hiredate | sal | hire_date_1 | hire_date_2 | |
0 | 7369 | SMITH | 1980/12/17 | 800 | December 17 1980 | December 17 1980 |
1 | 7499 | ALLEN | 1981/02/20 | 1600 | February 20 1981 | February 20 1981 |
2 | 7521 | WARD | 1981/02/22 | 1250 | February 22 1981 | February 22 1981 |
3 | 7566 | JONES | 1981/04/02 | 2975 | April 02 1981 | April 02 1981 |
4 | 7654 | MARTIN | 1981/09/28 | 1250 | September 28 1981 | September 28 1981 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 전처리 방법 리스트
댓글