[퍼온 글] 데이터 사이언스는 유사 과학이 되었는가?

728x90

/r/DataScience 서브레딧에 올라온 토론과 답변들 정리
글쓴이는 데이터 사이언스가 검증과 평가 없이 “제너레이티브 AI”라는 이름만으로 실행되는 현실에 회의감을 느낌
실상은 ChatGPT가 생성한 코드로 단순한 z-score 계산만 수행했으며, 모델 성능 평가도 없이 배포 직전까지 진행됨
커뮤니티에서는 "작동만 하면 배포하자"는 기업 문화, 검증 부족, 책임 회피, 과학적 윤리 희생을 공통적으로 지적함
다양한 실무자들이 비슷한 문제를 겪고 있으며, "유사 과학"으로 전락하는 흐름에 대해 강한 우려를 표현함
하지만 일부는 빠른 실험과 단순한 해법의 실용성도 이해해야 한다는 의견을 제시하며, 균형 있는 관점을 강조함

Data Science Has Become a Pseudo-Science

유럽에서 석·박사를 마치고 10년 간 산업과 학계를 오가며 데이터 사이언스를 수행함
최근 2년 간 "제너레이티브 AI"라는 이름으로 아무 검증 없이 결과를 내세우는 현상이 늘어나고 있음
예시로, 시계열 이상탐지를 목표로 한 프로젝트에서, ChatGPT가 생성한 코드로 평균 차이의 z-score만 계산하고, 어떤 성능 지표도 없이 배포를 논의함
이런 방식은 과학적 사고 없이 블랙박스에 질문하고 그대로 따르는 유사 과학의 모습이며, 질문조차 금기시됨
이에 따라 학계로 돌아가려는 고민도 있으며, 이런 현상이 동료들 사이에서도 공유되는 경험인지 묻고자 게시글을 작성함

댓글 요약

주요 공감 의견

“그냥 작동만 하면 배포” 라는 철학이 만연해 있음 (u/Illustrious-Pound266)
검증·로드맵 없이 AI만 강조하다 망한 스타트업 사례도 있음 (u/gothicserp3nt)
고의적이지 않은 편향이나 차별이 제대로 검토되지 않음 (u/tehMarzipanEmperor)
대부분 기업에서 RAG나 AI를 과장 포장하여 정확성보다 쇼잉(보여주기) 중심으로 운용됨 (u/castleking, u/flowanvindir)
현업의 분위기는 "퍼포먼스 극대화 연극(performance theater)" (u/Ty4Readin, u/faulerauslaender)
성과를 내기 위해 성급한 배포, 외형만 화려한 보고서, 측정 없는 AI 도입이 일반화됨 (u/glittering_tiger8996, u/Emergency-Job4136)
이런 상황은 예전부터 존재했고, GenAI는 그것을 더 노골적으로 만든 도구일 뿐이라는 시각도 다수 (u/RoomyRoots, u/303uru, u/TARehman)
설명 가능성은 낮고, 신뢰도도 떨어지지만 빠르기 때문에 채택됨
기업 의사결정에 대한 설명책임이 사라지고 있음 (u/empathic_psychopath8, u/Jollyhrothgar)

다른 시각

단순한 접근법도 문제를 해결할 수 있다면, 실용적으로 인정할 필요 있음 (u/AnarkittenSurprise)
많은 댓글에서 “DS는 본래부터 비과학적 요소도 섞여 있었으며”, 또는 “이름만 과학” 이라는 의견도 있음 (u/TaiChuanDoAddct, u/Time-Combination4710, u/LighterningZ)
AI 도구 사용 그 자체보다 그것을 책임 있게 활용하는 역량이 중요 (u/Dror_sim, u/ResearchMindless6419)
“데이터는 있지만 논리가 없음”, “통계 지식 없이 패키지만 돌리는 수준” 에 대한 비판 (u/gyp_casino, u/tmotytmoty)
실제로 중요한 건 도메인 지식과 수학적 사고이며, AI/코딩은 도구에 불과하다는 의견 다수 (u/MightBeRong, u/Dror_sim)

제도 및 교육 문제

MSDS 과정이 학문적으로는 유용하지만, 취업과는 무관한 경우가 많음 (u/throwaway_ghost_122)
교육 수준이 낮아지고, 학위만 얻으려는 수요가 늘면서 현업 전반의 품질이 하락 중 (u/Yam_Cheap)
학계 또한 검증되지 않은 논문과 얕은 분석이 많아지는 경향이 있으며, 학계라고 해서 예외는 아님 (u/joule_3am, u/Mishtle)

산업별 경험 공유

보험·헬스케어 분야는 엄격한 규제로 인해 여전히 타당성 검토와 법률 심사를 요구 (u/Mishtle, u/mikka1)
반대로 스타트업, 세일즈, 게임, 일부 제조 분야는 빠른 속도와 쇼잉 위주 (u/Vercingetorex89, u/Brackens_World)
공공 분야에서도 ChatGPT 도입으로 과거의 검증 체계가 무너지고 있음 (u/TheFluffyEngineer, u/joule_3am)

회의와 탈출 고민

현업을 떠나거나 학계로의 전환을 고민 중이라는 실무자가 많음 (u/thro0away12, u/Emotional_Plane_3500, u/candidFIRE)
진짜 실력 있는 사람은 오히려 돋보일 수 있는 기회라는 긍정적 시각도 일부 존재 (u/OddEditor2467, u/sideshowbob01)

풍자와 체념

“요즘은 pandas import만 해도 데이터 사이언티스트가 되는 시대” (u/vesnikos)
확률적 사고와 과학적 검증보다는 상사의 기분 맞추기가 중심이 된 현실 (u/tmotytmoty, u/WignerVille)
“과거에도 그랬고 지금도 그렇고, DS는 기업에서 과학이라 부르기엔 무리가 있었다”는 현실론 다수 존재 (u/TaiChuanDoAddct, u/LighterningZ)

결론

이 글과 댓글들은 최근 데이터 사이언스 실무가 과학적 정합성과 검증보다는, 빠른 납품과 AI 마케팅에 휘둘리는 현실을 잘 보여줌
“제너레이티브 AI”라는 라벨이 합리적 비판을 봉쇄하고 있다는 점, 그리고 검증 없는 코드가 곧바로 배포로 이어지는 구조에 대한 우려가 깊음
학계와 산업 모두 완벽하지 않지만, 데이터 사이언스가 진정한 의미의 ‘과학’이 되기 위해선 커뮤니티 내부의 비판적 사고와 교육, 실무 문화의 성찰이 필요하다는 점에서 논의는 계속될 것으로 보임

https://news.hada.io/topic?id=22025&utm_source=slack&utm_medium=bot&utm_campaign=T079373U492

데이터 사이언스는 유사 과학이 되었는가? | GeekNews

/r/DataScience 서브레딧에 올라온 토론과 답변들 정리글쓴이는 데이터 사이언스가 검증과 평가 없이 “제너레이티브 AI”라는 이름만으로 실행되는 현실에 회의감을 느낌실상은 ChatGPT가 생성한 코

news.hada.io

원문: https://www.reddit.com/r/datascience/comments/1lluwlv/data_science_has_become_a_pseudoscience/

From the datascience community on Reddit

Explore this post and more from the datascience community

www.reddit.com

I’ve been working in data science for the last ten years, both in industry and academia, having pursued a master’s and PhD in Europe. My experience in the industry, overall, has been very positive. I’ve had the opportunity to work with brilliant people on exciting, high-impact projects. Of course, there were the usual high-stress situations, nonsense PowerPoints, and impossible deadlines, but the work largely felt meaningful.

However, over the past two years or so, it feels like the field has taken a sharp turn. Just yesterday, I attended a technical presentation from the analytics team. The project aimed to identify anomalies in a dataset composed of multiple time series, each containing a clear inflection point. The team’s hypothesis was that these trajectories might indicate entities engaged in some sort of fraud.

The team claimed to have solved the task using “generative AI”. They didn’t go into methodological details but presented results that, according to them, were amazing. Curious, nespecially since the project was heading toward deployment, i asked about validation, performance metrics, or baseline comparisons. None were presented.

Later, I found out that “generative AI” meant asking ChatGPT to generate a code. The code simply computed the mean of each series before and after the inflection point, then calculated the z-score of the difference. No model evaluation. No metrics. No baselines. Absolutely no model criticism. Just a naive approach, packaged and executed very, very quickly under the label of generative AI.

The moment I understood the proposed solution, my immediate thought was "I need to get as far away from this company as possible". I share this anecdote because it summarizes much of what I’ve witnessed in the field over the past two years. It feels like data science is drifting toward a kind of pseudo-science where we consult a black-box oracle for answers, and questioning its outputs is treated as anti-innovation, while no one really understand how the outputs were generated.

After several experiences like this, I’m seriously considering focusing on academia. Working on projects like these is eroding any hope I have in the field. I know this won’t work and yet, the label generative AI seems to make it unquestionable. So I came here to ask if is this experience shared among other DSs?

728x90

'About Analytics' 카테고리의 다른 글

[데이터 분석을 위한 수학] 선형대수학 (0)	2025.01.02
Variables (0)	2021.09.27

Data Science Has Become a Pseudo-Science

댓글 요약

주요 공감 의견

다른 시각

제도 및 교육 문제

산업별 경험 공유

회의와 탈출 고민

풍자와 체념

결론

'About Analytics' 카테고리의 다른 글

티스토리툴바