관리 메뉴

BBong's Story

(Paper) GraphStorm 본문

Learn/'24_Fall_(EE599) DataScience

(Paper) GraphStorm

QBBong 2024. 12. 22. 22:42

GraphStom: All-in-one Graph Machine Learning Framework for Industry Application

교수님 스타일이 최신 논문 리스트를 주고, 그중에서 한개를 선택해서 눈문 분석 발표를 하게 한후, 그 내용을 바탕으로 퀴즈를 진행하다.

40여개의 논문중에서 그나마 제일 쉬워 보이는(수식이 없었다.) 이 논문을 선택했었다.


요약

GraphStorm은 산업 환경에서의 그래프 머신 러닝(GML) 적용을 단순화하고 확장성을 제공하는 혁신적인 프레임워크이다. 2023년 5월에 출시되어 노코드/로우코드 솔루션을 통해 대규모 그래프 처리, 모델 학습, 추론 작업을 간소화한다.


주요 특징

  1. 확장성: 수십억 개의 노드와 엣지를 가진 그래프를 처리하며, 하드웨어 환경에 맞게 확장 가능.
  2. 사용 편의성: 한 줄 명령으로 그래프 생성 및 학습 가능.
  3. 고급 기능:
    • 다중 모달 데이터 통합: 텍스트, 이미지, 그래프 데이터를 통합하여 모델링.
    • 특징 없는 노드 처리: 주변 노드 정보나 학습 가능한 임베딩을 활용.
    • 고립된 노드 처리: GNN 디스틸레이션 기법을 통해 성능 개선.
  4. 산업 검증: Microsoft Academic Graph(MAG), Amazon Review Dataset과 같은 대규모 데이터셋에서 성능 검증 완료.
  5. 모델 동작: 분산 그래프 엔진, 데이터 파이프라인, 모델 학습/추론, 모델 Zoo 등 4계층 구조 제공.

성능 평가

  • 평가 데이터셋: MAG, Amazon Review Dataset 등 이질적인 대규모 그래프.
  • 평가 작업:
    • 노드 분류: 예를 들어, 학술지 유형 또는 브랜드 예측.
    • 링크 예측: 논문 인용 또는 공동 구매 관계 식별.
  • 효율성:
    • 수백만~수억 개 노드를 포함한 그래프를 몇 시간 내에 처리.
    • 다양한 데이터셋에서의 탁월한 확장성과 성능.

기술 기여 및 향후 작업

  • 기여:
    • GML 파이프라인 간소화.
    • 산업 그래프를 위한 확장 가능한 모델링 솔루션.
    • 기존 생산 모델 대비 성능 개선.
  • 향후 연구 방향:
    • 더 큰 데이터셋 지원.
    • 클라우드 플랫폼 통합.

[발표 PPT]

쉽게 적용할 수 있다는 장점은 있지마, 아무래도 Amazon 에서 만든 프레임워크 이다 보니. 공개되어 있는 정보가 거의 없다.

나중에 제대로 써볼일이 있을지 모르겠다.

그러고보면, 이번학기는 지난학기보다도 퀴즈를 너무 많이 보았다. (총 18개 논문이였다...)

(대신 과제도 없었고, 진도도 좀 늦어지면서 마일스톤도 3으로 마무리 되었으니. 지난학기 보단 좋았다고 해야할지... )

최신 논문들을 나열해보면 다음과 같다.

최근 논문 목록 (2024)

  1. Levie, Ron
    A graphon-signal analysis of graph neural networks. Advances in Neural Information Processing Systems 36 (2024).
    논문 링크

  2. Amirhossein Farzam, Allen Tannenbaum, and Guillermo Sapiro
    From Geometry to Causality-Ricci Curvature and the Reliability of Causal Inference on Networks. 41st International Conference on Machine Learning, 2024.
    논문 링크

  3. Khang Nguyen, Nong Minh Hieu, Vinh Duc Nguyen, Nhat Ho, Stanley Osher, and Tan Minh Nguyen
    Revisiting over-smoothing and over-squashing using ollivier-ricci curvature. International Conference on Machine Learning, 2023.
    논문 링크

  4. Glover, Cory, and Albert-László Barabási
    Measuring Entanglement in Physical Networks. Physical Review Letters 133, no. 7 (2024): 077401.
    논문 링크

  5. Xue, Leyang, Shengling Gao, Lazaros K. Gallos, Orr Levy, Bnaya Gross, Zengru Di, and Shlomo Havlin
    Nucleation phenomena and extreme vulnerability of spatial k-core systems. Nature Communications 15, no. 1 (2024): 5850.
    논문 링크

  6. Meng, Xiangyi, Onur Varol, and Albert-László Barabási
    Hidden citations obscure true impact in science. PNAS Nexus 3, no. 5 (2024): pgae155.
    논문 링크

  7. Jiang, Chunheng, Zhenhan Huang, Tejaswini Pedapati, Pin-Yu Chen, Yizhou Sun, and Jianxi Gao
    Network properties determine neural network performance. Nature Communications 15, no. 1 (2024): 5718.
    논문 링크

  8. Li, Changbin, Kangshuo Li, Yuzhe Ou, Lance M. Kaplan, Audun Jøsang, Jin-Hee Cho, Dong Hyun Jeong, and Feng Chen
    Hyper Evidential Deep Learning to Quantify Composite Classification Uncertainty. ICLR 2024.
    논문 링크

  9. Lianghao Xia and Chao Huang
    AnyGraph: Graph Foundation Model in the Wild. ACM KDD, 2024.
    논문 링크

  10. Zappala, Emanuele, Antonio Henrique de Oliveira Fonseca, Josue Ortega Caro, Andrew Henry Moberly, Michael James Higley, Jessica Cardin, and David van Dijk
    Learning integral operators via neural integral equations. Nature Machine Intelligence (2024): 1-17.
    논문 링크

  11. Sandhu, Romeil S., Tryphon T. Georgiou, and Allen R. Tannenbaum
    Ricci curvature: An economic indicator for market fragility and systemic risk. Science Advances 2, no. 5 (2016): e1501495.
    논문 링크

  12. Cao, Qianying, Somdatta Goswami, and George Em Karniadakis
    Laplace neural operator for solving differential equations. Nature Machine Intelligence 6, no. 6 (2024): 631-640.
    논문 링크

  13. Rusch, T. Konstantin, Nathan Kirk, Michael M. Bronstein, Christiane Lemieux, and Daniela Rus
    Message-Passing Monte Carlo: Generating low-discrepancy point sets via graph neural networks. Proceedings of the National Academy of Sciences 121, no. 40 (2024): e2409913121.
    논문 링크

  14. Trivedi, Puja, Ryan A. Rossi, David Arbour, Tong Yu, Franck Dernoncourt, Sungchul Kim, Nedim Lipka, Namyong Park, Nesreen K. Ahmed, and Danai Koutra
    Editing Partially Observable Networks via Graph Diffusion Models. 41st International Conference on Machine Learning, 2024.
    논문 링크

  15. Zhong, Yi, Gaozheng Li, Ji Yang, Houbing Zheng, Yongqiang Yu, Jiheng Zhang, Heng Luo, Biao Wang, and Zuquan Weng
    Learning motif-based graphs for drug–drug interaction prediction via local–global self-attention. Nature Machine Intelligence (2024): 1-12.
    논문 링크

  16. Zahra Kadkhodaie, Florentin Guth, Eero P. Simoncelli, and Stéphane Mallat
    Generalization in diffusion models arises from geometry-adaptive harmonic representation. ICLR 2024 (Best Paper Award).
    논문 링크

  17. Berrueta, Thomas A., Allison Pinosky, and Todd D. Murphey
    Maximum diffusion reinforcement learning. Nature Machine Intelligence (2024): 1-11.
    논문 링크

  18. Gan, Quan, Minjie Wang, David Wipf, and Christos Faloutsos
    Graph Machine Learning Meets Multi-Table Relational Data. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024.
    논문 링크

  19. Zheng, Da, Xiang Song, Qi Zhu, Jian Zhang, Theodore Vasiloudis, Runjie Ma, Houyu Zhang et al.
    GraphStorm: All-in-one graph machine learning framework for industry applications. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024.
    논문 링크

  20. Ke, Qing, Alexander J. Gates, and Albert-László Barabási
    A network-based normalized impact measure reveals successful periods of scientific discovery across disciplines. Proceedings of the National Academy of Sciences 120, no. 48 (2023): e2309378120.
    논문 링크

  21. Mahowald, Kyle, Anna A. Ivanova, Idan A. Blank, Nancy Kanwisher, Joshua B. Tenenbaum, and Evelina Fedorenko
    Dissociating language and thought in large language models. Trends in Cognitive Sciences (2024).
    논문 링크

  22. Wu, Tao, Xiangyun Gao, Feng An, Xiaotian Sun, Haizhong An, Zhen Su, Shraddha Gupta, Jianxi Gao, and Jürgen Kurths
    Predicting multiple observations in complex systems through low-dimensional embeddings. Nature Communications 15, no. 1 (2024): 2242.
    논문 링크

  23. Gan, Xiao, Zixin Shu, Xinyan Wang, Dengying Yan, Jun Li, Shany Ofaim, Réka Albert et al.
    Network medicine framework reveals generic herb-symptom effectiveness of traditional Chinese medicine. Science Advances 9, no. 43 (2023): eadh0215.
    논문 링크

  24. Zhang, Yang, Lorenzo Boninsegna, Muyu Yang, Tom Misteli, Frank Alber, and Jian Ma
    Computational methods for analysing multiscale 3D genome organization. Nature Reviews Genetics 25, no. 2 (2024): 123-141.
    논문 링크

  25. Chen, Guozhang, and Pulin Gong
    A spatiotemporal mechanism of visual attention: Superdiffusive motion and theta oscillations of neural population activity patterns. Science Advances 8, no. 16 (2022): eabl4995.
    논문 링크

  26. Fatemi, Bahare, Jonathan Halcrow, and Bryan Perozzi
    Talk like a graph: Encoding graphs for large language models. ICLR 2024.
    논문 링크

  27. Max, Kevin, Laura Kriener, Garibaldi Pineda García, Thomas Nowotny, Ismael Jaras, Walter Senn, and Mihai A. Petrovici
    Learning efficient backprojections across cortical hierarchies in real time. Nature Machine Intelligence (2024): 1-12.
    논문 링크

  28. Cao, Duanhua, Geng Chen, Jiaxin Jiang, Jie Yu, Runze Zhang, Mingan Chen, Wei Zhang et al.
    Generic protein–ligand interaction scoring by integrating physical prior knowledge and data augmentation modelling. Nature Machine Intelligence (2024): 1-13.
    논문 링크

  29. Koh, Huan Yee, Anh TN Nguyen, Shirui Pan, Lauren T. May, and Geoffrey I. Webb
    Physicochemical graph neural network for learning protein–ligand interaction fingerprints from sequence data. Nature Machine Intelligence (2024): 1-15.
    논문 링크

  30. Chen, Dong, Jian Liu, and Guo-Wei Wei
    Multiscale topology-enabled structure-to-sequence transformer for protein–ligand interaction predictions. Nature Machine Intelligence 6, no. 7 (2024): 799-810.
    논문 링크

  31. Schlegel, P., Yin, Y., Bates, A.S. et al.
    Whole-brain annotation and multi-connectome cell typing of Drosophila. Nature 634, 139–152 (2024).
    논문 링크

  32. Gu, Albert, and Tri Dao
    Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
    논문 링크

  33. Ruiz, Luana, Luiz Chamon, and Alejandro Ribeiro
    Graphon neural networks and the transferability of graph neural networks. Advances in Neural Information Processing Systems 33 (2020): 1702-1712.
    논문 링크

  34. Fabian, Christian, Kai Cui, and Heinz Koeppl
    Learning sparse graphon mean field games. International Conference on Artificial Intelligence and Statistics, 2023.
    논문 링크

  35. Xia, Xinyue, Gal Mishne, and Yusu Wang
    Implicit graphon neural representation. International Conference on Artificial Intelligence and Statistics, 2023.
    논문 링크

  36. Cheng, Chaoran, and Jian Peng
    Equivariant neural operator learning with graphon convolution. Advances in Neural Information Processing Systems 36 (2024).
    논문 링크

  37. Sun, Yifei, Qi Zhu, Yang Yang, Chunping Wang, Tianyu Fan, Jiajun Zhu, and Lei Chen
    Fine-Tuning Graph Neural Networks by Preserving Graph Generative Patterns. Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
    논문 링크

  38. Cervino, Juan, Luana Ruiz, and Alejandro Ribeiro
    Learning by transference: Training graph neural networks on growing graphs. IEEE Transactions on Signal Processing 71 (2023): 233-247.
    논문 링크

  39. Ruiz, Luana, Luiz FO Chamon, and Alejandro Ribeiro
    Transferability properties of graph neural networks. IEEE Transactions on Signal Processing (2023).
    논문 링크

  40. Chatterjee, Anirban, Soham Dan, and Bhaswar B. Bhattacharya
    Higher-Order Graphon Theory: Fluctuations, Degeneracies, and Inference. arXiv preprint arXiv:2404.13822 (2024).
    논문 링크

  41. Cao, Yuxuan, Jiarong Xu, Carl Yang, Jiaan Wang, Yunchao Zhang, Chunping Wang, Lei Chen, and Yang Yang
    When to Pre-Train Graph Neural Networks? From Data Generation Perspective! Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023.
    논문 링크

  42. Böker, Jan, Ron Levie, Ningyuan Huang, Soledad Villar, and Christopher Morris
    Fine-grained expressivity of graph neural networks. Advances in Neural Information Processing Systems 36 (2024).
    논문 링크


기타 흥미로운 논문

  • He, Zhongmou, Jing Zhu, Shengyi Qian, Joyce Chai, and Danai Koutra
    LinkGPT: Teaching Large Language Models To Predict Missing Links. arXiv preprint arXiv:2406.04640 (2024).
    논문 링크

  • Barbero, Federico, Andrea Banino, Steven Kapturowski, Dharshan Kumaran, João GM Araújo, Alex Vitvitskyi, Razvan Pascanu, and Petar Veličković
    Transformers need glasses! Information over-squashing in language tasks. arXiv preprint arXiv:2406.04267 (2024).
    논문 링크