일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
- Wasserstein distance
- 주니어 레인저
- 로드트립
- 주니어레인저
- stochastic block model
- 핫스프링스
- verylargearray
- 국립공원
- community detection
- 타임스퀘어
- TGI FRIDAY
- 피츠버그여행
- 가족여행
- 뉴욕 가족 여행
- 그랜드 캐니언
- 프레리독타운
- 불헤드 시티
- 모하비 사막
- 뉴욕 여행
- 자연경관
- 미국여행
- 여행일기
- scaling behavior
- Tim Hortons
- 미국로드트립
- graph neural networks
- hi-c data analysis
- 국립공원탐방
- 원더패스
- 쥬니어레인저
- Today
- Total
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
- Wasserstein distance
- 주니어 레인저
- 로드트립
- 주니어레인저
- stochastic block model
- 핫스프링스
- verylargearray
- 국립공원
- community detection
- 타임스퀘어
- TGI FRIDAY
- 피츠버그여행
- 가족여행
- 뉴욕 가족 여행
- 그랜드 캐니언
- 프레리독타운
- 불헤드 시티
- 모하비 사막
- 뉴욕 여행
- 자연경관
- 미국여행
- 여행일기
- scaling behavior
- Tim Hortons
- 미국로드트립
- graph neural networks
- hi-c data analysis
- 국립공원탐방
- 원더패스
- 쥬니어레인저
- Today
- Total
BBong's Story
(Final Project) Machine Learning-based Intraday Stock Price Prediction with high-frequency data analysis 본문
(Final Project) Machine Learning-based Intraday Stock Price Prediction with high-frequency data analysis
QBBong 2024. 12. 22. 22:25Machine Learning-based Intraday Stock Price Prediction with high-frequency data analysis
Introduction
Stock market prediction is a complex but fascinating area in financial technology. Our project aimed to develop a machine-learning-based intraday stock price prediction model using high-frequency data. By progressing through three milestones, we refined our approach, integrated meaningful features, and tackled challenges in data handling and model optimization.
This blog post summarizes the journey through each milestone, highlighting our methods, findings, and lessons learned.
Milestone 1: Project Foundation
Objective
Define the scope and framework for the project, set up the foundational tools, and collect preliminary data.
- Problem Definition
The project focused on predicting stock price movements using machine learning. Specifically, we aimed to:- Analyze high-frequency intraday data.
- Incorporate external macroeconomic indicators for enhanced prediction.
- Data Collection
We leveragedyfinance
to gather historical minute-by-minute price data for S&P 500 companies and AAPL. Data fields included:- Open, High, Low, Close, Volume, and Ticker.
- Initial Setup
Tools such as Python,pandas
, andscikit-learn
were used for data preprocessing and exploratory analysis.
Outcome
We laid the groundwork for future milestones by ensuring reliable data access and defining the problem in terms of machine learning.
Milestone 2: Feature Engineering and Labeling
Objective
Engineer meaningful features and label the data for supervised learning.
- Feature Engineering
Introduced Bollinger Bands as a technical indicator to capture stock price volatility:- SMA: Simple Moving Average.
- Upper Band and Lower Band: SMA ± 2 × Standard Deviation.
- Triple Barrier Labeling
Data was labeled using the triple-barrier method:- Profit Target and Stop Loss: Defined based on standard deviation and correction factors.
- Labels:
1
for upward movement,-1
for downward movement, and0
for no significant change.
- External Indicators
Added macroeconomic factors, including:- Fed Rate: Represents interest rates.
- Crude Oil Price: Reflects global economic health.
- VIX Index: Captures market volatility.
- Challenges
- Handling high-frequency data required robust cleaning and interpolation.
- Balancing labels involved experimenting with correction factors to address class imbalance.
Outcome
This milestone concluded with a fully labeled dataset and a well-engineered feature set for model training.
Milestone 3: Model Training and Finalization
Objective
Train machine learning models and evaluate their performance using the engineered features.
- Model Selection
- Experimented with Random Forest Classifier and Support Vector Machines (SVM) for their robustness and interpretability.
- Data Splitting
Divided the dataset into training, validation, and testing sets to ensure unbiased evaluation. - Optimization
- Correction factors were fine-tuned to achieve balanced labels.
- Feature importance analysis revealed that external indicators significantly improved prediction accuracy.
- Performance Evaluation
Metrics such as precision, recall, and F1-score highlighted strengths in predicting upward movements. Downward movement predictions posed challenges due to class imbalance. - Conclusions
- Successfully captured market trends using high-frequency data and macroeconomic indicators.
- Insights gained could be extended to real-time trading systems with further refinements.
Conclusion
This project demonstrated the potential of combining technical and macroeconomic features to predict stock price movements. Each milestone addressed critical aspects of the problem, from foundational setup to feature engineering and final model training.
While the results were promising, challenges such as data imbalance and computational efficiency remain areas for future exploration. This journey reinforced the importance of data preprocessing, feature selection, and iterative optimization in developing predictive models.
What’s Next?
For future work, integrating deep learning models like LSTMs could capture temporal dependencies in high-frequency data. Additionally, experimenting with real-time data pipelines may bring the project closer to deployment in live trading scenarios.
'Learn > '24_Fall_(EE599) DataScience' 카테고리의 다른 글
(Paper) GraphStorm (0) | 2024.12.22 |
---|---|
(Lecture 12) Structure and inference in hypergraphs with node attributes (0) | 2024.12.22 |
(Lecture 11) ANNs, GNNs, RNNs, DNNs. (0) | 2024.12.22 |
(Lecture 10) Fractional difference operators (0) | 2024.12.22 |
(Lecture 8) Graphon definitions & Multifractal graph generators (0) | 2024.12.22 |