Learn/'24_Fall_(EE599) DataScience

(Final Project) Machine Learning-based Intraday Stock Price Prediction with high-frequency data analysis

QBBong 2024. 12. 22. 22:25

728x90

Machine Learning-based Intraday Stock Price Prediction with high-frequency data analysis

Introduction

Stock market prediction is a complex but fascinating area in financial technology. Our project aimed to develop a machine-learning-based intraday stock price prediction model using high-frequency data. By progressing through three milestones, we refined our approach, integrated meaningful features, and tackled challenges in data handling and model optimization.

This blog post summarizes the journey through each milestone, highlighting our methods, findings, and lessons learned.

Milestone 1: Project Foundation

Objective

Define the scope and framework for the project, set up the foundational tools, and collect preliminary data.

Problem Definition
The project focused on predicting stock price movements using machine learning. Specifically, we aimed to:
- Analyze high-frequency intraday data.
- Incorporate external macroeconomic indicators for enhanced prediction.
Data Collection
We leveraged yfinance to gather historical minute-by-minute price data for S&P 500 companies and AAPL. Data fields included:
- Open, High, Low, Close, Volume, and Ticker.
Initial Setup
Tools such as Python, pandas, and scikit-learn were used for data preprocessing and exploratory analysis.

Outcome

We laid the groundwork for future milestones by ensuring reliable data access and defining the problem in terms of machine learning.

Milestone 2: Feature Engineering and Labeling

Objective

Engineer meaningful features and label the data for supervised learning.

Feature Engineering
Introduced Bollinger Bands as a technical indicator to capture stock price volatility:
- SMA: Simple Moving Average.
- Upper Band and Lower Band: SMA ± 2 × Standard Deviation.
Triple Barrier Labeling
Data was labeled using the triple-barrier method:
- Profit Target and Stop Loss: Defined based on standard deviation and correction factors.
- Labels: 1 for upward movement, -1 for downward movement, and 0 for no significant change.
External Indicators
Added macroeconomic factors, including:
- Fed Rate: Represents interest rates.
- Crude Oil Price: Reflects global economic health.
- VIX Index: Captures market volatility.
Challenges
- Handling high-frequency data required robust cleaning and interpolation.
- Balancing labels involved experimenting with correction factors to address class imbalance.

Outcome

This milestone concluded with a fully labeled dataset and a well-engineered feature set for model training.

Milestone 3: Model Training and Finalization

Objective

Train machine learning models and evaluate their performance using the engineered features.

Model Selection
- Experimented with Random Forest Classifier and Support Vector Machines (SVM) for their robustness and interpretability.
Data Splitting
Divided the dataset into training, validation, and testing sets to ensure unbiased evaluation.
Optimization
- Correction factors were fine-tuned to achieve balanced labels.
- Feature importance analysis revealed that external indicators significantly improved prediction accuracy.
Performance Evaluation
Metrics such as precision, recall, and F1-score highlighted strengths in predicting upward movements. Downward movement predictions posed challenges due to class imbalance.
Conclusions
- Successfully captured market trends using high-frequency data and macroeconomic indicators.
- Insights gained could be extended to real-time trading systems with further refinements.

Conclusion

This project demonstrated the potential of combining technical and macroeconomic features to predict stock price movements. Each milestone addressed critical aspects of the problem, from foundational setup to feature engineering and final model training.

While the results were promising, challenges such as data imbalance and computational efficiency remain areas for future exploration. This journey reinforced the importance of data preprocessing, feature selection, and iterative optimization in developing predictive models.

What’s Next?

For future work, integrating deep learning models like LSTMs could capture temporal dependencies in high-frequency data. Additionally, experimenting with real-time data pipelines may bring the project closer to deployment in live trading scenarios.

(Milestone3)Presentation.pdf

1.8 MB

(Final)Project_Report.pdf

1.7 MB

728x90

'Learn > '24_Fall_(EE599) DataScience' 카테고리의 다른 글

(Paper) GraphStorm (0)	2024.12.22
(Lecture 12) Structure and inference in hypergraphs with node attributes (1)	2024.12.22
(Lecture 11) ANNs, GNNs, RNNs, DNNs. (1)	2024.12.22
(Lecture 10) Fractional difference operators (0)	2024.12.22
(Lecture 8) Graphon definitions & Multifractal graph generators (1)	2024.12.22

현재글(Final Project) Machine Learning-based Intraday Stock Price Prediction with high-frequency data analysis

BBong's Story

놀고, 먹고, 일하고, 만들고, 배우고

250x250

네트워크 성능, thingsboard io, 주니어 레인저, 핫스프링스, rdma, 데이터 분석, 로드트립, 혼잡 제어, roce, 가족여행, 클라우드 네트워크, TCP, 클라우드 컴퓨팅, Iot, FPGA, 미국여행, AWS, 불헤드 시티, 뉴욕 여행, 미국로드트립,

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

BBong's Story

(Final Project) Machine Learning-based Intraday Stock Price Prediction with high-frequency data analysis

Machine Learning-based Intraday Stock Price Prediction with high-frequency data analysis

Introduction

Milestone 1: Project Foundation

Objective

Outcome

Milestone 2: Feature Engineering and Labeling

Objective

Outcome

Milestone 3: Model Training and Finalization

Objective

Conclusion

What’s Next?

'Learn > '24_Fall_(EE599) DataScience' 카테고리의 다른 글

'Learn/'24_Fall_(EE599) DataScience'의 다른글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

(Final Project) Machine Learning-based Intraday Stock Price Prediction with high-frequency data analysis

Machine Learning-based Intraday Stock Price Prediction with high-frequency data analysis

Introduction

Milestone 1: Project Foundation

Objective

Outcome

Milestone 2: Feature Engineering and Labeling

Objective

Outcome

Milestone 3: Model Training and Finalization

Objective

Conclusion

What’s Next?

'Learn > '24_Fall_(EE599) DataScience' 카테고리의 다른 글

'Learn/'24_Fall_(EE599) DataScience'의 다른글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역