This project is a public API designed to provide basketball data from the website basketball-reference.com. It uses Flask for hosting the endpoints, AWS RDS for PostgreSQL database hosting, and will be deployed using Zeet. The API currently has two endpoints, with more planned for future releases.
NOTE: the database is being hosted on AWS RDS currently, but the endpoints are still on the localhost - they are not publicly available yet.
To get started with this project, clone the repository and install the required dependencies.
git clone https://github.com/yourusername/basketball-reference-api.git
cd basketball-reference-api
pip install -r requirements.txt
To run the API locally, execute the following command:
export FLASK_APP=endpoints.py
flask run
This will start the Flask server on http://127.0.0.1:5000/.
- Description: Retrieves player data.
- Method: GET
- Parameters:
player_name
(required): the name of the player.
- Description: Retrieves team data.
- Method: GET
- Parameters:
team_id
(required): The unique code for the team.year
(required): the year the team played in.
- Description: retrieves the player's past few games.
- Method: GET
- Parameters:
player_name
(required): the name of the player.
Basketball reference allows data to be scraped off its website, however it rate limits quite strictly. More than a few attempts at scraping within a few seconds can get your IP banned for over an hour. To circumvent this, the project has its own PostgreSQL database setup. When the api receives a GET request, checks its database to see if the dbase holds the queried data. If it does, then the data is immediately fetched from the database without attempting scraping of any sort. If it does not, THEN a web scraping script is run, after which the newly scraped data is added to the database (so that if it is ever requested again, one need not web scrape again). Additionally, to avoid the IP ban, this API has been rate limited to throttle requests to about 1 every 5 seconds. Finally, a proxy rotation feature has been recently implemented. If any of the endpoints receive a request for data, the web scraper is routed through a randomly chosen proxy so as to circumvent IP banning from basketballreference.com . The above methods have been implemented in order to effectively get and provide data from the popular website.
This module contains the Player
class which uses web scraping techniques to extract player data from basketball-reference.com.
This module contains the TeamScraper
class which uses web scraping techniques to extract team data from basketball-reference.com.
This module defines the Flask routes for the API endpoints.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes.
- Commit your changes (
git commit -m 'Add some feature'
). - Push to the branch (
git push origin feature-branch
). - Open a pull request.