I would like a script, preferably in Python, that will
1.scan a folder and subfolders
2. remove some files based on their file extension
3. extract text from PDF, Word, Excel, Html and possibly other formats, from files within the folders.
4. write the extracted text to a .txt file that is named as the original
5. deletes the original PDF, Word, ...
Skills: html, python