auth : hahally
start : 2020.1.11
colab 长连接脚本
1 2 3 4 5
| function ClickConnect(){ console.log("Clicked on connect button"); document.querySelector("colab-connect-button").click() } setInterval(ClickConnect,60000)
|
Django
常用命令
django-admin startproject locallibrary # 创建项目
python manage.py startapp catalog # 创建应用
python manage.py runserver # 启动服务
python manage.py makemigrations # 数据库迁移
python manage.py migrate
python manage.py createsuperuser # 创建管理员账号
views.py
posts.content = markdown.markdown(
posts.content,
extensions = [
# 包含 缩写、表格等常用扩展
'markdown.extensions.extra',
# 语法高亮扩展
'markdown.extensions.codehilite',
]
)
Scrapy 常用命令
scrapy startproject proj # 创建项目
scrapy crawl spider_name # 运行爬虫
python 第三方库安装源
清华大学镜像
https://pypi.tuna.tsinghua.edu.cn/simple/
阿里云
http://mirrors.aliyun.com/pypi/simple/
中科大镜像
https://pypi.mirrors.ustc.edu.cn/simple/
豆瓣镜像
http://pypi.douban.com/simple/
中科大镜像2
http://pypi.mirrors.ustc.edu.cn/simple/
笔记
Auth : hahally
createTime : 2019.10.26
abstract : 大数据辅修学习笔记
jdk
环境变量配置
在、/etc/profile或~/.bashrc中的文件底部
JAVA_HOME=/usr/java/jdk1.8.0_162
JRE_HOME=$JAVA_HOME/jre
CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
export JAVA_HOME JRE_HOME CLASS_PATH PATH
[root@master~]# source /etc/profile 使配置生效
windows dos
命令
C:\Users\ACER>netstat -aon|findstr "8081" 查看端口号
C:\Users\ACER>taskkill /f /t /im 10144 杀掉进程
Linux
命令
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| [root@master~]# tar -zxvf [*].tar.gz -C [路径] 解压 [root@master~]# yum -y remove firewalld 卸载防火墙 [root@master~]# systemctl stop/status/start firewalld 停止/查看状态/启动/防火墙服务 [root@master~]# netstat -tunlp|grep 端口号 查看端口占用情况 [root@master~]# sudo passwd root 设置root密码 [root@master~]# sudo ln -s /usr/local/jdk1.8.0_162/bin/ bin 创建软链接 [root@master~]# cp [-r] file/filedir filepath 复制文件或目录
ubuntu ens33丢失重连 [root@master~]# sudo service network-manager stop [root@master~]# sudo rm /var/lib/NetworkManager/NetworkManager.state [root@master~]# sudo service network-manager start [root@master~]# sudo gedit /etc/NetworkManager/NetworkManager.conf #(把false改成true) [root@master~]# sudo service network-manager restart
centos ens33丢失重连 [root@master~]# systemctl stop NetworkManager [root@master~]# systemctl disable NetworkManager [root@master~]# sudo ifup ens33 重新连接ens33 [root@master~]# systemctl restart network [root@master~]# systemctl start NetworkManager
[root@master~]# sudo ps -e |grep ssh 查看ssh服务是否启动
|
git
命令
1 2 3 4 5 6
| git init 初始化 git add filename 将上传文件加到缓冲区 git commit [-m] [注释] git remote add origin https://github.com/[用户名名]/[仓库名].git git push -u origin master -f 上传到远程仓库分支 git clone https://github.com/[用户名名]/[仓库名].git 拉取代码
|
docker
命令
1 2 3 4 5 6 7
| [root@master~]# sudo docker run -it -v /home/hahally/myimage:/data --name slave2 -h slave2 new_image:newhadoop /bin/bash 运行容器指定共享目录 [root@master~]# sudo docker start slave2 启动容器 [root@master~]# sudo docker exec -i -t s2 /bin/bash 进入容器 [root@master~]# docker commit master new_image:tag 提交容器 [root@master~]# sudo docker rm contianername 删除容器 [root@master~]# sudo docker rmi imagesname 删除镜像 [root@master~]# sudo docker rename name1 name2 重新命名容器
|
hadoop
命令
1 2 3 4
| [root@master~]# hadoop dfsadmin -report 命令查看磁盘使用情况 [root@master~]# hadoop jar hadoop-mapreduce-examples-2.7.5.jar wordcount /wordcount/input /wordcount/output 运行jar包 [root@master~]# hadoop dfsadmin -safemode leave 退出安全模式 [root@master~]# hadoop jar x.jar MainClassName[主类名称] [inputPath] [outputPath]
|
运行hadoop
自带MapReduce
程序
1 2 3 4 5
| [root@master hadoop-2.7.5]# hadoop fs -mkdir -p /wordcount/input [创建一个目录] [root@master hadoop-2.7.5]# hadoop fs -put a.txt b.txt /wordcount/input [将文件上传到input文件夹中] [root@master hadoop-2.7.5]# cd share/hadoop/mapreduce/ [进入程序所在目录] [root@master mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.5.jar wordcount /wordcount/input /wordcount/output [运行jar包] [root@master mapreduce]# hadoop fs -cat /wordcount/output/part-r-00000 [查看输出结果]
|
Spark
环境变量
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| vim ~/.bashrc # 现在我们的环境变量配置看起来像这样 export HADOOP_HOME=/usr/local/hadoop export SPARK_HOME=/usr/local/spark export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH export PYSPARK_PYTHON=python3 export JAVA_HOME=/usr/local/java/jdk1.8.0_171 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar export PATH=$PATH:${JAVA_HOME}/bin export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/spark/bin:/usr/local/spark/sbin:$PATH export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$LD_LIBRARY_PYTHON
# 使配置生效 source ~/.bashrc
|
spark-env.sh
1 2 3 4 5 6 7 8
| cd /usr/local/spark cp ./conf/spark-env.sh.template ./conf/spark-env.sh vim ./conf/spark-env.sh # 在最后一行添加如下配置信息:
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath) export HADOOP_CONF_DIR=/usr/local/hadoop/ect/hadoop export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop
|
运行
1 2 3 4
| /usr/local/spark/bin/run-example SparkPi /usr/local/spark、bin/run-example SparkPi 2>&1 | grep "Pi is roughly" /usr/local/spark/bin/spark-submit ../examples/src/main/python/pi.py 2>&1 | grep 'Pi' /usr/local/spark/bin/spark-submit --master yarn --deploy-mode cluster /usr/local/spark/examples/src/main/python/wordcount.py hdfs://master:9000/words.txt
|
注意点
运行jar
包时,先删掉 /output
文件夹,否则无法发查看输出结果
1 2 3 4 5 6 7 8
| export JAVA_HOME=/usr/local/jdk1.8.0_162 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export HADOOP_HOME=/usr/local/hadoop-2.7.5 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:/usr/local/hbase-1.3.6/bin export HBASE_HOME=/usr/local/hbase-1.3.6 export HBASE_CLASSPATH=/usr/local/hbase-1.3.6/lib/hbase-common-1.3.6.jar:/usr/local/hbase-1.3.6/lib/ hbase-server-1.3.6.jar
|